Subscribe to the Bombay Chartered Accountant Journal Subscribe Now!

December 2012

Big Data – II

By Samir Kapadia, Chartered Accountant
Reading Time 8 mins
fiogf49gjkf0d
About this Article

This article is
part 2 of the series on Big Data. This article briefly deals with issues
such as, why Big Data is gaining so much importance and what are the
recent trends in Big Data collection and analysis. The write up also
discusses some of the technologies being used for Big Data analysis.

The
previous write up briefly touched upon what is Big Data and some
background on the vital role played by it. This write up will delve a
little further and deal with some of the trends and developments in this
arena.

Background:
Big Data, as discussed earlier,
is all about collecting, storing, analysing and using the results for
betterment (one sincerely hopes so). It is typically characterised by
features such as volume, velocity, variety and veracity. While Big Data
is not entirely a recent development, but the manner in which data is
gathered, the sources of information, techniques for storage and
technologies for analysis, have evolved significantly in recent times.

Big Data is for Everyone:
Generally
speaking, most people believe that Big Data is for large corporations
and businesses or for the Government. But the truth is, whether you’re a
5 person shop or part of the Fortune 500, you can have Big Data and it
can help you to grow and become profitable. Today, if one wants to
remain competitive, he has to analyse both internal and external data,
as quickly and cost effectively as possible. This (rule) applies equally
to all types of organisations, big or small, giants or dwarfs.

Right
now, you may be asking how will Big Data help me to find the
opportunities by analysing new sources of data? Here is one small
example:

As the world becomes more instrumented, with RFID tags,
sensors and other sources, we are creating more and more data. When
paired with external data – like that generated by social media sites –
there’s incredible opportunity that is largely untapped and unanalysed.
This is where Big Data analysis comes into the picture. Every day,
companies of all sizes “cut through the noise” created by so much data
to find valuable insights.

Not just businesses and commercial
organizations, Big Data analysis can be applied to the social sector
too. Using the same techniques and tools (i.e. used for developing
marketing and risk management tools), Big Data analysis when applied to
the social sector, has the potential to revolutionise the functioning of
those sectors. For instance, imagine the advantages of using Big Data
analysis in:

  •  the public sector;

  •  the healthcare sector; or

  •  (to put it more generally,) mainly those sectors where an ethos of treating all citizens in the same way is kind of expected.

Advantages would traverse beyond commercials to the realm of mass social betterment.

How Big Data is Used
:
Big data allows organisations to create highly specific segmentations
and to tailor products and services precisely to meet those needs.

Consumer
goods and service companies that have used segmentation for many years
are beginning to deploy ever more sophisticated Big Data techniques,
such as the real-time micro-segmentation of customers to target
promotions and advertising. As they create and store more transactional
data in digital form, organisations can collect more accurate and
detailed performance data, in real or near real time, on everything from
product inventories to personnel sick days. Information Technology is
used to instrument processes and then set up controlled experiments.

Data
generated therefrom is used to understand the root causes of the
results, thus enabling leaders to make decisions and implement change.

Big Data technologies:
Some of the key Big Data technologies which are in play are described below:

  •  Cassandra: Cassandra is an open source (free) database management
    system, designed to handle huge amounts of data on a distributed system.
    This system was originally developed at Facebook and is now managed as a
    project of the Apache Software foundation.

  •  Dynamo: Is a proprietary software developed by Amazon.

  •  Hadoop: Is an open source software framework for processing huge
    datasets on certain kinds of problems on a distributed system. Its
    development was inspired by Google’s MapReduce and Google File System.
    It was originally developed at Yahoo! and is now managed as a project of
    the Apache Software Foundation.

  •  R: “R” is an open source
    programming language and software environment for statistical computing
    and graphics. The R language has become a de facto standard among
    statisticians for developing statistical software and is widely used for
    statistical software development and data analysis. R is part of the
    GNU Project, a collaboration that supports open source projects.

  •  HBase: Is an open source (free), distributed, non-relational database
    modeled on Google’s Big Table. It was originally developed by Powerset
    and is now managed as a project of the Apache Software foundation as
    part of the Hadoop.

  •  MapReduce: A software framework introduced
    by Google for processing huge datasets on certain kinds of problems on a
    distributed system.32. This too has been implemented in Hadoop.

  •  Stream processing: Also known as event stream processing. This refers
    to technologies designed to process large real-time streams of event
    data. Stream processing enables applications such as algorithmic trading
    in financial services, RFID event processing applications, fraud
    detection, process monitoring, and location-based services in
    telecommunications.

  •  Visualisation: This refers to technologies
    used for creating images, diagrams, or animations to communicate a
    message that are often used to synthesise the results of big data
    analyses. Some of the instances of visualisation are: Tag clouds,
    Clustergram, History flow, etc.

Myths surrounding Big Data:
While
there are many myths surrounding Big Data, for the purpose of this
write up, I have briefly summarised few myths commonly associated with
Big Data. These are:

Big Data is only about massive volumes of data:

As
discussed in part 1, volume is only one of the factors. Generally, the
industry considers petabytes of data as a starting point. However, it is
only a starting point, there are other aspects such as velocity,
variety and veracity to deal with.

Big Data means unstructured data:

While
variety is an important characteristic, it should be understood in
terms of format in which the data is gathered and stored. Many people
have a mistaken belief that the data would be in an unstructured format.
As a matter of fact, the term “unstructured” is misleading to a certain
extent. This is because, one doesn’t take in to account the many
varying and subtle structures typically associated with Big Data types.
Candidly, many industry insiders admit that Big Data may well have
different data types within the same set that do not contain the same
structure. Some suggest that the better way to describe Big Data would
be to term it “multi structured”.

Big Data is a silver bullet type solution:

This
is an avoidable pitfall. Most businesses have a tendency to believe
that Big Data is a silver bullet to their growth strategy. The
applications available only offer one of the means to analyse data.
Application of the learnings from the analysis is altogether a different
thing. What needs to be understood is that, Big Data is only a means to
the end and not the end itself.

What to expect in future:

  •    Big Data will be an important driver of business activities in the future. Almost all businesses will leverage the insights from Big Data based research to hone in their strategy. Be it innovations, competition or value addition, Big Data’s contribution will be significant.
  •     The impact of Big Data will span across sectors. Among these health sciences and natural sciences are likely to have a positive impact on the larger society.

  •     One can expect that the sources of data and volume of data itself will grow exponentially. Consequently, the data integration process will become more efficient.

  •     There will be a demand for talented personnel. Notable demand will not be restricted to personnel possessing the requisite skill for collecting and analysing Big Data. The need will be for personnel who know how to use the results of Big Data analysis in effective decision making.

  •     Decision making as we know it (and put in practice) today, is likely to undergo a drastic change. Sophisticated analytics can substantially improve decision making, minimise risks, and unearth valuable insights that would otherwise remain hidden.
  •    We are likely to see a sea of change in the regulatory environment, mainly related to privacy, intellectual property rights and public liability.

Well, this concludes the part 2 of the write up on Big Data. In my next write up I intend to deal with “the (ab)use of social media”. I intend to cover some (disturbing) trends that have caught the attention of many. Its still a thought, but the idea is fresh.

Disclaimer: The information/factual data provided in the above write up is based on several news reports, articles, etc., available in the public domain. The purpose of this write up is not to promote or malign any person or company or entity, the purpose is merely to create an awareness and share the knowledge that is already available in the public domain.

You May Also Like