Subscribe to the Bombay Chartered Accountant Journal Subscribe Now!

November 2012

Big Data – What is it all About??

By Samir Kapadia, Chartered Accountant
Reading Time 9 mins
fiogf49gjkf0d
About this article

Big Data is not a
very new idea, it’s been out there for quite some time. Nonetheless,
very few people have realised the full potential of this idea. To
highlight a few advantages, Big Data can help businesses become more
efficient, help them in servicing the customers better and at the same
time improve their bottomline. In a completely different sphere of life,
Big Data helps various research organisations track a variety of data,
such as tracking meteorological data, data related to clinical tests
conducted, etc.

Be it business establishments like eBay, Amazon,
Facebook or research organisation like NASA, the UN, Governments across
the world, etc., the one common link for all those who use Big Data is
Technology. This article seeks to create awareness about how technology
is used to store and analyse Big Data. Like all big ideas, there are
several stories – success as well as failure, myths, etc. associated
with it. This article will deal with some of the successes and failures.


Background

Ever wondered how a weather bureau
predicts weather or for that matter, how organisations like NASA, ISRO
monitor space, (in case you didn’t know already – apart from secretly
tracking UFOs) that includes tracking various stars, planets,
meteorites, comets, space crafts, satellites, millions of objects of
floating junk which were in some form or another a part of a satellite
or some cargo carried by the satellites. Also, there is the curious case
of the measurements that scientists do, such as that in a nuclear test,
the Hadron Collider. How about mapping the human genome – did you know
that there are more than a billion unique data sets ?

I know
that sounds hugely futuristic and the question that begs to be answered
is “What do I care” or “How does it matter to me”. Well let’s just say
that what is described above are some of the sources and users of Big
Data. Closer to home or to our everyday life, Big Data is used by giants
like Facebook, Amazon, Walmart, to name a few, for improving customer
experience.

Characteristics of Big Data:

Well, to be
honest, “Big Data” is more like a term which was coined in reference to
the data. What I mean is that, there no “official” definition of “Big
data” or for that matter “Small Data”. But, generally speaking, Big Data
refers to data characterised by four features i.e. volume, variety,
velocity and veracity. To understand this better, let’s take a few
illustrations of these characteristics that are closely identified with
Big Data:

Volume:
Today, businesses everywhere, are awash
with ever-growing data of all types. Conservatively speaking, they
collect huge amounts of data (often the volume is in terabytes – in some
cases petabytes – of information).

For instance, someone like
Twitter would churn x terabytes of tweets created each day, into
improved product sentiment analysis. Someone like General Electric is
likely to convert billions of annual meter readings to better predict
power consumption. One company boasts of systems which track events
(crime related) which can help Governments reduce crime rates.

Velocity:

Sometimes, a few minutes is too late. Certain time-sensitive processes
such as catching fraud, Big Data must be used as it streams into your
enterprise in order to maximise its value.

For instance,
exchanges like the Bombay Stock Exchange, National Stock Exchange etc.,
scrutinise millions of trade events created each day to identify
potential fraud (like the punching error report very recently). Couple
of weeks ago (and even in the past), these exchanges had assisted SEBI
is pinpointing instances of circular trading and front running.

Variety: For the readers of this Journal, data
would mean spreadsheets, word documents, accounting records, etc. But in
reality, there is a vast variety of forms/formats in which data can
exist. In case of Big Data, data may be of any type – structured and
unstructured data, text data, sensor data, audio, video, click streams,
log files and more. Typically, new insights are found when all these
different types of data is put together and analysed from a specific or
variety of specific points of reference.

The classic examples of
this would be Facebook, Amazon etc., and if I may dare to say so,
“Algorithmic trading solutions”. It is said that in some cases, the
“algos” are so advanced that they analyse the tweets and social media
trends for “sentiments” and execute trades on the basis of such analysis
alone.

Veracity:
What role does veracity have to play
here. Imagine this – you spend a fortune, putting in place a system to
collect the data. Thereafter, the data is stored before an analysis is
made. What good would be the collection, storage and analysis, if the
data collected was inaccurate. Further, customers part with the data
willingly (most of the time unknowingly), who ensure that their privacy
is not violated. Statistically speaking, one in three business leaders
don’t trust the information they use to make decisions.

How can
you act upon information, if you don’t trust it? Establishing trust in
Big Data presents a huge challenge, as the variety and number of sources
grows.

Big Data – has been out there for some time:

Most
people go under the assumption that Big Data is a recent phenomenon.
But that’s not quite true. As a matter of fact, companies like American
Express1 and Google have been using Big Data in some form or the other,
to analyse and predict customer behaviour, with a view to enhance
customers’ service and public perception. While this may or may not be
true, the fact remains that the amount of data captured and analysed in
the last two to three years, far exceed the total data (in volume and
variety) captured over the last millennia (at the least).

Big Data – recent changes:
What
most people don’t realise, is the manner and extent to which changes
have taken place in the last couple of years. To begin with, storage
space has increased dramatically, our ability to process such data has
been growing exponentially. One could also attribute some positives to
the technological advancement, development of new analytical models,
etc. Given all these, our need and manner of use, the very application
of such data, has undergone a sea of change (one may say. A change of
epic proportions). Here is why:

  • Walmart handles more
    than 1 million customer transactions every hour, which is imported into
    databases estimated to contain more than 2.5 petabytes of data.
  • Facebook handles 40 billion photos from its user base.
  • FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide.
  • Decoding the human genome originally took 10 years to process; now it can be achieved in one week.
  • There
    are 4.6 billion mobile-phone subscriptions worldwide and there are
    between 1 billion and 2 billion people accessing the internet.
  •  Between
    1990 and 2005, more than 1 billion people worldwide entered the middle
    class, which means more and more people who gain money will become more
    literate, which in turn leads to information growth.
  • The world’s effective capacity to exchange information through telecommunication networks was
  • 281 petabytes in 1986,
  • 471 petabytes in 1993,
  • 2.2 exabytes in 2000,
  • 65 exabytes in 2007; and
  • it is predicted that the amount of traffic

flowing over the internet will reach 667 exabytes annually by 2013. (Source: Wikipedia)

How big is “Big data”:

Consider this. In 2012, the Obama administration announced the Big Data Research and Development Initiative, which explored how Big Data could be used to address important problems facing the government. The initiative was composed of 84 different Big Data programs spread across six departments. The United States Federal Government owns six of the ten most powerful supercomputers in the world.

Big data has increased the demand of information management specialists due which software giants of the likes of SAG, Oracle Corporation, IBM, Microsoft, SAP, and HP, have spent more than $15 billion on software firms only specialising in data management and analytics. This industry on its own is estimated to be worth more than $100 billion. That’s not all, it’s reported to be growing at almost 10% a year, which is roughly twice as fast as the software business as a whole.

In the Indian scenario, the Indian Big Data industry is expected to grow from $ 200 million in 2012 to $ 1 billion in 2015, at a CAGR of over 83%. Nasscom’s prediction is that Big Data will help the BPO industry move forward as it will help in “evidence-based” decision-making for clients, which in turn has a high impact on business operations.

Can we ignore Big data?

The answer seems to a resounding NO. Why?????? Cause………… To remain competitive, all organisations need to analyse both internal and external data, as quickly and cost effectively as possible. As the world becomes more instrumented, with RFID tags, sensors and other sources, companies are creating more and more data. When paired with external data – like that generated by social media sites – there’s incredible opportunity that is largely untapped and unanalysed.

Parting remarks:

This write-up was intended to be a precursor – to give the readers a basic overview of Big Data. In the next part, we will cover some more ground and delve into some more details, understand what’s all the hype about and whether there is a hidden pot of the gold at the end of the rainbow or not.

Until then, I wish all the readers a Happy Dassera.

Disclaimer: The information/factual data provided in the above write-up is based on several news reports, articles, etc., available in the public domain. The purpose of this write-up is not to promote or malign any person or company or entity. The purpose is merely to create awareness and share knowledge that is already available in the public domain.

You May Also Like