4. • ‘Big Data’ is similar to ‘small data’, but
bigger
•…but having data bigger it requires different
approaches:
• Techniques, tools and architecture
•…with an aim to solve new problems
• …or old problems in a better way
4
6. Characteristics of Big Data:
1-Scale (Volume)
• DataVolume
Exponential increase in
collected/generated data
6
7. Big Data in Today’s Business and Technology Environment
2.7 Zetabytes of data exist in the digital universe today. (Source)
235 Terabytes of data has been collected by the U.S. Library of Congress in April
2011. (Source)
The Obama administration is investing $200 million in big data research projects.
(Source)
IDC Estimates that by 2020,business transactions on the internet- business-to-
business and business-to-consumer – will reach 450 billion per day. (Source)
Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.
(Source)
Akamai analyzes 75 million events per day to better target advertisements.
(Source)
94% of Hadoop users perform analytics on large volumes of data not possible
before; 88% analyze data in greater detail; while 82% can now retain more of their
data. (Source)
7
8. Walmart handles more than 1 million customer transactions
every hour, which is imported into databases estimated to
contain more than 2.5 petabytes of data. (Source)
More than 5 billion people are calling, texting, tweeting and
browsing on mobile phones worldwide. (Source)
Decoding the human genome originally took 10 years to
process; now it can be achieved in one week. (Source)
In 2008, Google was processing 20,000 terabytes of data (20
petabytes) a day. (Source)
The largest AT&T database boasts titles including the largest
volume of data in one unique database (312 terabytes) and the
second largest number of rows in a unique
8
9. The Rapid Growth of Unstructured Data
YouTube users upload 48 hours of new video every minute
of the day. (Source)
571 new websites are created every minute of the day.
(Source)
Brands and organizations on Facebook receive 34,722
Likes every minute of the day. (Source)
100 terabytes of data uploaded daily to Facebook.
(Source)
According to Twitter’s own research in early 2012, it sees
roughly 175 million tweets every day, and has more than
465 million accounts. (Source)
30 Billion pieces of content shared on Facebook every
month. (Source)
Data production will be 44 times greater in 2020 than it
was in 2009. (Source)
9
10. The Rapid Growth of Unstructured Data
In late 2011, IDC Digital Universe published a
report indicating that some 1.8 zettabytes of
data will be created that year. (Source)
In other words, the amount of data in the world
today is equal to:
Every person in the US tweeting three tweets
per minute for 26,976 years.
Every person in the world having more than
215m high-resolution MRI scans a day.
More than 200bn HD movies – which would take a
person 47m years to watch.
10
12. Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time
Sensor technology and
networks
(measuring all kinds of data)
12
18. Why Big Data
• Key enablers of appearance and growth of Big Data are
–Increase of storage capacities
–Increase of processing power
–Availability of data
–Every day we create 2.5 quintillion bytes of data;
90% of the data in the world today has been created
in the last two years alone
18
19. Big Data Analytics
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns, unknown correlations
• Competitive advantage
• Better business decisions: strategic and operational
• Effective marketing, customer satisfaction, increased revenue
19
20. Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
20
21. Healthcare
• 80% of medical data is unstructured and is clinically
relevant
• Data resides in multiple places like individual EMRs,
lab and imaging systems, physician notes, medical
correspondence, claims etc
• Leveraging Big Data
• Build sustainable healthcare systems
• Collaborate to improve care and outcomes
• Increase access to healthcare
21
23. PotentialTalent Pool -Big Data
India will require a minimum of 1 lakh data scientists in the next couple of years
in addition to data analysts and data managers to support the Big Data space.
23
26. Big DataAnalyticsTechnologies
NoSQL : non-relational or at least non-SQL database
solutions such as HBase (also a part of the Hadoop
ecosystem), Cassandra, MongoDB, Riak, CouchDB, and
many others.
Hadoop: It is an ecosystem of software packages,
including MapReduce, HDFS, and a whole host of other
software packages
26
27. Main Big DataTechnologies
Hadoop NoSQL Databases Analytic Databases
Hadoop
• Low cost, reliable
scale-out architecture
• Distributed computing
Proven success in
Fortune 500
companies
• Exploding interest
NoSQL Databases
• Huge horizontal scaling
and high availability
• Highly optimized for
retrieval and appending
• Types
• Document stores
• Key Value stores
• Graph databases
Analytic RDBMS
• Optimized for bulk-load
and fast aggregate
query workloads
• Types
• Column-oriented
• MPP
• In-memory
27