3. Definition
• Big data is a term applied to data sets whose
size is beyond the ability of commonly used
software tools to capture, manage, and process
the data within a tolerable elapsed time. Big
data sizes are a constantly moving target
currently ranging from a few dozen terabytes to
many petabytes of data in a single data set.
6. • In a 2001 research report[14] and related
conference presentations, then META Group
(now Gartner) analyst, Doug Laney, defined
data growth challenges (and opportunities) as
being three-dimensional, i.e. increasing volume
(amount of data), velocity (speed of data in/out),
and variety (range of data types, sources).
Gartner continues to use this model for
describing big data.[15]
7. Gartner
• Worldwide information volume is growing
annually at a minimum rate of 59 percent
annually, and while volume is a significant
challenge in managing big data, business and
IT leaders must focus on information volume,
variety and velocity.
• Volume
• Variety
• Velocity
8. Volume
• Volume: The increase in data volumes within
enterprise systems is caused by transaction
volumes and other traditional data types, as
well as by new types of data. Too much volume
is a storage issue, but too much data is also a
massive analysis issue.
9. Variety
• Variety: IT leaders have always had an issue
translating large volumes of transactional
information into decisions — now there are
more types of information to analyze — mainly
coming from social media and mobile (context-
aware). Variety includes tabular data
(databases), hierarchical data, documents, e-
mail, metering data, video, still images, audio,
stock ticker data, financial transactions and
more.
10. Velocity
• Velocity: This involves streams of data,
structured record creation, and availability for
access and delivery. Velocity means both how
fast data is being produced and how fast the
data must be processed to meet demand.
11. Why now?
There were 5 exabytes of information created between the dawn of
civilization through 2003, but that much information is now created every 2
days, and the pace is increasing
Eric Schmidt, Google CEO, Techonomy Conference, August 4, 2010
Data is becoming the new raw material of business: an economic input
almost on a par with capital and labour. “Every day I wake up and ask, ‘how
can I flow data better, manage data better, analyse data better?” says Rollin
Ford, the CIO of Wal-Mart.
Source: Data, Data Everywhere, The Economist, February 25, 2010
12. Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)
13. Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)
14.
15.
16.
17. How can big data create value?
• Creating transparency – enabling, for example,
the manufacturing sector to integrate ―data from
R&D, engineering, and manufacturing units to
enable concurrent engineering ... (to)
significantly cut time to market and improve
quality.‖ This seems much like traditional data
warehousing.
18. How can Big Data create value?
• Enabling experimentation – ―organizations can
collect more accurate and detailed performance
data ... to instrument processes and then set up
controlled experiments … (which) can enable
leaders to manage performance at higher
levels.‖ Super-crunching equals analytics +
experiments.
19. How can Big Data create value?
• Innovating new business models – ―The
emergence of real-time location data has
created an entirely new set of location-based
services from navigation to pricing property and
casualty insurance based on where, and how,
people drive their cars.‖ This affirms Mike
Loukides' assertion ―that data science enables
the creation of data products.‖
20. How can Big Data create value?
• Supporting human decision making with
automated algorithms – ―decision making may
never be the same; some organizations are
already making better decisions by analyzing
entire datasets from customers, employees, or
even sensors embedded in products.‖ The
statistical learning world continues to progress.
21. SAS - unstructured text
• http://www.youtube.com/user/SASsoftware?v=
NHAq8jG4FX4&feature=pyv&ad=8557352196&
kw=data%20analytics
22. Pattern Based Strategy
• "The ability to manage extreme data will be a core competency of enterprises that
are increasingly using new forms of information — such as text, social and context —
to look for patterns that support business decisions in what we call Pattern-Based
Strategy," said Yvonne Genovese, vice president and distinguished analyst at
Gartner. "Pattern-Based Strategy, as an engine of change, utilizes all the
dimensions in its pattern-seeking process. It then provides the basis of the modeling
for new business solutions, which allows the business to adapt. The seek-model-
and-adapt cycle can then be completed in various mediums, such as social
computing analysis or context-aware computing engines."
23. Pattern Based Strategy
• http://www.youtube.com/watch?v=r8N0L8Cz1q
g&feature=BFa&list=UUSNX50LYGXWV_e5U
WZGPGbw&lf=plpp_video
24. EMC’s Big Data Video
• http://www.youtube.com/watch?v=ILBV391a8Ic
• O’Reilly’s Take
• http://www.youtube.com/watch?v=Rn5rVGGfzy
0&feature=related
25. Tricks of the Trade
• New Architecture
• In Memory Analytics
26.
27.
28.
29.
30. In-Memory Indexing at SAP
• We have also got enterprise search time, we really started doing that back in
2003/2004 time period, that’s also when we started coming out with
business warehouse accelerator that was when Google was just really
starting to become Google, and we tried to do the same thing with enterprise
data that Google does with website data as far as indexing it. So we also
put the indexes in memory, so its speeded up even further and you know
now if you actually look at HANA really is kind of the next evolutionary step
in that that chain. This is in-memory process and this isn’t something just
for a specialist. It really is a technology that’s matured to a level that it can
run the entire business suite and run your entire company in-memory and
get all those benefits for everything.
• http://docs.media.bitpipe.com/io_10x/io_102428/item_477005/The%20Next
%20Chapter%20of%20In-Memory%20Computing_PT_12.22.11.pdf
31.
32.
33. For more on HADOOP
• http://www.slideshare.net/PhilippeJulio/hadoop-
architecture
Organizations everywhere now realize that there is immense insight and value locked inside of the data, and new infrastructure and approaches to data analysis allow us to unlock that value
This is what’s happened in the last four decades.
These four factors also happen to be inputs for data generation processes.
Sizes that were unimaginable a few years ago are now commonplaceJust storing and accessing the data can be difficultSIZE – MANAGED WITH – STOREDSmall :: Excel, R :: fits in memory on one machineMedium :: indexed files, monolithic DB :: fits on disk on one machineBig :: Hadoop, Distributed DB :: stored across many machinesGenerally - data too big to fit on a disk :: ‘data-center’ scale
Data that is difficult for computers to understand Principal example being natural langauagetext, Images, Video and moreValuable info locked up inside this data (e.g. twitter)
More data coming in fasterDecision windows getting smallerValuable to worthless in a matter of minutes. (seconds … no milliseconds)
Source: Architecture for Big Data Analytics: MarkLogic white paper