The dawn of big data

THE DAWN OF BIG DATA

New Rules; New Structures

Neal J. Hannon
University of Kansas
February 9, 2012

Data Mania
• http://www.youtube.com/watch?v=HQ_3g2hUC
n4&feature=player_embedded

Definition

• Big data is a term applied to data sets whose
size is beyond the ability of commonly used
software tools to capture, manage, and process
the data within a tolerable elapsed time. Big
data sizes are a constantly moving target
currently ranging from a few dozen terabytes to
many petabytes of data in a single data set.

• In a 2001 research report[14] and related
conference presentations, then META Group
(now Gartner) analyst, Doug Laney, defined
data growth challenges (and opportunities) as
being three-dimensional, i.e. increasing volume
(amount of data), velocity (speed of data in/out),
and variety (range of data types, sources).
Gartner continues to use this model for
describing big data.[15]

Gartner
• Worldwide information volume is growing
annually at a minimum rate of 59 percent
annually, and while volume is a significant
challenge in managing big data, business and
IT leaders must focus on information volume,
variety and velocity.
• Volume
• Variety
• Velocity

Volume

• Volume: The increase in data volumes within
enterprise systems is caused by transaction
volumes and other traditional data types, as
well as by new types of data. Too much volume
is a storage issue, but too much data is also a
massive analysis issue.

Variety
• Variety: IT leaders have always had an issue
translating large volumes of transactional
information into decisions — now there are
more types of information to analyze — mainly
coming from social media and mobile (context-
aware). Variety includes tabular data
(databases), hierarchical data, documents, e-
mail, metering data, video, still images, audio,
stock ticker data, financial transactions and
more.

Velocity
• Velocity: This involves streams of data,
structured record creation, and availability for
access and delivery. Velocity means both how
fast data is being produced and how fast the
data must be processed to meet demand.

Why now?
There were 5 exabytes of information created between the dawn of
civilization through 2003, but that much information is now created every 2
days, and the pace is increasing
Eric Schmidt, Google CEO, Techonomy Conference, August 4, 2010

Data is becoming the new raw material of business: an economic input
almost on a par with capital and labour. “Every day I wake up and ask, ‘how
can I flow data better, manage data better, analyse data better?” says Rollin
Ford, the CIO of Wal-Mart.
Source: Data, Data Everywhere, The Economist, February 25, 2010

Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)

How can big data create value?

• Creating transparency – enabling, for example,
the manufacturing sector to integrate ―data from
R&D, engineering, and manufacturing units to
enable concurrent engineering ... (to)
significantly cut time to market and improve
quality.‖ This seems much like traditional data
warehousing.

How can Big Data create value?
• Enabling experimentation – ―organizations can
collect more accurate and detailed performance
data ... to instrument processes and then set up
controlled experiments … (which) can enable
leaders to manage performance at higher
levels.‖ Super-crunching equals analytics +
experiments.

• Innovating new business models – ―The
emergence of real-time location data has
created an entirely new set of location-based
services from navigation to pricing property and
casualty insurance based on where, and how,
people drive their cars.‖ This affirms Mike
Loukides' assertion ―that data science enables
the creation of data products.‖

• Supporting human decision making with
automated algorithms – ―decision making may
never be the same; some organizations are
already making better decisions by analyzing
entire datasets from customers, employees, or
even sensors embedded in products.‖ The
statistical learning world continues to progress.

SAS - unstructured text

• http://www.youtube.com/user/SASsoftware?v=
NHAq8jG4FX4&feature=pyv&ad=8557352196&
kw=data%20analytics

Pattern Based Strategy
• "The ability to manage extreme data will be a core competency of enterprises that
are increasingly using new forms of information — such as text, social and context —
to look for patterns that support business decisions in what we call Pattern-Based
Strategy," said Yvonne Genovese, vice president and distinguished analyst at
Gartner. "Pattern-Based Strategy, as an engine of change, utilizes all the
dimensions in its pattern-seeking process. It then provides the basis of the modeling
for new business solutions, which allows the business to adapt. The seek-model-
and-adapt cycle can then be completed in various mediums, such as social
computing analysis or context-aware computing engines."

Pattern Based Strategy

• http://www.youtube.com/watch?v=r8N0L8Cz1q
g&feature=BFa&list=UUSNX50LYGXWV_e5U
WZGPGbw&lf=plpp_video

EMC’s Big Data Video

• http://www.youtube.com/watch?v=ILBV391a8Ic

• O’Reilly’s Take
• http://www.youtube.com/watch?v=Rn5rVGGfzy
0&feature=related

Tricks of the Trade

• New Architecture

• In Memory Analytics

In-Memory Indexing at SAP
• We have also got enterprise search time, we really started doing that back in
2003/2004 time period, that’s also when we started coming out with
business warehouse accelerator that was when Google was just really
starting to become Google, and we tried to do the same thing with enterprise
data that Google does with website data as far as indexing it. So we also
put the indexes in memory, so its speeded up even further and you know
now if you actually look at HANA really is kind of the next evolutionary step
in that that chain. This is in-memory process and this isn’t something just
for a specialist. It really is a technology that’s matured to a level that it can
run the entire business suite and run your entire company in-memory and
get all those benefits for everything.

• http://docs.media.bitpipe.com/io_10x/io_102428/item_477005/The%20Next
%20Chapter%20of%20In-Memory%20Computing_PT_12.22.11.pdf

For more on HADOOP
• http://www.slideshare.net/PhilippeJulio/hadoop-
architecture

Obligatory Questions slide

• Any Questions?

The dawn of big data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie The dawn of big data

Ähnlich wie The dawn of big data (20)

Mehr von Neal Hannon

Mehr von Neal Hannon (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The dawn of big data

Hinweis der Redaktion