2. Introduction
Navigate. Doug Denton
Big Data Practice Lead
Level Seven
Guide. Tim Hoolihan
CTO, Dir. Of Strategic Services
Level Seven
Explore. Michael DeAloia
Regional Vice President
Expedient Data Centers
3. Agenda
3:00 – Welcome & Introductions
3:05 - Explore the concept of Big Data
3:30 - Navigate through initial projects
4:00 – Beer break
4:15 – Back to work
4:30 - Guide to full company adoption
5:00 – QA and more beer (tour departs)
5. “BIG
DATA
DREAMS”
Michael C. DeAloia
Regional Vice President - Cleveland
Explore. Navigate. Guide.
6. Founded in 2001
Cleveland Ohio
12 Year Tenure In
Technology
Infrastructure
Services
Scalable Platform
Design
Petabytes of Storage,
100s of Terabytes of
Memory in our Cloud
Thousands of
Customers
2x Growth Y/Y in
Cloud Services
BIGDATADREAMS::THEEXPEDIENTECOLOGY
7. 7
Cloud & MS
Enterprise Cloud
Managed OS & App
Storage & Backup
Network Services
BIGDATADREAMS::THEEXPEDIENTECOLOGY
8. 1 Boston
2 Baltimore
2 Cleveland
1 Columbus
1 Indianapolis
2 Pittsburgh
10GbE
Interconnect
BIGDATADREAMS::THEEXPEDIENTECOLOGY
9. What is Big Data?
History of Big Data
8 Laws of Big Data
Q&A
Big Data by the Numbers
BIGDATADREAMS::ROADMAP
10. What is Big Data?
Gartner has defined ‘Big Data’ as
a Strategic Technology for 2013.
BIGDATADREAMS::WHATISBIGDATA
11. What is Big Data?
• “Big
Data
Dreams”
11
Big Data /bɪɡ dā'tə/ n. A collection of
data sets so large and complex that it
becomes difficult to process using on-hand
database management tools or traditional
data processing applications. %
%
Big Data challenges include capture,
curation, storage, search, sharing,
transfer, analysis and visualization. %
BIGDATADREAMS::WHATISBIGDATA
12. What is Big Data?
• “Big
Data
Dreams”
12
The three Vs characterize what big data is all about, and also
help define the major issues that IT needs to address:
• Volume The massive scale and growth of unstructured
data outstrips traditional storage and analytical solutions.
• Variety Traditional data management processes can’t
cope with the heterogeneity of big data—or “shadow” or
“dark data,” such as access traces and Web search
histories.
• Velocity Data is generated in real time, with demands for
usable information to be served up immediately.
BIGDATADREAMS::WHATISBIGDATA
13. What is Big Data?
• “Big
Data
Dreams”
13
“Big Data is the new oil.”
-
Bryan Trogdon
as quoted
in ‘The Future of Big Data’
Pew Research Survey
BIGDATADREAMS::WHATISBIGDATA
14. What is Big Data?
• “Big
Data
Dreams”
• A
technology-‐enabled
strategy
for
gaining
richer,
deeper
insights
into
customers,
partners,
and
the
business—and
ulEmately
gaining
compeEEve
advantage.
• Working
with
data
sets
whose
size
and
variety
is
beyond
the
ability
of
typical
database
soLware
to
capture,
store,
manage,
and
analyze.
• Processing
a
steady
stream
of
real-‐Eme
data
in
order
to
make
Eme-‐sensiEve
decisions
faster
than
ever
before.
• Distributed
in
nature.
AnalyEcs
processing
goes
to
where
the
data
is
for
greater
speed
and
efficiency.
• A
new
paradigm
in
which
IT
collaborates
with
business
users
and
“data
scienEsts”
to
idenEfy
and
implement
analyEcs
that
will
increase
operaEonal
efficiency
and
solve
new
business
problems.
• Moving
decision
making
down
in
the
organizaEon
and
empowering
people
to
make
beOer,
faster
decisions
in
real
Eme.
• Just
about
technology.
At
the
business
level,
it’s
about
how
to
exploit
the
vastly
enhanced
sources
of
data
to
gain
insight.
• Only
about
volume.
It’s
also
about
variety
and
velocity.
But
perhaps
most
important,
it’s
about
value
derived
from
the
data.
• Generated
or
used
only
by
huge
online
companies
like
Google
or
Amazon
anymore.
While
Internet
companies
may
have
pioneered
the
use
of
big
data
at
web
scale,
applicaEons
touch
every
industry.
• About
“one-‐size-‐fits-‐all”
tradiEonal
relaEonal
databases
built
on
shared
disk
and
memory
architecture.
Big
data
uses
a
grid
of
compuEng
resources
for
massively
parallel
processing
(MPP).
• Meant
to
replace
relaEonal
databases
or
the
data
warehouse.
Structured
data
conEnues
to
be
criEcally
important.
However,
tradiEonal
systems
may
not
be
suitable
for
the
new
sources
and
contexts
of
big
data.
Big Data Analytics IS: Big Data Analytics IS NOT:
BIGDATADREAMS::WHATISBIGDATA
15. What is Big Data?
• “Big
Data
Dreams”
“Every two days now we create as much
information as we did from the dawn of
civilization up until 2003. That’s something
like five exabytes of data”
- Erik Schmidt, CEO
Google
“By 2015 the digital universe is expected to
reach 8 zettabytes.”
- Intel
BIGDATADREAMS::WHATISBIGDATA
16. 16
1 zettabyte = 18 million copies of the Library of Congress
BIGDATADREAMS::WHATISBIGDATA
17. A new kind of professional is helping organizations make
sense of the massive streams of digital information: the data
scientist. Data scientists are responsible for modeling complex
business problems, discovering business insights, and
identifying opportunities.
They bring to the job:
• Skills for integrating and preparing large, varied data sets
• Advanced analytics and modeling skills to reveal and
understand hidden relationships
• Business knowledge to apply context
• Communication skills to present results
Who works Big Data?
BIGDATADREAMS::WHATISBIGDATA
34. 34
More sources and more devices
• Mobile
• Pictures
• Video
• SMS
• GPS
• Social Media
• Facebook
• Twitter
• Youtube
• Reviews
• Automated Sources
• RFID
• Telemetry
• Security cameras
Real-time correlation of
data can be turned into
golden nuggets of
information.
BIGDATADREAMS::BYTHENUMBERS
35. 35
Big Data Law #1
The Faster You Analyze Your Data, the
Greater its Predictive Power.
BIGDATADREAMS::THE8LAWSOFBIGDATA
Great
list
developed
by
Dave
Feinleib
–
Managing
Director
of
Big
Data
Group.
36. 36
Big Data Law #2
Maintain one copy of your data, not
dozens.
BIGDATADREAMS::THE8LAWSOFBIGDATA
37. 37
Big Data Law #3
Use more diverse data, not just more
data.
BIGDATADREAMS::THE8LAWSOFBIGDATA
38. 38
Big Data Law #4
Data has value far beyond what you
originally anticipate.
BIGDATADREAMS::THE8LAWSOFBIGDATA
39. 39
Big Data Law #5
Plan for Exponential Growth
BIGDATADREAMS::THE8LAWSOFBIGDATA
40. 40
Big Data Law #6
Solve a real pain point.
BIGDATADREAMS::THE8LAWSOFBIGDATA
41. 41
Big Data Law #7
Put data and humans together to get
more insight.
BIGDATADREAMS::THE8LAWSOFBIGDATA
42. 42
Big Data Law #8
Big Data is transforming business the
same way IT did.
BIGDATADREAMS::THE8LAWSOFBIGDATA
43. 43
Q&A
Michael C. DeAloia
Regional Vice President
Expedient Data Centers
m) 216.212.4067
e) michael.dealoia@expedient.com
BIGDATADREAMS::QUESTIONS&ANSWERS
44. Charting the Course to Big Data Implementation.
Doug Denton Tim Hoolihan
Big Data Practice Lead CTO, Dir. of Strategic Services
Explore. Navigate. Guide.
45. What’s Different About Big Data?
• Data that IT historically ignores
• Too much, too fast, too dirty to handle
• Represents 80% of all data
• Very different way of thinking about data
• Very different way of processing data
• A VERY BIG DEAL
You were blind, but now you see.
46. Why Now?
• Pretending 80% of data did not exist is OK
• Not really, numb & blind is no way to live
• Revolutionary tools now available
• Google, Facebook, Yahoo, IBM started
• Open source community advances
• HDFS, Map Reduce, Pig, Hive, JAQL, …
• Inexpensive, networked infrastructure available
It is all about technology, baby.
47. Where are we coming from?
• Relational databases are the norm
• Stored after analysis and transformation
• Optimized for predicted retrieval
• Best for well-understood, highly structured data
• Only works for 20% of our data
When it works, it works really well.
48. Where we’re going – Data at Rest
• Data stored in original format
• Divide and conquer to process
• Best for massive, poorly structured data
• Supplements relational database tools
Think “batch processing”.
49. Where we’re going – Data in Motion
• Data that you never write down
• Network traffic, sensor data, phone calls
• Data that never stops
• Processing is done in real time
• Processing is done in memory
• Tools are less numerous
• IBM Streams
Think “watch a stream flow by”.
50. Where are we now?
• Ecosystem of supporting tools well formed
• Thanks Google, FB, Yahoo, IBM
• Thanks Open Source Community
• Tool sets offered as premium aggregations
• IBM Big Insights
• Cloud infrastructure economical & available
• Expedient
Tools are ready for the craftsman.
51. What are the Tools?
• Distributed File System
• Distributed Map Reduce Runtime
• Jaql, Pig, Hive, Oozie, Hbase, R and others
Find a knowledgeable craftsman.
52. What Makes the Tools Different?
First and foremost - the run-time environment
• Massively distributed
• Redundant
• Anticipates failure
• Runs on commodity servers & operating systems
53. What else?
Divide and conquer on a massive scale
• Break data into smaller chunks (map)
• Execute on chunks in parallel
• Execute code as close to the data as possible
• Execute multiple instances simultaneously
• Work with name-value pairs (tuples)
• Assemble comprehensive answer (reduce)
55. The Challenge
• New way of thinking about data
• Everything is valuable data
• New way of thinking about processing data
• No normalization, no relationships
• Program extracts attribute and forms tuple
• Tuples consolidated and reduced
• Integration focus more on external sources, less DWs
• New tools and approaches
• Lots of specialized tools community-managed
• Technology adoption curve progressing rapidly
56. Meeting the Challenge
• Embrace the opportunity/inevitability
• Consider your place on the adoption curve
• Effectively, Efficiently, Intelligently:
• Experiment with technologies
• Prove concepts valuable to organization
• Prototype high value applications for quick wins
• Enable staff & organization
• Make a practical plan based on experience
Now is the time for leadership.
57. Big Data is a Big Deal for Business
• Bigger deal for CEO than for IT
• CEO singles look better than IT home runs
• Better CEO drags IT than IT push CEO
• You will need money
• You will need help keeping the faith
Dig where gold has been found.
61. Proving the Value
• GM/CEO needs to be in front of IT
• Think POV, not POC
• Get rid of the engineering mindset
• Stop thinking about specific tools – for now
• Sell the story without mentioning the tools
You still need the tools!
62. Top 5 Big Data Projects:
The Categories
1. Know Your Customer
2. Secure Cyber Assets
3. Optimize Operations
4. Expand Data Warehouse
5. Explore & Discover
63. Top 5 Big Data Projects:
1. Know Your Customer.
• Social Media
• Measure and track customer sentiment
• Real-time customer engagement
• Real-time selling
• Customer profiling
• Recent transactions
• Call center and web site activity
• Rate likelihood of defection
T-mobile cut defections by 50% in one quarter.
64. Top 5 Big Data Projects:
2. Secure Cyber Assets
• Analyze
• Logs to inform security policies
• Network traffic to identify outliers & patterns
• Enforce in real-time
• Data in motion solution
65. Top 5 Big Data Projects:
3. Optimize Operations
• Predict equipment failures
• Just-in-time maintenance
• Identify sources of inefficiencies
66. Top 5 Big Data Projects:
4. Expand Data Warehouse
• Customer profile (email/doc/call contents)
• Predicted behavior (man/machine/process)
• Market segmentation
67. Top 5 Big Data Projects:
5. Explore and Discover
• Cost of new customer
• Cost of a new product
• Efficacy of treatment
• Predictive analytics
• Data science analysis
68. Moving Forward
• Pick your team
• Call your shot
• Assemble your tools
• Prove the value (and your good judgment)
• Plot your course
70. Why Partner?
• What does a Strategic Partnership look like?
• What is the role of a data scientist?
71. Tools
• The tools are great, but…
• Owning a Hadoop cluster doesn’t make a Big Data
practice
• Just like owning a reporting tool doesn’t mean you
have a strong Business Intelligence initiative
• …it takes strategy and experts
72. Scenario A
• Retrained DBA or Developer
• Cost Model
• Looking for Theta with Linear Regression
• Local Minimum problems
• Lots of Iteration
• Even in a matrix / vector world, may iterate
73. Scenario B
• Data Scientist
• Linear Algebra solving data in chunks
• Reducing by multiple hundred iterations to one
• Use of proper data structures to leverage matrix / vector
operations
• MIMD vs SIMD on the CPU
• Again, large cycles of optimization
• No local minimum problems
74. Hats to Wear.
• Algorithms in Context
• Linear Algebra
• Data Structures
• CPU Architecture
• rare in the modern business app developer
• Concurrency Issues
• Cost Modeling
• Data Visualization
75. Why a Partner?
Multiple discipline jobs are hard, large barriers to entry
• Even with high market rates, supply can’t keep up
• Analogous to large ERP talent
• Retaining this talent is hard
• Particularly when under-utilized
• Rather than keeping that skill sharp artificially, an
outsourced data scientist is keeping sharp with real
solutions
76. …in short
You don’t keep a trial attorney around full-time for the few
times you may need them.
Why keep a data scientist full-time?
77. Is there a role for internal?
• Tweaks to Map/Reduce jobs
• Debugging
• Reporting
• Integrating new sources
• Hardware / Infrastructure
• Pilots
78. Explorer. Navigator. Guide
• Reducing risk of failure
• Working with your team
• Identifying initial projects
• Selecting best tools
• Creating a strategic adoption roadmap
• Avoiding common pitfalls
• Taking you beyond the initial phase
79. Q&A
Doug Denton Tim Hoolihan
Big Data Practice Lead CTO, Dir. Of Strategic Services
m) 440.478.6003 m) 330.338.1532
e) doug.denton@lvlsvn.com e) tim.hoolihan@lvlsvn.com
Editor's Notes
Gartner defines a strategic technology as one with the potential for significant impact on the enterprise within the next 3 years. Factors that denote high potential for significant impact include:-High potential for disruption to IT or the biz-Need for a major dollar investment-Risk being late to adopt