Weitere ähnliche Inhalte Ähnlich wie Big Data & the Cloud (20) Mehr von DATAVERSITY (20) Kürzlich hochgeladen (20) Big Data & the Cloud1. “Big Data” and “The Cloud”
Robert J. Abate, CBIP, CDMP
Independent Consultant
Webinar: March 20th, 2012
2PM EST / 11AM PST
2. “Big Data” And “The Cloud” - Agenda
The Industry Is A Buzz…
The Challenges Of Big
Data
Architectural Solutions &
The Cloud
It’s A Brave New World
Case Studies
Questions & Answers
2 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
3. The Industry Is A Buzz…
“Despite the hype,
most firms find the
technology useful to
operate on data they
already have”
Source: Forrester, June 2011
4. Everyone Is Talking About Big Data…
“Big data will represent a hugely disruptive force during the next five
years – enabling levels of insight – that are currently unachievable through
any other means” Gartner: May 2011
“Big Data: Huge Management Implications with Enormous Returns” IDC: March 2011
“Big data is still in mostly unchartered territory, but a surprise number is
actually doing something with it” Forrester: June 2011
“61% of respondents feel big data will fundamentally change the way their
business works CIO/Insight: November 2010
“Most enterprise data warehouse (EDW) and BI teams currently lack a
clear understanding of big data technologies, potential application areas,
and why ‘big data BI’ contrasts with traditional BI tools. It differs
dramatically from traditional BI in terms of both capabilities and in the
technologies used to achieve those capability breakthroughs” Gartner: January 2012
4 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
5. What Are The Drivers For Big Data/Cloud
We Are In The Information Age
Every corporation today is in the “Data Business”
We Are Inundated In Data
Types
Sources
Varieties
Data Is Growing Exponentially
So are the challenges
Data Complexity Is Increasing
Causing insight to be lost
5 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
7. Big Data Is More Than Just Volume
Consider: Master Data,
Fidelity, Complexity,
Validity, Perishability,
Linking Data Transactional
Data
Structured Data: POS Industry-
transactions, call detail specific
Web traffic Video
records, credit card Velocity Volume
transactions, shipping
updates, purchase orders,
payments, shipments,
account transactions
Unstructured Data: Web Social
logs, newsfeeds, social Text
media, geo-location,
mobile, consumer
comments, claims,
doctor’s notes, clinical Variety Complexity
studies, images, video, Sensor/
audio location-
Device-generated Data: based Audio
Device-
RFID sensors, smart
meters, smart grids, GPS Documents Images
spatial, micro-payments Smart Grid
7 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
8. Big Data’s Potential Is Limitless
TODAY TOMORROW
Less than 10% of enterprises Vast majority of available
information sources and external data
“Rear-view” mirror reporting, Forward looking or
dashboards and analysis “Windshield-view” predictions
Days, weeks, months, or with recommendations
even quarters old Real-time near real-time
Incomplete, inaccurate, and Correlated, high confidence,
disjointed data governed data
Architectures and methods Vastly accelerated time to
that take 6 to 18 months to market
exploit
8 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
9. Time Really Is Money!
Value “THE TIME VALUE CURVE”
© 2007 - Dr. Richard Hackathorn, Bolder Technology, Inc., All Rights Reserved. Used with Permission.
Business Event
Capture
Value Lost
Latency Data Ready For Analysis
Analysis
Latency
Information Delivered
Action
Decision
Taken
Latency
Action Time
Time
Data
Lifecycle
9 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
10. Data Is Coming At Us Faster
In A Recent TDWI Survey Of 450 CIO’s
17% have a real time data warehouse
90% plan on having a real time warehouse
75% will replace to get to a real-time solution
Big Data Projects Are Enterprise-Scale
When asked: Enterprise 65%
“What Is The Scope Of Line of business 8%
Departmental 8%
Your Big Data Initiative?”
Project-based 8%
Regional 5%
Other 5%
Source: Forrester® June 2011 Global Big Data Online Survey
10 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
11. Data Is Coming From All Directions…
Data is now commonly entering into
the enterprise from external sources
Government (Census, Revenues, …)
Neilson, NPD Group (Sales)
Bloomberg, NYSE (Financial Position)
Experian, TransUnion, Equifax (Credit Reporting)
Google Maps, MapInfo (Geospatial, …)
Radian 6, Biz360, … (Client Trend Data)
Etc.
11 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
12. Need For “Trust In Data”
Compliance with laws
Sarbanes Oxley [SOX], BASIL II, HIPAA, etc.
Lack of confidence in the data
Reports utilizing same data do not report same totals or
computations
Data not defined and readily available
Multiple sources of data have to be rationalized at each project start-
up thereby wasting valuable time & $ on every project
Data timeliness
Manual process to collect, analyze and provide results
Data integrity
Unknown filters, varying calculation/computations, fields used for
data not indicative of field names, data passed along from one
person to another to another to another…..
12 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
13. Summation Of Industry “Buzz”
Business mandate to obtain more value out
of the data (get answers)
Variety of sources, amounts, types and
granularity of data that customers want to
integrate is growing exponentially
Need to shrink the latency between the
business event and the data availability for
analysis and decision-making
Advancing agility of information is key
Need for Data trust and Compliance with
regulations
13 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
14. The Challenges Of Big Data
“If It Was That
Easy, Everyone
Would Be Doing It”
Source: Unknown
15. The Information Issue Is?
Too many organizations are not using
information to its full advantage!
1 in 3 business leaders frequently make critical
decisions without the information they need
1 in 2 business leaders do not have access to the
information across their organization needed to do
their jobs.
3 in 4 business leaders say more predictive
information would drive better decisions
Source: IBM Institute for Business Value, March 2009
15 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
16. Business Alignment & Trust
A Recent CIO:INSIGHT Poll of CIO’s Found
56% of respondents say they feel overwhelmed by the
amount of data their enterprise manages
33% of respondents want even more sources of data, despite
their feelings of being overwhelmed by it
62% of respondents say they’re frequently interrupted by
irrelevant incoming data
43% of respondents say they’re dissatisfied with the current
tools they use to filter out irrelevant data
46% of respondents say they’ve made inaccurate business
decisions as a result of bad or outdated data
One in Three report that they “can’t find the right people with
the right data”
Source: “The Big Data Conundrum”, http://www.cioinsight.com/c/a/Storage/The-Big-Data-Conundrum-568229/
16 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
17. Viewed Another Way…
If a football team had
these players on the
field:
Only 4 of the 11 players on
the field would know which
goal is theirs
Only 6 of the 11 would care
Only 3 of the 11 would know
what position they play and
what they are supposed to do
9 players out of 11 would, in
some way, be competing
against their own team rather
than the opponent
17 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
18. BI Perception Is Complicated & Slow
BI/DW is perceived as not “enabling” the business
Inhibitor to corporate progress IT systems cannot be changed
fast enough to meet market demands, seize opportunity or comply
with a new requirement.
Weak alignment between IT and business strategy Marked by
an intractable language barrier.
Business not always sure what information or dimensions they
want or need To answer questions about what to do next
BI/DW has not been known as a source of innovations
The complexity of systems has caused BI/DW to be
reactive rather than proactive
Silo’d solutions, db’s and applications with trapped business rules
Multiple sources of information and no single “truth”
No “Architectural Blueprints” to the enterprise…
18 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
19. BI & D/W – The “Old Way”
Data Chaos Master Data Business Intelligence
• Same type data is different • Publish and subscribe to • Analyzing the data by
in diverse systems master data looking into history
• EG: AT&T is the same as • EG: Single view of • Viewing graphs of
AT&T Inc customer across all historical information
information systems
PROCESSES Data Discovery DQ / Data Governance Data Integration BI & Data Mining
Data Defined Master Integrated Business D/W KPI’s
Chaos Data Data Information Intelligence Dashboards
TOOLS Profiling Metadata / MDM Data Modeling & ETL BI / DW / OLAP
Defined Data Integrated Information D/W KPI’s & Dashboards
• Defined common • Bring metadata together • Drilling into information to find
meanings with modeled information and analyze trends
• EG: Determine the for reporting (BI) and • KPI’s and metrics that offer a
sources, types, and warehousing (drilling and glimpse into historical
properties of grouped (i.e.: hierarchies). performance
customer) records • Exception reporting and alerts
19 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
21. Advancing The Maturity Of BI
21 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
22. The Big Data Method
Data Chaos Data Matching Data Analytics
• Same type data is different • Profiling of information to • Using Data Scientists,
in diverse systems determine quality evaluate data utilizing
• EG: AT&T is the same as • Automated analysis to mathematical algorithms
AT&T Inc match information and visualization toolsets
PROCESSES Data Discovery DQ / Data Governance Analytics Utilizing Data Scientists
Data Data Data Business
Integrated Data
Performance
Chaos Analysis Matching Information Analytics Optimization
TOOLS Profiling & Matching / DQ Query Federation “R”,
Defined Data Integrated Information Performance Optimization
• Defined common • Bring metadata together • Using analytics, changes to
meanings from matching into data business models are made
• EG: Determine the stores and sharing with • Analysis of models improve
sources, types, and analysis toolsets business and optimize business
properties of grouped (i.e.: • Organizing information for performance
customer) records rapid retrieval
22 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
23. Architectural Solutions &
The Cloud
“You never change things by
fighting the existing reality.
To change something, build a
new model that makes the
existing model obsolete.”
Richard Buckminster Fuller
24. Big Data Required A Big Change
Consider 100 GB would store the entire US Census DB
“basic” information set for every living human being on
the planet:
Age, Sex, Income, Ethnicity, Language, Religion, Housing Status, Location
into a 128 bit set
That equates to about 6.75 millions rows of about 10 columns
Consider the Large Hadron Collinder within the CERN
Laboratories
Expected to produce 150,000 times as much raw data each year
What makes large data sets are repeated observations
over time / space (spatial or temporal dimensions)
Web log has Millions [M] of visits over a handful pages
Retailer has 100K products, M customers, but Billions of transactions
Hi-Res Scientific like fMRI 1K-GB per view
Cardinalities (distinct observations) was usually small
with regard to total # of observations
This was starting to change with the advent of device supplied information,
sensors and other semi and unstructured data sources
24 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
25. A Change In Technology Was Needed
Consider that
Relational technologies
were invented to get
data in and organized,
not designed nor
organized to get it out
RDBMS’s were designed for
efficient transactions
processing on large data
sets
Adding, Updating
Searching for & retrieving
small amounts of data
Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
25 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
26. Data Warehousing Was A “Fix”
DW was classically designed as “copy of
transaction data specifically structured for
query and analysis”
General approach was bulk ETL into a DB designed for
queries
Big data caused this “Fix” to break
“Traditional RDBMS-based dimensional modeling and cube-
based OLAP turns out to be to slow or to limited to support
asking the really interesting questions of warehoused data”
“To achieve acceptable performance for highly order-dependent
queries on truly large data, one must be willing to consider
abandoning the purely relational database model”
Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
26 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
27. Then Change Came In Technologies…
The advent of cloud and storage costs
Infrastructure utilization increased dramatically
Low TCO and cost of storage and memory dropped
significantly spawning powerful computing
paradigms and appliances
The advent of commodity-based
processing in a grid or MPP config
Usage of existing hardware in a grid paradigm
supporting queries across entire datasets
“Hadoop” & MPP Shared Nothing Architectures
27 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
28. Technology Solutions Appeared
Massively Parallel
Processing
Teradata, Greenplum, etc.
Grid
Hadoop, MapReduce,
Cassandra, etc.
Columnar
ParAccel, Vertica, Sybase,
Sand Technologies, etc.
Hardware
Appliances
A visualization of a network of Facebook connections, from
DATAllegro, Netezza, previous related research by Mucha and others.
Oracle Exadata, etc. Credit: Amanda L. Traud, Christina Frost, UNC-Chapel Hill.
Source: http://www.physorg.com/news192985912.html
28 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
29. Virtualization & The Cloud
29 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
30. Data Virtualization In The Cloud
30 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
31. Advances Provided Answers To Silos
“What Areas Do Your Big Data
Initiatives Address?”
Source: Forrester® June 2011 Global Big Data Online Survey
31 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
32. It’s A Brave New World…
“Who Owns Or Drives
Your Big Data
Initiatives?”
Source: Forrester, June 2011
Business/IT collaboration 70%
Mostly business-driven, with minimal IT
15%
involvement
Mostly IT-driven, with minimal business
12%
involvement
Don’t know 2%
Other 2%
33. From The Old Stack To A New Ecosystem
Data integration without pre-processing
Ability to locate and to query federated sources of data and content without costly data
modeling and ETL transformation
Variety of sources (Mergers & Acquisitions, Growth, Services)
Inability to rapidly add new data sources because of tightly coupled business rules
Need for flexible data structures
Current structures are rigid and are views of the sources or the business requirements
Incorporation of unstructured data including social media
Need tools to integrate and analyze unstructured sources that are not currently used
Need to incorporate and utilize metadata
Metadata is disjointed, confined and incompatible – need uniformed, agile approach
Dynamic information with views for a reason
Need creation and structuring of views that support dynamic information for purpose
Information management and governance in a regulated world
Security and entitlement checking integrated with query processing
Information grants handled thru XACML obligations
33 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
34. The New “Data Fabric” Transformation
Coordinates ingestion of information
no matter what the source
Micro-batch takes the place of batch
Tagging replaces transformation
Federated query replaces ETL
Query direction removes the need for
optimization of data stores
Purposeful view is the new master
data repository
34 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
35. Newest Trends In Big Data & The Cloud
Compelling Analytics Provide Extreme ROI
Data Visualization Technologies
Heat, Clouds, Clusters, Flows
Mixing Structured, Semi and Unstructured Sources
Self-service analytics - Build your own sandbox!
Data visualization is the study of the visual representation of data, meaning
"information that has been abstracted in some schematic form, including
attributes or variables for the units of information"
Big Data Cloud Encircled Warehouses
Data Virtualization
Abstracting the data from the systems
Complements existing data warehouses
Many times the size of structured warehouse
Provides for rapid analytic iterations
Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization
35 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
36. Data Visualization In Practice
WorldWideWeb
Around
Wikipedia
- Wikipedia as
part of the world
wide web
Created by Chris
73 | Talk 09:56,
18 Jul 2004
(UTC) using
TouchGraph
GoogleBrowser
V1.01
Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization
36 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
37. A Picture Is Worth A Thousand Words
Source: Greenplum, An EMC Corporation
37 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
38. Mixing Structured, Semi & Unstructured Sources…
38 Source: Information Builders
Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
39. Big Data Cloud Encircled Warehouses
Source: EMC Corporation
39 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
40. Case Studies
In the real world, we
find out the reasons
why Murphy’s Law is
so prevalent…
41. Telecomm Provider Finds Answers…
Before investing tens of millions in infrastructure, a
telecomm firm learned where to invest their monies…
Challenge
100TB Traditional EDW, Single Source Of Truth
Operational Reporting & Financial Consolidation
Heavy Governance And Control
Unable To Support Critical Business Initiatives
Customer Loyalty And Churn The #1 Business
Initiative From The CEO
Enterprise Big Data Cloud
Surrounded Warehouse
Extracted Data From EDW & Other Sources
Generated Social Graph From Call Detail
And Subscriber Data
Within 2 Weeks Found “Connected” Subscribers
7X More Likely To Churn Than Average Users
Now Deploying 1PB Production
Source: Greenplum, an EMC Corporation
41 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
42. Questions & Answers
Open Exchange Of
Ideas
Speaker Contact
Information:
Robert J. Abate
r.j.abate@att.net
(201) 745-7680
42 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
43. Curriculum Vitae Of Presenter
Robert J. Abate, CBIP, CDMP
As a hands-on, accomplished Information Technology
professional, Mr. Abate offers 30 years of experience in
Architectures, Applications, Business Intelligence & Analytics,
Infrastructure, and IT strategy. He is credited as one of the first to
publish on Services Oriented Architectures (1996), and a
respected IT thought leader within the field. He holds a
Bachelors of Science in Electrical Engineering, and is a Certified
Business Intelligence Professional and a Certified Data
Management Professional in four disciplines. Mr. Abate both
chairs and presents at global conferences and a member of the
board of DAMA and is a respected author and industry thought-
leader. Mr. Abate frequently can be heard giving talks on topics
such as “The Convergence Of SOA & BI,” “Best Practices In
Enterprise Information Management,” “Making Big Data Analytics
Actionable”, and “Data Services & Virtualization”.
43 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate