SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Introduction to Big Data 
Three Engines for Harnessing the Power of Big Data 
Paul Barsch, Marketing Director
2 
2 > 
What are Big Data? 
Big data is not about size alone. This year's big 
data is next year's normal-sized data. 
Generally, volume quickly gives way to the 
more defining requirements of variety, velocity 
and complexity. 
-Mark Beyer, Douglas Laney, Gartner 
“Examples include web logs, RFID, sensor networks, 
social networks, Internet text and documents, 
Internet search indexing, call detail records, 
genomics, astronomy, biological research, military 
surveillance, medical records, photography 
archives, video archives, and large scale 
eCommerce." Wikipedia, Big Data
3 
We’ve Come A Long Way! 
• Larry Page and Sergey Brin 
managed to patch together 1TB 
of disk by spending $15K on their 
credit cards in 1998 
• In 1980, 1 Terabyte of disk 
storage could cost up to $14M. 
Amazon.com - $87.99
4 
Big Data: From Transactions to Interactions 
BIG DATA 
WEB 
Petabytes 
User Generated 
Content 
Mobile Web 
Dynamic Pricing 
CRM 
Terabytes 
Gigabytes 
Offer Details 
Segmentation 
Purchase ERP 
Customer Touches 
Detail 
Exabytes 
Increasing Data Variety and Complexity 
SMS/MMS 
Sentiment 
External 
Demographics 
HD Video 
Speech to Text 
Product/ 
Service Logs 
Social Network 
Business Data 
Feeds 
User Click Stream 
Web Logs 
Offer History A/B Testing 
Affiliate Networks 
Search Marketing 
Behavioral 
Targeting 
Dynamic Funnels 
Payment 
Record Support Contacts 
Purchase 
Record 
Behavioral Analytics 
Not Just “Big Data” but All Data
5 
Myriad Data Sources 
According to IDC, 
80 percent of 
enterprise data 
today is multi-structured 
data, 
and that is growing 
at the exponential 
annual rate of 60 
percent.
6 
Data Growth 
Source: IDC - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009 
Transactions 
10 24 
Yottabyte 
Interactions 1021 
Zettabyte 
1018 
Exabyte 
1015 
Petabyte 
1012 
Terabyte 
109 
Gigabyte
7 
235 TB of Data – as of 2011 
“The average company (over 1000 employees) in 14 of 17 sectors stores 
more data than does the US Library of Congress” 
Source: HortonWorks: Apache Hadoop Basics Whitepaper, June 2013
8 
The Teradata Club of Elite Power Players 
Teradata creates elite club for petabyte-plus data 
warehouse customers 
'Petabyte Power Players' includes eBay, Wal-Mart, Bank of America, Dell, unnamed bank 
October 14, 2008 (Computerworld) Teradata Corp. took its second step in two days to reaffirm itself as king of the 
data warehousing mountain, as it announced five customers running data warehouses larger than a petabyte in 
size. At its PARTNERS conference in Las Vegas on Tuesday, the Miamisburg, Oh. vendor said the five members of its 
newly-created 'Petabyte Power Players' club include eBay Inc., with 5 petabytes of data, Wal-Mart Stores Inc., 
which has 2.5 petabytes, Bank of America Corp., which is storing 1.5 petabytes, Dell Inc., which has a 1PB data 
warehouse, and a final bank, with a 1.4PB data warehouse that chief marketing officer Darryl McDonald said he 
couldn't name yet. McDonald said the club should grow quickly as Teradata convinces other petabyte-plus 
enterprises to come forward. However, the many rumored government and military customers that use Teradata 
will remain publicity-shy, he said. Most of the customers have been using Teradata for at least half a decade. Take 
eBay, which started in 2002 with a single 14TB system. Today, it processes 50PB of information each day while 
adding 40TB of auction and purchase data. Not only is the data warehouse large, it is speedy, with eBay doing real-time 
analytics alongside less timely data mining efforts, McDonald said …. 
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9117159
Financial, Customer, Transactional Data Most 
Important to Business Strategy 
53% 
44% 
41% 
36% 
26% 
22% 
22% 
15% 
17% 
11% 
12% 
8% 
10% 
7% 
7% 
8% 
5% 
21% 
18% 
23% 
18% 
18% 
14% 
15% 
14% 
13% 
10% 
29% 
28% 
37% 
27% 
31% 
31% 
38% 
Planning, budgeting, forecasting 
Transactional-corporate apps 
Customer 
Transactional-custom apps 
Spreadsheets 
Unstructured internal 
Product 
System logs 
Scientific 
3rd party 
Partner 
Video, imagery, audio 
Sensor 
Weblogs 
Social network 
Consumer mobile 
Unstructured external 
Very important 
Important 
Base: 603 global decision-makers involved in business intelligence, data management, and governance initiatives 
Source: Forrsights Strategy Spotlight: Business Intelligence And Big Data, Q4 2012 
9 9
10 
Unified Data Architecture 
Analytic Applications 
Visualization & BI Industry Accelerators 
Event 
Processing 
Big Data Architecture 
Hadoop Discovery 
Platform 
Application 
Development 
Systems 
Management 
Collaboration 
Access Layer 
Data Integration and Management 
Data 
Warehousing
11 
What is a Data Warehouse? 
• Subject oriented 
- A model of sales, inventory, finance, etc. with detailed data 
• Integrated 
- Consolidated data from many sources 
- Consistent, standardized data formats and values 
• Nonvolatile 
- Records kept unmodified for long periods of time 
• Time variant 
- Record versions with time stamps or temporal 
• Persistent storage 
- Not virtual, not federated 
Source: Gartner: Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses‘, Dec 2005; 
Inmon, Building the Data Warehouse, 1992, Wiley and Sons
12 
Subject Areas: A Model of ‘Our’ Business 
Price 
history 
Point of Sale 
Inventory 
Supplier 
Contracts 
Product/Services 
Labor 
E-Commerce 
Associate 
Channels 
Customer 
Sales 
transactions 
Carrier Shipment 
Campaigns 
Promotion 
Warehouse 
Each subject area has numerous large FACT tables (=big joins)
Attributes for Enterprise Class Data Warehousing 
13 
High Performance 
Database 
RDBMS with powerful architecture and rich features 
High Performance 
Components 
Powerful, robust hardware that supports the most demanding 
needs 
Reliable No single point of failure 
High Availability Data Warehouses are often mission critical 
Scalable Easily expand to meet high growth needs 
High Concurrency 10’s to 1000’s of concurrent users & multiple applications 
Mixed Workloads Reporting, ad hoc and complex queries on same platform 
Secure Full protection of customer data 
Fully Managed Single point of system operation 
Investment Protection Multiple generations of HW technologies in the same system 
Data Center Compliant Efficient systems that fit the enterprise data center processes
14 
BCBS North Carolina 
http://www.teradata.com/Resources/Videos/Blue-Cross-Blue-Shield-of- 
North-Carolina-High-Impact-Results-of-a-Data-Driven- 
Culture/?LangType=1033&LangSelect=true
15 
Why Data Discovery? 
• Discovery as a “process”*: 
– PoC/experimentation (8-10 weeks) 
– Rapid modeling –before scaling out on a 
global basis 
– Freedom to experiment without impacting 
production systems 
• Types of discovery analysis: 
– Customer Path 
– Fraud 
– Social Network 
– Attrition 
– Online testing/targeting 
• Go beyond expensive data scientists and 
“democratize” discovery 
Customer Paths To Attrition 
Fraudulent Paths 
* Content Courtesy of 
Thomas Davenport
16 
If You Know SQL – You Can Do This! 
Some of the 100+ out-of-the-box analytical apps 
Path Analysis 
Discover patterns in rows of 
sequential data 
Text Analysis 
Derive patterns and extract 
features in textual data 
Statistical Analysis 
High-performance processing of 
common statistical calculations 
Segmentation 
Discover natural groupings of 
data points 
Marketing Analytics 
Analyze customer interactions to 
optimize marketing decisions 
Data Transformation 
Transform data for more 
advanced analysis
17 
Barnes and Noble 
http://www.teradata.com/Resources/Videos/Data-Driven-Decision- 
Making/?LangType=1033&LangSelect=true
18 
Architecture Differences – File System vs. Relational 
Database 
• Hadoop • Teradata
19 
What Goes in Hadoop? 
© 2014 Teradata
20 
Benefits of Hadoop 
• Runs on 10 to 4,000 servers 
– Extreme scalability 
• Data analyzed where it is stored 
– Move function to data 
– Don’t move data to the function 
• Use popular developer tools 
– Java, grep, python, etc. 
• Average programmers do parallel processing 
– Millions of Java programmers 
• All open source (free)
21 
Yahoo! Hadoop Clusters 
• ≈42,000 machines running Hadoop 
• Largest Hadoop clusters are currently 4000 nodes 
• Several petabytes of user data (compressed, unreplicated) 
• Run hundreds of thousands of jobs every month
Yahoo! Japan 
http://blogs.teradata.com/customers/yahoojapan-increasing-roi-through-predictive- 
22 © 2014 Teradata 
analytics-to-solve-customers-challenges-for-a-better-japan/
23 
How They All Work Together 
Service Management 
Teradata Applications 
Reports Visualization 
Tools 
Source Data 
Marketing 
Sales 
Customers 
Marketing 
Execution 
Campaign 
Management 
BI and Visualization 
Advanced Analytics 
Data Mining 
Marketing 
Operations 
Predictive Models 
Data 
Integration 
DATA 
INGEST 
Data 
Infrastructure 
Data Access 
Analytic Users 
Lifecycle Development and Sustainment 
Production Support and Operations 
ERP 
CRM 
SCM 
Images, 
Audio & 
Video 
Machine 
Logs, Text, 
Web, 
Social
24 
Verizon Wireless 
http://www.teradata.com/Resources/Videos/Verizon-Wireless-Employing- 
Unified-Data-Architecture-to-serve-100-million-customers/ 
© 2014 Teradata
25 
Thank You! 
Questions 
and Answers

Weitere ähnliche Inhalte

Was ist angesagt?

DataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data GovernanceDataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data Governance
DATAVERSITY
 
Data-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data ModelingData-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data Modeling
DATAVERSITY
 
Data Governance by stealth v0.0.2
Data Governance by stealth v0.0.2Data Governance by stealth v0.0.2
Data Governance by stealth v0.0.2
Christopher Bradley
 
Enterprise Data World: Data Governance - The Four Critical Success Factors
Enterprise Data World: Data Governance - The Four Critical Success FactorsEnterprise Data World: Data Governance - The Four Critical Success Factors
Enterprise Data World: Data Governance - The Four Critical Success Factors
DATAVERSITY
 

Was ist angesagt? (20)

TeraStream - Data Integration/Migration/ETL/Batch Tool
TeraStream - Data Integration/Migration/ETL/Batch ToolTeraStream - Data Integration/Migration/ETL/Batch Tool
TeraStream - Data Integration/Migration/ETL/Batch Tool
 
Data Modeling is Data Governance
Data Modeling is Data GovernanceData Modeling is Data Governance
Data Modeling is Data Governance
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
 
DataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data GovernanceDataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data Governance
 
DAMA Webinar: What Does "Manage Data Assets" Really Mean?
DAMA Webinar: What Does "Manage Data Assets" Really Mean?DAMA Webinar: What Does "Manage Data Assets" Really Mean?
DAMA Webinar: What Does "Manage Data Assets" Really Mean?
 
Convincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is EssentialConvincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is Essential
 
Aug 2017 damaga-peter-vennel
Aug 2017 damaga-peter-vennelAug 2017 damaga-peter-vennel
Aug 2017 damaga-peter-vennel
 
Data-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data ModelingData-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data Modeling
 
RWDG Webinar: Mastering and Master Data Governance
RWDG Webinar: Mastering and Master Data GovernanceRWDG Webinar: Mastering and Master Data Governance
RWDG Webinar: Mastering and Master Data Governance
 
Data Governance by stealth v0.0.2
Data Governance by stealth v0.0.2Data Governance by stealth v0.0.2
Data Governance by stealth v0.0.2
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
 
Enterprise Data World: Data Governance - The Four Critical Success Factors
Enterprise Data World: Data Governance - The Four Critical Success FactorsEnterprise Data World: Data Governance - The Four Critical Success Factors
Enterprise Data World: Data Governance - The Four Critical Success Factors
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
RWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and ResponsibilitiesRWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and Responsibilities
 
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and CloudRWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
 
Data Management
Data Management Data Management
Data Management
 
RWDG Slides: Corporate Data Governance - The CDO is the Data Governance Chief
RWDG Slides: Corporate Data Governance - The CDO is the Data Governance ChiefRWDG Slides: Corporate Data Governance - The CDO is the Data Governance Chief
RWDG Slides: Corporate Data Governance - The CDO is the Data Governance Chief
 
RWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data Governance
RWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data GovernanceRWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data Governance
RWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data Governance
 
RWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data StewardshipRWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data Stewardship
 

Ähnlich wie Introduction to Harnessing Big Data

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
 

Ähnlich wie Introduction to Harnessing Big Data (20)

Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Big data
Big dataBig data
Big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Die Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AIDie Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AI
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 

Mehr von Paul Barsch

Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
Paul Barsch
 

Mehr von Paul Barsch (9)

What’s your perspective
What’s your perspectiveWhat’s your perspective
What’s your perspective
 
UCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a VillageUCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a Village
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
 
Internet of Things and the Value of Tracking Everything
Internet of Things and the Value of Tracking EverythingInternet of Things and the Value of Tracking Everything
Internet of Things and the Value of Tracking Everything
 
The Limits of Statistics in Business
The Limits of Statistics in BusinessThe Limits of Statistics in Business
The Limits of Statistics in Business
 
Lecture three skills to thrive in new economy slideshare
Lecture three skills to thrive in new economy slideshareLecture three skills to thrive in new economy slideshare
Lecture three skills to thrive in new economy slideshare
 
Surviving The Corporate World - 4 Lessons Learned
Surviving The Corporate World - 4 Lessons LearnedSurviving The Corporate World - 4 Lessons Learned
Surviving The Corporate World - 4 Lessons Learned
 
MBA Lecture: Supply Chain Risk Management
MBA Lecture: Supply Chain Risk ManagementMBA Lecture: Supply Chain Risk Management
MBA Lecture: Supply Chain Risk Management
 
Boundaryless Marketing
Boundaryless MarketingBoundaryless Marketing
Boundaryless Marketing
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Introduction to Harnessing Big Data

  • 1. Introduction to Big Data Three Engines for Harnessing the Power of Big Data Paul Barsch, Marketing Director
  • 2. 2 2 > What are Big Data? Big data is not about size alone. This year's big data is next year's normal-sized data. Generally, volume quickly gives way to the more defining requirements of variety, velocity and complexity. -Mark Beyer, Douglas Laney, Gartner “Examples include web logs, RFID, sensor networks, social networks, Internet text and documents, Internet search indexing, call detail records, genomics, astronomy, biological research, military surveillance, medical records, photography archives, video archives, and large scale eCommerce." Wikipedia, Big Data
  • 3. 3 We’ve Come A Long Way! • Larry Page and Sergey Brin managed to patch together 1TB of disk by spending $15K on their credit cards in 1998 • In 1980, 1 Terabyte of disk storage could cost up to $14M. Amazon.com - $87.99
  • 4. 4 Big Data: From Transactions to Interactions BIG DATA WEB Petabytes User Generated Content Mobile Web Dynamic Pricing CRM Terabytes Gigabytes Offer Details Segmentation Purchase ERP Customer Touches Detail Exabytes Increasing Data Variety and Complexity SMS/MMS Sentiment External Demographics HD Video Speech to Text Product/ Service Logs Social Network Business Data Feeds User Click Stream Web Logs Offer History A/B Testing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels Payment Record Support Contacts Purchase Record Behavioral Analytics Not Just “Big Data” but All Data
  • 5. 5 Myriad Data Sources According to IDC, 80 percent of enterprise data today is multi-structured data, and that is growing at the exponential annual rate of 60 percent.
  • 6. 6 Data Growth Source: IDC - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009 Transactions 10 24 Yottabyte Interactions 1021 Zettabyte 1018 Exabyte 1015 Petabyte 1012 Terabyte 109 Gigabyte
  • 7. 7 235 TB of Data – as of 2011 “The average company (over 1000 employees) in 14 of 17 sectors stores more data than does the US Library of Congress” Source: HortonWorks: Apache Hadoop Basics Whitepaper, June 2013
  • 8. 8 The Teradata Club of Elite Power Players Teradata creates elite club for petabyte-plus data warehouse customers 'Petabyte Power Players' includes eBay, Wal-Mart, Bank of America, Dell, unnamed bank October 14, 2008 (Computerworld) Teradata Corp. took its second step in two days to reaffirm itself as king of the data warehousing mountain, as it announced five customers running data warehouses larger than a petabyte in size. At its PARTNERS conference in Las Vegas on Tuesday, the Miamisburg, Oh. vendor said the five members of its newly-created 'Petabyte Power Players' club include eBay Inc., with 5 petabytes of data, Wal-Mart Stores Inc., which has 2.5 petabytes, Bank of America Corp., which is storing 1.5 petabytes, Dell Inc., which has a 1PB data warehouse, and a final bank, with a 1.4PB data warehouse that chief marketing officer Darryl McDonald said he couldn't name yet. McDonald said the club should grow quickly as Teradata convinces other petabyte-plus enterprises to come forward. However, the many rumored government and military customers that use Teradata will remain publicity-shy, he said. Most of the customers have been using Teradata for at least half a decade. Take eBay, which started in 2002 with a single 14TB system. Today, it processes 50PB of information each day while adding 40TB of auction and purchase data. Not only is the data warehouse large, it is speedy, with eBay doing real-time analytics alongside less timely data mining efforts, McDonald said …. http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9117159
  • 9. Financial, Customer, Transactional Data Most Important to Business Strategy 53% 44% 41% 36% 26% 22% 22% 15% 17% 11% 12% 8% 10% 7% 7% 8% 5% 21% 18% 23% 18% 18% 14% 15% 14% 13% 10% 29% 28% 37% 27% 31% 31% 38% Planning, budgeting, forecasting Transactional-corporate apps Customer Transactional-custom apps Spreadsheets Unstructured internal Product System logs Scientific 3rd party Partner Video, imagery, audio Sensor Weblogs Social network Consumer mobile Unstructured external Very important Important Base: 603 global decision-makers involved in business intelligence, data management, and governance initiatives Source: Forrsights Strategy Spotlight: Business Intelligence And Big Data, Q4 2012 9 9
  • 10. 10 Unified Data Architecture Analytic Applications Visualization & BI Industry Accelerators Event Processing Big Data Architecture Hadoop Discovery Platform Application Development Systems Management Collaboration Access Layer Data Integration and Management Data Warehousing
  • 11. 11 What is a Data Warehouse? • Subject oriented - A model of sales, inventory, finance, etc. with detailed data • Integrated - Consolidated data from many sources - Consistent, standardized data formats and values • Nonvolatile - Records kept unmodified for long periods of time • Time variant - Record versions with time stamps or temporal • Persistent storage - Not virtual, not federated Source: Gartner: Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses‘, Dec 2005; Inmon, Building the Data Warehouse, 1992, Wiley and Sons
  • 12. 12 Subject Areas: A Model of ‘Our’ Business Price history Point of Sale Inventory Supplier Contracts Product/Services Labor E-Commerce Associate Channels Customer Sales transactions Carrier Shipment Campaigns Promotion Warehouse Each subject area has numerous large FACT tables (=big joins)
  • 13. Attributes for Enterprise Class Data Warehousing 13 High Performance Database RDBMS with powerful architecture and rich features High Performance Components Powerful, robust hardware that supports the most demanding needs Reliable No single point of failure High Availability Data Warehouses are often mission critical Scalable Easily expand to meet high growth needs High Concurrency 10’s to 1000’s of concurrent users & multiple applications Mixed Workloads Reporting, ad hoc and complex queries on same platform Secure Full protection of customer data Fully Managed Single point of system operation Investment Protection Multiple generations of HW technologies in the same system Data Center Compliant Efficient systems that fit the enterprise data center processes
  • 14. 14 BCBS North Carolina http://www.teradata.com/Resources/Videos/Blue-Cross-Blue-Shield-of- North-Carolina-High-Impact-Results-of-a-Data-Driven- Culture/?LangType=1033&LangSelect=true
  • 15. 15 Why Data Discovery? • Discovery as a “process”*: – PoC/experimentation (8-10 weeks) – Rapid modeling –before scaling out on a global basis – Freedom to experiment without impacting production systems • Types of discovery analysis: – Customer Path – Fraud – Social Network – Attrition – Online testing/targeting • Go beyond expensive data scientists and “democratize” discovery Customer Paths To Attrition Fraudulent Paths * Content Courtesy of Thomas Davenport
  • 16. 16 If You Know SQL – You Can Do This! Some of the 100+ out-of-the-box analytical apps Path Analysis Discover patterns in rows of sequential data Text Analysis Derive patterns and extract features in textual data Statistical Analysis High-performance processing of common statistical calculations Segmentation Discover natural groupings of data points Marketing Analytics Analyze customer interactions to optimize marketing decisions Data Transformation Transform data for more advanced analysis
  • 17. 17 Barnes and Noble http://www.teradata.com/Resources/Videos/Data-Driven-Decision- Making/?LangType=1033&LangSelect=true
  • 18. 18 Architecture Differences – File System vs. Relational Database • Hadoop • Teradata
  • 19. 19 What Goes in Hadoop? © 2014 Teradata
  • 20. 20 Benefits of Hadoop • Runs on 10 to 4,000 servers – Extreme scalability • Data analyzed where it is stored – Move function to data – Don’t move data to the function • Use popular developer tools – Java, grep, python, etc. • Average programmers do parallel processing – Millions of Java programmers • All open source (free)
  • 21. 21 Yahoo! Hadoop Clusters • ≈42,000 machines running Hadoop • Largest Hadoop clusters are currently 4000 nodes • Several petabytes of user data (compressed, unreplicated) • Run hundreds of thousands of jobs every month
  • 22. Yahoo! Japan http://blogs.teradata.com/customers/yahoojapan-increasing-roi-through-predictive- 22 © 2014 Teradata analytics-to-solve-customers-challenges-for-a-better-japan/
  • 23. 23 How They All Work Together Service Management Teradata Applications Reports Visualization Tools Source Data Marketing Sales Customers Marketing Execution Campaign Management BI and Visualization Advanced Analytics Data Mining Marketing Operations Predictive Models Data Integration DATA INGEST Data Infrastructure Data Access Analytic Users Lifecycle Development and Sustainment Production Support and Operations ERP CRM SCM Images, Audio & Video Machine Logs, Text, Web, Social
  • 24. 24 Verizon Wireless http://www.teradata.com/Resources/Videos/Verizon-Wireless-Employing- Unified-Data-Architecture-to-serve-100-million-customers/ © 2014 Teradata
  • 25. 25 Thank You! Questions and Answers