SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Big Data & Cloud
Infinite Monkey Theorem
CloudCon Expo & Conference
October, 2012
First
8/17/2013 Infochimps Confidential 2
What is Big Data?
“data sets so large and complex that it becomes
difficult to process using on-hand database
management tools.”
3
Source: 2011 IDC Digital Universe Study
2010 = 1.2
Zettabytes/yr
2020 = 35.2
Zettabytes/yr
Data Volume
Growing 44x
8/17/2013 Infochimps Confidential
Amp
Node
Amp
Node
Amp
Node
Enterprise Data Warehouse
PARC | 4
. . . .
BYNET Interconnect
Parsing
Engines
Request
???
Answer
Search Recommend
Rank
Next-Best-ActionScore
Big Data Warehouse
PARC | 5
. . . .
Ethernet Interconnect
Master:
Name Node
Job Tracker
Analytic
Request
Slave:
Task Trckr
Data Node
Slave:
Task Trckr
Data Node
Slave:
Task Trckr
Data Node
Answer
Semi-
Structured
Data
Traditional Operational
Traditional
Decision Support
Analytic
Appliances
Real
Time
Batch
Large
Enterprise
Small
Enterprise
Application Ecosystem
Deployment in
Public/Private Cloud
Toolset Integration
Hardened
8/17/2013 6Infochimps Confidential
Next
8/17/2013 Infochimps Confidential 7
Infinite Monkey Theorem (2):
an infinite number of monkeys hitting
keys on a typewriter for a period of time
will almost surely type a given text, such
as Shakespeare”s Hamlet.
8/17/2013 Infochimps Confidential 8
“unexperienced and unobservable“
based on
“real experiences and real
observations“
“ “
8/17/2013 Infochimps Confidential 9
Infinite Monkey Theorem (2):
an infinite number of monkeys hitting keys
on a typewriter for a period of time will
almost surely type a given text, such as
Shakespeare”s Hamlet.
an infinite number of monkeys hitting keys
on atypewriter for a period of time will
almost surely type a given text, such as
Shakespeare”s Hamlet.
8/17/2013 Infochimps Confidential 10
infinite number
of monkeys
keys on a
typewriter
almost
surely
Shakespeare”s
Hamlet
unlimited
computational
power
processing
data
statistically
significant
insights
8/17/2013 Infochimps Confidential 11
#thisischimpy
8/17/2013 Infochimps Confidential 12
“Little Data For Business Users“
Problem
8/17/2013 Infochimps Confidential 15
“Big Data For Business Users“
8/17/2013 Infochimps Confidential
16
?
Data
$ $
$ $
Executive
Reduce
Friction
8/17/2013 Infochimps Confidential 17
#thisisreallygood
8/17/2013 Infochimps Confidential 18
unlimited
computational
power
Public
Private
Virtual
Private
8/17/2013 Infochimps Confidential 19
analysts use these images to
count shipping containers
coming off ships in California
and are able to get a sense of
overall US import activity
8/17/2013 Infochimps Confidential 20
data
processing
Public
Private
Virtual
Private
8/17/2013 Infochimps Confidential 21
Walmart
8/17/2013 Infochimps Confidential 22
Target
8/17/2013 Infochimps Confidential 23
Images
Docs,
Text
Web
Logs
Social
Sensors
GPS
Business
Transactions &
Interactions
Business
Intelligence &
Analytics
SQL NoSQL NewSQL
EDW MPP NewSQL
Dashboards, Reports
Visualization…
Web, Mobile, CRM,
ERP, SCM…
8/17/2013 Infochimps Confidential 24
statistically
significant
Public
Private
Virtual
Private
8/17/2013 Infochimps Confidential 25
#lotsofdata #simplealgorithms+
8/17/2013 Infochimps Confidential 26
Cars
In Lot
News
Text
Web
Pricing
Social
Sentiment
Weather
Sensors
Local
Employment
Quarterly
Revenue
Prediction
8/17/2013 Infochimps Confidential 27
insights
Public
Private
Virtual
Private
8/17/2013 Infochimps Confidential 28
Gnip
Powertrack
Gnip
EDC
Moreover
Metabase
TV
Transcription
Radio
Transcription
Print
Transcription
In-Motion
Data Delivery
Service
NoSQL
Listening
Application
New Media
Traditional Media
APIs
Sources Sentiment
Business Users
App DeveloperData Scientist
IT Staff
8/17/2013 Infochimps Confidential 29
unlimited
computational
power
processing
data
statistically
significant
insights
8/17/2013 Infochimps Confidential 30
#1BigDataCloudService
8/17/2013 Infochimps Confidential 31
#inspiredbyAvinashKaushik

Weitere ähnliche Inhalte

Was ist angesagt?

Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Linkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious
 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
 
WEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply ChainWEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply ChainFlytBase
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police storysemanticsconference
 
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Dublinked .
 
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...Dublinked .
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlandssemanticsconference
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017Ray Bugg
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersDataWorks Summit
 
EclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackEclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackBoris Adryan
 
Improving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMImproving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMElasticsearch
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsNeo4j
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 
Accelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected WorldAccelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected WorldDataWorks Summit/Hadoop Summit
 
HPC Top 5 Stories: October 13, 2017
HPC Top 5 Stories: October 13, 2017HPC Top 5 Stories: October 13, 2017
HPC Top 5 Stories: October 13, 2017NVIDIA
 
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Databricks
 

Was ist angesagt? (20)

Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Linkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4j
 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data Discovery
 
WEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply ChainWEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply Chain
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
 
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
EclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackEclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science Track
 
Improving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMImproving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APM
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with Graphs
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
Accelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected WorldAccelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected World
 
HPC Top 5 Stories: October 13, 2017
HPC Top 5 Stories: October 13, 2017HPC Top 5 Stories: October 13, 2017
HPC Top 5 Stories: October 13, 2017
 
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
 

Ähnlich wie Infochimps Cloudcon 2012

How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraChun Myung Kyu
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Streaming Analytics for IoT-Oriented Applications
Streaming Analytics for IoT-Oriented ApplicationsStreaming Analytics for IoT-Oriented Applications
Streaming Analytics for IoT-Oriented ApplicationsDATAVERSITY
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
The Management Accountant in a Digital World The interface of strategy, tech...
The Management Accountant in a Digital World  The interface of strategy, tech...The Management Accountant in a Digital World  The interface of strategy, tech...
The Management Accountant in a Digital World The interface of strategy, tech...Workiva
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiBrian Olsen
 
Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Labkevinflorian
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Stavros Kontopoulos
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
 

Ähnlich wie Infochimps Cloudcon 2012 (20)

How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infra
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Streaming Analytics for IoT-Oriented Applications
Streaming Analytics for IoT-Oriented ApplicationsStreaming Analytics for IoT-Oriented Applications
Streaming Analytics for IoT-Oriented Applications
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
The Management Accountant in a Digital World The interface of strategy, tech...
The Management Accountant in a Digital World  The interface of strategy, tech...The Management Accountant in a Digital World  The interface of strategy, tech...
The Management Accountant in a Digital World The interface of strategy, tech...
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Lab
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
 

Mehr von Jim Kaskade

Jim kaskade biography (updated)
Jim kaskade biography (updated)Jim kaskade biography (updated)
Jim kaskade biography (updated)Jim Kaskade
 
Woodside Residential Design Guidelines
Woodside Residential Design GuidelinesWoodside Residential Design Guidelines
Woodside Residential Design GuidelinesJim Kaskade
 
Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999Jim Kaskade
 
Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013Jim Kaskade
 
Infochimps TieCon 2013
Infochimps TieCon 2013Infochimps TieCon 2013
Infochimps TieCon 2013Jim Kaskade
 
Big analytics best practices @ PARC
Big analytics best practices @ PARCBig analytics best practices @ PARC
Big analytics best practices @ PARCJim Kaskade
 
Marketing & Sales
Marketing & SalesMarketing & Sales
Marketing & SalesJim Kaskade
 
Outsourcing Class
Outsourcing ClassOutsourcing Class
Outsourcing ClassJim Kaskade
 
Online Video and Next-gen Storage
Online Video and Next-gen StorageOnline Video and Next-gen Storage
Online Video and Next-gen StorageJim Kaskade
 
Rapid Social Game Development & Deployment
Rapid Social Game Development & DeploymentRapid Social Game Development & Deployment
Rapid Social Game Development & DeploymentJim Kaskade
 
Application Model for Cloud Deployment
Application Model for Cloud DeploymentApplication Model for Cloud Deployment
Application Model for Cloud DeploymentJim Kaskade
 
Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)Jim Kaskade
 
CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14Jim Kaskade
 
Jim Kaskade Biography
Jim Kaskade BiographyJim Kaskade Biography
Jim Kaskade BiographyJim Kaskade
 
CISCO\'s Take On Internet Video
CISCO\'s Take On Internet VideoCISCO\'s Take On Internet Video
CISCO\'s Take On Internet VideoJim Kaskade
 
Private Cloud Platform as a Service
Private Cloud Platform as a ServicePrivate Cloud Platform as a Service
Private Cloud Platform as a ServiceJim Kaskade
 
Advertising Exchange Whitepaper
Advertising Exchange WhitepaperAdvertising Exchange Whitepaper
Advertising Exchange WhitepaperJim Kaskade
 
Broadband Video Ad Exchange
Broadband Video Ad ExchangeBroadband Video Ad Exchange
Broadband Video Ad ExchangeJim Kaskade
 
Broadband Video Review
Broadband Video ReviewBroadband Video Review
Broadband Video ReviewJim Kaskade
 

Mehr von Jim Kaskade (20)

Jim kaskade biography (updated)
Jim kaskade biography (updated)Jim kaskade biography (updated)
Jim kaskade biography (updated)
 
Woodside Residential Design Guidelines
Woodside Residential Design GuidelinesWoodside Residential Design Guidelines
Woodside Residential Design Guidelines
 
Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999
 
Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013
 
Infochimps TieCon 2013
Infochimps TieCon 2013Infochimps TieCon 2013
Infochimps TieCon 2013
 
Big analytics best practices @ PARC
Big analytics best practices @ PARCBig analytics best practices @ PARC
Big analytics best practices @ PARC
 
Marketing & Sales
Marketing & SalesMarketing & Sales
Marketing & Sales
 
Outsourcing Class
Outsourcing ClassOutsourcing Class
Outsourcing Class
 
Online Video and Next-gen Storage
Online Video and Next-gen StorageOnline Video and Next-gen Storage
Online Video and Next-gen Storage
 
Rapid Social Game Development & Deployment
Rapid Social Game Development & DeploymentRapid Social Game Development & Deployment
Rapid Social Game Development & Deployment
 
Application Model for Cloud Deployment
Application Model for Cloud DeploymentApplication Model for Cloud Deployment
Application Model for Cloud Deployment
 
Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)
 
CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14
 
Jim Kaskade Biography
Jim Kaskade BiographyJim Kaskade Biography
Jim Kaskade Biography
 
CISCO\'s Take On Internet Video
CISCO\'s Take On Internet VideoCISCO\'s Take On Internet Video
CISCO\'s Take On Internet Video
 
Private Cloud Platform as a Service
Private Cloud Platform as a ServicePrivate Cloud Platform as a Service
Private Cloud Platform as a Service
 
Advertising Exchange Whitepaper
Advertising Exchange WhitepaperAdvertising Exchange Whitepaper
Advertising Exchange Whitepaper
 
Broadband Video Ad Exchange
Broadband Video Ad ExchangeBroadband Video Ad Exchange
Broadband Video Ad Exchange
 
Mobile Video
Mobile VideoMobile Video
Mobile Video
 
Broadband Video Review
Broadband Video ReviewBroadband Video Review
Broadband Video Review
 

Kürzlich hochgeladen

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Kürzlich hochgeladen (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Infochimps Cloudcon 2012

Hinweis der Redaktion

  1. AvinashKaushik gave a talk at Strata 2012 in Santa Clara in March.If you listen to all the hype of Big Data, it solves for the first problem.If you listen to all the vendors, there is a lot of emphasis on the first part (perhaps Infochimps included), and very little on the second.I think that’s because we don’t exactly know how to truly empower the organization to interact directly with any/all data available.It’s too expensive, risky, complex.
  2. 40%+ YoY growth with 2012 generating 2.4Zettabytes alone.http://jameskaskade.com/?p=2040http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm
  3. AMP:access module processorsPE: Parsing EngineBYNET: Banyan Cross-bar Switch YNET (Y Network)Store:The Parsing Engine dispatches a request to retrieve one or more rows.The BYNET ensures that appropriate AMP(s) are activated.The Parsing Engine dispatches a request to insert a row.The BYNET ensures that the row gets to the appropriate AMP (Access Module Processor) via the hashing algorithm.The AMP stores the row on its associated disk.Each AMP can have multiple physical disks associated with it.Retrieve:The AMPs (access module processors) locate and retrieve desired rows in parallel access and will sort, aggregate or format if needed.The BYNET returns retrieved rows to Parsing Engine.The Parsing Engine returns row(s) to requesting client application.Teradata’s shared-nothing architecture allows for highly scalable data volumes.
  4. 3 node Hadoop system:$8K/node$10K switch$4K/node HadoopDistro$24K + $10K x 25%x3 maintenance = $43K$4K x 3 x 3 = $36KTotal = There are three essential elements of an analytic platform: Strong support for analytic database query. A variety of query styles — at a minimum, SQL, MDX or graph.Strong support for analytic processes other than queries. Typically these would be in the areas of mathematics (statistics, predictive analytics, data mining, linear algebra, optimization, graph theory, etc.) and/or data transformation (e.g. sessionization, entity extraction).Strong integration between the first two.The point is — an analytic platform is something on which you can build a range of powerful analytic applications. Some specifics of what to look for in analytic platform may be found in the link above.http://www.dbms2.com/2011/02/24/analytic-platforms/http://www.dbms2.com/2011/01/18/architectural-options-for-analytic-database-management-systems/Enterprise data warehouse (Full or partial)Kinds of data likely to be included: All, but especially operationalLikely use styles: AllCanonical example: Central EDW for a big enterpriseStresses: Concurrency, reliability, workload managementClassical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL ServerTraditional data martKinds of data likely to be included: AllLikely use styles: Business intelligence, budgeting/consolidation, investigativeExamples: Reporting servers, planning/consolidation servers, anything MOLAP, etc.Stresses: Performance, concurrency, TCOColumnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them — e.g. Sybase IQ and Vertica — have excellent track records in concurrent usage as well.Investigative data mart — agileKinds of data likely to be included: All, especially customer-centricLikely use styles: InvestigativeCanonical example: A few analysts getting a few TB to examineStresses: Ease of setup/load, ease of admin, price/performanceInfobright is often cost-effective among columnar analytic DBMS. Investigative data mart — bigKinds of data likely to be included: All, especially customer-centric, logs, financial trade, scientificLikely use styles: InvestigativeCanonical example: Single-subject 20 TB – 20 PB relational databaseStresses: Performance, scale-out, analytic functionalityPerformance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum.Bit bucket - HadoopKinds of data likely to be included: Logs, other technical/externalLikely use styles: Staging/ETL, investigativeCanonical example: Log files in a Hadoop clusterStresses: TCO, scale-out, transform/big-query performance, ETL functionalityArchival data storeKinds of data likely to be included: Operational, CDR (call detail record), security logLikely use styles: Archival, reporting (for compliance), possibly also investigativeExamples: Any long-term detailed historical storeStresses: TCO, compression, scale-out, performance (if multi-use)Perhaps only Rainstor truly embraces the archival positioningOutsourced data martKinds of data likely to be included: AllLikely use styles: Traditional BI, investigative analytics, staging/ETLExamples: Advertising tracking, SaaS CRMStresses: Performance, TCO, reliability, concurrencyOracle shops = Vertica gets the nod in a number of these casesOperational analytic(s) serverKinds of data likely to be included: Customer-centric, log, financial tradeLikely use styles: Advanced operational analyticsExamples:Lower latency: Web or call-center personalization, anti-fraudHigher latency: Customer profiling, Basel 3 risk analysisStresses: Performance, reliability, analytic functionality, perhaps concurrencyhttp://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/
  5. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  6. This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.Think about this a little….we’re talking about analyzing real world experiences and observations to predict what will happen…what will happen with our business in the future….the unexperienced and unobserved.This is fundamentally what Big Data proposes to help…
  7. So as a metaphor…the "monkey" is not an actual monkey, but a metaphor for an abstract device a device that produces a sequence of letters and symbols.And "almost surely" is a mathematical term with a precise meaningShakespeare’s Hamlet also represents a broader meaning….it represents any text, any work, any insight.
  8. So lets look at this in more depth….Infinite number of monkeys -> represents today’s seemingly unlimited computational power of either public or private Clouds…as an elastic delivery method.Keys on a typewriter -> capture discrete transactions which only analyzed together can derive meaning. Again we amass the computational power to process dataAlmost surely -> is translated into a mathematical term, namely the concept of significanceAnd finally, Shakespeare’s Hamlet is what we strive to create and it is the source of our happiness, our translation of this raw resource into insight.
  9. Now this may seem “chimpy”….but this is beautiful. I love this metaphor.But we have a LARGE problem….
  10. We have a problem today WITH our data infrastructure….our ability to gleam insights.I think all of you know what I’m referring to…..It’s the fact that we’re operating on less than 15% of the corporate data available to us…..even with the ENTERPRISE DATA WAREHOUSE, the EDW which is supposedly storing a COMPLETE, SINGLE VIEW OF THE TRUTH….We’re still giving our business users…..a tiny bit…a little bit of data.
  11. The Business User
  12. The Business User
  13. The Business User
  14. So why is an elastic, unlimited computational resource important?Op-Ex vs. Cap-ExCost Reduction due to better utilization / productivityTime-to-Market
  15. Hedge funds and Wall Street firms, are using Cold War-style satellite surveillance to gather market-moving information. The Port of Long Beach is the second-busiest container port in the United States and acts as a major gateway for trade between the US and Asia. With the activity from this port estimated at over $100 billion per year, this specific port is a location it will pay to keep track of. 

Satellite analysts use these images to count shipping containers coming off ships in California and are able to get a sense of overall US import activity, comparing activity month by month.This analysis is being performed in Amazon”s EC2
  16. Now lets talk about processing your enterprise data assets….your Big Data…..again, we can leverage the cloud infrastructure to scale to the level of any processing needs you may have.
  17. The current image shows a Walmart in Wichita, Kansas.Analysts count cars in Wal-Mart parking lots to measure overall customer traffic to understand growth versus its competition.For example, Wal-Mart's growthwas determined to come mostly from areas of high unemployment.This type of analysis is being performed in Amazon”s EC2…
  18. The current image shows the a Target in the Moraine Point Plaza located in Gardiner, NorthAnalysts comparing satellite parking lot data with regional unemployment trends found Target's growth tended to come in areas of lower-than-average unemployment.

Again, these processes are being performed in Amazon EC2.…this is interesting….but how do we process the data further to help derive more relevant insights?http://www.cnbc.com/id/38738810/Spying_For_Profits_The_Satellite_Image_Indicator
  19. The way this is performed is by taking data sources like images and storing them into Hadoop. Then using Big Data tools like MapReduce to perform sophisticated analysis on those aggregated data sets.Why is this concept so disruptive?Things like a fraction of the price….no structured data model – aka no star schema…yet the ability to run sophisticated queries and algorithms against all your detailed data.
  20. The Business User
  21. The previous examples of Walmart and Target involved using a regression algorithm which was executed against the satellite data + other data to produce a quarterly revenue prediction which BEAT all previous models.
  22. Which brings us to the discussion around insights.
  23. Quote that sets theme….the definition of “Infinite Monkey Theorem”.
  24. The Business User