SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
1




 Big Data
 the next frontier

RVC Seminar                                Leonid Zhukov
Moscow, 08/02/2013   Professor Higher School of Economics
2
Big data




+ Graph of terms popularity




                              www.visibletechologies.com
3
McKinsey, May 2011




                     www.mckinsey.com
4
Headlines




            Data driven business

            Data democratization

            Data scientists
5
The White House



+ $200M initiative
+ NSF: core techniques
+ NIH: 1000 genomes
+ DOE: advanced computing
+ DOD: data to decisions
+ USGS: Earth system


                            www.whitehouse.gov
6
Gartner Hype Cycle




                     www.gartner.com
7
 Market Forecast




                         + Venture money invested (Reuters):
+ Market forecasts:        + 2009 - $1.1B
 + IDC: 2015 - $16.9B      + 2010 - $1.53B
 + Gartner: 2016- $55B     + 2011 - $2.47B
                                                      www.wikibon.com
8
Big Data Revenue 2012




 + Big Business:
    +   IBM
    +   HP
    +   Oracle
    +   Teradata
    +   EMC             www.wikibon.com
9
Big Data Vendors!




    + Hadoop:
      + Cloudera
      + MapR Techonologies
      + HortonWorks          www.wikibon.com
10
Forrester Wave




                 www.forrester.com
What is big data                                                    11




+ Big data:
  + “Data you can’t process by traditional tools”
  + “A phenomenon defined by the rapid acceleration in the
     expanding volume of high velocity, complex and diverse
     types of data.”

  + “Refers to a collection of tools, techniques and technologies
     for working with data productively, at any scale.”
12
What is Big data

 + 3V
    + Volume: petabytes (1000TB) to exabytes (1000PB)
    + Variety: structured, semi-structured, unstructured
    + Velocity: Tb/s data streams
 + Requires distributed processing
 + Big data = storage + processing
 + Big data = Hadoop (not only)
13
Big data Glossary


+ Hadoop, MapReduce, Hive, Pig, Cascading,
  HBase, Hypertable, Cassandra, Flume, Sqoop,
  Mongo, Voldemort, Storm, Kafka, Drill, Dremmel,
  Impala, Zookeeper, Ambari, Oozi, Yarn, Redis,
  Rajak, Pregel, Gremlin, Giraph, Solr, Lucene, R,
  Mahout, Weka,
14
How big is Big?

+ Google
  + 24 PB data processed daily
+ Twitter
  + 340 mln daily tweets
  + 1.6 bln search queries
  + 7 TB added daily
+ Facebook
  + 750 mln users
  + 12 TB daily daily content
  + 2.7 bln “likes” and comments daily
15
Sources of Big Data




                      www.ibm.com
16
Supercomputing


+ National Labs, Universities, Military
+ Processing power, flops, MPI
+ Parallel computing:
   + Cray, IBM SP, SGI
   + Beowulf cluster (Linux commodity)
17
New realities


+ Yahoo, AltaVista, Inktomi, Google
+ Consumer web companies:
   + web search (crawling, indexing)
   + advertising
   + email services
   + ecommerce


   + Commodity hardware
18
Google




  2003   2004
19
GFS/HDFS

+ Distributed replicated data blocks (64Mb)
+ Master-slave architecture (Name Node, Data Nodes)
+ Not a general file system
+ Access via command line utils and API
+ Can’t modify after files written
20
  MapReduce


                                                    + Scalable:
                                                      + no file IO
                                                      + no networking
                                                      + no synchronization




                                 + Master-slave architecture
+ MapReduce programming model:
                                   + Master: divide, schedule, monitor work
  + functional programming
                                   + Slave: actual processing
  + like UNIX pipeline
21
 Data movement




+ store and process data on the same nodes
+ bring code to data, data “locality”
                                             www.cloudera.com
22
Hadoop
+ Doug Cutting
  + Search indexer - Lucene
  + Web crawler - Nutch
  + Hadoop
     + HDFS
     + MapReduce
23
Yahoo!
+ 40,000 servers
+ 170PB storage
+ 1000+ active users
+ 5M+ monthly jobs
+ email spam filters
+ categorization, personalization
+ computational advertising
Data Base NoSQL                   24

Revolution
+ Needed:
   + fast read/write time
   + high concurrency
   + easy horizontally scalable
+ Flat data structure
+ Sacrificed:
   + DB Schema
   + SQL
   + Transactions
25
NoSQL World

+ Key-value: Dynamo, Voldemort, Redis, Riak
+ Column (tabular): HBase, Hypertable, Cassandra
+ Document store: CouchDB, MongoDB
+ Graph: Neo4J, FlockDB
+ 120+ products (2012)
26
Hadoop stack




               www.hortonworks.com
27
Hadoop tools

+ Pig
  + high level scripting language (PigLatin)
  + converts to MapReduce jobs
+ Hive
  + SQL like queries on dat in HDFS
  + converts in MapReduce jobs
28

Hadoop data movement




                       www.cloudera.com
29
Typical hadoop usage
 +   Text mining
 +   Pattern recognition
 +   Recommendation systems (collaborative filtering)
 +   Prediction models
 +   Risk assessment
 +   Sentiment analysis
 +   Customer churn prediction
 +   Customer segmentation
 +   Point of Sale Transaction analysis
 +   Data “sandbox”
30

Application fields

+ Science: sensors, genome, weather, satellite,
   imaging

+ Engineering: log analytics, status feeds, network
   messages, spam filters..

+ Product: financial, pharmaceutical, insurance,
   energy, retail, ecommerce, healthcare, telecom

+ Business: analytics, BI
31
Business analytics



+ Analytic
+ Operational




        Capture, analyze, learn from data
                                            www.datasciencecentral.com
32
Who uses Hadoop?




                   www.cloudera.com
33
Why Hadoop?




              www.thinkbiganalytics.com
34
Cloudera




+ Enterprise support for Apache Hadoop
+ Founded 2008, funding $141 M
+ Employee 230
+ Products:
  + CDH 4 (cloudera distrobution hadoop)
  + Impala
  + Consulting and training
                                           www.cloudera.com
35
MapR




+ Founded 2009, funding $20M
+ MapR Technologies is engineering game-
  changing Map/Reduce related technologies

+ Products:
  + M3,M5,M7
  + NFS, no single node failure
  + NOT open source !
                                             www.mapr.com
36
HortonWorks




+ Founded 2011
+ Yahoo spin-off
+ Products:
  + HDP distribution
  + tools

                       www.hortonworks.com
37
Hadoop Ecosystem




                   www.datameer.com
38
Big Data Landscape




                     www.bigdatalandscape.com
39
Splunk




+ Founded 2003, raised $230M, IPO 2011, Market cap $3.35B
+ Machine logs analysis, operational intelligence
+ Collecting, searching, monitoring




                                                            www.splunk.com
40
Datameer




+ Founded 2009,
  Funding $17,8M

+ Big data:
  + Data integration
  + Data Analytics
  + Data Visualization
                         www.datameer.com
41
Datasift




+ Founded 2010, funding $29.7M
+ Data platform for social web
+ Aggregate and filter data



                                 www.datasift.com
42
Infochimps




+ Founded 2009, funding $5.5M
+ Transitioned from data marketpalce to big data platform
+ End-to-end big data solution, real time




                                                        www.infochimps.com
43
Tableau software




+ Founded 2003, funding $15M
+ Big data analytics
+ Big data visualization

                               www.tableau.com
Big data Startups                       44

 2012

+ Platfora, in memory BI on Hadoop
+ Sumologic, log file analysis
+ Hadapt, Hadoop+RDBSM
+ Metamarkets, patterns in data flow
+ DataStax, consulting, training
+ Karmasphere, BI, analytics on Hadoop
Big data startups                               45

 2013!


+ 10gen, MongoDB
+ ClearStory, big data aggregation + analytics
+ Continuuity, Hadoop API
+ Parstream, database analytics
+ Zoomdata, data visualization
+ Climate corporation, predictive analytics
46
Big data by industry




                       www.gartner.com
47
Big data Processing

                 Batch
                             interactive       stream
               processing



               minutes to   Millisecond to
 Query time                                   continues
                 hours         seconds



 data volume    TB to PT      GB to PB        continues



programming
               MapReduce       Queries           DAG
   model




   Users       Developers     Analysts       Developers




                Hadoop
Open Source                  Drill, Impala   Storm, Kafka
               mapreduce
48
New technologies

+ Real time quering
  + Drill (based on Google Dremmel)
  + Impala (Cloudera)


+ Data stream processing
  + Storm (Twitter), real time analytics
  + Kafka (LinkedIn), messaging system
49
Machine learning

 + Predictive analytics
 + Patterns discovery
 + Data mining
 + Tools:
    + Mahout
    + R
50
Big data revolution

+ Google: GFS, MapReduce, BigTable,
+ Yahoo: Hadoop
+ Amazon: DynamoDB
+ Facebook: Cassandra, HBase
+ Twitter: FlockDB, Storm
+ LinkedIn: Vondelmort, Kafka
51
Observations

+ Game changing technologies come from big companies
+ Open Source (!)
+ Start-up ecosystem
+ Less general, more specialized
+ Next step: big data analytics and visualization
52
Data scientist

+ Machine Learning
+ Data Mining
+ Statistics
+ Software Engineering
+ Hadoop/MapReduce/HBase/Hive/Pig
+ Java, Python, C/C+, SQL

“By 2018, the United States alone could face a shortage of 140,000 to 190,000
people with deep analytical skills as well as 1.5 million managers and analysts with
the know-how to use the analysis of big data to make effective decisions.”
Big Data Products                  53

MindMap




                    www.garycrawford.co.uk
54
Contacts


+ Leonid Zhukov, Ph.D.
+ School of Applied Mathematics and Information Science
   Higher School of Economics, NRU-HSE

+ lzhukov@hse.ru
+ www.leonidzhukov.ru

Weitere ähnliche Inhalte

Was ist angesagt?

관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)
관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)
관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)Myungjin Lee
 
Big Data Tools : PAST, NOW and FUTURE
Big Data Tools : PAST, NOW and FUTUREBig Data Tools : PAST, NOW and FUTURE
Big Data Tools : PAST, NOW and FUTUREJazz Yao-Tsung Wang
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUBAhmed Salman
 
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data WarehouseHadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data Warehousetervela
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
BigData HUB Workshop
BigData HUB WorkshopBigData HUB Workshop
BigData HUB WorkshopAhmed Salman
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013Brian Crotty
 
Big data overview external
Big data overview externalBig data overview external
Big data overview externalBrett Colbert
 

Was ist angesagt? (14)

HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)
관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)
관광 지식베이스와 스마트 관광 서비스 (Knowledge base and Smart Tourism)
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big Data Tools : PAST, NOW and FUTURE
Big Data Tools : PAST, NOW and FUTUREBig Data Tools : PAST, NOW and FUTURE
Big Data Tools : PAST, NOW and FUTURE
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data WarehouseHadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
BigData HUB Workshop
BigData HUB WorkshopBigData HUB Workshop
BigData HUB Workshop
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
Big data overview external
Big data overview externalBig data overview external
Big data overview external
 

Andere mochten auch

Vis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI VisualizationVis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI VisualizationLeonid Zhukov
 
ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013Leonid Zhukov
 
Numerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link AnalysisNumerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link AnalysisLeonid Zhukov
 
socialnetworkszhukov
socialnetworkszhukovsocialnetworkszhukov
socialnetworkszhukovLeonid Zhukov
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesBen Siscovick
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges Experian_US
 
A Primer on Big Data for Business
A Primer on Big Data for BusinessA Primer on Big Data for Business
A Primer on Big Data for BusinessLeslie Bradshaw
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business AdvantageTeradata Aster
 
Oriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRI
Oriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRIOriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRI
Oriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRILeonid Zhukov
 

Andere mochten auch (11)

CAATs - a way to avoid becoming a "TV star"
CAATs - a way to avoid becoming a "TV star"CAATs - a way to avoid becoming a "TV star"
CAATs - a way to avoid becoming a "TV star"
 
Vis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI VisualizationVis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI Visualization
 
ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013
 
Numerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link AnalysisNumerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link Analysis
 
socialnetworkszhukov
socialnetworkszhukovsocialnetworkszhukov
socialnetworkszhukov
 
Data Scientists
 Data Scientists Data Scientists
Data Scientists
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA Ventures
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges
 
A Primer on Big Data for Business
A Primer on Big Data for BusinessA Primer on Big Data for Business
A Primer on Big Data for Business
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
Oriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRI
Oriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRIOriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRI
Oriented Tensor Reconstruction. Tracing Neural Pathways from DT-MRI
 

Ähnlich wie Business of Big Data

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013
 Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013 Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013
Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013Big Data Spain
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data TrendsIMC Institute
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
1st Birmingham Big Data Science Group meetup
1st Birmingham Big Data Science Group meetup 1st Birmingham Big Data Science Group meetup
1st Birmingham Big Data Science Group meetup Faizan Javed
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopFebiyan Rachman
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013nkabra
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven businessOpenDataSoft
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 

Ähnlich wie Business of Big Data (20)

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013
 Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013 Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013
Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data Trends
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big data
Big dataBig data
Big data
 
1st Birmingham Big Data Science Group meetup
1st Birmingham Big Data Science Group meetup 1st Birmingham Big Data Science Group meetup
1st Birmingham Big Data Science Group meetup
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven business
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 

Mehr von Leonid Zhukov

Ecosystem challenges around data use
Ecosystem challenges around data useEcosystem challenges around data use
Ecosystem challenges around data useLeonid Zhukov
 
Social Networks: from Micromotives to Macrobehavior
Social Networks: from Micromotives to MacrobehaviorSocial Networks: from Micromotives to Macrobehavior
Social Networks: from Micromotives to MacrobehaviorLeonid Zhukov
 
Big Data at Ancestry.com
Big Data at Ancestry.comBig Data at Ancestry.com
Big Data at Ancestry.comLeonid Zhukov
 
Russian Big Data Startups
Russian Big Data StartupsRussian Big Data Startups
Russian Big Data StartupsLeonid Zhukov
 
Революция Больших Данных
Революция Больших ДанныхРеволюция Больших Данных
Революция Больших ДанныхLeonid Zhukov
 
Профессия Data Scientist
 Профессия Data Scientist Профессия Data Scientist
Профессия Data ScientistLeonid Zhukov
 
Большие Данные
Большие ДанныеБольшие Данные
Большие ДанныеLeonid Zhukov
 
Information cascades
Information cascadesInformation cascades
Information cascadesLeonid Zhukov
 
Инфорамционные каскады
Инфорамционные каскадыИнфорамционные каскады
Инфорамционные каскадыLeonid Zhukov
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisLeonid Zhukov
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 

Mehr von Leonid Zhukov (13)

Ecosystem challenges around data use
Ecosystem challenges around data useEcosystem challenges around data use
Ecosystem challenges around data use
 
Social Networks: from Micromotives to Macrobehavior
Social Networks: from Micromotives to MacrobehaviorSocial Networks: from Micromotives to Macrobehavior
Social Networks: from Micromotives to Macrobehavior
 
Big Data at Ancestry.com
Big Data at Ancestry.comBig Data at Ancestry.com
Big Data at Ancestry.com
 
Russian Big Data Startups
Russian Big Data StartupsRussian Big Data Startups
Russian Big Data Startups
 
Революция Больших Данных
Революция Больших ДанныхРеволюция Больших Данных
Революция Больших Данных
 
Профессия Data Scientist
 Профессия Data Scientist Профессия Data Scientist
Профессия Data Scientist
 
Большие Данные
Большие ДанныеБольшие Данные
Большие Данные
 
Information cascades
Information cascadesInformation cascades
Information cascades
 
Инфорамционные каскады
Инфорамционные каскадыИнфорамционные каскады
Инфорамционные каскады
 
Social Networks
Social NetworksSocial Networks
Social Networks
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
Monitorium DLP
Monitorium DLPMonitorium DLP
Monitorium DLP
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Business of Big Data

  • 1. 1 Big Data the next frontier RVC Seminar Leonid Zhukov Moscow, 08/02/2013 Professor Higher School of Economics
  • 2. 2 Big data + Graph of terms popularity www.visibletechologies.com
  • 3. 3 McKinsey, May 2011 www.mckinsey.com
  • 4. 4 Headlines Data driven business Data democratization Data scientists
  • 5. 5 The White House + $200M initiative + NSF: core techniques + NIH: 1000 genomes + DOE: advanced computing + DOD: data to decisions + USGS: Earth system www.whitehouse.gov
  • 6. 6 Gartner Hype Cycle www.gartner.com
  • 7. 7 Market Forecast + Venture money invested (Reuters): + Market forecasts: + 2009 - $1.1B + IDC: 2015 - $16.9B + 2010 - $1.53B + Gartner: 2016- $55B + 2011 - $2.47B www.wikibon.com
  • 8. 8 Big Data Revenue 2012 + Big Business: + IBM + HP + Oracle + Teradata + EMC www.wikibon.com
  • 9. 9 Big Data Vendors! + Hadoop: + Cloudera + MapR Techonologies + HortonWorks www.wikibon.com
  • 10. 10 Forrester Wave www.forrester.com
  • 11. What is big data 11 + Big data: + “Data you can’t process by traditional tools” + “A phenomenon defined by the rapid acceleration in the expanding volume of high velocity, complex and diverse types of data.” + “Refers to a collection of tools, techniques and technologies for working with data productively, at any scale.”
  • 12. 12 What is Big data + 3V + Volume: petabytes (1000TB) to exabytes (1000PB) + Variety: structured, semi-structured, unstructured + Velocity: Tb/s data streams + Requires distributed processing + Big data = storage + processing + Big data = Hadoop (not only)
  • 13. 13 Big data Glossary + Hadoop, MapReduce, Hive, Pig, Cascading, HBase, Hypertable, Cassandra, Flume, Sqoop, Mongo, Voldemort, Storm, Kafka, Drill, Dremmel, Impala, Zookeeper, Ambari, Oozi, Yarn, Redis, Rajak, Pregel, Gremlin, Giraph, Solr, Lucene, R, Mahout, Weka,
  • 14. 14 How big is Big? + Google + 24 PB data processed daily + Twitter + 340 mln daily tweets + 1.6 bln search queries + 7 TB added daily + Facebook + 750 mln users + 12 TB daily daily content + 2.7 bln “likes” and comments daily
  • 15. 15 Sources of Big Data www.ibm.com
  • 16. 16 Supercomputing + National Labs, Universities, Military + Processing power, flops, MPI + Parallel computing: + Cray, IBM SP, SGI + Beowulf cluster (Linux commodity)
  • 17. 17 New realities + Yahoo, AltaVista, Inktomi, Google + Consumer web companies: + web search (crawling, indexing) + advertising + email services + ecommerce + Commodity hardware
  • 19. 19 GFS/HDFS + Distributed replicated data blocks (64Mb) + Master-slave architecture (Name Node, Data Nodes) + Not a general file system + Access via command line utils and API + Can’t modify after files written
  • 20. 20 MapReduce + Scalable: + no file IO + no networking + no synchronization + Master-slave architecture + MapReduce programming model: + Master: divide, schedule, monitor work + functional programming + Slave: actual processing + like UNIX pipeline
  • 21. 21  Data movement + store and process data on the same nodes + bring code to data, data “locality” www.cloudera.com
  • 22. 22 Hadoop + Doug Cutting + Search indexer - Lucene + Web crawler - Nutch + Hadoop + HDFS + MapReduce
  • 23. 23 Yahoo! + 40,000 servers + 170PB storage + 1000+ active users + 5M+ monthly jobs + email spam filters + categorization, personalization + computational advertising
  • 24. Data Base NoSQL 24 Revolution + Needed: + fast read/write time + high concurrency + easy horizontally scalable + Flat data structure + Sacrificed: + DB Schema + SQL + Transactions
  • 25. 25 NoSQL World + Key-value: Dynamo, Voldemort, Redis, Riak + Column (tabular): HBase, Hypertable, Cassandra + Document store: CouchDB, MongoDB + Graph: Neo4J, FlockDB + 120+ products (2012)
  • 26. 26 Hadoop stack www.hortonworks.com
  • 27. 27 Hadoop tools + Pig + high level scripting language (PigLatin) + converts to MapReduce jobs + Hive + SQL like queries on dat in HDFS + converts in MapReduce jobs
  • 28. 28 Hadoop data movement www.cloudera.com
  • 29. 29 Typical hadoop usage + Text mining + Pattern recognition + Recommendation systems (collaborative filtering) + Prediction models + Risk assessment + Sentiment analysis + Customer churn prediction + Customer segmentation + Point of Sale Transaction analysis + Data “sandbox”
  • 30. 30 Application fields + Science: sensors, genome, weather, satellite, imaging + Engineering: log analytics, status feeds, network messages, spam filters.. + Product: financial, pharmaceutical, insurance, energy, retail, ecommerce, healthcare, telecom + Business: analytics, BI
  • 31. 31 Business analytics + Analytic + Operational Capture, analyze, learn from data www.datasciencecentral.com
  • 32. 32 Who uses Hadoop? www.cloudera.com
  • 33. 33 Why Hadoop? www.thinkbiganalytics.com
  • 34. 34 Cloudera + Enterprise support for Apache Hadoop + Founded 2008, funding $141 M + Employee 230 + Products: + CDH 4 (cloudera distrobution hadoop) + Impala + Consulting and training www.cloudera.com
  • 35. 35 MapR + Founded 2009, funding $20M + MapR Technologies is engineering game- changing Map/Reduce related technologies + Products: + M3,M5,M7 + NFS, no single node failure + NOT open source ! www.mapr.com
  • 36. 36 HortonWorks + Founded 2011 + Yahoo spin-off + Products: + HDP distribution + tools www.hortonworks.com
  • 37. 37 Hadoop Ecosystem www.datameer.com
  • 38. 38 Big Data Landscape www.bigdatalandscape.com
  • 39. 39 Splunk + Founded 2003, raised $230M, IPO 2011, Market cap $3.35B + Machine logs analysis, operational intelligence + Collecting, searching, monitoring www.splunk.com
  • 40. 40 Datameer + Founded 2009, Funding $17,8M + Big data: + Data integration + Data Analytics + Data Visualization www.datameer.com
  • 41. 41 Datasift + Founded 2010, funding $29.7M + Data platform for social web + Aggregate and filter data www.datasift.com
  • 42. 42 Infochimps + Founded 2009, funding $5.5M + Transitioned from data marketpalce to big data platform + End-to-end big data solution, real time www.infochimps.com
  • 43. 43 Tableau software + Founded 2003, funding $15M + Big data analytics + Big data visualization www.tableau.com
  • 44. Big data Startups 44 2012 + Platfora, in memory BI on Hadoop + Sumologic, log file analysis + Hadapt, Hadoop+RDBSM + Metamarkets, patterns in data flow + DataStax, consulting, training + Karmasphere, BI, analytics on Hadoop
  • 45. Big data startups 45 2013! + 10gen, MongoDB + ClearStory, big data aggregation + analytics + Continuuity, Hadoop API + Parstream, database analytics + Zoomdata, data visualization + Climate corporation, predictive analytics
  • 46. 46 Big data by industry www.gartner.com
  • 47. 47 Big data Processing Batch interactive stream processing minutes to Millisecond to Query time continues hours seconds data volume TB to PT GB to PB continues programming MapReduce Queries DAG model Users Developers Analysts Developers Hadoop Open Source Drill, Impala Storm, Kafka mapreduce
  • 48. 48 New technologies + Real time quering + Drill (based on Google Dremmel) + Impala (Cloudera) + Data stream processing + Storm (Twitter), real time analytics + Kafka (LinkedIn), messaging system
  • 49. 49 Machine learning + Predictive analytics + Patterns discovery + Data mining + Tools: + Mahout + R
  • 50. 50 Big data revolution + Google: GFS, MapReduce, BigTable, + Yahoo: Hadoop + Amazon: DynamoDB + Facebook: Cassandra, HBase + Twitter: FlockDB, Storm + LinkedIn: Vondelmort, Kafka
  • 51. 51 Observations + Game changing technologies come from big companies + Open Source (!) + Start-up ecosystem + Less general, more specialized + Next step: big data analytics and visualization
  • 52. 52 Data scientist + Machine Learning + Data Mining + Statistics + Software Engineering + Hadoop/MapReduce/HBase/Hive/Pig + Java, Python, C/C+, SQL “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”
  • 53. Big Data Products 53 MindMap www.garycrawford.co.uk
  • 54. 54 Contacts + Leonid Zhukov, Ph.D. + School of Applied Mathematics and Information Science Higher School of Economics, NRU-HSE + lzhukov@hse.ru + www.leonidzhukov.ru