SlideShare ist ein Scribd-Unternehmen logo
1 von 98
Downloaden Sie, um offline zu lesen
Big Data Analytics
                                Peter Sirota
General Manager, Amazon Elastic MapReduce
Overview
1. Introducing Big Data

2. From data to actionable information

3. Analytics and Cloud Computing

4. The Big Data ecosystem
1



Introducing Big Data
Generation



 Collection & storage



Analytics & computation



Collaboration & sharing
The cost of data generation
         is falling
Lower cost,
higher throughput         Generation



                     Collection & storage



                    Analytics & computation



                    Collaboration & sharing
Lower cost,
higher throughput         Generation



                                                   Highly
                     Collection & storage     constrained



                    Analytics & computation



                    Collaboration & sharing
Data volume




                                                                                                                Generated data



                                                                                                                Available for analysis




      Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
      IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Elastic and highly scalable
             +
No upfront capital expense
                                   Remove
             +                =
Only pay for what you use         constraints
             +
   Available on-demand
Lower cost,
higher throughput         Generation



                                                   Highly
                     Collection & storage     constrained



                    Analytics & computation



                    Collaboration & sharing
Generation



               Collection & storage

Accelerated

              Analytics & computation



              Collaboration & sharing
Close the gap.
Big Data
Technologies and techniques for
 working productively with data,
          at any scale.
2




     From data to
actionable information
“Who buys video games?”
Per day:
    3.5 billion records
13 TB of click stream logs
71 million unique cookies
Big Data Analytics
Big Data Analytics
Results:
      500% return on ad spend
17,000% reduction in procurement time
“Who is using our
   service?”
Finding signal in the noise of logs

      Identified early mobile usage
 Invested heavily in mobile development
In January 2013
 9,432,061 unique mobile devices
    used the Yelp mobile app.

4 million+ calls. 5 million+ directions.
Open web index.
3.4 billion records.
  Available to all.
Full parse for impact of
    social networks
  300 lines of Ruby code.
         14 hours.
           $100.
Tweeting about Flu




      You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011
Tweeting about Food


 Tweets about
the price of rice




  Official food
 price inflation
3




  Analytics and
Cloud Computing
Generation



 Collection & storage



Analytics & computation



Collaboration & sharing
Generation


                                S3, Glacier,
 Collection & storage     Storage Gateway,
                               DynamoDB,
                             Redshift, RDS,
                                     HBase
Analytics & computation



Collaboration & sharing
Generation



 Collection & storage


                                      EC2 &
Analytics & computation   Elastic MapReduce




Collaboration & sharing
Generation



 Collection & storage



Analytics & computation


                                        EC2 & S3,
Collaboration & sharing            CloudFormation,
                               Elastic MapReduce,
                          RDS, DynamoDB, Redshift
Generation
                                                            S3, Glacier,
                                                      Storage Gateway,
                                                           DynamoDB,
                     Collection & storage                Redshift, RDS,
                                                                 HBase
AWS Data Pipeline
                                                                EC2 &
                    Analytics & computation         Elastic MapReduce


                                                            EC2 & S3,
                    Collaboration & sharing            CloudFormation,
                                                   Elastic MapReduce,
                                              RDS, DynamoDB, Redshift
Generation
                                                            S3, Glacier,
                                                      Storage Gateway,
                                                           DynamoDB,
                     Collection & storage                Redshift, RDS,
                                                                 HBase
AWS Data Pipeline
                                                                EC2 &
                    Analytics & computation         Elastic MapReduce


                                                            EC2 & S3,
                    Collaboration & sharing            CloudFormation,
                                                   Elastic MapReduce,
                                              RDS, DynamoDB, Redshift
Elastic MapReduce
Managed Hadoop analytics
S3, DynamoDB, Redshift
Input data
S3, DynamoDB, Redshift
       Input data




Code       Elastic
          MapReduce
S3, DynamoDB, Redshift
       Input data




Code       Elastic    Name
          MapReduce   node
S3, DynamoDB, Redshift
       Input data




Code       Elastic    Name
          MapReduce   node



                                                S3/HDFS


                                    Elastic
                                    cluster
S3, DynamoDB, Redshift
       Input data




Code       Elastic                        Name
          MapReduce                       node


                        Queries
                                                                    S3/HDFS
                         + BI
                    Via JDBC, Pig, Hive
                                                        Elastic
                                                        cluster
S3, DynamoDB, Redshift
       Input data




Code       Elastic                        Name                                Output
          MapReduce                       node


                        Queries
                                                                    S3/HDFS
                         + BI
                    Via JDBC, Pig, Hive
                                                        Elastic
                                                        cluster
S3, DynamoDB, Redshift
Input data




                                      Output
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
1. Elastic clusters
10 hours
6 hours
Peak capacity
2. Rapid, tuned provisioning
Tedious.
Remove undifferentiated
    heavy lifting.
3. Hadoop all the way down
Robust ecosystem.
Databases, machine learning, segmentation,
   clustering, analytics, metadata stores,
      exchange formats, and so on...
4. Agility for experimentation
Instance choice.
Stay flexible on instance type & number.
5. Cost optimizations
Built for Spot.
Name-your-price supercomputing.
1. Elastic clusters

2. Rapid, tuned provisioning
3. Hadoop all the way down

4. Agility for experimentation.

5. Cost optimizations
Vin Sharma vin.sharma@intel.com
Director, Product Strategy & Marketing
Big Data Software, Intel Corporation
Analysis of Data Can Transform Society




   Enhance scientific       Create new business   Increase public safety
  understanding, drive      models and improve         and improve
     innovation, and           organizational     energy efficiency with
accelerate medical cures.       processes.             smart grids.
Intel’s Vision to Democratize Big Data




Unlock Value in   Support Open   Deliver Software Value
    Silicon         Platforms
Intel at the Intersection of Big Data




      HPC                   Cloud             Open Source
  Enabling exascale     Helping enterprises   Contributing code
computing on massive        build open          and fostering
     data sets         interoperable clouds      ecosystem
Intel® Technology at the Heart of the Cloud




                  Server


        Storage

                  Network
Scale-Out Big Data
Compute Platform Optimization


          Cost-effective performance
          •Intel® Advanced Vector Extension Technology
          •Intel® Turbo Boost Technology 2.0
          •Intel® Advanced Encryption Standard New
          Instructions Technology
Intel® Advanced Vector Extensions Technology

                                                                                                              • Newest in a long line of
                                                                                                                processor instruction
                                                                                                                innovations

                                                                                                              • Increases floating point
                                                                                                                operations per clock up to
                                                                                                                2X1 performance




     Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
     measured using specific computer See backup for configuration details. software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
        1 : Performance comparison using Linpack benchmark. systems, components,
     information information on performance forecasts go to http://www.intel.com/performance
        For more legal
                       and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
73
Intel® Turbo Boost Technology 2.0



              More Performance
              Higher turbo speeds maximize
              performance for single and
              multi-threaded applications
Intel® Advanced Encryption
 Standard New Instructions

           • Processor assistance for
             performing AES encryption
             7 new instructions

           • Makes enabled encryption
             software faster and stronger
The Power of Intel® Platform Solutions:
        TeraSort for       50%                              Richer
         1 TB sort         Reduction                         user
                                                          experiences
4 HRS                                  80%
                                       Reduction      50%
                                                     Reduction      40%
                                                                    Reduction




   Previous
     Intel®
    Xeon®
                 Intel®
                Xeon®        Solid-State
                                                                                10 MIN
               Processor       Drive             10G
   Processor
                E5 2600                        Ethernet   Intel® Apache
                                                             Hadoop
The Virtuous Cycle of User Experience


                                    Clients
Cloud




                           Intelligent Systems
4




The Big Data
 Ecosystem
Data, data, everywhere...
     Data is stored in silos.
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On-premises
“How do I get my data to the cloud?”
Data mobility
    Generated and stored in AWS
    Inbound data transfer is free
    Multipart upload to S3
    Physical media
    AWS Direct Connect
    Regional replication of AMIs and snapshots
“How do I integrate my data for
     maximum impact?”
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On-premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On-premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On premises
AWS Data Pipeline
Orchestration for data-intensive workloads.
 Announced in November, available now.
AWS Data Pipeline
   Data-intensive orchestration and automation
   Reliable and scheduled
   Easy to use, drag and drop
   Execution and retry logic
   Map data dependencies
   Create and manage temporary compute
   resources
Anatomy of a pipeline
Additional checks and notifications
Arbitrarily complex pipelines
aws.amazon.com/datapipeline
aws.amazon.com/big-data
Summary
1. Introducing Big Data

2. From data to actionable information

3. Analytics and Cloud Computing

4. The Big Data ecosystem
Get 600 Hours of free supercomputing
                time!


        www.powerof60.com
Thank you!
sirota@amazon.com

Weitere ähnliche Inhalte

Was ist angesagt?

VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUG IT
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and BeyondPaco Nathan
 
Cloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsCloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsAnkit Rathi
 
Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Paco Nathan
 
Multi-thematic spatial databases
Multi-thematic spatial databasesMulti-thematic spatial databases
Multi-thematic spatial databasesConor Mc Elhinney
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Industry experts webinar slides (final v1.0)
Industry experts webinar slides (final   v1.0)Industry experts webinar slides (final   v1.0)
Industry experts webinar slides (final v1.0)NuoDB
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Martin Bém
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemFour Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemTreasure Data, Inc.
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 

Was ist angesagt? (20)

VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and Beyond
 
Cloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsCloud Computing for Data Professionals
Cloud Computing for Data Professionals
 
Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)
 
Multi-thematic spatial databases
Multi-thematic spatial databasesMulti-thematic spatial databases
Multi-thematic spatial databases
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Industry experts webinar slides (final v1.0)
Industry experts webinar slides (final   v1.0)Industry experts webinar slides (final   v1.0)
Industry experts webinar slides (final v1.0)
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Sandish3Certs
Sandish3CertsSandish3Certs
Sandish3Certs
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemFour Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 

Andere mochten auch

Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
High performance computing принципы проектирования сети
High performance computing принципы проектирования сетиHigh performance computing принципы проектирования сети
High performance computing принципы проектирования сетиMUK Extreme
 
How HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology EcosystemHow HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology Ecosysteminside-BigData.com
 
High Performance Computing: State of the Industry
High Performance Computing: State of the IndustryHigh Performance Computing: State of the Industry
High Performance Computing: State of the IndustryIMEX Research
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBigDataExpo
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analyticsCapgemini
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Andere mochten auch (16)

Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
High performance computing принципы проектирования сети
High performance computing принципы проектирования сетиHigh performance computing принципы проектирования сети
High performance computing принципы проектирования сети
 
How HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology EcosystemHow HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology Ecosystem
 
High Performance Computing: State of the Industry
High Performance Computing: State of the IndustryHigh Performance Computing: State of the Industry
High Performance Computing: State of the Industry
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
IDC HPC Market Update
IDC HPC Market UpdateIDC HPC Market Update
IDC HPC Market Update
 
2016 IDC HPC Market Update
2016 IDC HPC Market Update2016 IDC HPC Market Update
2016 IDC HPC Market Update
 
EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15
EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15
EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15
 
HPC Market Update from IDC
HPC Market Update from IDCHPC Market Update from IDC
HPC Market Update from IDC
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Ähnlich wie Big Data Analytics

Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Amazon Web Services
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarAmazon Web Services
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesAmazon Web Services
 
Introduction to Elastic MapReduce
Introduction to Elastic MapReduceIntroduction to Elastic MapReduce
Introduction to Elastic MapReduceAmazon Web Services
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesAmazon Web Services
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Amazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012Amazon Web Services
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlKhanderao Kand
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server ProLynn Langit
 
AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013Amazon Web Services
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architecturesRaji Gogulapati
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016Amazon Web Services Korea
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 

Ähnlich wie Big Data Analytics (20)

Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
 
Introduction to Elastic MapReduce
Introduction to Elastic MapReduceIntroduction to Elastic MapReduce
Introduction to Elastic MapReduce
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Understanding Player Behaviour
Understanding Player BehaviourUnderstanding Player Behaviour
Understanding Player Behaviour
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
 
Data-driven Innovation - Wood
Data-driven Innovation - WoodData-driven Innovation - Wood
Data-driven Innovation - Wood
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
 
AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Treasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on HerokuTreasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on Heroku
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Kürzlich hochgeladen (20)

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

Big Data Analytics

  • 1. Big Data Analytics Peter Sirota General Manager, Amazon Elastic MapReduce
  • 2. Overview 1. Introducing Big Data 2. From data to actionable information 3. Analytics and Cloud Computing 4. The Big Data ecosystem
  • 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 5. The cost of data generation is falling
  • 6. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  • 7. Lower cost, higher throughput Generation Highly Collection & storage constrained Analytics & computation Collaboration & sharing
  • 8. Data volume Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • 9. Elastic and highly scalable + No upfront capital expense Remove + = Only pay for what you use constraints + Available on-demand
  • 10. Lower cost, higher throughput Generation Highly Collection & storage constrained Analytics & computation Collaboration & sharing
  • 11. Generation Collection & storage Accelerated Analytics & computation Collaboration & sharing
  • 13. Big Data Technologies and techniques for working productively with data, at any scale.
  • 14. 2 From data to actionable information
  • 15. “Who buys video games?”
  • 16. Per day: 3.5 billion records 13 TB of click stream logs 71 million unique cookies
  • 19. Results: 500% return on ad spend 17,000% reduction in procurement time
  • 20. “Who is using our service?”
  • 21. Finding signal in the noise of logs Identified early mobile usage Invested heavily in mobile development
  • 22. In January 2013 9,432,061 unique mobile devices used the Yelp mobile app. 4 million+ calls. 5 million+ directions.
  • 23. Open web index. 3.4 billion records. Available to all.
  • 24. Full parse for impact of social networks 300 lines of Ruby code. 14 hours. $100.
  • 25. Tweeting about Flu You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011
  • 26. Tweeting about Food Tweets about the price of rice Official food price inflation
  • 27. 3 Analytics and Cloud Computing
  • 28. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 29. Generation S3, Glacier, Collection & storage Storage Gateway, DynamoDB, Redshift, RDS, HBase Analytics & computation Collaboration & sharing
  • 30. Generation Collection & storage EC2 & Analytics & computation Elastic MapReduce Collaboration & sharing
  • 31. Generation Collection & storage Analytics & computation EC2 & S3, Collaboration & sharing CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  • 32. Generation S3, Glacier, Storage Gateway, DynamoDB, Collection & storage Redshift, RDS, HBase AWS Data Pipeline EC2 & Analytics & computation Elastic MapReduce EC2 & S3, Collaboration & sharing CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  • 33. Generation S3, Glacier, Storage Gateway, DynamoDB, Collection & storage Redshift, RDS, HBase AWS Data Pipeline EC2 & Analytics & computation Elastic MapReduce EC2 & S3, Collaboration & sharing CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  • 37. S3, DynamoDB, Redshift Input data Code Elastic MapReduce
  • 38. S3, DynamoDB, Redshift Input data Code Elastic Name MapReduce node
  • 39. S3, DynamoDB, Redshift Input data Code Elastic Name MapReduce node S3/HDFS Elastic cluster
  • 40. S3, DynamoDB, Redshift Input data Code Elastic Name MapReduce node Queries S3/HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • 41. S3, DynamoDB, Redshift Input data Code Elastic Name Output MapReduce node Queries S3/HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • 57. 2. Rapid, tuned provisioning
  • 59. Remove undifferentiated heavy lifting.
  • 60. 3. Hadoop all the way down
  • 61. Robust ecosystem. Databases, machine learning, segmentation, clustering, analytics, metadata stores, exchange formats, and so on...
  • 62. 4. Agility for experimentation
  • 63. Instance choice. Stay flexible on instance type & number.
  • 66. 1. Elastic clusters 2. Rapid, tuned provisioning 3. Hadoop all the way down 4. Agility for experimentation. 5. Cost optimizations
  • 67. Vin Sharma vin.sharma@intel.com Director, Product Strategy & Marketing Big Data Software, Intel Corporation
  • 68. Analysis of Data Can Transform Society Enhance scientific Create new business Increase public safety understanding, drive models and improve and improve innovation, and organizational energy efficiency with accelerate medical cures. processes. smart grids.
  • 69. Intel’s Vision to Democratize Big Data Unlock Value in Support Open Deliver Software Value Silicon Platforms
  • 70. Intel at the Intersection of Big Data HPC Cloud Open Source Enabling exascale Helping enterprises Contributing code computing on massive build open and fostering data sets interoperable clouds ecosystem
  • 71. Intel® Technology at the Heart of the Cloud Server Storage Network
  • 72. Scale-Out Big Data Compute Platform Optimization Cost-effective performance •Intel® Advanced Vector Extension Technology •Intel® Turbo Boost Technology 2.0 •Intel® Advanced Encryption Standard New Instructions Technology
  • 73. Intel® Advanced Vector Extensions Technology • Newest in a long line of processor instruction innovations • Increases floating point operations per clock up to 2X1 performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer See backup for configuration details. software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other 1 : Performance comparison using Linpack benchmark. systems, components, information information on performance forecasts go to http://www.intel.com/performance For more legal and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 73
  • 74. Intel® Turbo Boost Technology 2.0 More Performance Higher turbo speeds maximize performance for single and multi-threaded applications
  • 75. Intel® Advanced Encryption Standard New Instructions • Processor assistance for performing AES encryption 7 new instructions • Makes enabled encryption software faster and stronger
  • 76. The Power of Intel® Platform Solutions: TeraSort for 50% Richer 1 TB sort Reduction user experiences 4 HRS 80% Reduction 50% Reduction 40% Reduction Previous Intel® Xeon® Intel® Xeon® Solid-State 10 MIN Processor Drive 10G Processor E5 2600 Ethernet Intel® Apache Hadoop
  • 77. The Virtuous Cycle of User Experience Clients Cloud Intelligent Systems
  • 78. 4 The Big Data Ecosystem
  • 79. Data, data, everywhere... Data is stored in silos.
  • 80. S3 HBase on EMR RDS DynamoDB EMR Redshift On-premises
  • 81. “How do I get my data to the cloud?”
  • 82. Data mobility Generated and stored in AWS Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect Regional replication of AMIs and snapshots
  • 83. “How do I integrate my data for maximum impact?”
  • 84. S3 HBase on EMR RDS DynamoDB EMR Redshift On-premises
  • 85. S3 HBase on EMR RDS DynamoDB EMR Redshift On-premises
  • 86. S3 HBase on EMR RDS DynamoDB EMR Redshift On premises
  • 87. S3 HBase on EMR RDS DynamoDB EMR Redshift On premises
  • 88. S3 HBase on EMR RDS DynamoDB EMR Redshift On premises
  • 89. AWS Data Pipeline Orchestration for data-intensive workloads. Announced in November, available now.
  • 90. AWS Data Pipeline Data-intensive orchestration and automation Reliable and scheduled Easy to use, drag and drop Execution and retry logic Map data dependencies Create and manage temporary compute resources
  • 91. Anatomy of a pipeline
  • 92. Additional checks and notifications
  • 96. Summary 1. Introducing Big Data 2. From data to actionable information 3. Analytics and Cloud Computing 4. The Big Data ecosystem
  • 97. Get 600 Hours of free supercomputing time! www.powerof60.com