SlideShare a Scribd company logo
1 of 69
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BIG-DATA
  When your data sets become
 so large that you have to start
innovating how to collect, store,
      analyze and share it
Volume
3Vs   Velocity
      Variety
BIG-DATA
   The collection and
analysis of large amounts
     of data creates
 competitive advantage
BIGGER IS BETTER
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Online Population
    Mobile Phone
    Machine Data
1 Trillion Objects!
COLLECT | STORE | ANALYZE | SHARE
COLLECT | STORE | ANALYZE | SHARE
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
•   Stream data to Amazon using Apache Flume
    • Amazon S3
    • Amazon Elastic MapReduce
COLLECT | STORE | ANALYZE | SHARE
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Structure
               High                                      Low
       Large
                                                         S3
                                            EMR HDFS
                                 Hbase
Size                 Dynamo DB



               RDS
       Small                             Logs on App servers
ANALYZE
ORGINIZE | CLEAN | ENRICH | CONDENSE
DynamoDB Table:           On-Premise DB Table:
Daily-Orders              Customer-Demographics
NoSQL Table               SQL Table




                  RDS Table:
                  Targeting-Information
DynamoDB Table:                  On-Premise DB Table:
Daily-Orders                     Customer-Demographics
NoSQL Table                      SQL Table

S3://clickstream-data/            3rd Party Data:
           Apache Logs            Social Networking Information
                                  Accessed via web API



                         RDS Table:
                         Targeting-Information
S3 file:
s3://weekly-trend-data/
CSV Report


S3 file:
s3://monthly-trend-data/
CSV Report
AMAZON ELASTIC MAPREDUCE
Reduces complexity/cost of Hadoop Management
Integrates seamlessly with AWS Services
Leverages unmatched operational experience
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Hadoop on Elastic MapReduce
lowers the cost of developing and
  operating a distributed system.
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Amazon EMR and Amazon S3
                           S3
Recommendation Ad-hoc
      Engine    Analysis   Personalization

                              Prod Cluster
           S3                    (EMR)


                                 EMR




Data consumed in multiple ways
Prod Cluster
         (EMR)

S3
        EMR



     Query Cluster
        (EMR)


        EMR
         EMR

               EMR

                     EMR
DynamoDB




   S3
EMR   DynamoDB




S3
DynamoDB
ANALYZE SHARE
VISUALIZE | EXPLORE | DECIDE
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Big Data Use Cases
Digital Advertising

                      Web Analytics

                                      Log Processing


                                                       Data Warehousing
Social
Media/Advertising   Oil & Gas       Retail        Life Sciences   Financial Services      Security
                                                                                                         Network/Gaming




                                                                                                             User
                                                                                          Anti-virus
                                                                                                          Demographics
    Targeted                    Recommendations
                                                                     Monte Carlo
   Advertising                                                       Simulations




                    Seismic                         Genome
                                                                                       Fraud Detection    Usage analysis
                    Analysis                        Analysis



   Image and
                                  Transactions
     Video                          Analysis                        Risk Analysis
   Processing                                                                              Image
                                                                                         Recognition     In-game metrics
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Who is VivaKi?




           ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
Big Data Challenge for VivaKi




Enablement       Activation                                             Attribution




             ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
The Product Solution – Fluent from Razorfish
A digital marketing technology platform that provides marketers and agencies with a single,
integrated software application to target, distribute, and manage multi-channel digital campaigns and
experiences.




                                                                Marketing Central
                                      (Marketing Planning and Management, Team Collaboration and Workflow)


                                                               Experience Publishing
                                    (CMS / DMS, Multi-Channel and Multi-Device Distribution, Social Monitoring)


                              Targeting                                                                     Insights
            (Multi-Channel Aware Segmentation and Targeting)                             (Analytics and Reporting, including Attribution)


                                                                 Data Warehouse
                              (Data Sources - 1st and 3rd Party, Data Normalization + Transformation, Data Management)


                                                         Amazon Cloud Infrastructure



                                               ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
VivaKi Technology Solution




           ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
Example: Atlas Cookie Level Data

                    Click Stream

                                                                                                                      Historical Click Stream




                                                         Fe
                                                                                                                               Data




                                                            e
User Browsing




                                                           d
                                   Ad Server Logs
  Session
                                                                                                                          Data Mining




                                                                                                   Apply
                                                                                               Customization




                                                                                                                        Segmentation &
                                                                                                                        Categorization
                                                                                                                          Algorithm



                                                    Customer Loyalty Data




           Ad Serving System                                                                   Cross Selling System



                                      ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
Example: Atlas Cookie Level Data
 Operational Specifics
                   Traditional Data Center Solution                                Amazon Cloud Solution
                   30 Processing Servers (HP Proliant DL-360)
                   3 SQL Servers (HP Proliant DL-580)                              EMR Cluster of up to 1000 EC2 Instances
  Configuration    10TB SAN Storage                                                200GB additional S3 storage per month
  Processing       2 to 30 hours                                                   reliably 9 hours
  Data Retention   90 days                                                         18 months
  System Cost      $5000/month                                                     $10000/month
  Personnel Cost   $15000/month                                                    $5500/month



 Business Impact
    no upfront investment in hardware
    no hardware procurement delay
    no additional operations staff was hired
    We completed development and testing of our first client project in six weeks. Our
     process is completely automated.
    our first client campaign experienced a 500% increase in their return on ad spend
                                        ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Better?
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Search Ads Restyled
Etsy on
Oprah                           Search Ads Restyled




                                      Hurricane
                                      Strikes

 Justin Beiber   New Cat Meme
 Sneezes
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
5%

95%
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Thank you!


aws.amazon.com/big-data
We are sincerely eager to
hear your FEEDBACK on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

More Related Content

Viewers also liked

Accelerating Your Connection to the Cloud
Accelerating Your Connection to the CloudAccelerating Your Connection to the Cloud
Accelerating Your Connection to the CloudAmazon Web Services
 
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...Amazon Web Services
 
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarHow to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarAmazon Web Services
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
 
Sales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersSales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersTaxJar
 
(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive ApplicationsAmazon Web Services
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWSAmazon Web Services
 
Stream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LTStream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LTMasahiro Nagano
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
 
Implementing FISMA Moderate Applications on AWS
Implementing FISMA Moderate Applications on AWSImplementing FISMA Moderate Applications on AWS
Implementing FISMA Moderate Applications on AWSAmazon Web Services
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryAmazon Web Services
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
 
Improve Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsImprove Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...Amazon Web Services
 

Viewers also liked (20)

Accelerating Your Connection to the Cloud
Accelerating Your Connection to the CloudAccelerating Your Connection to the Cloud
Accelerating Your Connection to the Cloud
 
AWS Introduction - Ryland
AWS Introduction - RylandAWS Introduction - Ryland
AWS Introduction - Ryland
 
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
 
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarHow to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
 
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Sales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersSales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA Sellers
 
Bv asw presentation
Bv asw presentationBv asw presentation
Bv asw presentation
 
(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS
 
Stream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LTStream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LT
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
 
Implementing FISMA Moderate Applications on AWS
Implementing FISMA Moderate Applications on AWSImplementing FISMA Moderate Applications on AWS
Implementing FISMA Moderate Applications on AWS
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
Improve Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsImprove Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile Apps
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
 

Similar to BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012infolive
 
6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services6.Live Framework 和Mesh Services
6.Live Framework 和Mesh ServicesGaryYoung
 
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...Amazon Web Services
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisAmazon Web Services
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architectureDataWorks Summit
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0Matt Turck
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Mindtree Ltd.
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger PresentationMauricio Godoy
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Foviance
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaleBase
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk OverviewSplunk
 

Similar to BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012 (20)

The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Globant and Big Data on AWS
Globant and Big Data on AWSGlobant and Big Data on AWS
Globant and Big Data on AWS
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012
 
6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services
 
16h30 p duff-big-data-final
16h30   p duff-big-data-final16h30   p duff-big-data-final
16h30 p duff-big-data-final
 
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - We...
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architecture
 
Barak regev
Barak regevBarak regev
Barak regev
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger Presentation
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data Distribution
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk Overview
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

  • 3. BIG-DATA When your data sets become so large that you have to start innovating how to collect, store, analyze and share it
  • 4. Volume 3Vs Velocity Variety
  • 5. BIG-DATA The collection and analysis of large amounts of data creates competitive advantage
  • 8. Online Population Mobile Phone Machine Data
  • 10. COLLECT | STORE | ANALYZE | SHARE
  • 11. COLLECT | STORE | ANALYZE | SHARE
  • 13. Stream data to Amazon using Apache Flume • Amazon S3 • Amazon Elastic MapReduce
  • 14. COLLECT | STORE | ANALYZE | SHARE
  • 16. Structure High Low Large S3 EMR HDFS Hbase Size Dynamo DB RDS Small Logs on App servers
  • 17. ANALYZE ORGINIZE | CLEAN | ENRICH | CONDENSE
  • 18. DynamoDB Table: On-Premise DB Table: Daily-Orders Customer-Demographics NoSQL Table SQL Table RDS Table: Targeting-Information
  • 19. DynamoDB Table: On-Premise DB Table: Daily-Orders Customer-Demographics NoSQL Table SQL Table S3://clickstream-data/ 3rd Party Data: Apache Logs Social Networking Information Accessed via web API RDS Table: Targeting-Information
  • 20. S3 file: s3://weekly-trend-data/ CSV Report S3 file: s3://monthly-trend-data/ CSV Report
  • 21. AMAZON ELASTIC MAPREDUCE Reduces complexity/cost of Hadoop Management Integrates seamlessly with AWS Services Leverages unmatched operational experience
  • 24. Hadoop on Elastic MapReduce lowers the cost of developing and operating a distributed system.
  • 26. Amazon EMR and Amazon S3 S3
  • 27. Recommendation Ad-hoc Engine Analysis Personalization Prod Cluster S3 (EMR) EMR Data consumed in multiple ways
  • 28. Prod Cluster (EMR) S3 EMR Query Cluster (EMR) EMR EMR EMR EMR
  • 29. DynamoDB S3
  • 30. EMR DynamoDB S3
  • 32. ANALYZE SHARE VISUALIZE | EXPLORE | DECIDE
  • 36. Big Data Use Cases
  • 37. Digital Advertising Web Analytics Log Processing Data Warehousing
  • 38. Social Media/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Anti-virus Demographics Targeted Recommendations Monte Carlo Advertising Simulations Seismic Genome Fraud Detection Usage analysis Analysis Analysis Image and Transactions Video Analysis Risk Analysis Processing Image Recognition In-game metrics
  • 41. Who is VivaKi? ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 42. Big Data Challenge for VivaKi Enablement Activation Attribution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 43. The Product Solution – Fluent from Razorfish A digital marketing technology platform that provides marketers and agencies with a single, integrated software application to target, distribute, and manage multi-channel digital campaigns and experiences. Marketing Central (Marketing Planning and Management, Team Collaboration and Workflow) Experience Publishing (CMS / DMS, Multi-Channel and Multi-Device Distribution, Social Monitoring) Targeting Insights (Multi-Channel Aware Segmentation and Targeting) (Analytics and Reporting, including Attribution) Data Warehouse (Data Sources - 1st and 3rd Party, Data Normalization + Transformation, Data Management) Amazon Cloud Infrastructure ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 44. VivaKi Technology Solution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 45. Example: Atlas Cookie Level Data Click Stream Historical Click Stream Fe Data e User Browsing d Ad Server Logs Session Data Mining Apply Customization Segmentation & Categorization Algorithm Customer Loyalty Data Ad Serving System Cross Selling System ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 46. Example: Atlas Cookie Level Data  Operational Specifics Traditional Data Center Solution Amazon Cloud Solution 30 Processing Servers (HP Proliant DL-360) 3 SQL Servers (HP Proliant DL-580) EMR Cluster of up to 1000 EC2 Instances Configuration 10TB SAN Storage 200GB additional S3 storage per month Processing 2 to 30 hours reliably 9 hours Data Retention 90 days 18 months System Cost $5000/month $10000/month Personnel Cost $15000/month $5500/month  Business Impact  no upfront investment in hardware  no hardware procurement delay  no additional operations staff was hired  We completed development and testing of our first client project in six weeks. Our process is completely automated.  our first client campaign experienced a 500% increase in their return on ad spend ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 57. Etsy on Oprah Search Ads Restyled Hurricane Strikes Justin Beiber New Cat Meme Sneezes
  • 68. We are sincerely eager to hear your FEEDBACK on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.