SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Elastic MapReduce Office Hours April 13th, 2011
Introduction Richard Cole, Lead Engineer Adam Gray, Product Manager
Office Hours IS Simply, Office Hours is a program the enables a technical audience the ability to interact with AWS technical experts. We look to improve this program by soliciting feedback from Office Hours attendees.  Please let us know what you would like to see.
Office Hours is NOT Support If you have a problem with your current deployment please visit our forums or our support website http://aws.amazon.com/premiumsupport/ A place to find out about upcoming services We do not typically disclose information about our services until they are available
Agenda What’s New How-to Demonstrations Resize a running job flow Launch a Hive-based Data Warehouse (Contextual Advertising Example) Question and Answer  Please begin submitting questions now
What’s New?
S3 Multipart Upload ,[object Object]
Can begin upload process before a Hadoop task has finished
Parallel uploads and earlier start mean data-intensive applications finish significantly fasterUsage: ./elastic-mapreduce--create --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop  –args “-c,fs.s3n.multipart.uploads.enabled=true, -c,fs.s3n.multipart.uploads.split.size=524288000”
Elastic IP Address Integration ,[object Object]
Reference the same IP address for a long-running job flow even if it has to be restarted
Reference transient job flows in a consistent way each time they are launchedUsage: ./elastic-mapreduce --create --eip [existing_ip]
EC2 Instance Tagging ,[object Object]
Useful if you are running multiple job flows concurrently or managing a large numbers of Amazon EC2 instances,[object Object]
Example 1 – Resize a running job flow ,[object Object]
Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)Job Flow Job Flow Job Flow Data Warehouse (Batch Processing) Allocate  4 instances Expand to  25 instances Expand to  9 instances 3 Hours Data Warehouse (Steady State) Data Warehouse (Steady State) Shrink to  9 instances Expand to  25 instances Time remaining: Time remaining: 14 Hours Time remaining: 7 Hours
Job Flow Architecture
Launch a Job Flow // Launch a job flow with 8 core nodes ./elastic-mapreduce--create --alive --instance-group master --instance-type m2.2xlarge --instance-count 1 --instance-group core --instance-type m1.small --instance-count 8 Created job flow  j-ABABABABAB  // Describe job flow ./elastic-mapreduce --describe j-ABABABABAB
Resize Job Flow // Add 8 m2.4xlarge TASK Nodes ./elastic-mapreduce --jobflow j-ABABABABAB --add-instance-group task --instance-type m2.4xlarge --instance-count 8 // Expand CORE Nodes from 8 to 12 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group core --instance-count 12 // Shrink TASK Nodes from 8 to 6 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group task --instance-count 6
Terminate Job Flow // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
Example 2 – Launch a Hive-based Data Warehouse Apache Hive Data warehouse for Hadoop Open source project started at Facebook Turns data on Hadoop into a virtually limitless data warehouse Provides data summarization, ad hoc querying and analysis Enables SQL-like queries on structured and unstructured data ,[object Object],Inherits linear scalability of Hadoop
Launch a Hive Cluster(Contextual Advertising Example) // Launch a Hive cluster with cluster compute nodes ./elastic-mapreduce --create --alive --hive-interactive --name "Hive Job Flow" --instance-type cc1.4xlarge Created job flow  j-ABABABABAB  // SSH to Master Node ./elastic-mapreduce --ssh --jobflow j-ABABABABAB  // Run a Hive Session on the Master Node hadoop@domU-12-31-39-07-D2-14:~$ hive -d SAMPLE=s3://elasticmapreduce/samples/hive-ads -d DAY=2009-04-13 -d HOUR=08 -d NEXT_DAY=2009-04-13 -d NEXT_HOUR=09 -d OUTPUT=s3://mybucket/samples/output  hive>
Define Impressions Table Show Tables hive> show tables;  OK Time taken: 3.51 seconds hive>  Define Impressions Table ADD JAR ${SAMPLE}/libs/jsonserde.jar ; CREATE EXTERNAL TABLE impressions (  requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ipstring )  PARTITIONED BY (dt string)  ROW FORMAT  serde'com.amazon.elasticmapreduce.JsonSerde'  with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId,  					referrer, userAgent, userCookie, ip' ) LOCATION '${SAMPLE}/tables/impressions' ;
Recover Partitions The table is partitioned based on time, we can tell Hive about the existence of a single partition using the following statement. ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05') ; If we were to query the table at this point the results would contain data from just the single partition. We can instruct Hive to recover all partitions by inspecting the data stored in Amazon S3 using the RECOVER PARTITIONS statement. ALTER TABLE impressions RECOVER PARTITIONS ;
Define Clicks Table We follow the same process to define the clicks table and recover its partitions. CREATE EXTERNAL TABLE clicks ( impressionIdstring  )  PARTITIONED BY (dt string)  ROW FORMAT  SERDE 'com.amazon.elasticmapreduce.JsonSerde'  WITH SERDEPROPERTIES ( 'paths'='impressionId' )  LOCATION '${SAMPLE}/tables/clicks'  ;  ALTER TABLE clicks RECOVER PARTITIONS ;
Define Output Table We are going to combine the clicks and impressions tables so that we have a record of whether or not each impression resulted in a click. We'd like this data stored in Amazon S3 so that it can be used as input to other job flows.  CREATE EXTERNAL TABLE joined_impressions (  requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string, clicked Boolean  )  PARTITIONED BY (day string, hour string)  	STORED AS SEQUENCEFILE  	LOCATION '${OUTPUT}/joined_impressions'  ;

Weitere ähnliche Inhalte

Was ist angesagt?

AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAmazon Web Services
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsAmazon Web Services
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排Amazon Web Services
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksEmmanuel Quentin
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...Julien SIMON
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Amazon Web Services
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Web Services
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Julien SIMON
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesAmazon Web Services
 
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Amazon Web Services
 
AWS basics session
AWS basics sessionAWS basics session
AWS basics sessionSharad Gupta
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWSAmazon Web Services
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation Adam Book
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAmazon Web Services
 

Was ist angesagt? (20)

AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWS
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your Logs
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacks
 
AWS Partner Presentation - SAP
AWS Partner Presentation - SAP AWS Partner Presentation - SAP
AWS Partner Presentation - SAP
 
Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
 
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
 
AWS basics session
AWS basics sessionAWS basics session
AWS basics session
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
 
Amazon EC2 Masterclass
Amazon EC2 MasterclassAmazon EC2 Masterclass
Amazon EC2 Masterclass
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
 

Andere mochten auch

Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReducehuguk
 
AWS Customer Presentation - kikin
AWS Customer Presentation -  kikinAWS Customer Presentation -  kikin
AWS Customer Presentation - kikinAmazon Web Services
 
AWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAmazon Web Services
 
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVArchitecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVAmazon Web Services
 
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAmazon Web Services
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialAdrian Hornsby
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 

Andere mochten auch (11)

Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
 
AWS Customer Presentation - kikin
AWS Customer Presentation -  kikinAWS Customer Presentation -  kikin
AWS Customer Presentation - kikin
 
AWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAWS 101 business seminar in Taipei
AWS 101 business seminar in Taipei
 
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVArchitecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
 
Running databases on AWS
Running databases on AWSRunning databases on AWS
Running databases on AWS
 
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Technical Track
Technical TrackTechnical Track
Technical Track
 

Ähnlich wie AWS Office Hours: Amazon Elastic MapReduce

Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampBigDataCamp
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS Chicago
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...Big Data Spain
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04Krishna Sankar
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperVinay Kumar
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & ZingLong Dao
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...Altinity Ltd
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewDan Morrill
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewMadhur Nawandar
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseNag Arvind Gudiseva
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAmin Uddin
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zingzingopen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 

Ähnlich wie AWS Office Hours: Amazon Elastic MapReduce (20)

Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paper
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure Presentation
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Kürzlich hochgeladen (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

AWS Office Hours: Amazon Elastic MapReduce

  • 1. Elastic MapReduce Office Hours April 13th, 2011
  • 2. Introduction Richard Cole, Lead Engineer Adam Gray, Product Manager
  • 3. Office Hours IS Simply, Office Hours is a program the enables a technical audience the ability to interact with AWS technical experts. We look to improve this program by soliciting feedback from Office Hours attendees. Please let us know what you would like to see.
  • 4. Office Hours is NOT Support If you have a problem with your current deployment please visit our forums or our support website http://aws.amazon.com/premiumsupport/ A place to find out about upcoming services We do not typically disclose information about our services until they are available
  • 5. Agenda What’s New How-to Demonstrations Resize a running job flow Launch a Hive-based Data Warehouse (Contextual Advertising Example) Question and Answer Please begin submitting questions now
  • 7.
  • 8. Can begin upload process before a Hadoop task has finished
  • 9. Parallel uploads and earlier start mean data-intensive applications finish significantly fasterUsage: ./elastic-mapreduce--create --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop –args “-c,fs.s3n.multipart.uploads.enabled=true, -c,fs.s3n.multipart.uploads.split.size=524288000”
  • 10.
  • 11. Reference the same IP address for a long-running job flow even if it has to be restarted
  • 12. Reference transient job flows in a consistent way each time they are launchedUsage: ./elastic-mapreduce --create --eip [existing_ip]
  • 13.
  • 14.
  • 15.
  • 16. Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)Job Flow Job Flow Job Flow Data Warehouse (Batch Processing) Allocate 4 instances Expand to 25 instances Expand to 9 instances 3 Hours Data Warehouse (Steady State) Data Warehouse (Steady State) Shrink to 9 instances Expand to 25 instances Time remaining: Time remaining: 14 Hours Time remaining: 7 Hours
  • 18. Launch a Job Flow // Launch a job flow with 8 core nodes ./elastic-mapreduce--create --alive --instance-group master --instance-type m2.2xlarge --instance-count 1 --instance-group core --instance-type m1.small --instance-count 8 Created job flow j-ABABABABAB // Describe job flow ./elastic-mapreduce --describe j-ABABABABAB
  • 19. Resize Job Flow // Add 8 m2.4xlarge TASK Nodes ./elastic-mapreduce --jobflow j-ABABABABAB --add-instance-group task --instance-type m2.4xlarge --instance-count 8 // Expand CORE Nodes from 8 to 12 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group core --instance-count 12 // Shrink TASK Nodes from 8 to 6 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group task --instance-count 6
  • 20. Terminate Job Flow // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
  • 21.
  • 22. Launch a Hive Cluster(Contextual Advertising Example) // Launch a Hive cluster with cluster compute nodes ./elastic-mapreduce --create --alive --hive-interactive --name "Hive Job Flow" --instance-type cc1.4xlarge Created job flow j-ABABABABAB // SSH to Master Node ./elastic-mapreduce --ssh --jobflow j-ABABABABAB // Run a Hive Session on the Master Node hadoop@domU-12-31-39-07-D2-14:~$ hive -d SAMPLE=s3://elasticmapreduce/samples/hive-ads -d DAY=2009-04-13 -d HOUR=08 -d NEXT_DAY=2009-04-13 -d NEXT_HOUR=09 -d OUTPUT=s3://mybucket/samples/output hive>
  • 23. Define Impressions Table Show Tables hive> show tables; OK Time taken: 3.51 seconds hive> Define Impressions Table ADD JAR ${SAMPLE}/libs/jsonserde.jar ; CREATE EXTERNAL TABLE impressions ( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ipstring ) PARTITIONED BY (dt string) ROW FORMAT serde'com.amazon.elasticmapreduce.JsonSerde' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION '${SAMPLE}/tables/impressions' ;
  • 24. Recover Partitions The table is partitioned based on time, we can tell Hive about the existence of a single partition using the following statement. ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05') ; If we were to query the table at this point the results would contain data from just the single partition. We can instruct Hive to recover all partitions by inspecting the data stored in Amazon S3 using the RECOVER PARTITIONS statement. ALTER TABLE impressions RECOVER PARTITIONS ;
  • 25. Define Clicks Table We follow the same process to define the clicks table and recover its partitions. CREATE EXTERNAL TABLE clicks ( impressionIdstring ) PARTITIONED BY (dt string) ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde' WITH SERDEPROPERTIES ( 'paths'='impressionId' ) LOCATION '${SAMPLE}/tables/clicks' ; ALTER TABLE clicks RECOVER PARTITIONS ;
  • 26. Define Output Table We are going to combine the clicks and impressions tables so that we have a record of whether or not each impression resulted in a click. We'd like this data stored in Amazon S3 so that it can be used as input to other job flows. CREATE EXTERNAL TABLE joined_impressions ( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string, clicked Boolean ) PARTITIONED BY (day string, hour string) STORED AS SEQUENCEFILE LOCATION '${OUTPUT}/joined_impressions' ;
  • 27. Define Local Impressions Table Next, we create some temporary tables in the job flow's local HDFS partition to store intermediate impression and click data. CREATE TABLE tmp_impressions( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string ) STORED AS SEQUENCEFILE ; We insert data from the impressions table for the time duration we're interested in. Note that because the impressions table is partitioned only the relevant partitions will be read. INSERT OVERWRITE TABLE tmp_impressions SELECT from_unixtime(cast((cast(i.requestBeginTime as bigint) / 1000) as int)) requestBeginTime, i.adId, i.impressionId, i.referrer, i.userAgent, i.userCookie, i.ip FROM impressions i WHERE i.dt >= '${DAY}-${HOUR}-00' AND i.dt < '${NEXT_DAY}-${NEXT_HOUR}-00' ;
  • 28. Define Local Clicks Table For clicks, we extend the period of time over which we join by 20 minutes. Meaning we accept a click that occurred up to 20 minutes after the impression. CREATE TABLE tmp_clicks( impressionIdstring ) STORED AS SEQUENCEFILE; INSERT OVERWRITE TABLE tmp_clicks SELECT impressionId FROM clicks c WHERE c.dt>= '${DAY}-${HOUR}-00' AND c.dt < '${NEXT_DAY}-${NEXT_HOUR}-20' ;
  • 29. Join Tables Now we combine the impressions and clicks tables using a left outer join. This way any impressions that did not result in a click are preserved. INSERT OVERWRITE TABLE joined_impressions PARTITION (day='${DAY}', hour='${HOUR}') SELECT i.requestBeginTime, i.adId, i.impressionId, i.referrer, i.userAgent, i.userCookie, i.ip, (c.impressionId is not null) clicked FROM tmp_impressionsi LEFT OUTER JOIN tmp_clicksc ON i.impressionId = c.impressionId ;
  • 30. Terminate Interactive Session // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
  • 31. Question & Answer Visithttp://aws.amazon.com/officehours to watch recorded sessions and to sign up for upcoming sessions.