SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Big Data in the Cloud 
Russell Nash 
Solutions Architect, Amazon Web Services, APAC 
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Big picture slide
Hadoop 
MPP 
NoSQL 
STREAMING
Structure 
High Low 
Large 
Size 
Small 
Traditional 
Database 
Hadoop 
NoSQL 
MPP DW
Hadoop 
MPP 
NoSQL 
Structure 
Latency 
Interfaces
Background 
2004 – Map Reduce 
2006 – Hadoop
Input 
File 
Hadoop cluster 
Func;ons 
1. Very Flexible 
2. Very Scalable 
3. Often Transient 
Output
Input 
file map 
reduce 
Output 
file
Input 
file map 
reduce 
Output 
file 
Input 
file map 
reduce 
Output 
file 
Input 
file map 
reduce 
Output 
file
Big Data Verticals and Use cases 
Media/ 
Advertising 
Targeted 
Advertising 
Image and 
Video 
Processing 
Oil & Gas 
Seismic 
Analysis 
Retail 
Recommendations 
Transactions 
Analysis 
Life 
Sciences 
Genome 
Analysis 
Financial 
Services 
Monte Carlo 
Simulations 
Risk 
Analysis 
Security 
Anti-virus 
Fraud 
Detection 
Image 
Recognition 
Social 
Network/ 
Gaming 
User 
Demographics 
Usage 
analysis 
In-game 
metrics
Deployment Options 
On-premise 
Cloud 
Managed on Cloud
Elas;c 
MapReduce 
Manageability 
Scalability 
Cost
400 GB of logs per day 
~12 Terabytes per month
1) Load log file data for six 
months of user search history 
into Amazon S3 
Amazon S3 
Search ID Search Text Final Selection 
12423451 westen Westin 
14235235 wisten Westin 
54332232 westenn Westin 
12423451 
14235235 
54332232 
12423451 
14235235 
54332232 
12423451 
14235235 
54332232 
12423451 
14235235 
54332232 
12423451
Amazon S3 Amazon EMR 
Log Files 
2) Spin up a 200 node cluster 
Hadoop Cluster
3) 200 nodes simultaneously analyze 
this data looking for common 
misspellings 
… this takes a few hours 
Hadoop Cluster 
Amazon S3 Amazon EMR
Amazon S3 Amazon EMR 
4) New common misspellings and 
suggestions loaded back into S3 
Hadoop Cluster 
Log Files
Amazon S3 Amazon EMR 
5) When the job is done, the 
cluster is shut down. 
Log Files
The Hadoop Ecosystem
Trends 
SQL on Hadoop 
Spark
Hadoop 
MPP 
NoSQL 
Structure 
Latency 
Interfaces 
Any 
Mins-Hours 
Programming 
SQL-Like 
Tools
Background 
SQL Databases 
for analytical workloads 
Performance 
Scalability 
Ease of Use 
Cost
Leader 
Node 
Compute 
Node 
Compute 
Node 
Compute 
Node 
BI Tools 
1. SQL 
2. High Performance 
3. Broad Toolset
Deployment Options 
On-premise 
Cloud 
Managed on Cloud
Amazon 
RedshiA 
Manageability 
Scalability 
Cost
Performance Evaluation on 2B Rows 
Aggregate 
by 
month 
Traditional SQL 
Database 
02:08:35 
00:35:46 
00:00:12
Hadoop 
MPP 
NoSQL 
Structure 
Latency 
Interfaces 
Any Full 
Mins-Hours Seconds-Minutes 
Programming 
SQL-Like 
Tools 
SQL 
BI Tools
Background 
Databases for 
webscale transactions 
Performance 
Flexibility
ID Age State 
123 20 CA 
345 25 WA 
678 40 FL 
Relational Table 
ID Attributes 
123 Age:20, State:CA 
345 Age:25, Country: Australia, Gender: F, Smoker: No 
678 Age:40 
Non-Relational Table
Deployment Options 
On-premise 
Cloud 
Managed on Cloud
DynamoDB 
Manageability 
Scalability 
Cost
digital advertising 
real-time bidding
Hadoop 
MPP 
NoSQL 
Structure 
Latency 
Interfaces 
Any Full Semi 
Mins-Hours Seconds-Minutes Sub-second 
Programming 
SQL-Like 
Tools 
SQL Programming 
Tools
Streaming 
Analy;cs
Data 
Sources 
App.4 
[Machine 
Learning] 
AWS 
Endpoint 
App.1 
[Aggregate 
& 
De-­‐Duplicate] 
Data 
Sources 
Data 
Sources 
Data 
Sources 
App.2 
[Metric 
ExtracIon] 
S3 
DynamoDB 
Redshift 
App.3 
[Sliding 
Window 
Analysis] 
Data 
Sources 
Availability 
Zone 
Availability 
Zone 
Shard 1 
Shard 2 
Shard N 
Availability 
Zone 
Amazon Kinesis 
EMR
• Sensor networks analytics 
• Ad network analytics 
• Log centralization 
• Click stream analysis 
• Hardware and software appliance metrics 
• …more…
Amazon Mobile Analytics 
Fast: get your data within an hour 
Automatic MAU, DAU, session and 
retention reports 
Design and track custom app events 
Data is not mined or sold by Amazon
Expand your skills with AWS 
Certification 
Exams 
Validate your proven 
technical expertise with 
the AWS platform 
aws.amazon.com/certification 
On-Demand 
Resources 
Videos & Labs 
Get hands-on practice 
working with AWS 
technologies in a live 
environment 
aws.amazon.com/training/ 
self-paced-labs 
Instructor-Led 
Courses 
Training Classes 
Expand your technical 
expertise to design, deploy, 
and operate scalable, 
efficient applications on AWS 
aws.amazon.com/training
Big Data Tutorials 
aws.amazon.com/big-data 
Redshift Free Trial 
aws.amazon.com/redshift/free-trial
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Weitere ähnliche Inhalte

Was ist angesagt?

AWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv Singhal
AWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv SinghalAWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv Singhal
AWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv Singhal
Amazon Web Services Korea
 

Was ist angesagt? (20)

(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
AWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv Singhal
AWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv SinghalAWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv Singhal
AWS Innovate: Best of Both Worlds: Leveraging Hybrid IT with AWS- Dhruv Singhal
 
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
 
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
 
Application Migrations at Scale
Application Migrations at ScaleApplication Migrations at Scale
Application Migrations at Scale
 
Running Microsoft Enterprise Workloads on Amazon Web Services
Running Microsoft Enterprise Workloads on Amazon Web ServicesRunning Microsoft Enterprise Workloads on Amazon Web Services
Running Microsoft Enterprise Workloads on Amazon Web Services
 
Using Microsoft Active Directory across On-premises and Cloud Workloads
Using Microsoft Active Directory across On-premises and Cloud WorkloadsUsing Microsoft Active Directory across On-premises and Cloud Workloads
Using Microsoft Active Directory across On-premises and Cloud Workloads
 
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
 
Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016
 
Managing WorkSpaces at Scale | AWS Public Sector Summit 2016
Managing WorkSpaces at Scale | AWS Public Sector Summit 2016Managing WorkSpaces at Scale | AWS Public Sector Summit 2016
Managing WorkSpaces at Scale | AWS Public Sector Summit 2016
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
 
Getting Started with AWS Lambda and the Serverless Cloud by Jim Tran, Princip...
Getting Started with AWS Lambda and the Serverless Cloud by Jim Tran, Princip...Getting Started with AWS Lambda and the Serverless Cloud by Jim Tran, Princip...
Getting Started with AWS Lambda and the Serverless Cloud by Jim Tran, Princip...
 
WKS420 Create an IoT Gateway & Establish a Data Pipeline to AWS IoT with Intel
WKS420 Create an IoT Gateway & Establish a Data Pipeline to AWS IoT with IntelWKS420 Create an IoT Gateway & Establish a Data Pipeline to AWS IoT with Intel
WKS420 Create an IoT Gateway & Establish a Data Pipeline to AWS IoT with Intel
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and Docker
 
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
 

Andere mochten auch

Andere mochten auch (6)

Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 
Jump Start your First Hour with AWS
Jump Start your First Hour with AWSJump Start your First Hour with AWS
Jump Start your First Hour with AWS
 
Secure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platformSecure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platform
 
Flir k65
Flir k65Flir k65
Flir k65
 
LAS FIGURAS Y SOLIDOS GEOMETRICOS
LAS FIGURAS Y SOLIDOS GEOMETRICOSLAS FIGURAS Y SOLIDOS GEOMETRICOS
LAS FIGURAS Y SOLIDOS GEOMETRICOS
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 

Ähnlich wie Big Data in the Cloud

Ähnlich wie Big Data in the Cloud (20)

Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
AWS re:Invent 2016 recap (part 1)
AWS re:Invent 2016 recap (part 1)AWS re:Invent 2016 recap (part 1)
AWS re:Invent 2016 recap (part 1)
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with Redis
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Big Data in the Cloud

  • 1. Big Data in the Cloud Russell Nash Solutions Architect, Amazon Web Services, APAC © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 3. Hadoop MPP NoSQL STREAMING
  • 4. Structure High Low Large Size Small Traditional Database Hadoop NoSQL MPP DW
  • 5. Hadoop MPP NoSQL Structure Latency Interfaces
  • 6. Background 2004 – Map Reduce 2006 – Hadoop
  • 7. Input File Hadoop cluster Func;ons 1. Very Flexible 2. Very Scalable 3. Often Transient Output
  • 8. Input file map reduce Output file
  • 9. Input file map reduce Output file Input file map reduce Output file Input file map reduce Output file
  • 10. Big Data Verticals and Use cases Media/ Advertising Targeted Advertising Image and Video Processing Oil & Gas Seismic Analysis Retail Recommendations Transactions Analysis Life Sciences Genome Analysis Financial Services Monte Carlo Simulations Risk Analysis Security Anti-virus Fraud Detection Image Recognition Social Network/ Gaming User Demographics Usage analysis In-game metrics
  • 11. Deployment Options On-premise Cloud Managed on Cloud
  • 12. Elas;c MapReduce Manageability Scalability Cost
  • 13.
  • 14. 400 GB of logs per day ~12 Terabytes per month
  • 15.
  • 16. 1) Load log file data for six months of user search history into Amazon S3 Amazon S3 Search ID Search Text Final Selection 12423451 westen Westin 14235235 wisten Westin 54332232 westenn Westin 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451
  • 17. Amazon S3 Amazon EMR Log Files 2) Spin up a 200 node cluster Hadoop Cluster
  • 18. 3) 200 nodes simultaneously analyze this data looking for common misspellings … this takes a few hours Hadoop Cluster Amazon S3 Amazon EMR
  • 19. Amazon S3 Amazon EMR 4) New common misspellings and suggestions loaded back into S3 Hadoop Cluster Log Files
  • 20. Amazon S3 Amazon EMR 5) When the job is done, the cluster is shut down. Log Files
  • 22. Trends SQL on Hadoop Spark
  • 23. Hadoop MPP NoSQL Structure Latency Interfaces Any Mins-Hours Programming SQL-Like Tools
  • 24. Background SQL Databases for analytical workloads Performance Scalability Ease of Use Cost
  • 25. Leader Node Compute Node Compute Node Compute Node BI Tools 1. SQL 2. High Performance 3. Broad Toolset
  • 26. Deployment Options On-premise Cloud Managed on Cloud
  • 27. Amazon RedshiA Manageability Scalability Cost
  • 28. Performance Evaluation on 2B Rows Aggregate by month Traditional SQL Database 02:08:35 00:35:46 00:00:12
  • 29. Hadoop MPP NoSQL Structure Latency Interfaces Any Full Mins-Hours Seconds-Minutes Programming SQL-Like Tools SQL BI Tools
  • 30. Background Databases for webscale transactions Performance Flexibility
  • 31. ID Age State 123 20 CA 345 25 WA 678 40 FL Relational Table ID Attributes 123 Age:20, State:CA 345 Age:25, Country: Australia, Gender: F, Smoker: No 678 Age:40 Non-Relational Table
  • 32. Deployment Options On-premise Cloud Managed on Cloud
  • 35. Hadoop MPP NoSQL Structure Latency Interfaces Any Full Semi Mins-Hours Seconds-Minutes Sub-second Programming SQL-Like Tools SQL Programming Tools
  • 37. Data Sources App.4 [Machine Learning] AWS Endpoint App.1 [Aggregate & De-­‐Duplicate] Data Sources Data Sources Data Sources App.2 [Metric ExtracIon] S3 DynamoDB Redshift App.3 [Sliding Window Analysis] Data Sources Availability Zone Availability Zone Shard 1 Shard 2 Shard N Availability Zone Amazon Kinesis EMR
  • 38. • Sensor networks analytics • Ad network analytics • Log centralization • Click stream analysis • Hardware and software appliance metrics • …more…
  • 39. Amazon Mobile Analytics Fast: get your data within an hour Automatic MAU, DAU, session and retention reports Design and track custom app events Data is not mined or sold by Amazon
  • 40. Expand your skills with AWS Certification Exams Validate your proven technical expertise with the AWS platform aws.amazon.com/certification On-Demand Resources Videos & Labs Get hands-on practice working with AWS technologies in a live environment aws.amazon.com/training/ self-paced-labs Instructor-Led Courses Training Classes Expand your technical expertise to design, deploy, and operate scalable, efficient applications on AWS aws.amazon.com/training
  • 41. Big Data Tutorials aws.amazon.com/big-data Redshift Free Trial aws.amazon.com/redshift/free-trial
  • 42. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.