SlideShare ist ein Scribd-Unternehmen logo
1 von 66
agenda overview (wifi: Guest/Stick@@4999)
08:00 AM Welcome
08:45 AM Introduction to Big Data @ AWS
10:00 AM Break
10:15 AM Data Collection and Storage
11:30 AM Break
11:45 AM Real-time Event Processing
01:00 PM Lunch
01:30 PM HPC in the Cloud
02:45 PM Break
03:00 PM Processing & Analytics
November 10, 2015
Herndon, VA
AWS big data platform
global footprint
Over 1 million active customers
across 190 countries
800+ government agencies
3,000+ educational institutions
11 regions
28 availability zones
52 edge locations
Everyday, AWS adds enough new server capacity to support
Amazon.com when it was a $7 billion global enterprise.
Gartner Magic Quadrant for
Cloud Infrastructure as a Service, Worldwide
Gartner “Magic Quadrant for Cloud Infrastructure as a Service, Worldwide,” Lydia Leong, Douglas Toombs, Bob Gill, May 18, 2015. This Magic Quadrant graphic was published by Gartner, Inc. as part of a larger research note
and should be evaluated in the context of the entire report. The Gartner report is available at http://aws.amazon.com/resources/analyst-reports/. Gartner does not endorse any vendor, product or service depicted in its research
publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should
not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
broad & deep core services
rich platform services
thousands of big data customers
big data portfolio
AnalyzeStoreCollect
Amazon Machine
Learning
Amazon Kinesis
Analytics
AWS Import/Export
AWS Direct Connect
Amazon Kinesis
Amazon Kinesis
Firehose
AWS Database
Migration
Amazon Glacier
Amazon S3
Amazon
CloudSearch
Amazon Dynamo DB
Amazon RDS,
Aurora
Amazon
ElasticSearch
AWS Data
Pipeline
Amazon Redshift
Amazon EMR Amazon EC2
Amazon
QuickSight
big data pipeline
Data Answers
Collect Process Analyze
Store
Collect Process Analyze
Store
primitive patterns
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
time data
primitive patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
AWS Lambda
KCL Apps
EMR Redshift
Machine
Learning
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Amazon
QuickSight
primitive patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
data collection and storage
File: media, log files (sets of records)
Stream: records (eg: device stats)
Transactional: database reads/writes
AppsDevicesLoggingFrameworks
AWS services – data collection and storage
S3
Kinesis
DynamoDB
RDS (Aurora)
benefits of streamlined data collection
Increase velocity of data
• Upgrade existing applications to log records rather
than files – driven by need for greater agility
• Build new applications that are designed for
streaming data from the outset
Example: social media analytics (reference architecture)
S3
$0.030/GB-Mo
Redshift
Starts at
$0.25/hour
EC2
Starts at
$0.02/hour
Glacier
$0.010/GB-Mo
Kinesis
$0.015/shard 1MB/s in; 2MB/out
$0.028/million puts
500MM tweets/day = ~ 5,800 tweets/sec
2k/tweet is ~12MB/sec (~1TB/day)
$0.015/hour per shard, $0.028/million PUTS
Kinesis cost is $0.765/hour
Redshift cost is $0.850/hour (for a 2TB node)
S3 cost is $1.28/hour (no compression)
Total: $2.895/hour
cost &
scale
benefits of streamlined data collection
• Instrument existing applications
• Inject code to log activity – “new big data”
• Example: WAPO Labs Social Reader (now Trove)
Existing
Application
DynamoDB table(s)
GET calls & Queries
PUT calls
Query(…
PutItem(…
benefits of streamlined data collection
Increase data granularity
Customers Devices Data Items Item Size Frequency
Challenge: compounding scale
Benefit: improved data quality
primitive patterns
AWS Lambda
KCL Apps
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
event processing – enabling capabilities
S3 Event Notifications
Kinesis stream
DynamoDB Streams
AWS Lambda
KCL Apps
real-time event processing
• Event-driven programming
• Trigger activities based on real-time input
Examples:
 Proactively detect hardware errors in device logs
 Identify fraud from activity logs
 Monitor performance SLAs
 Notify when inventory drops below a threshold
benefits of event processing
• Build / add real-time events
 Take action between data collection and analytics
• Alerts and notifications, performance and security
• Automated data enrichment (eg: aggregations)
• De-couple application modules
 Streamline development and maintenance
 Increase agility
• MVP + iterate on discrete components
Collect | Store | Analyze
Alert
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
EMR Redshift
Machine
Learning
NASDAQ
Legacy Data Warehouse
• Expensive ($1.16M annually)
• Limited capacity (1 year of data online)
• 4-8 billion rows inserted per trading day, storing:
 Orders
 Trades
 Quotes
 Market Data
 Security Master
 Membership
DW can be used to analyze
market share, client activity,
surveillance, power our billing,
and more…
NASDAQ
• 5.5B Records are loaded to
Amazon Redshift every day
• Security Requirements for
Client Side Encryption
• Historical Data - HDFS became
too expensive
 S3 + EMR to the Rescue
EMR & Redshift
Amazon Redshift has security built-in
• SSL to secure data in transit
• Encryption to secure data at rest
 AES-256; hardware accelerated
 All blocks on disks and in Amazon S3 encrypted
 HSM Support
• No direct access to compute nodes
• Audit logging, AWS CloudTrail, AWS KMS
integration
• Amazon VPC support
• SOC 1/2/3, PCI-DSS Level 1, FedRAMP, HIPAA
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Retail and
POS Analytics
Process 10’s of TB
in hours vs. 2
weeks
80-90% reduction
in costs
big data use cases
Internet of Things
Digital Advertising
Online Gaming
Log Analytics
Customer Value Scoring
Personalization Engine
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
TempTracker
bee hive monitoring
in the AWS cloud
temperature
sensor board
raspberry pi
micro server
waterproof
housing
Python (boto)
DynamoDBKinesis
App
Kinesis
ingestion
dashboard
Lambda
event
source
SNS
TempTracker: IoT sensor ingestion example
DynamoDB schema
hash range attributes
internal temperature
outside temperature
big data case study: Kaiten Sushiro
Kaiten Sushiro
• Kaiten Sushi Chain restaurant
• Gathering sensor data into Kinesis
Kaiten Sushiro data flow
2009 2010 2011 2012 2013
Move to AWS
cameras
Switch to
DynamoDB
IoT / connected devices
Simple video monitoring & security
Fast growth – “suddenly petabytes”
EC2 (live streaming)
S3 (CVR data)
DynamoDB (meta data)
CloudFront (CDN)
EMR (activity recognition)
applying analytics to
connected device data
VPC Subnet
MQTT Broker on
EC2 Instance
VPC
Internet
Gateway
EMR
Kinesis
DynamoDB
Redshift
Lambda
SNS
S3
Data Pipeline
backend analytics architecture for
connected device data
AWS big data ecosystem
S3
Kinesis
EMR
Redshift
Data Pipeline
DynamoDB
Collect Process & Analyze Visualize
AWS Professional Services
Partnering in Your Journey
Technical
Specialists
Specialty practices for
AWS skills transfer,
security, infrastructure
architecture,
application
optimization, analytics,
big data, and
operational integration
Advisory
Services
Portfolio strategy and
planning, cost/benefit
modeling, governance,
change management
and risk management
as it relates to
implementing the AWS
platform
Collaboration
Working together with
you and APN Premier
Partners you already
trust to provide you
with access to all
resources needed to
realize breakthrough
results
Proven
Process
Best practices and
patterns to help your
teams get the
foundation right, deploy
and migrate workloads,
and create a modern IT
operating model to
support your business
criteria for big data competency
Technology (ISV) Consulting (SI)
APN Membership Advanced Partner
AWS Support Business Level
Customer Success 4+ big data customer references
AWS certifications 4 AWS certified staff
Big Data Practice
Public reference to firm's solutions,
tools, and guidance on big data
Solution Review
• Product approved by AWS
Architect Review Board
• Available in 3+ AWS regions
• Public support statement
Minimum requirements to have a solution / service approved
big data partner solutions
Solutions vetted by the AWS Partner Competency Program
Data
Enablement
Move, synchronize,
cleanse, and manage data
Data Analysis &
Visualization
Turn data into actionable
insight and enhance
decision making
Infrastructure
Intelligence
Harness data generated
from your systems and
infrastructure
Advanced
Analytics
Anticipate future behaviors
and conduct what-if analysis
big data service offers
Service expertise vetted by the AWS Partner Competency Program
AWS marketplace
1-click deployment to launch, in
multiple regions around the world
Pay-as-you-go pricing with no
long term contracts required
2,000+ product listings to
browse, test and buy software
Enterprise software store for business users who need simplified procurement
Advanced Analytics
Database and Data Enablement
Business Intelligence
Amazon QuickSight
A very fast, cloud-powered, business
intelligence service for 1/10th the cost
of traditional BI software
$9 per user per month
with 1 year commitment
Business
User
Business
User
QuickSight
APIQuickSight UI
Mobile Devices Web Browsers
Partner BI Products
MetadataData PrepConnectors SuggestionsSPICE
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon EMRAmazon
Redshift
Amazon RDSFiles Third-party
Key Features
• Easy exploration of AWS data
• Fast insights with SPICE
 Super-fast, Parallel, In-memory, Calculation Engine
• Intuitive visualizations and transitions with
AutoGraph
• Native mobile experience
• Secure sharing and collaboration using StoryBoard
Easy Exploration of AWS Data
• Securely discover and connect to AWS data
• Quickly explore AWS data sources
 Relational databases
 NoSQL databases
 Amazon EMR, Amazon S3, files
 Streaming data sources
• Easily import data from any table or file
• Automatic detection of data types
Amazon EMR
Amazon Kinesis
Amazon Dynamo DB
Amazon Redshift
Amazon RDS
Amazon S3
File Upload
Third Party
Fast Insights with SPICE
• Super-fast, Parallel, In-memory, Calculation Engine
• 2 to 4x compression columnar data
• Compiled queries with machine code generation
• Rich calculations
• SQL-like syntax
• Very fast response time to queries
• Fully managed – No hardware or software to license
Intuitive Visualizations with AutoGraph
• Automatic detection of data types
• Optimal query generation
• Appropriate graph type selection
• Ability to customize the graph type
• Very fast response
Tell a Story with Your Data
• Enable interactive exploration
• Very fast response
• Capture the critical snapshot of analysis
• Build a sequence of analysis
• Share it securely
Native mobile experience
• iOS, Android
• Full experience on tablets
• Consumption experience on smart phones
• Very fast response
Dashboard
Embeddable
Amazon QuickSight Pricing
Standard Edition Enterprise Edition
Subscription Annual Monthly Annual Monthly
Price per user per month $9 $12 $18 $24
SPICE Capacity (GB)* 10 10 10 10
Additional SPICE
GB-month $0.25 $0.38
* Per user SPICE capacity is pooled across all users in an account. As an example, a customer with 100 user
subscriptions will get 1,000 GB of SPICE capacity for the account.
How Do I Get Started Using
Amazon QuickSight?
Sign-in
First analysis in about 60 seconds
Register for the Preview @
aws.amazon.com/quicksight
Thank you
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Aws ppt
Aws pptAws ppt
Aws ppt
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 

Andere mochten auch

Andere mochten auch (20)

The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sCloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2's
 
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
 
AWS re:Invent 2016: Predictive Security: Using Big Data to Fortify Your Defen...
AWS re:Invent 2016: Predictive Security: Using Big Data to Fortify Your Defen...AWS re:Invent 2016: Predictive Security: Using Big Data to Fortify Your Defen...
AWS re:Invent 2016: Predictive Security: Using Big Data to Fortify Your Defen...
 
Big Data y el sector salud
Big Data y el sector saludBig Data y el sector salud
Big Data y el sector salud
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Microservices on AWS: Divide & Conquer for Agility and Scalability
 Microservices on AWS: Divide & Conquer for Agility and Scalability Microservices on AWS: Divide & Conquer for Agility and Scalability
Microservices on AWS: Divide & Conquer for Agility and Scalability
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 

Ähnlich wie AWS Big Data Platform

Ähnlich wie AWS Big Data Platform (20)

Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

AWS Big Data Platform

  • 1. agenda overview (wifi: Guest/Stick@@4999) 08:00 AM Welcome 08:45 AM Introduction to Big Data @ AWS 10:00 AM Break 10:15 AM Data Collection and Storage 11:30 AM Break 11:45 AM Real-time Event Processing 01:00 PM Lunch 01:30 PM HPC in the Cloud 02:45 PM Break 03:00 PM Processing & Analytics
  • 2. November 10, 2015 Herndon, VA AWS big data platform
  • 3. global footprint Over 1 million active customers across 190 countries 800+ government agencies 3,000+ educational institutions 11 regions 28 availability zones 52 edge locations Everyday, AWS adds enough new server capacity to support Amazon.com when it was a $7 billion global enterprise.
  • 4. Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide Gartner “Magic Quadrant for Cloud Infrastructure as a Service, Worldwide,” Lydia Leong, Douglas Toombs, Bob Gill, May 18, 2015. This Magic Quadrant graphic was published by Gartner, Inc. as part of a larger research note and should be evaluated in the context of the entire report. The Gartner report is available at http://aws.amazon.com/resources/analyst-reports/. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
  • 5. broad & deep core services
  • 7. thousands of big data customers
  • 8. big data portfolio AnalyzeStoreCollect Amazon Machine Learning Amazon Kinesis Analytics AWS Import/Export AWS Direct Connect Amazon Kinesis Amazon Kinesis Firehose AWS Database Migration Amazon Glacier Amazon S3 Amazon CloudSearch Amazon Dynamo DB Amazon RDS, Aurora Amazon ElasticSearch AWS Data Pipeline Amazon Redshift Amazon EMR Amazon EC2 Amazon QuickSight
  • 9. big data pipeline Data Answers Collect Process Analyze Store
  • 10. Collect Process Analyze Store primitive patterns Data Collection and Storage Data Processing Event Processing Data Analysis
  • 11. primitive patterns Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis time data
  • 12. primitive patterns S3 Kinesis DynamoDB RDS (Aurora) AWS Lambda KCL Apps EMR Redshift Machine Learning Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis Amazon QuickSight
  • 13. primitive patterns S3 Kinesis DynamoDB RDS (Aurora) Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 14. data collection and storage File: media, log files (sets of records) Stream: records (eg: device stats) Transactional: database reads/writes AppsDevicesLoggingFrameworks
  • 15. AWS services – data collection and storage S3 Kinesis DynamoDB RDS (Aurora)
  • 16. benefits of streamlined data collection Increase velocity of data • Upgrade existing applications to log records rather than files – driven by need for greater agility • Build new applications that are designed for streaming data from the outset Example: social media analytics (reference architecture)
  • 17.
  • 19. 500MM tweets/day = ~ 5,800 tweets/sec 2k/tweet is ~12MB/sec (~1TB/day) $0.015/hour per shard, $0.028/million PUTS Kinesis cost is $0.765/hour Redshift cost is $0.850/hour (for a 2TB node) S3 cost is $1.28/hour (no compression) Total: $2.895/hour cost & scale
  • 20. benefits of streamlined data collection • Instrument existing applications • Inject code to log activity – “new big data” • Example: WAPO Labs Social Reader (now Trove) Existing Application DynamoDB table(s) GET calls & Queries PUT calls Query(… PutItem(…
  • 21. benefits of streamlined data collection Increase data granularity Customers Devices Data Items Item Size Frequency Challenge: compounding scale Benefit: improved data quality
  • 22. primitive patterns AWS Lambda KCL Apps Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 23. event processing – enabling capabilities S3 Event Notifications Kinesis stream DynamoDB Streams AWS Lambda KCL Apps
  • 24. real-time event processing • Event-driven programming • Trigger activities based on real-time input Examples:  Proactively detect hardware errors in device logs  Identify fraud from activity logs  Monitor performance SLAs  Notify when inventory drops below a threshold
  • 25. benefits of event processing • Build / add real-time events  Take action between data collection and analytics • Alerts and notifications, performance and security • Automated data enrichment (eg: aggregations) • De-couple application modules  Streamline development and maintenance  Increase agility • MVP + iterate on discrete components Collect | Store | Analyze Alert
  • 26. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis primitive patterns EMR Redshift Machine Learning
  • 27. NASDAQ Legacy Data Warehouse • Expensive ($1.16M annually) • Limited capacity (1 year of data online) • 4-8 billion rows inserted per trading day, storing:  Orders  Trades  Quotes  Market Data  Security Master  Membership DW can be used to analyze market share, client activity, surveillance, power our billing, and more…
  • 28. NASDAQ • 5.5B Records are loaded to Amazon Redshift every day • Security Requirements for Client Side Encryption • Historical Data - HDFS became too expensive  S3 + EMR to the Rescue EMR & Redshift
  • 29. Amazon Redshift has security built-in • SSL to secure data in transit • Encryption to secure data at rest  AES-256; hardware accelerated  All blocks on disks and in Amazon S3 encrypted  HSM Support • No direct access to compute nodes • Audit logging, AWS CloudTrail, AWS KMS integration • Amazon VPC support • SOC 1/2/3, PCI-DSS Level 1, FedRAMP, HIPAA 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC
  • 30. Retail and POS Analytics Process 10’s of TB in hours vs. 2 weeks 80-90% reduction in costs
  • 31. big data use cases Internet of Things Digital Advertising Online Gaming Log Analytics Customer Value Scoring Personalization Engine Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 33. temperature sensor board raspberry pi micro server waterproof housing
  • 37. big data case study: Kaiten Sushiro
  • 38. Kaiten Sushiro • Kaiten Sushi Chain restaurant • Gathering sensor data into Kinesis
  • 40. 2009 2010 2011 2012 2013 Move to AWS cameras Switch to DynamoDB IoT / connected devices Simple video monitoring & security Fast growth – “suddenly petabytes”
  • 41. EC2 (live streaming) S3 (CVR data) DynamoDB (meta data) CloudFront (CDN) EMR (activity recognition)
  • 42. applying analytics to connected device data VPC Subnet MQTT Broker on EC2 Instance VPC Internet Gateway EMR Kinesis DynamoDB Redshift Lambda SNS S3 Data Pipeline
  • 43. backend analytics architecture for connected device data
  • 44. AWS big data ecosystem S3 Kinesis EMR Redshift Data Pipeline DynamoDB Collect Process & Analyze Visualize
  • 45. AWS Professional Services Partnering in Your Journey Technical Specialists Specialty practices for AWS skills transfer, security, infrastructure architecture, application optimization, analytics, big data, and operational integration Advisory Services Portfolio strategy and planning, cost/benefit modeling, governance, change management and risk management as it relates to implementing the AWS platform Collaboration Working together with you and APN Premier Partners you already trust to provide you with access to all resources needed to realize breakthrough results Proven Process Best practices and patterns to help your teams get the foundation right, deploy and migrate workloads, and create a modern IT operating model to support your business
  • 46. criteria for big data competency Technology (ISV) Consulting (SI) APN Membership Advanced Partner AWS Support Business Level Customer Success 4+ big data customer references AWS certifications 4 AWS certified staff Big Data Practice Public reference to firm's solutions, tools, and guidance on big data Solution Review • Product approved by AWS Architect Review Board • Available in 3+ AWS regions • Public support statement Minimum requirements to have a solution / service approved
  • 47. big data partner solutions Solutions vetted by the AWS Partner Competency Program Data Enablement Move, synchronize, cleanse, and manage data Data Analysis & Visualization Turn data into actionable insight and enhance decision making Infrastructure Intelligence Harness data generated from your systems and infrastructure Advanced Analytics Anticipate future behaviors and conduct what-if analysis
  • 48. big data service offers Service expertise vetted by the AWS Partner Competency Program
  • 49. AWS marketplace 1-click deployment to launch, in multiple regions around the world Pay-as-you-go pricing with no long term contracts required 2,000+ product listings to browse, test and buy software Enterprise software store for business users who need simplified procurement Advanced Analytics Database and Data Enablement Business Intelligence
  • 51. A very fast, cloud-powered, business intelligence service for 1/10th the cost of traditional BI software
  • 52. $9 per user per month with 1 year commitment
  • 53. Business User Business User QuickSight APIQuickSight UI Mobile Devices Web Browsers Partner BI Products MetadataData PrepConnectors SuggestionsSPICE Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon EMRAmazon Redshift Amazon RDSFiles Third-party
  • 54. Key Features • Easy exploration of AWS data • Fast insights with SPICE  Super-fast, Parallel, In-memory, Calculation Engine • Intuitive visualizations and transitions with AutoGraph • Native mobile experience • Secure sharing and collaboration using StoryBoard
  • 55. Easy Exploration of AWS Data • Securely discover and connect to AWS data • Quickly explore AWS data sources  Relational databases  NoSQL databases  Amazon EMR, Amazon S3, files  Streaming data sources • Easily import data from any table or file • Automatic detection of data types Amazon EMR Amazon Kinesis Amazon Dynamo DB Amazon Redshift Amazon RDS Amazon S3 File Upload Third Party
  • 56. Fast Insights with SPICE • Super-fast, Parallel, In-memory, Calculation Engine • 2 to 4x compression columnar data • Compiled queries with machine code generation • Rich calculations • SQL-like syntax • Very fast response time to queries • Fully managed – No hardware or software to license
  • 57.
  • 58. Intuitive Visualizations with AutoGraph • Automatic detection of data types • Optimal query generation • Appropriate graph type selection • Ability to customize the graph type • Very fast response
  • 59. Tell a Story with Your Data • Enable interactive exploration • Very fast response • Capture the critical snapshot of analysis • Build a sequence of analysis • Share it securely
  • 60. Native mobile experience • iOS, Android • Full experience on tablets • Consumption experience on smart phones • Very fast response
  • 63. Amazon QuickSight Pricing Standard Edition Enterprise Edition Subscription Annual Monthly Annual Monthly Price per user per month $9 $12 $18 $24 SPICE Capacity (GB)* 10 10 10 10 Additional SPICE GB-month $0.25 $0.38 * Per user SPICE capacity is pooled across all users in an account. As an example, a customer with 100 user subscriptions will get 1,000 GB of SPICE capacity for the account.
  • 64. How Do I Get Started Using Amazon QuickSight?
  • 65. Sign-in First analysis in about 60 seconds Register for the Preview @ aws.amazon.com/quicksight

Hinweis der Redaktion

  1. 500MM tweets/day = 5,800 tweets/second 2.5k/tweet =
  2. 500MM tweets/day = 5,800 tweets/second 2.5k/tweet =
  3. DASHBOARDING Compose multiple visuals from different tables and sources into a dashboard that you can publish and distribute
  4. Imagery on