SlideShare ist ein Scribd-Unternehmen logo
1 von 87
Downloaden Sie, um offline zu lesen
Managing Big Data in the AWS Cloud 
Siva Raghupathy 
Principal Solutions Architect 
Amazon Web Services
Agenda 
• Big data challenges 
• AWS big data portfolio 
• Architectural considerations 
• Customer success stories 
• Resources to help you get started 
• Q&A
Data Volume, Velocity, & Variety 
• 4.4 zettabytes (ZB) of data exists 
in the digital universe today 
– 1 ZB = 1 billion terabytes 
• 450 billion transaction per day by 
2020 
• More unstructured data than 
structured data GB 
TB 
PB 
ZB 
EB 
1990 2000 2010 2020
Big Data 
• Hourly server logs: how your systems were 
misbehaving an hour ago 
• Weekly / Monthly Bill: What you spent this 
past billing cycle? 
• Daily customer-preferences report from your 
web-site’s click stream: tells you what deal 
or ad to try next time 
• Daily fraud reports: tells you if there was 
fraud yesterday 
Real-time Big Data 
• Real-time metrics: what just went wrong 
now 
• Real-time spending alerts/caps: 
guaranteeing you can’t overspend 
• Real-time analysis: tells you what to offer 
the current customer now 
• Real-time detection: blocks fraudulent use 
now 
Big Data : Best Served Fresh
Data Analysis Gap 
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares 
Generated data 
Available for analysis 
Data volume - Gap 
1990 2000 2010 2020
Big Data 
Potentially massive datasets 
Iterative, experimental style of data 
manipulation and analysis 
Frequently not a steady-state workload; 
peaks and valleys 
Time to results is key 
Hard to configure/manage 
AWS Cloud 
Massive, virtually unlimited capacity 
Iterative, experimental style of 
infrastructure deployment/usage 
At its most efficient with highly 
variable workloads 
Parallel compute clusters from singe 
data source 
Managed services
AWS Big Data Portfolio 
Collect / Ingest 
Kinesis 
Store Process / Analyze 
Visualize / Report 
EMR EC2 
Redshift Data Pipeline 
S3 
DynamoDB 
Glacier 
RDS 
Import Export 
Direct Connect 
Amazon SQS
Ingest: The act of collecting and storing data
Why Data Ingest Tools? 
• Data ingest tools convert 
random streams of data into 
fewer set of sequential streams 
– Sequential streams are easier to 
process 
– Easier to scale 
– Easier to persist 
Processing 
Processing 
Processing 
Processing 
Processing 
Kafka 
Or 
Kinesis 
Processing
Data Ingest Tools 
• Facebook Scribe  Data collectors 
• Apache Kafka  Data collectors 
• Apache Flume  Data Movement and Transformation 
• Amazon Kinesis  Data collectors
Real-time processing of streaming data 
High throughput 
Elastic 
Easy to use 
Connectors for EMR, S3, Redshift, DynamoDB 
Amazon 
Kinesis
AmAamzaozno Kn iKneinseiss iAs rAchrcitheictetucrtuere 
AZ AZ AZ 
Durable, highly consistent storage replicates data 
across three data centers (availability zones) 
Amazon Web Services 
Aggregate and 
archive to S3 
Millions of 
sources producing 
100s of terabytes 
per hour 
Front 
End 
Authentication 
Authorization 
Ordered stream 
of events supports 
multiple readers 
Real-time 
dashboards 
and alarms 
Machine learning 
algorithms or 
sliding window 
analytics 
Aggregate analysis 
in Hadoop or a 
data warehouse 
Inexpensive: $0.028 per million puts
Kinesis Stream: 
Managed ability to capture and store data 
• Streams are made of Shards 
• Each Shard ingests data up to 
1MB/sec, and up to 1000 TPS 
• Each Shard emits up to 2 MB/sec 
• All data is stored for 24 hours 
• Scale Kinesis streams by adding or 
removing Shards 
• Replay data inside of 24Hr. Window
Simple Put interface to store data in Kinesis 
• Producers use a PUT call to store data in a Stream 
• PutRecord {Data, PartitionKey, StreamName} 
• A Partition Key is supplied by producer and used to 
distribute the PUTs across Shards 
• Kinesis MD5 hashes supplied partition key over the hash 
key range of a Shard 
• A unique Sequence # is returned to the Producer upon a 
successful PUT call
Building Kinesis Processing Apps: Kinesis Client Library 
Client library for fault-tolerant, at least-once, Continuous Processing 
o Java client library, source available on Github 
o Build & Deploy app with KCL on your EC2 instance(s) 
o KCL is intermediary b/w your application & stream 
 Automatically starts a Kinesis Worker for each shard 
 Simplifies reading by abstracting individual shards 
 Increase / Decrease Workers as # of shards changes 
 Checkpoints to keep track of a Worker’s location in the 
stream, Restarts Workers if they fail 
o Integrates with AutoScaling groups to redistribute workers 
to new instances
Sending & Reading Data from Kinesis Streams 
Sending Reading 
HTTP Post 
AWS SDK 
LOG4J 
Flume 
Fluentd 
Get* APIs 
Kinesis Client Library 
+ 
Connector Library 
Apache Storm 
Amazon Elastic 
MapReduce 
Write Read
AWS Partners for Data Load and Transformation 
Hparser, Big Data Edition 
Flume, Sqoop
Storage
Storage 
Structured – Simple Query 
NoSQL 
Amazon DynamoDB 
Cache 
Amazon ElastiCache 
(Memcached, Redis) 
Structured – Complex Query 
SQL 
Amazon RDS 
Data Warehouse 
Amazon Redshift 
Search 
Amazon CloudSearch 
Unstructured – No Query 
Cloud Storage 
Amazon S3 
Amazon Glacier 
Unstructured – Custom Query 
Hadoop/HDFS 
Amazon Elastic Map Reduce 
Data Structure Complexity 
Query Structure Complexity
Store anything 
Object storage 
Scalable 
Designed for 99.999999999% durability 
Amazon 
S3
Why is Amazon S3 good for Big Data? 
• No limit on the number of Objects 
• Object size up to 5TB 
• Central data storage for all systems 
• High bandwidth 
• 99.999999999% durability 
• Versioning, Lifecycle Policies 
• Glacier Integration
Amazon S3 Best Practices 
• Use random hash prefix for keys 
• Ensure a random access pattern 
• Use Amazon CloudFront for high throughput GETs and PUTs 
• Leverage the high durability, high throughput design of Amazon S3 for 
backup and as a common storage sink 
• Durable sink between data services 
• Supports de-coupling and asynchronous delivery 
• Consider RRS for lower cost, lower durability storage of derivatives or copies 
• Consider parallel threads and multipart upload for faster writes 
• Consider parallel threads and range get for faster reads
Aggregate All Data in S3 Surrounded by a collection of the right tools 
EMR Kinesis 
Data Pipeline 
Redshift DynamoDB RDS 
Cassandra Storm Spark Streaming 
Amazon 
S3 
Amazon S3
Fully-managed NoSQL database service 
Built on solid-state drives (SSDs) 
Consistent low latency performance 
Any throughput rate 
No storage limits 
Amazon 
DynamoDB
DynamoDB Concepts 
table 
items 
attributes 
schema-less 
schema is defined per attribute
DynamoDB: Access and Query Model 
• Two primary key options 
• Hash key: Key lookups: “Give me the status for user abc” 
• Composite key (Hash with Range): “Give me all the status updates for user ‘abc’ 
that occurred within the past 24 hours” 
• Support for multiple data types 
– String, number, binary… or sets of strings, numbers, or binaries 
• Supports both strong and eventual consistency 
– Choose your consistency level when you make the API call 
– Different parts of your app can make different choices 
• Global Secondary Indexes
DynamoDB: High Availability and Durability
What does DynamoDB handle for me? 
• Scaling without down-time 
• Automatic sharding 
• Security inspections, patches, upgrades 
• Automatic hardware failover 
• Multi-AZ replication 
• Hardware configuration designed specifically for DynamoDB 
• Performance tuning 
…and a lot more
Amazon DynamoDB Best Practices 
• Keep item size small 
• Store metadata in Amazon DynamoDB and blobs in Amazon S3 
• Use a table with a hash key for extremely high scale 
• Use hash-range key to model 
– 1:N relationships 
– Multi-tenancy 
• Avoid hot keys and hot partitions 
• Use table per day, week, month etc. for storing time series data 
• Use conditional updates
Relational Databases 
Fully managed; zero admin 
MySQL, PostgreSQL, Oracle & SQL Server 
Amazon 
RDS
Process and Analyze
Processing Frameworks 
• Batch Processing 
– Take large amount (>100TB) of cold data and ask questions 
– Takes hours to get answers back 
• Stream Processing (real-time) 
– Take small amount of hot data and ask questions 
– Takes short amount of time to get your answer back
Processing Frameworks 
• Batch Processing 
– Amazon EMR (Hadoop) 
– Amazon Redshift 
• Stream Processing 
– Spark Streaming 
– Storm
Columnar data warehouse 
ANSI SQL compatible 
Massively parallel 
Petabyte scale 
Fully-managed 
Very cost-effective 
Amazon 
Redshift
Amazon Redshift architecture 
• Leader Node 
– SQL endpoint 
– Stores metadata 
– Coordinates query execution 
• Compute Nodes 
– Local, columnar storage 
– Execute queries in parallel 
– Load, backup, restore via 
Amazon S3 
– Parallel load from Amazon DynamoDB 
• Hardware optimized for data processing 
• Two hardware platforms 
– DW1: HDD; scale from 2TB to 1.6PB 
– DW2: SSD; scale from 160GB to 256TB 
10 GigE 
(HPC) 
Ingestion 
Backup 
Restore 
JDBC/ODBC
Amazon Redshift Best Practices 
• Use COPY command to load large data sets from Amazon S3, Amazon 
DynamoDB, Amazon EMR/EC2/Unix/Linux hosts 
– Split your data into multiple files 
– Use GZIP or LZOP compression 
– Use manifest file 
• Choose proper sort key 
– Range or equality on WHERE clause 
• Choose proper distribution key 
– Join column, foreign key or largest dimension, group by column
Hadoop/HDFS clusters 
Hive, Pig, Impala, HBase 
Easy to use; fully managed 
On-demand and spot pricing 
Tight integration with S3, 
DynamoDB, and Kinesis 
Amazon 
Elastic 
MapReduce
EMR Cluster 
S3 
1. Put the data 
into S3 
2. Choose: Hadoop distribution, # 
of nodes, types of nodes, Hadoop 
apps like Hive/Pig/HBase 
4. Get the output 
from S3 
3. Launch the cluster using 
the EMR console, CLI, SDK, or 
APIs 
How Does EMR Work?
EMR Cluster 
EMR 
S3 
You can easily resize the 
cluster 
And launch parallel 
clusters using the same 
data 
How Does EMR Work?
EMR Cluster 
EMR 
S3 
Use Spot 
nodes to save 
time and 
money 
How Does EMR Work?
The Hadoop Ecosystem works inside of EMR
Amazon EMR Best Practices 
• Balance transient vs persistent clusters 
to get the best TCO 
• Leverage Amazon S3 integration 
– Consistent View for EMRFS 
• Use Compression (LZO is a good pick) 
• Avoid small files (< 100MB; s3distcp can help!) 
• Size cluster to suit each job 
• Use EC2 Spot Instances
Amazon EMR Nodes and Size 
• Tuning cluster size can be more efficient than tuning Hadoop code 
• Use m1 and c1 family for functional testing 
• Use m3 and c3 xlarge and larger nodes for production workloads 
• Use cc2/c3 for memory and CPU intensive jobs 
• hs1, hi1, i2 instances for HDFS workloads 
• Prefer a smaller cluster of larger nodes
Partners – Analytics (Scientific, algorithmic, predictive, etc)
Visualize
Partners - BI & Data Visualization
Putting All The AWS Data Tools Together & 
Architectural Considerations
One tool to 
rule them all
Data Characteristics: Hot, Warm, Cold 
Hot Warm Cold 
Volume MB–GB GB–TB PB 
Item size B–KB KB–MB KB–TB 
Latency ms ms, sec min, hrs 
Durability Low–High High Very High 
Request rate Very High High Low 
Cost/GB $$-$ $-¢¢ ¢
Average 
latency 
Data 
volume 
Item size 
Request 
rate 
Cost 
($/GB/month) 
Durability 
Elasti- 
Cache 
ms 
GB 
B-KB 
Very 
High 
$$ 
Low - 
Moderate 
Amazon 
DynamoDB 
ms 
GB-TBs 
(no limit) 
B-KB 
(64 KB max) 
Very 
High 
¢¢ 
Very High 
Amazon 
RDS 
ms.sec 
GB-TB 
(3 TB 
max) 
KB 
(~rowsize) 
High 
¢¢ 
High 
Cloud 
Search 
ms.sec 
GB-TB 
KB 
(1 MB max) 
High 
$ 
High 
Amazon 
Redshift 
sec.min 
TB-PB 
(1.6 PB max) 
KB 
(64 K max) 
Low 
¢ 
High 
Amazon 
EMR (Hive) 
sec.min, 
hrs 
GB-PB 
(~nodes) 
KB-MB 
Low 
¢ 
High 
Amazon 
S3 
ms,sec, 
min (~size) 
GB-PB 
(no limit) 
KB-GB 
(5 TB max) 
Low-Very 
High (no limit) 
¢ 
Very High 
Amazon 
Glacier 
hrs 
GB-PB 
(no limit) 
GB 
(40 TB max) 
Very Low 
(no limit) 
¢ 
Very High
Cost Conscious Design 
Example: Should I use Amazon S3 or Amazon DynamoDB? 
“I’m currently scoping out a project that will greatly increase 
my team’s use of Amazon S3. Hoping you could answer 
some questions. The current iteration of the design calls for 
many small files, perhaps up to a billion during peak. The 
total size would be on the order of 1.5 TB per month…” 
Request rate 
(Writes/sec) 
Object size 
(Bytes) 
Total size 
(GB/month) 
Objects per month 
300 2048 1483 777,600,000
Request rate 
(Writes/sec) 
Object size 
(Bytes) 
Total size 
(GB/month) 
Objects per 
month 
DynamoDB or S3? 300 2,048 1,483 777,600,000
Amazon DynamoDB 
Request rate 
(Writes/sec) 
Object size 
(Bytes) 
Total size 
(GB/month) 
Objects per month 
Scenario 1 300 2,048 1,483 777,600,000 
Scenario 2 300 32,768 23,730 777,600,000 
Amazon S3 
use 
use
Lambda Architecture
Putting it all together 
De-coupled architecture 
• Multi-tier data processing architecture 
• Ingest & Store de-coupled from Processing 
• Ingest tools write to multiple data stores 
• Processing frameworks (Hadoop, Spark, etc.) read from data stores 
• Consumers can decide which data store to read from depending on 
their data processing requirement
Hot Data Temperature Cold 
Spark 
Streaming / 
Storm 
Redshift 
Impala Spark 
EMR/ 
Hadoop 
Redshift 
EMR/ 
Hadoop 
Spark 
Kinesis/ 
Kafka 
Data NoSQL / DynamoDB / Hadoop HDFS S3 
Low 
Latency 
High 
Answers
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
Customer Use Cases
Automatic spelling corrections Autocomplete Search Recommendations
A look at how it works 
Data Analyzed Using EMR: 
Months of user history Common misspellings 
Weste 
Winstin 
Westa 
Whenstin 
Automatic spelling corrections
Yelp web site log data goes into Amazon S3 
Months of user search data 
Search terms 
Misspellings 
Final click throughs 
Amazon S3
Amazon Elastic MapReduce spins up a 200 node 
Hadoop cluster 
Hadoop Cluster 
Amazon S3 Amazon EMR
All 200 nodes of the cluster simultaneously look for 
common misspellings 
Hadoop Cluster 
Amazon S3 Amazon EMR 
Westen 
Wistin 
Westan
A map of common misspellings and suggested corrections 
are loaded back into Amazon S3. 
Hadoop Cluster 
Amazon S3 Amazon EMR 
Westen 
Wistin 
Westan
Then the cluster is shut down 
Yelp only pays for the time they used it 
Hadoop Cluster 
Amazon S3 Amazon EMR
Each of Yelp’s 80 Engineers Can Do This Whenever 
They Have a Big Data Problem 
spins up over 
250 Hadoop 
clusters per 
week in EMR. 
Amazon S3 Amazon EMR
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
Data Innovation Meets Action at Scale 
at NASDAQ OMX 
• NASDAQ’s technology powers more than 70 marketplaces in 50 countries 
• NASDAQ’s global platform can handle more than 1 million messages/second at 
a median speed of sub-55 microseconds 
• NASDAQ own & operate 26 markets including 3 clearinghouse & 5 central 
securities repositories 
• More than 5,500 structured products are tied to NASDAQ’s global indexes with 
the notional value of at least $1 trillion 
• NASDAQ powers 1 in 10 of the world’s securities transactions
NASDAQ’s Big Data Challenge 
• Archiving Market Data 
– A classic “Big Data” problem 
• Power Surveillance and Business Intelligence/Analytics 
• Minimize Cost 
– Not only infrastructure, but development/IT labor costs too 
• Empower the business for self-service
SIP Total Monthly Message Volumes 
OPRA, UQDF and CQS 
Market 
Data 
Is Big 
Data 
Charts courtesy of the 
Financial Information Forum 
NASDAQ Exchange Daily Peak Messages 
Financial Information Forum, Redistribution without permission from FIF prohibited, email: fifinfo@fif.com 
Total Monthly Message Volume Combined 
Average Daily 
Date UQDF CQS Volume 
Aug-12 2,317,804,321 8,241,554,280 459,102,548 
Sep-12 1,948,330,199 7,452,279,225 494,768,917 
Oct-12 1,016,336,632 7,452,279,225 403,267,422 
Nov-12 2,148,867,295 9,552,313,807 557,199,100 
Dec-12 2,017,355,401 8,052,399,165 503,487,728 
Jan-13 2,099,233,536 7,474,101,082 455,873,077 
Feb-13 1,969,123,978 7,531,093,813 500,011,463 
Mar-13 2,010,832,630 7,896,498,260 495,366,545 
Apr-13 2,447,109,450 9,805,224,566 556,924,273 
May-13 2,400,946,680 9,430,865,048 537,809,624 
Jun-13 2,601,863,331 11,062,086,463 683,197,490 
Jul-13 2,142,134,920 8,266,215,553 473,106,840 
Aug-13 2,188,338,764 9,079,813,726 512,188,750 
23 
OPRA Annual Increase: 69% 
CQS Annual Increase: 10% 
UQDF Annual Decrease: 6% 
Total Monthly 
Message Volume Average Daily 
Date OPRA Volume 
Aug-12 80,600,107,361 3,504,352,494 
Sep-12 77,303,404,427 4,068,600,233 
Oct-12 98,407,788,187 4,686,085,152 
Nov-12 104,739,265,089 4,987,584,052 
Dec-12 81,363,853,339 4,068,192,667 
Jan-13 82,227,243,377 3,915,583,018 
Feb-13 87,207,025,489 4,589,843,447 
Mar-13 93,573,969,245 4,678,698,462 
Apr-13 123,865,614,055 5,630,255,184 
May-13 134,587,099,561 6,117,595,435 
Jun-13 162,771,803,250 8,138,590,163 
Jul-13 120,920,111,089 5,496,368,686 
Aug-13 136,237,441,349 6,192,610,970 
600,000,000 
400,000,000 
200,000,000 
0 
Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00
NASDAQ’s Legacy Solution 
• On-premises MPP DB 
– Relatively expensive, finite storage 
– Required periodic additional expenses to add more storage 
– Ongoing IT (administrative) human costs 
• Legacy BI tool 
– Requires developer involvement for new data sources, reports, 
dashboards, etc.
New Solution: Amazon Redshift 
• Cost Effective 
– Redshift is 43% of the cost of legacy 
• Assuming equal storage capacities 
– Doesn’t include IT ongoing costs! 
• Performance 
– Outperforms NASDAQ’s legacy BI/DB solution 
– Insert 550K rows/second on a 2 node 8XL cluster 
• Elastic 
– NASDAQ can add additional capacity on demand, easy to grow their cluster
New Solution: Pentaho BI/ETL 
• Amazon Redshift partner 
– http://aws.amazon.com/redshift/partn 
ers/pentaho/ 
• Self Service 
– Tools empower BI users to integrate 
new data sources, create their own 
analytics, dashboards, and reports 
without requiring development 
involvement 
• Cost effective
Net Result 
• New solution is cheaper, faster, and offers capabilities that NASDAQ 
didn’t have before 
– Empowers NASDAQ’s business users to explore data like they never 
could before 
– Reduces IT and development as bottlenecks 
– Margin improvement (expense reduction and supports business 
decisions to grow revenue)
NEXT STEPS
AWS is here to help 
Solution 
Architects 
Professional 
Services 
Premium 
Support 
AWS Partner 
Network (APN)
aws.amazon.com/partners/competencies/big-data 
Partner with an AWS Big Data expert
http://aws.amazon.com/marketplace 
Big Data Case Studies 
Learn from other AWS customers 
aws.amazon.com/solutions/case-studies/big-data
AWS Marketplace 
AWS Online Software Store 
aws.amazon.com/marketplace 
Shop the big data category
http://aws.amazon.com/marketplace 
AWS Public Data Sets 
Free access to big data sets 
aws.amazon.com/publicdatasets
AWS Grants Program 
AWS in Education 
aws.amazon.com/grants
AWS Big Data Test Drives 
APN Partner-provided labs 
aws.amazon.com/testdrive/bigdata
AWS Training & Events 
Webinars, Bootcamps, 
and Self-Paced Labs 
aws.amazon.com/events 
https://aws.amazon.com/training
Big Data on AWS 
Course on Big Data 
aws.amazon.com/training/course-descriptions/bigdata
reinvent.awsevents.com
aws.amazon.com/big-data
sivar@amazon.com 
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSAmazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
Big Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell NashBig Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell NashAmazon Web Services
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSAmazon Web Services
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analyticsKovid Academy
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017Amazon Web Services
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Amazon Web Services
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 

Was ist angesagt? (20)

February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Big Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell NashBig Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell Nash
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analytics
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 

Andere mochten auch

AWS Partner Presentation - OpenSGI
AWS Partner Presentation - OpenSGIAWS Partner Presentation - OpenSGI
AWS Partner Presentation - OpenSGIAmazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Amazon Web Services
 
The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...
The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...
The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...Amazon Web Services
 
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...Amazon Web Services
 
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applicationsAWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applicationsAmazon Web Services
 
(EDU201) How Technology is Transforming Education | AWS re:Invent 2014
(EDU201) How Technology is Transforming Education | AWS re:Invent 2014(EDU201) How Technology is Transforming Education | AWS re:Invent 2014
(EDU201) How Technology is Transforming Education | AWS re:Invent 2014Amazon Web Services
 
RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...
RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...
RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...RightScale
 
AWSome Data Protection with Veeam
AWSome Data Protection with VeeamAWSome Data Protection with Veeam
AWSome Data Protection with VeeamAmazon Web Services
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAmazon Web Services
 
AWS Summit Stockholm 2014 – T2 – Understanding AWS security
AWS Summit Stockholm 2014 – T2 – Understanding AWS securityAWS Summit Stockholm 2014 – T2 – Understanding AWS security
AWS Summit Stockholm 2014 – T2 – Understanding AWS securityAmazon Web Services
 
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseDay 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseAmazon Web Services
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSTom Laszewski
 
AWS Webinar - Measuring Your Application Performance and Health
AWS Webinar - Measuring Your Application Performance and HealthAWS Webinar - Measuring Your Application Performance and Health
AWS Webinar - Measuring Your Application Performance and HealthAmazon Web Services
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...Amazon Web Services
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Amazon Web Services
 
Security in the Cloud - AWS Symposium 2014 - Washington D.C.
Security in the Cloud - AWS Symposium 2014 - Washington D.C. Security in the Cloud - AWS Symposium 2014 - Washington D.C.
Security in the Cloud - AWS Symposium 2014 - Washington D.C. Amazon Web Services
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAmazon Web Services
 
AWS Webcast - AWS Cloud Solution for State and Local Law Enforcement Agencies
AWS Webcast -  AWS Cloud Solution for State and Local Law Enforcement Agencies AWS Webcast -  AWS Cloud Solution for State and Local Law Enforcement Agencies
AWS Webcast - AWS Cloud Solution for State and Local Law Enforcement Agencies Amazon Web Services
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWSAmazon Web Services
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014Amazon Web Services
 

Andere mochten auch (20)

AWS Partner Presentation - OpenSGI
AWS Partner Presentation - OpenSGIAWS Partner Presentation - OpenSGI
AWS Partner Presentation - OpenSGI
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...
The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...
The Path to Business Agility for Vodafone: How Amazon made us "boring" - Sess...
 
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
 
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applicationsAWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
 
(EDU201) How Technology is Transforming Education | AWS re:Invent 2014
(EDU201) How Technology is Transforming Education | AWS re:Invent 2014(EDU201) How Technology is Transforming Education | AWS re:Invent 2014
(EDU201) How Technology is Transforming Education | AWS re:Invent 2014
 
RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...
RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...
RightScale Webinar: Decoding AWS Reserved Instances (RIs) What It Means for C...
 
AWSome Data Protection with Veeam
AWSome Data Protection with VeeamAWSome Data Protection with Veeam
AWSome Data Protection with Veeam
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
AWS Summit Stockholm 2014 – T2 – Understanding AWS security
AWS Summit Stockholm 2014 – T2 – Understanding AWS securityAWS Summit Stockholm 2014 – T2 – Understanding AWS security
AWS Summit Stockholm 2014 – T2 – Understanding AWS security
 
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseDay 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWS
 
AWS Webinar - Measuring Your Application Performance and Health
AWS Webinar - Measuring Your Application Performance and HealthAWS Webinar - Measuring Your Application Performance and Health
AWS Webinar - Measuring Your Application Performance and Health
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
Security in the Cloud - AWS Symposium 2014 - Washington D.C.
Security in the Cloud - AWS Symposium 2014 - Washington D.C. Security in the Cloud - AWS Symposium 2014 - Washington D.C.
Security in the Cloud - AWS Symposium 2014 - Washington D.C.
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the Cloud
 
AWS Webcast - AWS Cloud Solution for State and Local Law Enforcement Agencies
AWS Webcast -  AWS Cloud Solution for State and Local Law Enforcement Agencies AWS Webcast -  AWS Cloud Solution for State and Local Law Enforcement Agencies
AWS Webcast - AWS Cloud Solution for State and Local Law Enforcement Agencies
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
 

Ähnlich wie AWS Webcast - Managing Big Data in the AWS Cloud_20140924

Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSAmazon Web Services
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rasmus Ekman
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Amazon Web Services LATAM
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-SourceAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 

Ähnlich wie AWS Webcast - Managing Big Data in the AWS Cloud_20140924 (20)

Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-Source
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 

Kürzlich hochgeladen (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 

AWS Webcast - Managing Big Data in the AWS Cloud_20140924

  • 1. Managing Big Data in the AWS Cloud Siva Raghupathy Principal Solutions Architect Amazon Web Services
  • 2. Agenda • Big data challenges • AWS big data portfolio • Architectural considerations • Customer success stories • Resources to help you get started • Q&A
  • 3. Data Volume, Velocity, & Variety • 4.4 zettabytes (ZB) of data exists in the digital universe today – 1 ZB = 1 billion terabytes • 450 billion transaction per day by 2020 • More unstructured data than structured data GB TB PB ZB EB 1990 2000 2010 2020
  • 4. Big Data • Hourly server logs: how your systems were misbehaving an hour ago • Weekly / Monthly Bill: What you spent this past billing cycle? • Daily customer-preferences report from your web-site’s click stream: tells you what deal or ad to try next time • Daily fraud reports: tells you if there was fraud yesterday Real-time Big Data • Real-time metrics: what just went wrong now • Real-time spending alerts/caps: guaranteeing you can’t overspend • Real-time analysis: tells you what to offer the current customer now • Real-time detection: blocks fraudulent use now Big Data : Best Served Fresh
  • 5. Data Analysis Gap Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Generated data Available for analysis Data volume - Gap 1990 2000 2010 2020
  • 6. Big Data Potentially massive datasets Iterative, experimental style of data manipulation and analysis Frequently not a steady-state workload; peaks and valleys Time to results is key Hard to configure/manage AWS Cloud Massive, virtually unlimited capacity Iterative, experimental style of infrastructure deployment/usage At its most efficient with highly variable workloads Parallel compute clusters from singe data source Managed services
  • 7. AWS Big Data Portfolio Collect / Ingest Kinesis Store Process / Analyze Visualize / Report EMR EC2 Redshift Data Pipeline S3 DynamoDB Glacier RDS Import Export Direct Connect Amazon SQS
  • 8. Ingest: The act of collecting and storing data
  • 9. Why Data Ingest Tools? • Data ingest tools convert random streams of data into fewer set of sequential streams – Sequential streams are easier to process – Easier to scale – Easier to persist Processing Processing Processing Processing Processing Kafka Or Kinesis Processing
  • 10. Data Ingest Tools • Facebook Scribe  Data collectors • Apache Kafka  Data collectors • Apache Flume  Data Movement and Transformation • Amazon Kinesis  Data collectors
  • 11. Real-time processing of streaming data High throughput Elastic Easy to use Connectors for EMR, S3, Redshift, DynamoDB Amazon Kinesis
  • 12. AmAamzaozno Kn iKneinseiss iAs rAchrcitheictetucrtuere AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Amazon Web Services Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
  • 13. Kinesis Stream: Managed ability to capture and store data • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours • Scale Kinesis streams by adding or removing Shards • Replay data inside of 24Hr. Window
  • 14. Simple Put interface to store data in Kinesis • Producers use a PUT call to store data in a Stream • PutRecord {Data, PartitionKey, StreamName} • A Partition Key is supplied by producer and used to distribute the PUTs across Shards • Kinesis MD5 hashes supplied partition key over the hash key range of a Shard • A unique Sequence # is returned to the Producer upon a successful PUT call
  • 15. Building Kinesis Processing Apps: Kinesis Client Library Client library for fault-tolerant, at least-once, Continuous Processing o Java client library, source available on Github o Build & Deploy app with KCL on your EC2 instance(s) o KCL is intermediary b/w your application & stream  Automatically starts a Kinesis Worker for each shard  Simplifies reading by abstracting individual shards  Increase / Decrease Workers as # of shards changes  Checkpoints to keep track of a Worker’s location in the stream, Restarts Workers if they fail o Integrates with AutoScaling groups to redistribute workers to new instances
  • 16. Sending & Reading Data from Kinesis Streams Sending Reading HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Write Read
  • 17. AWS Partners for Data Load and Transformation Hparser, Big Data Edition Flume, Sqoop
  • 19. Storage Structured – Simple Query NoSQL Amazon DynamoDB Cache Amazon ElastiCache (Memcached, Redis) Structured – Complex Query SQL Amazon RDS Data Warehouse Amazon Redshift Search Amazon CloudSearch Unstructured – No Query Cloud Storage Amazon S3 Amazon Glacier Unstructured – Custom Query Hadoop/HDFS Amazon Elastic Map Reduce Data Structure Complexity Query Structure Complexity
  • 20. Store anything Object storage Scalable Designed for 99.999999999% durability Amazon S3
  • 21. Why is Amazon S3 good for Big Data? • No limit on the number of Objects • Object size up to 5TB • Central data storage for all systems • High bandwidth • 99.999999999% durability • Versioning, Lifecycle Policies • Glacier Integration
  • 22. Amazon S3 Best Practices • Use random hash prefix for keys • Ensure a random access pattern • Use Amazon CloudFront for high throughput GETs and PUTs • Leverage the high durability, high throughput design of Amazon S3 for backup and as a common storage sink • Durable sink between data services • Supports de-coupling and asynchronous delivery • Consider RRS for lower cost, lower durability storage of derivatives or copies • Consider parallel threads and multipart upload for faster writes • Consider parallel threads and range get for faster reads
  • 23. Aggregate All Data in S3 Surrounded by a collection of the right tools EMR Kinesis Data Pipeline Redshift DynamoDB RDS Cassandra Storm Spark Streaming Amazon S3 Amazon S3
  • 24. Fully-managed NoSQL database service Built on solid-state drives (SSDs) Consistent low latency performance Any throughput rate No storage limits Amazon DynamoDB
  • 25. DynamoDB Concepts table items attributes schema-less schema is defined per attribute
  • 26. DynamoDB: Access and Query Model • Two primary key options • Hash key: Key lookups: “Give me the status for user abc” • Composite key (Hash with Range): “Give me all the status updates for user ‘abc’ that occurred within the past 24 hours” • Support for multiple data types – String, number, binary… or sets of strings, numbers, or binaries • Supports both strong and eventual consistency – Choose your consistency level when you make the API call – Different parts of your app can make different choices • Global Secondary Indexes
  • 27. DynamoDB: High Availability and Durability
  • 28. What does DynamoDB handle for me? • Scaling without down-time • Automatic sharding • Security inspections, patches, upgrades • Automatic hardware failover • Multi-AZ replication • Hardware configuration designed specifically for DynamoDB • Performance tuning …and a lot more
  • 29. Amazon DynamoDB Best Practices • Keep item size small • Store metadata in Amazon DynamoDB and blobs in Amazon S3 • Use a table with a hash key for extremely high scale • Use hash-range key to model – 1:N relationships – Multi-tenancy • Avoid hot keys and hot partitions • Use table per day, week, month etc. for storing time series data • Use conditional updates
  • 30. Relational Databases Fully managed; zero admin MySQL, PostgreSQL, Oracle & SQL Server Amazon RDS
  • 32. Processing Frameworks • Batch Processing – Take large amount (>100TB) of cold data and ask questions – Takes hours to get answers back • Stream Processing (real-time) – Take small amount of hot data and ask questions – Takes short amount of time to get your answer back
  • 33. Processing Frameworks • Batch Processing – Amazon EMR (Hadoop) – Amazon Redshift • Stream Processing – Spark Streaming – Storm
  • 34. Columnar data warehouse ANSI SQL compatible Massively parallel Petabyte scale Fully-managed Very cost-effective Amazon Redshift
  • 35. Amazon Redshift architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB • Hardware optimized for data processing • Two hardware platforms – DW1: HDD; scale from 2TB to 1.6PB – DW2: SSD; scale from 160GB to 256TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 36. Amazon Redshift Best Practices • Use COPY command to load large data sets from Amazon S3, Amazon DynamoDB, Amazon EMR/EC2/Unix/Linux hosts – Split your data into multiple files – Use GZIP or LZOP compression – Use manifest file • Choose proper sort key – Range or equality on WHERE clause • Choose proper distribution key – Join column, foreign key or largest dimension, group by column
  • 37. Hadoop/HDFS clusters Hive, Pig, Impala, HBase Easy to use; fully managed On-demand and spot pricing Tight integration with S3, DynamoDB, and Kinesis Amazon Elastic MapReduce
  • 38. EMR Cluster S3 1. Put the data into S3 2. Choose: Hadoop distribution, # of nodes, types of nodes, Hadoop apps like Hive/Pig/HBase 4. Get the output from S3 3. Launch the cluster using the EMR console, CLI, SDK, or APIs How Does EMR Work?
  • 39. EMR Cluster EMR S3 You can easily resize the cluster And launch parallel clusters using the same data How Does EMR Work?
  • 40. EMR Cluster EMR S3 Use Spot nodes to save time and money How Does EMR Work?
  • 41. The Hadoop Ecosystem works inside of EMR
  • 42. Amazon EMR Best Practices • Balance transient vs persistent clusters to get the best TCO • Leverage Amazon S3 integration – Consistent View for EMRFS • Use Compression (LZO is a good pick) • Avoid small files (< 100MB; s3distcp can help!) • Size cluster to suit each job • Use EC2 Spot Instances
  • 43. Amazon EMR Nodes and Size • Tuning cluster size can be more efficient than tuning Hadoop code • Use m1 and c1 family for functional testing • Use m3 and c3 xlarge and larger nodes for production workloads • Use cc2/c3 for memory and CPU intensive jobs • hs1, hi1, i2 instances for HDFS workloads • Prefer a smaller cluster of larger nodes
  • 44. Partners – Analytics (Scientific, algorithmic, predictive, etc)
  • 46. Partners - BI & Data Visualization
  • 47. Putting All The AWS Data Tools Together & Architectural Considerations
  • 48. One tool to rule them all
  • 49. Data Characteristics: Hot, Warm, Cold Hot Warm Cold Volume MB–GB GB–TB PB Item size B–KB KB–MB KB–TB Latency ms ms, sec min, hrs Durability Low–High High Very High Request rate Very High High Low Cost/GB $$-$ $-¢¢ ¢
  • 50. Average latency Data volume Item size Request rate Cost ($/GB/month) Durability Elasti- Cache ms GB B-KB Very High $$ Low - Moderate Amazon DynamoDB ms GB-TBs (no limit) B-KB (64 KB max) Very High ¢¢ Very High Amazon RDS ms.sec GB-TB (3 TB max) KB (~rowsize) High ¢¢ High Cloud Search ms.sec GB-TB KB (1 MB max) High $ High Amazon Redshift sec.min TB-PB (1.6 PB max) KB (64 K max) Low ¢ High Amazon EMR (Hive) sec.min, hrs GB-PB (~nodes) KB-MB Low ¢ High Amazon S3 ms,sec, min (~size) GB-PB (no limit) KB-GB (5 TB max) Low-Very High (no limit) ¢ Very High Amazon Glacier hrs GB-PB (no limit) GB (40 TB max) Very Low (no limit) ¢ Very High
  • 51. Cost Conscious Design Example: Should I use Amazon S3 or Amazon DynamoDB? “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…” Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2048 1483 777,600,000
  • 52. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month DynamoDB or S3? 300 2,048 1,483 777,600,000
  • 53. Amazon DynamoDB Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month Scenario 1 300 2,048 1,483 777,600,000 Scenario 2 300 32,768 23,730 777,600,000 Amazon S3 use use
  • 55. Putting it all together De-coupled architecture • Multi-tier data processing architecture • Ingest & Store de-coupled from Processing • Ingest tools write to multiple data stores • Processing frameworks (Hadoop, Spark, etc.) read from data stores • Consumers can decide which data store to read from depending on their data processing requirement
  • 56. Hot Data Temperature Cold Spark Streaming / Storm Redshift Impala Spark EMR/ Hadoop Redshift EMR/ Hadoop Spark Kinesis/ Kafka Data NoSQL / DynamoDB / Hadoop HDFS S3 Low Latency High Answers
  • 59. Automatic spelling corrections Autocomplete Search Recommendations
  • 60. A look at how it works Data Analyzed Using EMR: Months of user history Common misspellings Weste Winstin Westa Whenstin Automatic spelling corrections
  • 61. Yelp web site log data goes into Amazon S3 Months of user search data Search terms Misspellings Final click throughs Amazon S3
  • 62. Amazon Elastic MapReduce spins up a 200 node Hadoop cluster Hadoop Cluster Amazon S3 Amazon EMR
  • 63. All 200 nodes of the cluster simultaneously look for common misspellings Hadoop Cluster Amazon S3 Amazon EMR Westen Wistin Westan
  • 64. A map of common misspellings and suggested corrections are loaded back into Amazon S3. Hadoop Cluster Amazon S3 Amazon EMR Westen Wistin Westan
  • 65. Then the cluster is shut down Yelp only pays for the time they used it Hadoop Cluster Amazon S3 Amazon EMR
  • 66. Each of Yelp’s 80 Engineers Can Do This Whenever They Have a Big Data Problem spins up over 250 Hadoop clusters per week in EMR. Amazon S3 Amazon EMR
  • 68. Data Innovation Meets Action at Scale at NASDAQ OMX • NASDAQ’s technology powers more than 70 marketplaces in 50 countries • NASDAQ’s global platform can handle more than 1 million messages/second at a median speed of sub-55 microseconds • NASDAQ own & operate 26 markets including 3 clearinghouse & 5 central securities repositories • More than 5,500 structured products are tied to NASDAQ’s global indexes with the notional value of at least $1 trillion • NASDAQ powers 1 in 10 of the world’s securities transactions
  • 69. NASDAQ’s Big Data Challenge • Archiving Market Data – A classic “Big Data” problem • Power Surveillance and Business Intelligence/Analytics • Minimize Cost – Not only infrastructure, but development/IT labor costs too • Empower the business for self-service
  • 70. SIP Total Monthly Message Volumes OPRA, UQDF and CQS Market Data Is Big Data Charts courtesy of the Financial Information Forum NASDAQ Exchange Daily Peak Messages Financial Information Forum, Redistribution without permission from FIF prohibited, email: fifinfo@fif.com Total Monthly Message Volume Combined Average Daily Date UQDF CQS Volume Aug-12 2,317,804,321 8,241,554,280 459,102,548 Sep-12 1,948,330,199 7,452,279,225 494,768,917 Oct-12 1,016,336,632 7,452,279,225 403,267,422 Nov-12 2,148,867,295 9,552,313,807 557,199,100 Dec-12 2,017,355,401 8,052,399,165 503,487,728 Jan-13 2,099,233,536 7,474,101,082 455,873,077 Feb-13 1,969,123,978 7,531,093,813 500,011,463 Mar-13 2,010,832,630 7,896,498,260 495,366,545 Apr-13 2,447,109,450 9,805,224,566 556,924,273 May-13 2,400,946,680 9,430,865,048 537,809,624 Jun-13 2,601,863,331 11,062,086,463 683,197,490 Jul-13 2,142,134,920 8,266,215,553 473,106,840 Aug-13 2,188,338,764 9,079,813,726 512,188,750 23 OPRA Annual Increase: 69% CQS Annual Increase: 10% UQDF Annual Decrease: 6% Total Monthly Message Volume Average Daily Date OPRA Volume Aug-12 80,600,107,361 3,504,352,494 Sep-12 77,303,404,427 4,068,600,233 Oct-12 98,407,788,187 4,686,085,152 Nov-12 104,739,265,089 4,987,584,052 Dec-12 81,363,853,339 4,068,192,667 Jan-13 82,227,243,377 3,915,583,018 Feb-13 87,207,025,489 4,589,843,447 Mar-13 93,573,969,245 4,678,698,462 Apr-13 123,865,614,055 5,630,255,184 May-13 134,587,099,561 6,117,595,435 Jun-13 162,771,803,250 8,138,590,163 Jul-13 120,920,111,089 5,496,368,686 Aug-13 136,237,441,349 6,192,610,970 600,000,000 400,000,000 200,000,000 0 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00 Jan-00
  • 71. NASDAQ’s Legacy Solution • On-premises MPP DB – Relatively expensive, finite storage – Required periodic additional expenses to add more storage – Ongoing IT (administrative) human costs • Legacy BI tool – Requires developer involvement for new data sources, reports, dashboards, etc.
  • 72. New Solution: Amazon Redshift • Cost Effective – Redshift is 43% of the cost of legacy • Assuming equal storage capacities – Doesn’t include IT ongoing costs! • Performance – Outperforms NASDAQ’s legacy BI/DB solution – Insert 550K rows/second on a 2 node 8XL cluster • Elastic – NASDAQ can add additional capacity on demand, easy to grow their cluster
  • 73. New Solution: Pentaho BI/ETL • Amazon Redshift partner – http://aws.amazon.com/redshift/partn ers/pentaho/ • Self Service – Tools empower BI users to integrate new data sources, create their own analytics, dashboards, and reports without requiring development involvement • Cost effective
  • 74. Net Result • New solution is cheaper, faster, and offers capabilities that NASDAQ didn’t have before – Empowers NASDAQ’s business users to explore data like they never could before – Reduces IT and development as bottlenecks – Margin improvement (expense reduction and supports business decisions to grow revenue)
  • 76. AWS is here to help Solution Architects Professional Services Premium Support AWS Partner Network (APN)
  • 78. http://aws.amazon.com/marketplace Big Data Case Studies Learn from other AWS customers aws.amazon.com/solutions/case-studies/big-data
  • 79. AWS Marketplace AWS Online Software Store aws.amazon.com/marketplace Shop the big data category
  • 80. http://aws.amazon.com/marketplace AWS Public Data Sets Free access to big data sets aws.amazon.com/publicdatasets
  • 81. AWS Grants Program AWS in Education aws.amazon.com/grants
  • 82. AWS Big Data Test Drives APN Partner-provided labs aws.amazon.com/testdrive/bigdata
  • 83. AWS Training & Events Webinars, Bootcamps, and Self-Paced Labs aws.amazon.com/events https://aws.amazon.com/training
  • 84. Big Data on AWS Course on Big Data aws.amazon.com/training/course-descriptions/bigdata

Hinweis der Redaktion

  1. Organized the deck so that the partner slide in each section closes that section.
  2. 2 x 2 Matrix Structured Level of query (from none to complex) Draw down the slide
  3. Transition Statement – RDBMS is still a viable and important component in Big Data Architecture Traditional SQL Database Fully managed which means zero admin Most popular flavors Binary compatible
  4. Generally come in two major types Batch Streaming
  5. Examples
  6. Needs a transition statement – Looking at AWS Portfolio in context of Processing …. Columnar data warehouse Massively parallel (MPP) Petabyte scale Fully managed $1,000/TB/Year (with Heavy RI)
  7. Leader node Compute Node Hardware optimized Two different hardware platforms (SSD and HDD) Parallel Load API (of course)
  8. Copy Split files into 1 to 2 GB compressed Use manifest file Sort keys Distribution keys System has option to make educated guess
  9. Regular Hadoop/HDFS Support for popular add-ons Fully managed and easy to use On demand and SPOT pricing Integrated with other AWS services S3 DDB Kinesis Bootstrap capabilities have most flexibility at the layer above core Hadoop/HDFS
  10. Popular pattern 1-Customer puts data into S3 2-Make some decisions about what to run (type, number and other technologies to install) 3-Use CLI, SDK, Console or API to launch 4-Output is sent to S3 Call out S3 integration as an important innovation and addition
  11. Time to resize is going to be a combination of EC2/AMI boot time + the bootstrap options.
  12. Call out that the nodes that are added to a running cluster that are SPOT must be task nodes (details) Additional nodes to a running cluster that are SPOT S3DistCp to load/unload from HDFS Shutdown the cluster (stop being charged except
  13. Core Hadoop is: Map Reduce – Computational Model HDFS – Hadoop Distributed File System Additional Tools have entered the eco system Tools to help get data into Hadoop Tools to connect to Relational Systems Monitoring Machine Learning This slide is a small slice
  14. EMRFS all of your files will be processed as intended when you run a chained series of MapReduce jobs. This is not a replacement file system. Instead, it extends the existing file system with mechanisms that are designed to detect and react to inconsistencies. The detection and recovery process includes a retry mechanism. After it has reached a configurable limit on the number of retries (to allow S3 to return what EMRFS expects in the consistent view), it will either (your choice) raise an exception or log the issue and continue. The EMRFS consistent view creates and uses metadata in an Amazon DynamodB table to maintain a consistent view of your S3 objects. This table tracks certain operations but does not hold any of your data. The information in the table is used to confirm that the results returned from an S3 LIST operation are as expected, thereby allowing EMRFS to check list consistency and read-after-write consistency. Compression Always Compress Data Files On Amazon S3 Reduces Bandwidth Between Amazon S3 and Amazon EMR Speeds Up Your Job Compress Mappers and Reducer Output Advise Compressing all files for an instance for a day
  15. Do not use smaller nodes for production workload unless you’re 100% sure you know what you’re doing. The majority of jobs I’ve seen requires more CPU and Memory the smaller instances have to offer and most of the times causes job failures if the cluster is not fine tuned. Instead of spending time fine tunning small nodes, get a larger node and run your workload with peace of mind. Anything larger and including m1.xlarge is a good candidate. m1.xlarge, c1.xlarge, m2.4xlarge and all cluster compute instances are good choices.
  16. To summarize the review of the AWS Big Data Portfolio There’s no single tool that can do every job needed
  17. Emphasize that this is an “aid” for the design process used to compare options. In my role as an SA it helps to have a heuristic tool to think about the requirements Is the data HOT, Warm or cold As a designer – by asking various questions can slot the data into one of these buckets Less of a rule and more of a guideline
  18. This material in customer’s own words http://www.youtube.com/watch?v=j7uZGgSxJGM&t=3m0s
  19. Access to AWS expertise Solution Architects and security experts Architecture reviews and best practices Case studies and sample architectures http://aws.amazon.com/solutions/case-studies/ http://aws.amazon.com/architecture/ Training and Premium Support Four support tiers, including Enterprise: http://aws.amazon.com/premiumsupport/ Trusted Advisor service for cost optimization AWS Professional Services Domain and product experts
  20. APN Competency Program (to Customers)   What: The APN Competency Program is designed to provide AWS Customers with top quality APN Partners who have demonstrated technical proficiency and proven success in specialized solutions areas. Partners who’ve attained an APN Competency offer a variety of services, software, and solutions on the AWS Cloud. -          Big Data Specific: o   Who: Big Data Competency Partners help customers evaluate and use the tools, techniques, and technologies of working with data productively, at any scale. Learn More about Big Data Competency Partners: aws.amazon.com/partners/competencies/big-data/ -          Announcement – Big Data Competency for APN Technology Partners o   Technology Partners for Launch: Microstrategy, SAP, Informatica, SnapLogic, Tableau Software, Attunity, MapR, Sumo Logic, Splunk o   Find out more information on these partners on the Big Data Competency Partners page(aws.amazon.com/partners/competencies/big-data) -          Advise customers to take advantage of these partners software and solutions in Big Data on AWS – they’ve been qualified by Partner Teams, Service Teams, have Marketplace Software, Built Test Drives, etc.   APN Competency Program (to Partners)   What:  The APN Competency program is designed to highlight APN Partners who have demonstrated technical proficiency and proven customer success in specialized solution areas. Attaining an APN Competency allows partners to differentiate themselves to customers by showcasing expertise in a specific solution area. -          Banner Attached – Learn More about APN Competencies -          Learn More about the Program: APN Competency Program (aws.amazon.com/partners/competencies)   -          Learn More about APN Competency Partners: o   SAP (aws.amazon.com/partners/competencies/sap) o   Oracle (aws.amazon.com/partners/competencies/oracle) o   Big Data (aws.amazon.com/partners/competencies/big-data) o   MSP (aws.amazon.com/partners/competencies/msp) o   Microsoft (aws.amazon.com/partners/competencies/Microsoft) -          Announcement – Big Data Competency for APN Technology Partners o   Technology Partners for Launch: Microstrategy, SAP, Informatica, SnapLogic, Tableau Software, Attunity, MapR, Sumo Logic, Splunk o   Find out more information on these partners on the Big Data Competency Partners page(aws.amazon.com/partners/competencies/big-data)
  21. Life technologies LinkedIn DropCam ICRAR CDC Channel4 Yelp Nokia
  22. AWS Marketplace is the AWS Online Software Store Customer can find, research, buy software including a wide variety of big data options and software to help you manage your databases With AWS Marketplace, the simple hourly pricing of most products aligns with EC2 usage model You can find, purchase and 1-Click launch in minutes, making deployment easy Marketplace billing integrated into your AWS account 1300+ product listings across 25 categories Description: Attunity CloudBeam for Amazon Redshift (Express) enables organizations to simplify, automate, and accelerate bulk data loading from database sources (Oracle, Microsoft SQL Server, and MySQL) to Amazon Redshift. Attunity CloudBeam allows your team to avoid the heavy lifting of manually extracting data, transferring via API/script, chopping, staging, and importing.
  23. We will provide researchers and professors of accredited schools and universities with free access to AWS to accelerate science and discovery. With AWS in Education, educators, academic researchers, and students can apply to obtain free usage credits to tap into the on-demand infrastructure of the Amazon Web Services cloud to teach advanced courses, tackle research endeavors, and explore new projects – tasks that previously would have required expensive up-front and ongoing investments in infrastructure.
  24. Microstrategy Splunk QlikView EMR Pig MongoDB Oracle BI, OBIEE 11g SAP Hana Yellowfin BI
  25. Speaker Notes: We have just released “Big Data on AWS”, a new technical training course for individuals who are responsible for implementing big data environments, namely Data Scientists, Data Analysts, and Enterprise Big Data Solution Architects. This course is designed to teach technical end users how to use Amazon EMR to process data using the broad ecosystem of Hadoop tools like Pig and Hive. We also cover how to create big data environments, work with Amazon DynamoDB and Amazon Redshift, understand the benefits of Amazon Kinesis, and leverage best practices to design big data environments for security and cost-effectiveness. Upcoming classes include: Audience Individuals responsible for implementing big data environments: Data Scientists, Data Analysts, and Enterprise Big Data Solution Architects Objectives Understand the architecture of an Amazon EMR cluster Choose appropriate AWS data storage options for use with Amazon EMR Know your options for ingesting, transferring, and compressing data for use with Amazon EMR Use common programming frameworks for Amazon EMR including Hive, Pig, and Streaming Work with Amazon Redshift and Spark/Shark to implement big data solutions Leverage big data visualization software Choose appropriate security and cost management options for Amazon EMR Understand the benefits of using Amazon Kinesis for big data Prerequisites Basic familiarity with big data technologies, including Apache Hadoop and HDFS Knowledge of big data technologies such as Pig, Hive, and MapReduce helpful, but not required Working knowledge of core AWS services and public cloud implementation AWS Essentials course completion or equivalent experience Basic understanding of data warehousing, relational database systems, and database design Format Instructor-Led & Hands-on Labs Duration 3 days Details aws.amazon.com/training/course-descriptions/bigdata/ Big Data on AWS Big Data on AWS introduces you to cloud-based big data solutions and Amazon Elastic MapReduce (EMR), the AWS big data platform. In this course, we show you how to use Amazon EMR to process data using the broad ecosystem of Hadoop tools like Pig and Hive. We also teach you how to create big data environments, work with Amazon DynamoDB and Amazon Redshift, understand the benefits of Amazon Kinesis, and leverage best practices to design big data environments for security and cost-effectiveness. Intended Audience This course is intended for: Partners and customers responsible for implementing big data environments, including: Data Scientists Data Analysts Enterprise, Big Data Solution Architects Prerequisites We recommend that attendees of this course have: Basic familiarity with big data technologies, including Apache Hadoop and HDFS. Knowledge of big data technologies such as Pig, Hive, and MapReduce is helpful but not required Working knowledge of core AWS services and public cloud implementation. Students should complete the AWS Essentials course or have equivalent experience: http://aws.amazon.com/training/course-descriptions/essentials/ Basic understanding of data warehousing, relational database systems, and database design Delivery Method Instructor-Led Training (ILT) Hands-on Labs on AWS Hands-On Activity This course allows you to test new skills and apply knowledge to your working environment through a variety of practical exercises. Duration 3 days Course Outline Day 1 Overview of Big Data and Apache Hadoop Benefits of Amazon EMR Amazon EMR Architecture Using Amazon EMR Launching and Using an Amazon EMR Cluster High-Level Apache Hadoop Programming Frameworks Using Hive for Advertising Analytics Day 2 Other Apache Hadoop Programming Frameworks Using Streaming for Life Sciences Analytics Overview: Spark and Shark for In-Memory Analytics Using Spark and Shark for In-Memory Analytics Managing Amazon EMR Costs Overview of Amazon EMR Security Exploring Amazon EMR Security Data Ingestion, Transfer, and Compression Day 3 Using Amazon Kinesis for Real-Time Big Data Processing AWS Data Storage Options Using DynamoDB with Amazon EMR Overview: Amazon Redshift and Big Data Using Amazon Redshift for Big Data Visualizing and Orchestrating Big Data Using Tableau Desktop or Jaspersoft BI to Visualize Big Data By the end of this course, you will be able to: Understand Apache Hadoop in the context of Amazon EMR Understand the architecture of an Amazon EMR cluster Launch an Amazon EMR cluster using an appropriate Amazon Machine Image and Amazon EC2 instance types Choose appropriate AWS data storage options for use with Amazon EMR Know your options for ingesting, transferring, and compressing data for use with Amazon EMR Use common programming frameworks available for Amazon EMR including Hive, Pig, and Streaming Work with Amazon Redshift to implement a big data solution Leverage big data visualization software Choose appropriate security options for Amazon EMR and your data Perform in-memory data analysis with Spark and Shark on Amazon EMR Choose appropriate options to manage your Amazon EMR environment cost-effectively Understand the benefits of using Amazon Kinesis for big data
  26. Sign Up Big Data & HPC track with over 20 sessions Link to reinvent 20 + sessions on big data and high performance computing
  27. Again mention survey.