SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Data Processing and Analytics
Agenda Overview
10:00 AM Registration
10:30 AM Introduction to Big Data @ AWS
12:00 PM Lunch + Registration for Technical Sessions
12:30 PM Data Collection and Storage
1:45PM Real-time Event Processing
3:00PM Analytics (incl Machine Learning)
4:30 PM Open Q&A Roundtable
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Primitive Patterns
EMR Redshift
Machine
Learning
Amazon Elastic MapReduce (EMR)
Why Amazon EMR?
Easy to Use
Launch a cluster in minutes
Low Cost
Pay an hourly rate
Elastic
Easily add or remove capacity
Reliable
Spend less time monitoring
Secure
Manage firewalls
Flexible
Control the cluster
The Hadoop ecosystem can run in Amazon EMR
Try different configurations to find your optimal architecture
CPU
c3 family
cc1.4xlarge
cc2.8xlarge
Memory
m2 family
r3 family
Disk/IO
d2 family
i2 family
General
m1 family
m3 family
Choose your instance types
Batch Machine Spark and Large
process learning interactive HDFS
Easy to add/remove compute capacity to your cluster
Match compute
demands with
cluster sizing
Resizable clusters
Spot Instances
for task nodes
Up to 90%
off Amazon EC2
on-demand
pricing
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Easy to use Spot Instances
Meet SLA at predictable cost Exceed SLA at lower cost
Amazon S3 as your persistent data store
• Separate compute and storage
• Resize and shut down Amazon
EMR clusters with no data loss
• Point multiple Amazon EMR
clusters at same data in Amazon
S3
EMR
EMR
Amazon
S3
EMRFS makes it easier to leverage S3
• Better performance and error handling options
• Transparent to applications – Use “s3://”
• Consistent view
 For consistent list and read-after-write for new puts
• Support for Amazon S3 server-side and client-side
encryption
• Faster listing using EMRFS metadata
Amazon S3 EMRFS metadata
in Amazon DynamoDB
• List and read-after-write consistency
• Faster list operations
Number
of objects
Without
Consistent
Views
With Consistent
Views
1,000,000 147.72 29.70
100,000 12.70 3.69
Fast listing of S3 objects using
EMRFS metadata
*Tested using a single node cluster with a m3.xlarge instance.
EMRFS - S3 client-side encryption
Amazon S3
AmazonS3encryption
clients
EMRFSenabledfor
AmazonS3client-sideencryption
Key vendor (AWS KMS or your custom key vendor)
(client-side encrypted objects)
Optimize to leverage HDFS
• Iterative workloads
 If you’re processing the same dataset more than once
• Disk I/O intensive workloads
Persist data on Amazon S3 and use S3DistCp to
copy to HDFS for processing
Pattern #1: Batch processing
GBs of logs pushed
to Amazon S3 hourly
Daily Amazon EMR
cluster using Hive to
process data
Input and output
stored in Amazon S3
Load subset into
Redshift DW
Pattern #2: Online data-store
Data pushed to
Amazon S3
Daily Amazon EMR cluster
Extract, Transform, and Load
(ETL) data into database
24/7 Amazon EMR cluster
running HBase holds last 2
years’ worth of data
Front-end service uses
HBase cluster to power
dashboard with high
concurrency
Pattern #3: Interactive query
TBs of logs sent
daily
Logs stored in S3
Transient EMR
clusters
Hive Metastore
Example: Log Processing using Amazon EMR
• Aggregating small files using s3distcp
• Defining Hive tables with data on Amazon S3
• Interactive querying using Hue
Amazon S3
Log Bucket
Amazon
EMR
Processed and
structured log data
Months of user history Common misspellings
Data Analyzed Using EMR:
Westen
Wistin
Westan
Whestin
Automatic spelling corrections
Amazon Redshift
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Selected Amazon Redshift Customers
Clickstream Analysis for Amazon.com
• Redshift runs web log analysis for Amazon.com
 100 node Redshift Cluster
 Over one petabyte workload
 Largest table: 400TB
 2TB of data per day
• Understand customer behavior
 Who is browsing but not buying
 Which products / features are winners
 What sequence led to higher customer conversion
Redshift Performance Realized
• Scan 15 months of data: 14 minutes
 2.25 trillion rows
• Load one day worth of data: 10 minutes
 5 billion rows
• Backfill one month of data: 9.75 hours
 150 billion rows
• Pig  Amazon Redshift: 2 days to 1 hr
 10B row join with 700M rows
• Oracle  Amazon Redshift: 90 hours to 8 hrs
 Reduced number of SQLs by a factor of 3
Amazon Redshift Architecture
• Leader Node
 SQL endpoint
 Stores metadata
 Coordinates query execution
• Compute Nodes
 Local, columnar storage
 Execute queries in parallel
 Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
• Two hardware platforms
 Optimized for data processing
 DW1: HDD; scale from 2TB to 2PB
 DW2: SSD; scale from 160GB to 325TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift Node Types
• Optimized for I/O intensive workloads
• High disk density
• On demand at $0.85/hour
• As low as $1,000/TB/Year
• Scale from 2TB to 1.6PB
DW1.XL: 16 GB RAM, 2 Cores
3 Spindles, 2 TB compressed storage
DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles
16 TB compressed, 2 GB/sec scan rate
• High performance at smaller storage
size
• High compute and memory density
• On demand at $0.25/hour
• As low as $5,500/TB/Year
• Scale from 160GB to 256TB
DW2.L: 16 GB RAM, 2 Cores,
160 GB compressed SSD storage
DW2.8XL: 256 GB RAM, 32 Cores,
2.56 TB of compressed SSD storage
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• With row storage you do
unnecessary I/O
• To get total amount, you have
to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
With column storage, you
only read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• COPY compresses
automatically
• You can analyze and override
• More performance, less cost
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• Track the minimum and
maximum value for each block
• Skip over blocks that don’t
contain relevant data
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
Direct-attached storage
• Use local storage for
performance
• Maximize scan rates
• Automatic replication
and continuous
backup
• HDD & SSD platforms
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Load in parallel from Amazon S3
or DynamoDB or any SSH
connection
• Data automatically distributed
and sorted according to DDL
• Scales linearly with the number of
nodes in the cluster
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Backups to Amazon S3 are automatic,
continuous and incremental
• Configurable system snapshot retention
period. Take user snapshots on-
demand
• Cross region backups for disaster
recovery
• Streaming restores enable you to
resume querying faster
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Resize while remaining online
• Provision a new cluster in the
background
• Copy data in parallel from node to
node
• Only charged for source cluster
Amazon Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
• Automatic SQL endpoint
switchover via DNS
• Decommission the source
cluster
• Simple operation via Console or
API
Architecture and its Table Design
Implications
Table Distribution Styles
Distribution Key All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
All data on
every node
Same key to
same location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin
distribution
Sorting Data
• In the slices (on disk), the data is sorted by a sort key
 If no sort key exists Redshift uses the data insertion order
• Choose a sort key that is frequently used in your queries
 As a query predicate (date, identifier, …)
 As a join parameter (it can also be the hash key)
• The sort key allows Redshift to avoid reading entire
blocks based on predicates
 For example, a table containing a timestamp sort key where only recent
data is accessed, will skip blocks containing “old” data
Interleaved Multi Column Sort
• Compound Sort Keys
 Optimized for applications that filter data by one leading column
• Interleaved Sort Keys (new)
 Optimized for filtering data by up to eight columns
 No storage overhead unlike an index
 Lower maintenance penalty compared to indexes
Compound Sort Keys Illustrated
• Records in Redshift are
stored in blocks.
• For this illustration, let’s
assume that four records
fill a block
• Records with a given
cust_id are all in one block
• However, records with a
given prod_id are spread
across four blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
• Records with a given
cust_id are spread
across two blocks
• Records with a given
prod_id are also
spread across two
blocks
• Data is sorted in equal
measures for both
keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
Amazon Redshift works with your
existing analysis tools
JDBC/ODBC
Amazon Redshift
Custom ODBC and JDBC Drivers
• Up to 35% higher performance than open source
drivers
• Supported by Informatica, Microstrategy, Pentaho,
Qlik, SAS, Tableau, Tibco, and others
• Will continue to support PostgreSQL open source
drivers
• Download drivers from console
User Defined Functions
• We’re enabling User Defined Functions (UDFs)
so you can add your own
 Scalar and Aggregate Functions supported
• You’ll be able to write UDFs using Python 2.7
 Syntax is largely identical to PostgreSQL UDF Syntax
 System and network calls within UDFs are prohibited
• Comes with Pandas, NumPy, and SciPy pre-
installed
 You’ll also be able import your own libraries for even
more flexibility
SELECT
INTO OUTFILE
s3cmd
COPY
Staging Prod
SQL
bcp
SQL Server
Redshift Use Case
Operational Reporting with Redshift
Amazon S3
Log Bucket
Amazon
EMR
Processed and
structured log
data
Amazon
Redshift
Operational
Reports
Amazon Web Services’ global
customer and partner conference
Learn more and register:
reinvent.awsevents.com
October 6-9, 2015 | The Venetian - Las Vegas, NV
Thank you
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Amazon Web Services
 
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis FirehoseAmazon Web Services
 
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014Amazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayAmazon Web Services Korea
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Scalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedScalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedFlyData Inc.
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...Amazon Web Services
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big DataAmazon Web Services
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsAmazon Web Services
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduceAmazon Web Services
 

Was ist angesagt? (20)

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
 
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Scalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedScalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query Speed
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 

Andere mochten auch

Business Intelligence, Analytics e Big Data: una guida per capire e orientarsi
Business Intelligence, Analytics e Big Data: una guida per capire e orientarsiBusiness Intelligence, Analytics e Big Data: una guida per capire e orientarsi
Business Intelligence, Analytics e Big Data: una guida per capire e orientarsiSMAU
 
L'ARPA e il progetto SmartOpenData
L'ARPA e il progetto SmartOpenDataL'ARPA e il progetto SmartOpenData
L'ARPA e il progetto SmartOpenDatajexxon
 
Ppt Pdf
Ppt PdfPpt Pdf
Ppt Pdfkumee
 
Zaadoptuj Rzekę
Zaadoptuj RzekęZaadoptuj Rzekę
Zaadoptuj Rzekębeowulf
 
Surface Computing
Surface ComputingSurface Computing
Surface Computingrandyp311
 
Providing Better Producer Administration With TrueProducer
Providing Better Producer Administration With TrueProducerProviding Better Producer Administration With TrueProducer
Providing Better Producer Administration With TrueProducerCallidus Software
 
Reunió pares p4 2012 13
Reunió pares p4 2012 13Reunió pares p4 2012 13
Reunió pares p4 2012 13marblocs
 
Crop Circles - Cerchi nel grano
Crop Circles - Cerchi nel granoCrop Circles - Cerchi nel grano
Crop Circles - Cerchi nel granogiusnico
 
Conchiglie
Conchiglie Conchiglie
Conchiglie giusnico
 
Competitività, diversità culturale e creatività Mediterranea
Competitività, diversità culturale e creatività MediterraneaCompetitività, diversità culturale e creatività Mediterranea
Competitività, diversità culturale e creatività Mediterraneajexxon
 
Trends and Best Practices for Implementing SaaS for Your Business
Trends and Best Practices for Implementing SaaS for Your BusinessTrends and Best Practices for Implementing SaaS for Your Business
Trends and Best Practices for Implementing SaaS for Your BusinessCallidus Software
 
Il Caso di Studio "Living Labs"
Il Caso di Studio "Living Labs"Il Caso di Studio "Living Labs"
Il Caso di Studio "Living Labs"jexxon
 

Andere mochten auch (20)

Business Intelligence, Analytics e Big Data: una guida per capire e orientarsi
Business Intelligence, Analytics e Big Data: una guida per capire e orientarsiBusiness Intelligence, Analytics e Big Data: una guida per capire e orientarsi
Business Intelligence, Analytics e Big Data: una guida per capire e orientarsi
 
Bren!!!! She
Bren!!!! SheBren!!!! She
Bren!!!! She
 
L'ARPA e il progetto SmartOpenData
L'ARPA e il progetto SmartOpenDataL'ARPA e il progetto SmartOpenData
L'ARPA e il progetto SmartOpenData
 
Ppt Pdf
Ppt PdfPpt Pdf
Ppt Pdf
 
Zaadoptuj Rzekę
Zaadoptuj RzekęZaadoptuj Rzekę
Zaadoptuj Rzekę
 
Issueno.1
Issueno.1Issueno.1
Issueno.1
 
WLCG-Discu
WLCG-DiscuWLCG-Discu
WLCG-Discu
 
Surface Computing
Surface ComputingSurface Computing
Surface Computing
 
Providing Better Producer Administration With TrueProducer
Providing Better Producer Administration With TrueProducerProviding Better Producer Administration With TrueProducer
Providing Better Producer Administration With TrueProducer
 
Reunió pares p4 2012 13
Reunió pares p4 2012 13Reunió pares p4 2012 13
Reunió pares p4 2012 13
 
Beware the Shiny!
Beware the Shiny!Beware the Shiny!
Beware the Shiny!
 
Crop Circles - Cerchi nel grano
Crop Circles - Cerchi nel granoCrop Circles - Cerchi nel grano
Crop Circles - Cerchi nel grano
 
Catriel She
Catriel SheCatriel She
Catriel She
 
She
SheShe
She
 
Conchiglie
Conchiglie Conchiglie
Conchiglie
 
Competitività, diversità culturale e creatività Mediterranea
Competitività, diversità culturale e creatività MediterraneaCompetitività, diversità culturale e creatività Mediterranea
Competitività, diversità culturale e creatività Mediterranea
 
she de ivan y cecilia
she de ivan y ceciliashe de ivan y cecilia
she de ivan y cecilia
 
Trends and Best Practices for Implementing SaaS for Your Business
Trends and Best Practices for Implementing SaaS for Your BusinessTrends and Best Practices for Implementing SaaS for Your Business
Trends and Best Practices for Implementing SaaS for Your Business
 
she de franco.......bueno bueno y juan pablo
she de franco.......bueno bueno y juan pabloshe de franco.......bueno bueno y juan pablo
she de franco.......bueno bueno y juan pablo
 
Il Caso di Studio "Living Labs"
Il Caso di Studio "Living Labs"Il Caso di Studio "Living Labs"
Il Caso di Studio "Living Labs"
 

Ähnlich wie Processing and Analytics

Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon RedshiftAmazon Web Services
 
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?Amazon Web Services Korea
 

Ähnlich wie Processing and Analytics (20)

Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Processing and Analytics

  • 2. Agenda Overview 10:00 AM Registration 10:30 AM Introduction to Big Data @ AWS 12:00 PM Lunch + Registration for Technical Sessions 12:30 PM Data Collection and Storage 1:45PM Real-time Event Processing 3:00PM Analytics (incl Machine Learning) 4:30 PM Open Q&A Roundtable
  • 3. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis Primitive Patterns EMR Redshift Machine Learning
  • 5. Why Amazon EMR? Easy to Use Launch a cluster in minutes Low Cost Pay an hourly rate Elastic Easily add or remove capacity Reliable Spend less time monitoring Secure Manage firewalls Flexible Control the cluster
  • 6. The Hadoop ecosystem can run in Amazon EMR
  • 7. Try different configurations to find your optimal architecture CPU c3 family cc1.4xlarge cc2.8xlarge Memory m2 family r3 family Disk/IO d2 family i2 family General m1 family m3 family Choose your instance types Batch Machine Spark and Large process learning interactive HDFS
  • 8. Easy to add/remove compute capacity to your cluster Match compute demands with cluster sizing Resizable clusters
  • 9. Spot Instances for task nodes Up to 90% off Amazon EC2 on-demand pricing On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Easy to use Spot Instances Meet SLA at predictable cost Exceed SLA at lower cost
  • 10. Amazon S3 as your persistent data store • Separate compute and storage • Resize and shut down Amazon EMR clusters with no data loss • Point multiple Amazon EMR clusters at same data in Amazon S3 EMR EMR Amazon S3
  • 11. EMRFS makes it easier to leverage S3 • Better performance and error handling options • Transparent to applications – Use “s3://” • Consistent view  For consistent list and read-after-write for new puts • Support for Amazon S3 server-side and client-side encryption • Faster listing using EMRFS metadata
  • 12. Amazon S3 EMRFS metadata in Amazon DynamoDB • List and read-after-write consistency • Faster list operations Number of objects Without Consistent Views With Consistent Views 1,000,000 147.72 29.70 100,000 12.70 3.69 Fast listing of S3 objects using EMRFS metadata *Tested using a single node cluster with a m3.xlarge instance.
  • 13. EMRFS - S3 client-side encryption Amazon S3 AmazonS3encryption clients EMRFSenabledfor AmazonS3client-sideencryption Key vendor (AWS KMS or your custom key vendor) (client-side encrypted objects)
  • 14. Optimize to leverage HDFS • Iterative workloads  If you’re processing the same dataset more than once • Disk I/O intensive workloads Persist data on Amazon S3 and use S3DistCp to copy to HDFS for processing
  • 15. Pattern #1: Batch processing GBs of logs pushed to Amazon S3 hourly Daily Amazon EMR cluster using Hive to process data Input and output stored in Amazon S3 Load subset into Redshift DW
  • 16. Pattern #2: Online data-store Data pushed to Amazon S3 Daily Amazon EMR cluster Extract, Transform, and Load (ETL) data into database 24/7 Amazon EMR cluster running HBase holds last 2 years’ worth of data Front-end service uses HBase cluster to power dashboard with high concurrency
  • 17. Pattern #3: Interactive query TBs of logs sent daily Logs stored in S3 Transient EMR clusters Hive Metastore
  • 18. Example: Log Processing using Amazon EMR • Aggregating small files using s3distcp • Defining Hive tables with data on Amazon S3 • Interactive querying using Hue Amazon S3 Log Bucket Amazon EMR Processed and structured log data
  • 19. Months of user history Common misspellings Data Analyzed Using EMR: Westen Wistin Westan Whestin Automatic spelling corrections
  • 20. Amazon Redshift Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
  • 22. Clickstream Analysis for Amazon.com • Redshift runs web log analysis for Amazon.com  100 node Redshift Cluster  Over one petabyte workload  Largest table: 400TB  2TB of data per day • Understand customer behavior  Who is browsing but not buying  Which products / features are winners  What sequence led to higher customer conversion
  • 23. Redshift Performance Realized • Scan 15 months of data: 14 minutes  2.25 trillion rows • Load one day worth of data: 10 minutes  5 billion rows • Backfill one month of data: 9.75 hours  150 billion rows • Pig  Amazon Redshift: 2 days to 1 hr  10B row join with 700M rows • Oracle  Amazon Redshift: 90 hours to 8 hrs  Reduced number of SQLs by a factor of 3
  • 24. Amazon Redshift Architecture • Leader Node  SQL endpoint  Stores metadata  Coordinates query execution • Compute Nodes  Local, columnar storage  Execute queries in parallel  Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH • Two hardware platforms  Optimized for data processing  DW1: HDD; scale from 2TB to 2PB  DW2: SSD; scale from 160GB to 325TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 25. Amazon Redshift Node Types • Optimized for I/O intensive workloads • High disk density • On demand at $0.85/hour • As low as $1,000/TB/Year • Scale from 2TB to 1.6PB DW1.XL: 16 GB RAM, 2 Cores 3 Spindles, 2 TB compressed storage DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles 16 TB compressed, 2 GB/sec scan rate • High performance at smaller storage size • High compute and memory density • On demand at $0.25/hour • As low as $5,500/TB/Year • Scale from 160GB to 256TB DW2.L: 16 GB RAM, 2 Cores, 160 GB compressed SSD storage DW2.8XL: 256 GB RAM, 32 Cores, 2.56 TB of compressed SSD storage
  • 26. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • With row storage you do unnecessary I/O • To get total amount, you have to read everything ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 27. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage With column storage, you only read the data you need ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 28. analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • COPY compresses automatically • You can analyze and override • More performance, less cost
  • 29. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • Track the minimum and maximum value for each block • Skip over blocks that don’t contain relevant data 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959
  • 30. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage • Use local storage for performance • Maximize scan rates • Automatic replication and continuous backup • HDD & SSD platforms
  • 31. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize
  • 32. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Load in parallel from Amazon S3 or DynamoDB or any SSH connection • Data automatically distributed and sorted according to DDL • Scales linearly with the number of nodes in the cluster
  • 33. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Backups to Amazon S3 are automatic, continuous and incremental • Configurable system snapshot retention period. Take user snapshots on- demand • Cross region backups for disaster recovery • Streaming restores enable you to resume querying faster
  • 34. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Resize while remaining online • Provision a new cluster in the background • Copy data in parallel from node to node • Only charged for source cluster
  • 35. Amazon Redshift parallelizes and distributes everything Query Load Backup/Restore Resize • Automatic SQL endpoint switchover via DNS • Decommission the source cluster • Simple operation via Console or API
  • 36. Architecture and its Table Design Implications
  • 37. Table Distribution Styles Distribution Key All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 All data on every node Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  • 38. Sorting Data • In the slices (on disk), the data is sorted by a sort key  If no sort key exists Redshift uses the data insertion order • Choose a sort key that is frequently used in your queries  As a query predicate (date, identifier, …)  As a join parameter (it can also be the hash key) • The sort key allows Redshift to avoid reading entire blocks based on predicates  For example, a table containing a timestamp sort key where only recent data is accessed, will skip blocks containing “old” data
  • 39. Interleaved Multi Column Sort • Compound Sort Keys  Optimized for applications that filter data by one leading column • Interleaved Sort Keys (new)  Optimized for filtering data by up to eight columns  No storage overhead unlike an index  Lower maintenance penalty compared to indexes
  • 40. Compound Sort Keys Illustrated • Records in Redshift are stored in blocks. • For this illustration, let’s assume that four records fill a block • Records with a given cust_id are all in one block • However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks
  • 41. 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Illustrated • Records with a given cust_id are spread across two blocks • Records with a given prod_id are also spread across two blocks • Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 42. Amazon Redshift works with your existing analysis tools JDBC/ODBC Amazon Redshift
  • 43. Custom ODBC and JDBC Drivers • Up to 35% higher performance than open source drivers • Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau, Tibco, and others • Will continue to support PostgreSQL open source drivers • Download drivers from console
  • 44. User Defined Functions • We’re enabling User Defined Functions (UDFs) so you can add your own  Scalar and Aggregate Functions supported • You’ll be able to write UDFs using Python 2.7  Syntax is largely identical to PostgreSQL UDF Syntax  System and network calls within UDFs are prohibited • Comes with Pandas, NumPy, and SciPy pre- installed  You’ll also be able import your own libraries for even more flexibility
  • 46. Operational Reporting with Redshift Amazon S3 Log Bucket Amazon EMR Processed and structured log data Amazon Redshift Operational Reports
  • 47. Amazon Web Services’ global customer and partner conference Learn more and register: reinvent.awsevents.com October 6-9, 2015 | The Venetian - Las Vegas, NV

Hinweis der Redaktion

  1. Six main reasons why Amazon EMR
  2. Amazon EMR is more than just MapReduce. Bootstrap actions available on GitHub
  3. In the next few slides, we’ll talk about data persistence models with Amazon EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to the HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more. EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
  4. And every other feature that comes with Amazon S3. Features such as SSE, LifeCycle, etc. And again keep in mind that Amazon S3 as the storage is the main reason why we can’t build elastic clusters where nodes get added and removed dynamically without any data loss.
  5. In the next few slides, we’ll talk about data persistence models with EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more. EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
  6. In the next few slides, we’ll talk about data persistence models with EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more. EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
  7. EMR example #3: EMR for ETL and query engine for investigations which require all raw data
  8. CloudFront logs arrive out of order.
  9. 200 node cluster spin it up daily, shut it down
  10. Nasdaq security, HasOffers loads 60M rows per day in 2 min intervals, Desk: high concurrency user facing portal (read/write cluster), Amazon.com/NTT PB scale. Pinterest saw 50-100x speed ups when moved 300TB from Hadoop to Redshift. Nokia saw 50% reduction in costs.
  11. Today we will over the role of Amazon Redshift in addressing the Web Log Analysis problem for one of the largest online retailer, Amazon.com <go over the slide with restated language>
  12. Read only the data you need
  13. Read only the data you need
  14. Read only the data you need
  15. Read only the data you need
  16. Read only the data you need
  17. Comments on next slide.
  18. Redshift is a distributed system: A cluster contains a leader node and compute nodes A compute node contains slices (one per core) that contain data Data is distributed among slices in 3 ways: Even – Rows distributed in Round Robin fashion (default) Key – Rows distributed based on a distribution key (hash of a defined column) All - Rows distributed to all slices Queries run on all slices in parallel Optimal query throughput can be achieved when data is evenly spread across slices
  19. Redshift leverages sorting in storage. Redshift stores column data in blocks, for the sort key, the data blocks are “marked” with the min and max value of this columns, allowing Redshift to skip reading the blocks that are not relevant to the current query. Check that join parameter statement is true (best practices on designing tables)
  20. Redshift works with customer’s BI tool of choice through Postgres drivers and a JDBC, ODBC connection. A number of partners shown here have certified integration with Redshift, meaning they have done testing to validate/build Redshift integration and make using Redshift easy from a UI perspective. If there are tools customer’s use not shown we can work with Redshift on getting them integrated.
  21. So, we started with our MySQL server. But this time we would run directly on the server itself SQL statements that would dump the data out to local files. Then using s3cmd we copied the flat files into our S3 bucket. Select data from MySQL and use the S3cmd to copy these flat files to S3. Use BCP to export data into an EC2 instance, which generates and copies flat files to S3. And then instead of using EMR, we just run some crazy SQL statements to transform the data into the Production version of Redshift. Copy data into a staging schema in Redshift where it can be transformed via SQL to the final table structure and loaded into the production schema. Use standard tools, like Microstrategy and Tableau, to provide business views into the data. And then of course we need a good way for business users to look at the data, and that’s where MicroStrategy and Tableau come into play.
  22. CloudFront logs arrive out of order.