SlideShare a Scribd company logo
1 of 45
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data lakes and analytics
Giorgio Nobile – AWS Solutions Architect
Francesco Marelli – AWS Solutions Architect
Dario De Agostini – CTO Thron
A W S S u m m i t 2 0 1 9 - M i l a n
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
https://bit.ly/AWSDataLakeMilan
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Defining the AWS Data Lake
Data lake is an architecture with a virtually
limitless centralized storage platform capable
of categorization, processing, analysis, and
consumption of heterogeneous datasets
Key data lake attributes
• Decoupled storage and compute
• Rapid ingest and transformation
• Secure multi-tenancy
• Query in place
• Schema on read
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Store exabytes of data
Stage from landing dock to transformed to curated–
Make available in each
Load, transform, and catalog once
Make data available to many tools
Open formats and interfaces support innovation
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
Amazon S3
Amazon
Redshift
Amazon
EMR
Athena
Amazon
Kinesis Amazon
Elasticsearch
Service
Data lakes help you cost-effectively scale
Kinesis
Video Streams
AI Services
Amazon
QuickSight
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
How it works: Data Lakes and analytics on AWS
S3
IAM KMS
OLTP
ERP
CRM
LOB
Devices
Web
Sensors
Social Kinesis
Build Data Lakes quickly
• Identify, crawl, and catalog sources
• Ingest and clean data
• Transform into optimal formats
Simplify security management
• Enforce encryption
• Define access policies
• Implement audit login
Enable self-service and combined analytics
• Analysts discover all data available for analysis
from a single data catalog
• Use multiple analytics tools over the same data
Athena
Amazon
Redshift
AI Services
Amazon
EMR
Amazon
QuickSight
Data
Catalog
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
High Performance
Why Amazon S3 for the Data Lake?
SecureDurable
Available
Easy to use
Scalable & Affordable
Integrated
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Kinesis—Real Time
Easily collect, process, and analyze video and data streams in real time
Capture, process,
and store video
streams for analytics
Load data streams
into AWS data stores
Analyze data streams
with SQL
Build custom
applications that
analyze data streams
Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics
SQL
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
User-Defined Functions
• Bring your own functions & code
• Execute without provisioning servers
Processing and Querying In Place
Fully Managed Process & Query
• Catalog, Transform, & Query Data in Amazon S3
• No physical instances to manage
Lambda Function
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon S3 Select and Amazon Glacier Select
Select subset of data from an object based on a SQL expression
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Motivation Behind Amazon S3 Select
GET all the data from S3 objects, and my application will filter the data that I need
Redshift Spectrum Example:
Customer: Run 50,000 queries
Amount of data fetched from S3: 6 PBs
Amount of data used in Amazon Redshift: 650 TB
Data needed from S3: 10%
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Before
200 seconds and 11.2 cents
# Download and process all keys
for key in src_keys:
response = s3_client.get_object(Bucket=src_bucket, Key=key)
contents = response['Body'].read()
for line in contents.split('n')[:-1]:
line_count +=1
try:
data = line.split(',')
srcIp = data[0][:8]
….
Amazon S3 Select: Serverless MapReduce
After
95 seconds and costs 2.8 cents
# Select IP Address and Keys
for key in src_keys:
response = s3_client.select_object_content
(Bucket=src_bucket, Key=key, expression =
SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj)
contents = response['Body'].read()
for line in contents:
line_count +=1
try:
….
2X Faster at 1/5 of the cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Athena—Interactive Analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Supports Multiple Data Formats – Define Schema on Demand
$
Query Instantly Pay per query Open Easy
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Choosing the Right Data Formats
There is no such thing as the “best” data format
• All involve tradeoffs, depending on workload & tools
• CSV, TSV, JSON are easy, but not efficient
• Compress & store/archive as raw input
• Columnar compressed are generally preferred
• Parquet or ORC
• Smaller storage footprint = lower cost
• More efficient scan & query
• Row oriented (AVRO) good for full data scans
Key considerations are cost, performance & support
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Choosing the Right Data Formats (con’t.)
Pay by the amount of data scanned per query
Use Compressed Columnar Formats
• Parquet
• ORC
Easy to integrate with wide variety of tools
Dataset Size on Amazon S3 Query Run time Data Scanned Cost
Logs stored as Text
files
1 TB 237 seconds 1.15TB $5.75
Logs stored in Apache
Parquet format*
130 GB 5.13 seconds 2.69 GB $0.013
Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data Prep is ~80% of Data Lake Work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AWS Glue—Serverless Data Catalog & ETL
Data Catalog
ETL Job
authoring
Discover data and
extract schema
Auto-generates
customizable ETL code
in Python and Spark
Automatically discovers data and stores schema
Data searchable, and available for ETL
Generates customizable code
Schedules and runs your ETL jobs
Serverless
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AWS Lake Formation (join the preview)
Build, secure, and manage a data lake in days
Build a data lake in days,
not months
Build and deploy a fully
managed data lake with a few
clicks
Enforce security policies
across multiple services
Centrally define security,
governance, and auditing policies in
one place and enforce those policies
for all users and all applications
Combine different
analytics approaches
Empower analyst and data scientist
productivity, giving them self-
service discovery and safe access to
all data from a single catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Traditionally, analytics looked like this
Expensive: Large initial capex + $10k $50k/TB/year
GBs-TBs scale [not designed for PB/EBs]
Relational data
90% of data was thrown away because of cost
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data Lakes evolve the traditional approach
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
1001100001001010111001
0101011100101010000101
1111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW
Queries
Big data
processing
Interactive Real-time
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
What does
data warehouse modernization mean?
Easy to use
Extends to
your Data Lake
Don’t waste time on
menial administrative
tasks and maintenance
Directly analyze data
stored in your data lake
in open formats
Any scale of data,
workloads, and users
Dynamically scale up to
guarantee performance even
with unpredictable demands
and data volumes
Faster
time-to-insights
Consistently fast
performance, even with
thousands of concurrent
queries and users
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Redshift
Fastest
Get faster time-to-insight
for all types of analytics
workloads; powered by
machine learning, columnar
storage and MPP
Unlimited
scale
Extends your
Data Lake
1/10th
the cost
Dynamically scale up to
guarantee performance
even with unpredictable
analytical demands and
data volumes
Analyze data in the Amazon
S3 Data Lake in-place and in
open formats, together with
data loaded into Redshift’s
high performance SSDs
Start at $0.25 per hour,
save costs with automated
administration tasks and
eliminate business impact
due to downtime; as low as
$1,000 per terabyte per year
Fast, simple, cost-effective data
warehouse that can extend queries to your Data Lake
Analyze data in open formats
such as Parquet, ORC, and JSON, using SQL tools
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Redshift architecture
Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries,
loads, backups, restores, resizes
Start at just $0.25/hour
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Ingestion / Backup / Restore
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Security is built-in
Select compliance certifications*
10 GigE (HPC)
Customer
VPC
Internal
VPC
JDBC/ODBC
Compute
Nodes
Leader
Node
Network Isolation
End-to-end encryption
Integration with AWS Key
Management Service
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Caching Layer
Concurrency Scaling for
bursts of user activity (Preview)
Automatically
creates more
clusters on-
demand
Consistently
fast
performance
even with
thousands of
concurrent queries
No advance
hydration
required
Quickly scale
to serve changing
query workload
Backup
Redshift Managed S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Redshift Elastic Resize (GA)
Adds
additional
nodes
to Redshift cluster
Distributes
data
across new
configuration
in minutes
Minimal
transition time
Scale compute
and storage on-
demand
Scale up and down in minutes
Redshift
Cluster
Redshift Managed S3
JDBC/ODBC
Leader Node
CN2CN1 CN3 CN4
Backup
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Redshift intelligent
administration
Automates data
distribution in tables for
improved performance
and disk space
utilization.
Provides intelligent
recommendations for tuning
based on continuous
workload analysis.
ALL
keyA keyB keyC keyD
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
EVEN
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
KEY
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
recommended
distribution key
No more messing
with distkeys!
Coming Soon!
Advise
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Redshift intelligent maintenance
VacuumAnalyze WLM
Concurrency
Setting
AutoAuto Auto
Maintenance processes like
vacuum and analyze will
automatically run in the
background.
Redshift will automatically adjust the
WLM concurrency setting to deliver
optimal throughput.
Moving towards
zero-maintenance.
Coming Soon!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Run stored procedures in
Amazon Redshift
Bring your existing Stored
Procedure and run in
Amazon Redshift.
Amazon Redshift will support Stored
Procedure in PL/pgSQL format,
enabling you to bring your existing
Stored Procedure to Amazon Redshift.
Migrating to Amazon
Redshift is even
easier!
Coming Soon!
where the data is to
efficiently run ETL,
data validation, and
custom business
logic.
THE INTELLIGENT DAM PLATFORM
SPEAKER
DARIO DE AGOSTINI
CTO & Co-Founder THRON
https://www.linkedin.com/in/dariodeagostini/
THE INTELLIGENT DAM PLATFORM
Grazie al supporto dell’Intelligenza Artificiale
THRON ti permette di ridurre i costi di gestione
di tutte le attività umane legate
all’intero ciclo di vita dei contenuti.
THE INTELLIGENT DAM PLATFORM
THE INTELLIGENT DAM PLATFORM
STORIE DI SUCCESSO
THE INTELLIGENT DAM PLATFORM
THRON permette di controllare l’intero ciclo di vita dei contenuti:
THRON è stato incluso nel Forrester's Landscape come fornitore emergente di un DAM all’avanguardia per i
Marketers, poiché dimostra funzionalità avanzate di analytics e intelligence. – Nick Barber, Senior Analyst
WORKFLOW DEI CONTENUTI
THE INTELLIGENT DAM PLATFORM
L’ESIGENZA
THE INTELLIGENT DAM PLATFORM
VOLUME DI DATI
1,200,000,000
1,300,000,000
1,400,000,000
1,500,000,000
1,600,000,000
1,700,000,000
1,800,000,000
1,900,000,000
2,000,000,000
2,100,000,000
2,200,000,000
Events processed
100 milioni
di nuovi eventi al mese
Retention fa crescere il volume di dati
THE INTELLIGENT DAM PLATFORM
CARICO NON PREVEDIBILE
4 X
THE INTELLIGENT DAM PLATFORM
ARCHITETTURA 1/4
THE INTELLIGENT DAM PLATFORM
ARCHITETTURA 2/4
THE INTELLIGENT DAM PLATFORM
ARCHITETTURA 3/4
THE INTELLIGENT DAM PLATFORM
ARCHITETTURA 4/4
THE INTELLIGENT DAM PLATFORM
BENEFICI OTTENUTI
Uso risorse efficiente: cluster ES passa da 4 istanze I3.2xlarge per dataload a 3 istanze
I3.large per erogazione. Utilizzo di Spot instance per EMR.
Drastica riduzione tempi sviluppo: data Pipeline astrae gestione flusso dati e rende
facilissima la evoluzione, ottima la gestione di timeout e di retry.
Riduzione dei costi «accessori»: allarmi tramite SNS e logging centralizzato.
Scalabilità: Kinesis e Lambda forniscono grande scalabilità per realtime data processing.
Esplorazione dati accessibile: Athena ci fa risparmiare circa un giorno/uomo al mese a
fronte di meno di 50$/mese di spesa.
Resilienza e alta disponibilità: grazie all’uso dei container su ECS.
Realizzata
in meno di 7
giorni uomo
THE INTELLIGENT DAM PLATFORM
https://medium.com/thron-tech
Follow us on Medium Join us, we’re hiring
https://www.thron.com/en/about/careers
FOLLOW US
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
https://bit.ly/AWSDataLakeMilan
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
JSON Payload Example for each event sent
{
"r": 255,
"g": 0,
"b": 0,
"c": "Red",
"device": {
"id": "4992157",
"browser": "Chrome",
"browserVersion": "72.0.3626.109",
"os": "Mac OS",
"isMobile": false,
"isMobileIOS": false,
"isMobileAndroid": false
},
"dt": {
"year": 2019,
"month": 2,
"day": 25,
"hour": 18,
"minutes": 43,
"seconds": 47,
"millis": 725
},
"id": 1551116627725,
"region": "Outside Italy",
"awsExperience": "1-3 Years",
"awsServiceArea": "Management Tools"
}
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo Application Architecture
Amazon CloudFront
Amazon Cognito
Amazon S3
Static Web Application
Users Amazon Kinesis
Data Firehose
Amazon AthenaAWS Glue Amazon
QuickSight
Client
Mobile
client
AWS Browser
JS SDK
S3 Bucket
AWS Cloud
Region
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...Amazon Web Services
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Amazon Web Services
 
Getting Started with Microservices, Containers, and Serverless Architectures
Getting Started with Microservices, Containers, and Serverless ArchitecturesGetting Started with Microservices, Containers, and Serverless Architectures
Getting Started with Microservices, Containers, and Serverless ArchitecturesAmazon Web Services
 
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019AWS Summits
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019AWS Summits
 
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Amazon Web Services
 
Using automation to drive continuous-compliance best practices - SEC208 - New...
Using automation to drive continuous-compliance best practices - SEC208 - New...Using automation to drive continuous-compliance best practices - SEC208 - New...
Using automation to drive continuous-compliance best practices - SEC208 - New...Amazon Web Services
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Amazon Web Services
 
Adding intelligence to applications - AIM201 - Chicago AWS Summit
Adding intelligence to applications - AIM201 - Chicago AWS SummitAdding intelligence to applications - AIM201 - Chicago AWS Summit
Adding intelligence to applications - AIM201 - Chicago AWS SummitAmazon Web Services
 
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0Amazon Web Services
 
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...AWS Summits
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Amazon Web Services
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Amazon Web Services
 
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Amazon Web Services
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統Amazon Web Services
 
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...Amazon Web Services
 
Best-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWSBest-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWSAmazon Web Services
 
Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019
 Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019 Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019
Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019AWS Summits
 
Amazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAmazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAWS Riyadh User Group
 

What's hot (20)

AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
 
Getting Started with Microservices, Containers, and Serverless Architectures
Getting Started with Microservices, Containers, and Serverless ArchitecturesGetting Started with Microservices, Containers, and Serverless Architectures
Getting Started with Microservices, Containers, and Serverless Architectures
 
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
 
HK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-WorkshopHK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-Workshop
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
 
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
 
Using automation to drive continuous-compliance best practices - SEC208 - New...
Using automation to drive continuous-compliance best practices - SEC208 - New...Using automation to drive continuous-compliance best practices - SEC208 - New...
Using automation to drive continuous-compliance best practices - SEC208 - New...
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
 
Adding intelligence to applications - AIM201 - Chicago AWS Summit
Adding intelligence to applications - AIM201 - Chicago AWS SummitAdding intelligence to applications - AIM201 - Chicago AWS Summit
Adding intelligence to applications - AIM201 - Chicago AWS Summit
 
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
 
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
 
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統
 
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
 
Best-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWSBest-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWS
 
Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019
 Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019 Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019
Let Your Business Logic go Serverless | AWS Summit Tel Aviv 2019
 
Amazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAmazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML Models
 

Similar to Creare e gestire Data Lake e Data Warehouses

Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS SummitOptimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS SummitAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Amazon Web Services
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...Provectus
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSLam Le
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.javier ramirez
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfSasikumarPalanivel3
 

Similar to Creare e gestire Data Lake e Data Warehouses (20)

Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS SummitOptimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWS
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Creare e gestire Data Lake e Data Warehouses

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data lakes and analytics Giorgio Nobile – AWS Solutions Architect Francesco Marelli – AWS Solutions Architect Dario De Agostini – CTO Thron A W S S u m m i t 2 0 1 9 - M i l a n
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T https://bit.ly/AWSDataLakeMilan
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Defining the AWS Data Lake Data lake is an architecture with a virtually limitless centralized storage platform capable of categorization, processing, analysis, and consumption of heterogeneous datasets Key data lake attributes • Decoupled storage and compute • Rapid ingest and transformation • Secure multi-tenancy • Query in place • Schema on read
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Store exabytes of data Stage from landing dock to transformed to curated– Make available in each Load, transform, and catalog once Make data available to many tools Open formats and interfaces support innovation Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams Amazon S3 Amazon Redshift Amazon EMR Athena Amazon Kinesis Amazon Elasticsearch Service Data lakes help you cost-effectively scale Kinesis Video Streams AI Services Amazon QuickSight
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How it works: Data Lakes and analytics on AWS S3 IAM KMS OLTP ERP CRM LOB Devices Web Sensors Social Kinesis Build Data Lakes quickly • Identify, crawl, and catalog sources • Ingest and clean data • Transform into optimal formats Simplify security management • Enforce encryption • Define access policies • Implement audit login Enable self-service and combined analytics • Analysts discover all data available for analysis from a single data catalog • Use multiple analytics tools over the same data Athena Amazon Redshift AI Services Amazon EMR Amazon QuickSight Data Catalog Amazon S3
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T High Performance Why Amazon S3 for the Data Lake? SecureDurable Available Easy to use Scalable & Affordable Integrated
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Kinesis—Real Time Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL Build custom applications that analyze data streams Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics SQL
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T User-Defined Functions • Bring your own functions & code • Execute without provisioning servers Processing and Querying In Place Fully Managed Process & Query • Catalog, Transform, & Query Data in Amazon S3 • No physical instances to manage Lambda Function
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Motivation Behind Amazon S3 Select GET all the data from S3 objects, and my application will filter the data that I need Redshift Spectrum Example: Customer: Run 50,000 queries Amount of data fetched from S3: 6 PBs Amount of data used in Amazon Redshift: 650 TB Data needed from S3: 10%
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Before 200 seconds and 11.2 cents # Download and process all keys for key in src_keys: response = s3_client.get_object(Bucket=src_bucket, Key=key) contents = response['Body'].read() for line in contents.split('n')[:-1]: line_count +=1 try: data = line.split(',') srcIp = data[0][:8] …. Amazon S3 Select: Serverless MapReduce After 95 seconds and costs 2.8 cents # Select IP Address and Keys for key in src_keys: response = s3_client.select_object_content (Bucket=src_bucket, Key=key, expression = SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj) contents = response['Body'].read() for line in contents: line_count +=1 try: …. 2X Faster at 1/5 of the cost
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Athena—Interactive Analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports Multiple Data Formats – Define Schema on Demand $ Query Instantly Pay per query Open Easy
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Choosing the Right Data Formats There is no such thing as the “best” data format • All involve tradeoffs, depending on workload & tools • CSV, TSV, JSON are easy, but not efficient • Compress & store/archive as raw input • Columnar compressed are generally preferred • Parquet or ORC • Smaller storage footprint = lower cost • More efficient scan & query • Row oriented (AVRO) good for full data scans Key considerations are cost, performance & support
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Choosing the Right Data Formats (con’t.) Pay by the amount of data scanned per query Use Compressed Columnar Formats • Parquet • ORC Easy to integrate with wide variety of tools Dataset Size on Amazon S3 Query Run time Data Scanned Cost Logs stored as Text files 1 TB 237 seconds 1.15TB $5.75 Logs stored in Apache Parquet format* 130 GB 5.13 seconds 2.69 GB $0.013 Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Prep is ~80% of Data Lake Work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Glue—Serverless Data Catalog & ETL Data Catalog ETL Job authoring Discover data and extract schema Auto-generates customizable ETL code in Python and Spark Automatically discovers data and stores schema Data searchable, and available for ETL Generates customizable code Schedules and runs your ETL jobs Serverless
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Lake Formation (join the preview) Build, secure, and manage a data lake in days Build a data lake in days, not months Build and deploy a fully managed data lake with a few clicks Enforce security policies across multiple services Centrally define security, governance, and auditing policies in one place and enforce those policies for all users and all applications Combine different analytics approaches Empower analyst and data scientist productivity, giving them self- service discovery and safe access to all data from a single catalog
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Traditionally, analytics looked like this Expensive: Large initial capex + $10k $50k/TB/year GBs-TBs scale [not designed for PB/EBs] Relational data 90% of data was thrown away because of cost OLTP ERP CRM LOB Data Warehouse Business Intelligence
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Lakes evolve the traditional approach OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 1001100001001010111001 0101011100101010000101 1111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time Relational and non-relational data TBs-EBs scale Schema defined during analysis Diverse analytical engines to gain insights Designed for low-cost storage and analytics
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T What does data warehouse modernization mean? Easy to use Extends to your Data Lake Don’t waste time on menial administrative tasks and maintenance Directly analyze data stored in your data lake in open formats Any scale of data, workloads, and users Dynamically scale up to guarantee performance even with unpredictable demands and data volumes Faster time-to-insights Consistently fast performance, even with thousands of concurrent queries and users
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Redshift Fastest Get faster time-to-insight for all types of analytics workloads; powered by machine learning, columnar storage and MPP Unlimited scale Extends your Data Lake 1/10th the cost Dynamically scale up to guarantee performance even with unpredictable analytical demands and data volumes Analyze data in the Amazon S3 Data Lake in-place and in open formats, together with data loaded into Redshift’s high performance SSDs Start at $0.25 per hour, save costs with automated administration tasks and eliminate business impact due to downtime; as low as $1,000 per terabyte per year Fast, simple, cost-effective data warehouse that can extend queries to your Data Lake Analyze data in open formats such as Parquet, ORC, and JSON, using SQL tools
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Redshift architecture Leader Node Simple SQL end point Stores metadata Optimizes query plan Coordinates query execution Compute Nodes Local columnar storage Parallel/distributed execution of all queries, loads, backups, restores, resizes Start at just $0.25/hour DC1: SSD; scale from 160 GB to 326 TB DS2: HDD; scale from 2 TB to 2 PB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC Ingestion / Backup / Restore
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Security is built-in Select compliance certifications* 10 GigE (HPC) Customer VPC Internal VPC JDBC/ODBC Compute Nodes Leader Node Network Isolation End-to-end encryption Integration with AWS Key Management Service Amazon S3
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Caching Layer Concurrency Scaling for bursts of user activity (Preview) Automatically creates more clusters on- demand Consistently fast performance even with thousands of concurrent queries No advance hydration required Quickly scale to serve changing query workload Backup Redshift Managed S3
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Redshift Elastic Resize (GA) Adds additional nodes to Redshift cluster Distributes data across new configuration in minutes Minimal transition time Scale compute and storage on- demand Scale up and down in minutes Redshift Cluster Redshift Managed S3 JDBC/ODBC Leader Node CN2CN1 CN3 CN4 Backup
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Redshift intelligent administration Automates data distribution in tables for improved performance and disk space utilization. Provides intelligent recommendations for tuning based on continuous workload analysis. ALL keyA keyB keyC keyD Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 KEY Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 recommended distribution key No more messing with distkeys! Coming Soon! Advise
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Redshift intelligent maintenance VacuumAnalyze WLM Concurrency Setting AutoAuto Auto Maintenance processes like vacuum and analyze will automatically run in the background. Redshift will automatically adjust the WLM concurrency setting to deliver optimal throughput. Moving towards zero-maintenance. Coming Soon!
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Run stored procedures in Amazon Redshift Bring your existing Stored Procedure and run in Amazon Redshift. Amazon Redshift will support Stored Procedure in PL/pgSQL format, enabling you to bring your existing Stored Procedure to Amazon Redshift. Migrating to Amazon Redshift is even easier! Coming Soon! where the data is to efficiently run ETL, data validation, and custom business logic.
  • 29. THE INTELLIGENT DAM PLATFORM SPEAKER DARIO DE AGOSTINI CTO & Co-Founder THRON https://www.linkedin.com/in/dariodeagostini/
  • 30. THE INTELLIGENT DAM PLATFORM Grazie al supporto dell’Intelligenza Artificiale THRON ti permette di ridurre i costi di gestione di tutte le attività umane legate all’intero ciclo di vita dei contenuti. THE INTELLIGENT DAM PLATFORM
  • 31. THE INTELLIGENT DAM PLATFORM STORIE DI SUCCESSO
  • 32. THE INTELLIGENT DAM PLATFORM THRON permette di controllare l’intero ciclo di vita dei contenuti: THRON è stato incluso nel Forrester's Landscape come fornitore emergente di un DAM all’avanguardia per i Marketers, poiché dimostra funzionalità avanzate di analytics e intelligence. – Nick Barber, Senior Analyst WORKFLOW DEI CONTENUTI
  • 33. THE INTELLIGENT DAM PLATFORM L’ESIGENZA
  • 34. THE INTELLIGENT DAM PLATFORM VOLUME DI DATI 1,200,000,000 1,300,000,000 1,400,000,000 1,500,000,000 1,600,000,000 1,700,000,000 1,800,000,000 1,900,000,000 2,000,000,000 2,100,000,000 2,200,000,000 Events processed 100 milioni di nuovi eventi al mese Retention fa crescere il volume di dati
  • 35. THE INTELLIGENT DAM PLATFORM CARICO NON PREVEDIBILE 4 X
  • 36. THE INTELLIGENT DAM PLATFORM ARCHITETTURA 1/4
  • 37. THE INTELLIGENT DAM PLATFORM ARCHITETTURA 2/4
  • 38. THE INTELLIGENT DAM PLATFORM ARCHITETTURA 3/4
  • 39. THE INTELLIGENT DAM PLATFORM ARCHITETTURA 4/4
  • 40. THE INTELLIGENT DAM PLATFORM BENEFICI OTTENUTI Uso risorse efficiente: cluster ES passa da 4 istanze I3.2xlarge per dataload a 3 istanze I3.large per erogazione. Utilizzo di Spot instance per EMR. Drastica riduzione tempi sviluppo: data Pipeline astrae gestione flusso dati e rende facilissima la evoluzione, ottima la gestione di timeout e di retry. Riduzione dei costi «accessori»: allarmi tramite SNS e logging centralizzato. Scalabilità: Kinesis e Lambda forniscono grande scalabilità per realtime data processing. Esplorazione dati accessibile: Athena ci fa risparmiare circa un giorno/uomo al mese a fronte di meno di 50$/mese di spesa. Resilienza e alta disponibilità: grazie all’uso dei container su ECS. Realizzata in meno di 7 giorni uomo
  • 41. THE INTELLIGENT DAM PLATFORM https://medium.com/thron-tech Follow us on Medium Join us, we’re hiring https://www.thron.com/en/about/careers FOLLOW US
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T https://bit.ly/AWSDataLakeMilan
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T JSON Payload Example for each event sent { "r": 255, "g": 0, "b": 0, "c": "Red", "device": { "id": "4992157", "browser": "Chrome", "browserVersion": "72.0.3626.109", "os": "Mac OS", "isMobile": false, "isMobileIOS": false, "isMobileAndroid": false }, "dt": { "year": 2019, "month": 2, "day": 25, "hour": 18, "minutes": 43, "seconds": 47, "millis": 725 }, "id": 1551116627725, "region": "Outside Italy", "awsExperience": "1-3 Years", "awsServiceArea": "Management Tools" }
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo Application Architecture Amazon CloudFront Amazon Cognito Amazon S3 Static Web Application Users Amazon Kinesis Data Firehose Amazon AthenaAWS Glue Amazon QuickSight Client Mobile client AWS Browser JS SDK S3 Bucket AWS Cloud Region
  • 45. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.