SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
2 0 1 9
Data Lakes & Analytics en
AWS
Laura Mariana Caicedo
Solutions Architect
Lauracai10
What to expect from this session
• Data Importance
• Big Data Process
• DataLake – LakeFormation
• Right Tool to analyze
Data is a strategic asset
for every organization
The world’s most valuable
resource is
*Copyright: The Economist, 2017, David Parkins
The move
toward
data-centric
companies
Five largest companies by
market cap*
2001
2006
2011
2016
2018
$1.091T
$406B
$446B
$406B
$582B
$976B
$365B
$383B
$556B
$383B
$877B
$272B
$327B
$277B
$452B
$839B
$261B
$293B
$237B
$364B
$523B
$260B
$273B
$228B
$228B
Thinking about data as an asset, not a
cost
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stop
throwing
data away
Make it
available to
more users
Arm users
with more
data processing
technologies
Data
every 5 years
There is more data
than people think
15
years
live for
Data platforms need to
1,000x
scale
>10x
grows
There are more
people accessing data
And more
requirements for
making data available
Data Scientists
Analysts
Business Users
Applications
Secure Real time
Flexible Scalable
Big Data
Types of big data analytics
•Batch/
interactive
•Stream
processing
•Machine
learning
Plethora of tools
Collect Store
Process/
Analyze
Consume
Time to answer (latency)
Throughput
Cost
Simplify big data processing
StoreCollect
Analytics used to look like this
OLTP ERP CRM LOB
Data warehouse
Business intelligence
Relational data
TBs–PBs scale
Schema defined prior to data load
Operational reporting and ad hoc
Large initial CAPEX + $10K $50K/TB/Year
A data lake is a centralized repository that
allows you to store all your structured and
unstructured data at any scale
Why data lakes?
Data Lakes provide:
Relational and non-relational data
Scale-out to EBs
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
1001100001001010111001010
1011100101010000101111101
1010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW Queries Big data
processing
Interactive Real-time
• OLTP (Online Transaction Processing)
Characterized by a large number of short transactions (INSERT,
UPDATE, DELETE) that serve as persistence layer for applications.
e.g. Aurora, MySQL, PostgreSQL, etc. Typically a row-store
architecture
• OLAP (Online Analytical Processing)
Characterized by relatively low volume of transactions, and
queries are often complex and involve aggregations against large
historical datasets for data-driven decision making. e.g. Amazon
Redshift, Greenplum, etc. Typically a column-store architecture
• Data Lake
An architectural paradigm that allows customers to store all of
their data in a single unified place where they can collect and
store any data, at any scale, and at low cost. Data lakes
complement (not replace) other data stores such as data
warehouses. e.g. S3 data lake
OLTP
PostgreSQL
Amazon
Aurora
Amazon EC2
(Business Application)User
Applications
DataLake
Data Stores: What’s the Difference?
OLAP
ETL Tools
Amazon
QuickSight
Amazon Redshift
Amazon
Glue
BI Tools
OLTP ERP CRM LOBUser
Dashboards
Amazon S3 | AWS Glue
Any analytic
workload, any scale,
at the lowest possible
cost
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
Real-time
Data Movement
Data Lake
There are lots of ingestion tools
Amazon S3
Process Consume
S3 Transfer
Acceleration
Data sources
Transactions
Web logs /
cookies
ERP
Connected
devices
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
Building data lakes can still take
months
Data preparation accounts for ~80% of the work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
Sample of steps required
Find sources
Create Amazon Simple Storage Service (Amazon S3) locations
Configure access policies
Map tables to Amazon S3 locations
ETL jobs to load and clean data
Create metadata access policies
Configure access from analytics services
Rinse and repeat for other:
data sets, users, and end-services
And more:
manage and monitor ETL jobs
update metadata catalog as data changes
update policies across services as users and permissions change
manually maintain cleansing scripts
create audit processes for compliance
…
Manual | Error-prone | Time consuming
Enforce security policies
across multiple services
Gain and manage new
insights
Identify, ingest, clean,
and transform data
Build a secure data lake in days
AWS Lake Formation
How it works
Register existing data or import new
Amazon S3 forms the storage layer for
Lake Formation
Register existing S3 buckets that
contain your data
Ask Lake Formation to create required
S3 buckets and import data into them
Data is stored in your account. You have
direct access to it. No lock-in.
Data Lake Storage
Data
Catalog
Access
Control
Data import
Lake Formation
Crawlers ML-based
data prep
Easily load data to your data lake
logs
DBs
Blueprints
Data Lake Storage
Data
Catalog
Access
Control
Data import
Lake Formation
Crawlers ML-based
data prep
one-shot
incremental
With blueprints
You
1. Point us to the source
2. Tell us the location to load to
in your data lake
3. Specify how often you want to
load the data
Blueprints
1. Discover the source table(s)
schema
2. Automatically convert to the
target data format
3. Automatically partition the
data based on the
partitioning schema
4. Keep track of data that was
already processed
5. You can customize any of
the above
Easily de-duplicate your data with ML transforms
Secure once, access in multiple ways
Data Lake Storage
Data
Catalog
Access
Control
Lake Formation
Admin
Security permissions in Lake Formation
Control data access with simple
grant and revoke permissions
Specify permissions on tables and
columns rather than on buckets
and objects
Easily view policies granted to a
particular user
Audit all data access at one place
Process/
analyze
Process/analyze
Process/ Analyze analytics
• Amazon Redshift & Amazon Redshift Spectrum
• Managed data warehouse
• Spectrum enables querying S3
• Amazon Athena
• Serverless interactive query service
• Amazon EMR
• Managed Hadoop framework for running Apache Spark, Flink, Presto, Tez, Hive, Pig,
HBase, and others
Amazon
Redshift
EMR Amazon
Athena
Serverless Query Processing
• Serverless query service for querying data in S3 using standard SQL with
no infrastructure to manage
• No data loading required; query directly from Amazon S3
• Use standard ANSI SQL queries with support for joins, JSON, and
window functions
• Support for multiple data formats include text, CSV, TSV, JSON, Avro,
ORC, Parquet
• Pay per query only when you’re running queries based on data scanned.
If you compress your data, you pay less and your queries run faster
Amazon
Athena
Querying it in Amazon Athena
Either Create a Crawler to
auto-generate schema
OR
Write a DDL on the Athena
console/API/ JDBC/ODBC
driver
Start Querying Data
Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Amazon Redshift Speed: Three Highlights
Machine-learning based acceleration
1
2
Result-set Caching for Local & Data Lake Queries
(sub-second repeat
queries)
3
Constant improvements in performance for
real-world workloads
10x faster
than it was two years ago
Amazon Redshift is now
from over 200 features and enhancements
released due to lessons learnt from more than
10,000 customer deployments processing over 2 exabytes
of data every dayRedshift Spectrum caches
intermediate results that can
benefit different queries
The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a
market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is
based on best available resources. Opinions reflect judgment at the time and are subject to change.
The Forrester Wave™: Enterprise Data
Warehouse, Q4 2015
Forrester Wave™ Big Data Warehouse Q4 2018
AWS rated top in the
leader bracket and
received a score of
5/5 (the highest score
possible) in a number
of areas such as Use
Cases, Roadmap,
Market Awareness,
and Ability to Execute
AWS positioned as a
Leader in the Gartner
Magic Quadrant for
Data Management
Gartner Magic Quadrant, 2018
Solutions for
Analytics
Semi-structured/Unstructured Data Processing
• Hadoop, Hive, Presto, Spark, Tez, Impala etc.
• Release 5.2: Hadoop 2.7.3, Hive 2.1, Spark 2.02, Zeppelin, Presto, HBase 1.2.3 and HBase on
S3, Phoenix, Tez, Flink.
• New applications added within 30 days of their open source release
• Fully managed, Auto Scaling clusters with support for on-demand and
spot pricing
• Support for HDFS and S3 file systems enabling separated compute and
storage; multiple clusters can run against the same data in S3
• HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 client-
side encryption with customer managed keys and AWS KMS
Amazon EMR
The Hadoop ecosystem can run in Amazon EMR
Which analytics should I use? Process/analyze
Batch
Takes minutes to hours
Example: Daily/weekly/monthly reports
Amazon EMR (MapReduce, Hive, Pig, Spark)
Interactive
Takes seconds
Example: Self-service dashboards
Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark)
Stream
Takes milliseconds to seconds
Example: Fraud alerts, one-minute metrics
Amazon EMR (Spark Streaming), Amazon Kinesis Data Analytics,
Amazon KCL, AWS Lambda, and others
Predictive
Takes milliseconds (real-time) to minutes (batch)
Example: Fraud detection, forecasting demand, speech
recognition
Amazon SageMaker, Amazon Polly, Amazon Rekognition, Amazon
Transcribe, Amazon Translate, Amazon EMR (Spark ML), Amazon
Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK, and Caffe)
FastSlow
Amazon Redshift
& Spectrum
Amazon Athena
BatchInteractive
Amazon ES
Presto
Amazon
EMR
Predictive
AmazonML
KCL
Apps
AWS Lambda
Amazon Kinesis
Analytics
Stream
Streaming
Fast
Collect Store ConsumeProcess/analyzeETL
Amazon
QuickSight
Analysis&visualizationDatasceince
AI Apps
Apps
SW engineers/
business
users
Data scientists/
data engineers
Amazon QuickSight: Examples – Salesforce Analytics
Summary
• Data MUST be used in every organization
• Data lakes are very important to consume structured and
unstructured data
• Data lake governance
• Analyze data with the right tool
• Different type of consumers

Weitere ähnliche Inhalte

Was ist angesagt?

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesAmazon Web Services
 
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibiliCasi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibiliAmazon Web Services
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseAmazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)Amazon Web Services
 
Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Amazon Web Services
 
Are you Well-Architected? - AWS Online Tech Talks
Are you Well-Architected? - AWS Online Tech TalksAre you Well-Architected? - AWS Online Tech Talks
Are you Well-Architected? - AWS Online Tech TalksAmazon Web Services
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...AWS Germany
 
Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...
AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...
AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
AWS re:Invent 2016: Cost Optimization at Scale (ENT209)
AWS re:Invent 2016: Cost Optimization at Scale (ENT209)AWS re:Invent 2016: Cost Optimization at Scale (ENT209)
AWS re:Invent 2016: Cost Optimization at Scale (ENT209)Amazon Web Services
 
Build an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million UsersBuild an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million UsersAmazon Web Services
 

Was ist angesagt? (20)

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibiliCasi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
 
Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency
 
Are you Well-Architected? - AWS Online Tech Talks
Are you Well-Architected? - AWS Online Tech TalksAre you Well-Architected? - AWS Online Tech Talks
Are you Well-Architected? - AWS Online Tech Talks
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
 
Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...
AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...
AWS re:Invent 2016: Deliver Engaging Experiences with Custom Apps Built on Sa...
 
Databases
DatabasesDatabases
Databases
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Running Lean Architectures
Running Lean ArchitecturesRunning Lean Architectures
Running Lean Architectures
 
AWS re:Invent 2016: Cost Optimization at Scale (ENT209)
AWS re:Invent 2016: Cost Optimization at Scale (ENT209)AWS re:Invent 2016: Cost Optimization at Scale (ENT209)
AWS re:Invent 2016: Cost Optimization at Scale (ENT209)
 
Build an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million UsersBuild an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million Users
 

Ähnlich wie Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS

Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightAmazon Web Services LATAM
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...Amazon Web Services
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
Modern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftModern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftAmazon Web Services
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsAmazon Web Services
 

Ähnlich wie Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS (20)

Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Modern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftModern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon Redshift
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 

Mehr von Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAmazon Web Services LATAM
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAmazon Web Services LATAM
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSAmazon Web Services LATAM
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSAmazon Web Services LATAM
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAmazon Web Services LATAM
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAmazon Web Services LATAM
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosAmazon Web Services LATAM
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM
 

Mehr von Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Kürzlich hochgeladen

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Kürzlich hochgeladen (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS

  • 1. 2 0 1 9
  • 2. Data Lakes & Analytics en AWS Laura Mariana Caicedo Solutions Architect Lauracai10
  • 3. What to expect from this session • Data Importance • Big Data Process • DataLake – LakeFormation • Right Tool to analyze
  • 4. Data is a strategic asset for every organization The world’s most valuable resource is *Copyright: The Economist, 2017, David Parkins
  • 5. The move toward data-centric companies Five largest companies by market cap* 2001 2006 2011 2016 2018 $1.091T $406B $446B $406B $582B $976B $365B $383B $556B $383B $877B $272B $327B $277B $452B $839B $261B $293B $237B $364B $523B $260B $273B $228B $228B
  • 6. Thinking about data as an asset, not a cost © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Stop throwing data away Make it available to more users Arm users with more data processing technologies
  • 7. Data every 5 years There is more data than people think 15 years live for Data platforms need to 1,000x scale >10x grows
  • 8. There are more people accessing data And more requirements for making data available Data Scientists Analysts Business Users Applications Secure Real time Flexible Scalable
  • 10. Types of big data analytics •Batch/ interactive •Stream processing •Machine learning
  • 12. Collect Store Process/ Analyze Consume Time to answer (latency) Throughput Cost Simplify big data processing
  • 14. Analytics used to look like this OLTP ERP CRM LOB Data warehouse Business intelligence Relational data TBs–PBs scale Schema defined prior to data load Operational reporting and ad hoc Large initial CAPEX + $10K $50K/TB/Year
  • 15. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 16. Why data lakes? Data Lakes provide: Relational and non-relational data Scale-out to EBs Diverse set of analytics and machine learning tools Work on data without any data movement Designed for low cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 1001100001001010111001010 1011100101010000101111101 1010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 17. • OLTP (Online Transaction Processing) Characterized by a large number of short transactions (INSERT, UPDATE, DELETE) that serve as persistence layer for applications. e.g. Aurora, MySQL, PostgreSQL, etc. Typically a row-store architecture • OLAP (Online Analytical Processing) Characterized by relatively low volume of transactions, and queries are often complex and involve aggregations against large historical datasets for data-driven decision making. e.g. Amazon Redshift, Greenplum, etc. Typically a column-store architecture • Data Lake An architectural paradigm that allows customers to store all of their data in a single unified place where they can collect and store any data, at any scale, and at low cost. Data lakes complement (not replace) other data stores such as data warehouses. e.g. S3 data lake OLTP PostgreSQL Amazon Aurora Amazon EC2 (Business Application)User Applications DataLake Data Stores: What’s the Difference? OLAP ETL Tools Amazon QuickSight Amazon Redshift Amazon Glue BI Tools OLTP ERP CRM LOBUser Dashboards
  • 18. Amazon S3 | AWS Glue Any analytic workload, any scale, at the lowest possible cost AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight Analytics Machine Learning Real-time Data Movement Data Lake
  • 19. There are lots of ingestion tools Amazon S3 Process Consume S3 Transfer Acceleration Data sources Transactions Web logs / cookies ERP Connected devices
  • 20. Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5
  • 21. Building data lakes can still take months
  • 22. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 23. Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone | Time consuming
  • 24. Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 26. Register existing data or import new Amazon S3 forms the storage layer for Lake Formation Register existing S3 buckets that contain your data Ask Lake Formation to create required S3 buckets and import data into them Data is stored in your account. You have direct access to it. No lock-in. Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep
  • 27. Easily load data to your data lake logs DBs Blueprints Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep one-shot incremental
  • 28. With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 29. Easily de-duplicate your data with ML transforms
  • 30. Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Lake Formation Admin
  • 31. Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 33. Process/analyze Process/ Analyze analytics • Amazon Redshift & Amazon Redshift Spectrum • Managed data warehouse • Spectrum enables querying S3 • Amazon Athena • Serverless interactive query service • Amazon EMR • Managed Hadoop framework for running Apache Spark, Flink, Presto, Tez, Hive, Pig, HBase, and others Amazon Redshift EMR Amazon Athena
  • 34. Serverless Query Processing • Serverless query service for querying data in S3 using standard SQL with no infrastructure to manage • No data loading required; query directly from Amazon S3 • Use standard ANSI SQL queries with support for joins, JSON, and window functions • Support for multiple data formats include text, CSV, TSV, JSON, Avro, ORC, Parquet • Pay per query only when you’re running queries based on data scanned. If you compress your data, you pay less and your queries run faster Amazon Athena
  • 35. Querying it in Amazon Athena Either Create a Crawler to auto-generate schema OR Write a DDL on the Athena console/API/ JDBC/ODBC driver Start Querying Data
  • 36. Relational data warehouse Massively parallel; Petabyte scale Fully managed HDD and SSD Platforms $1,000/TB/Year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 37. Amazon Redshift Speed: Three Highlights Machine-learning based acceleration 1 2 Result-set Caching for Local & Data Lake Queries (sub-second repeat queries) 3 Constant improvements in performance for real-world workloads 10x faster than it was two years ago Amazon Redshift is now from over 200 features and enhancements released due to lessons learnt from more than 10,000 customer deployments processing over 2 exabytes of data every dayRedshift Spectrum caches intermediate results that can benefit different queries
  • 38. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. The Forrester Wave™: Enterprise Data Warehouse, Q4 2015 Forrester Wave™ Big Data Warehouse Q4 2018 AWS rated top in the leader bracket and received a score of 5/5 (the highest score possible) in a number of areas such as Use Cases, Roadmap, Market Awareness, and Ability to Execute AWS positioned as a Leader in the Gartner Magic Quadrant for Data Management Gartner Magic Quadrant, 2018 Solutions for Analytics
  • 39. Semi-structured/Unstructured Data Processing • Hadoop, Hive, Presto, Spark, Tez, Impala etc. • Release 5.2: Hadoop 2.7.3, Hive 2.1, Spark 2.02, Zeppelin, Presto, HBase 1.2.3 and HBase on S3, Phoenix, Tez, Flink. • New applications added within 30 days of their open source release • Fully managed, Auto Scaling clusters with support for on-demand and spot pricing • Support for HDFS and S3 file systems enabling separated compute and storage; multiple clusters can run against the same data in S3 • HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 client- side encryption with customer managed keys and AWS KMS Amazon EMR
  • 40. The Hadoop ecosystem can run in Amazon EMR
  • 41. Which analytics should I use? Process/analyze Batch Takes minutes to hours Example: Daily/weekly/monthly reports Amazon EMR (MapReduce, Hive, Pig, Spark) Interactive Takes seconds Example: Self-service dashboards Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark) Stream Takes milliseconds to seconds Example: Fraud alerts, one-minute metrics Amazon EMR (Spark Streaming), Amazon Kinesis Data Analytics, Amazon KCL, AWS Lambda, and others Predictive Takes milliseconds (real-time) to minutes (batch) Example: Fraud detection, forecasting demand, speech recognition Amazon SageMaker, Amazon Polly, Amazon Rekognition, Amazon Transcribe, Amazon Translate, Amazon EMR (Spark ML), Amazon Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK, and Caffe) FastSlow Amazon Redshift & Spectrum Amazon Athena BatchInteractive Amazon ES Presto Amazon EMR Predictive AmazonML KCL Apps AWS Lambda Amazon Kinesis Analytics Stream Streaming Fast
  • 42. Collect Store ConsumeProcess/analyzeETL Amazon QuickSight Analysis&visualizationDatasceince AI Apps Apps SW engineers/ business users Data scientists/ data engineers
  • 43. Amazon QuickSight: Examples – Salesforce Analytics
  • 44. Summary • Data MUST be used in every organization • Data lakes are very important to consume structured and unstructured data • Data lake governance • Analyze data with the right tool • Different type of consumers