SlideShare a Scribd company logo
1 of 58
Download to read offline
1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Asif Abbasi – Specialist SA Analytics
03.Nov.2019
Cutting to the chase for Machine Learning
Analytics Ecosystem & AWS Lake Formation
@masifabbasi
2© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Agenda
• Analytics Ecosystem
• Why did we build AWS Lake Formation?
• What is AWS Lake Formation?
• How does AWS Lake Formation help you?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics Value Stream
idea insight
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
time to first answer
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agile Analytics
• Experiment
• Invest in and scale up success
• Fail fast
• Adapt and evolve rapidly
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Modern Data Platform
• Treat data as a reusable asset
(while keeping the cost of reuse low)
• Apply an open set of processing
strategies in parallel
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Store all your data, forever,
at every stage of its lifecycle
Apply it using the right tool for the job
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Foundations: Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Events and Lifecycle Management
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent
Access
Amazon Glacier
Create
Delete
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Using S3 as Your Data Lake Foundation
• Unlimited number of objects and
volume
• 99.99% availability
• 99.999999999% durability
• Versioning
• Tiered storage via lifecycle policies
• SSL, client/server-side encryption
at rest
• Low cost (~ $2700/month/100 TB)
• Natively supported by big data
frameworks (eg. Spark, Hive,
Presto)
• Decouples storage and compute
• Run transient compute clusters
(with Amazon EC2 Spot
Instances)
• Multiple, heterogeneous
clusters can use same data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest
Kinesis Firehose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Discover and Govern Your Data
Kinesis Firehose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue Data Catalog
Hive metastore-compatible, highly-available
metadata repository:
• Search metadata for data discovery
• Connection info – JDBC URLs, credentials
• Classification for identifying and parsing
files
• Versioning of table metadata as schemas
evolve and other metadata are updated
• Table definitions – usable by Redshift,
Athena, Glue, EMR
Populate using Hive DDL, bulk import, or
automatically through crawlers.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Catalog and Query Multiple Sources
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Access Control and Auditing
Kinesis Firehose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Access Control and Auditing
IAM
Amazon
S3
Amazon
DynamoDB
Amazon EMR Amazon
Kinesis
Amazon
Athena
Service API Access
AWS
CloudTrail
Identity and Access Management
• Security at the API and data level
3rd party ecosystem security tools
• Blue Talon, Apache Ranger, Sentry, etc
Storage-level access logging and audit
• AWS CloudTrail logs API calls
• S3 access logging
• Log analytics with Athena
Encryption
• AWS Server-Side Encryption
• AWS Key Management Service
• AWS CloudHSM
• Custom materials providers
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Process and Analyze
Kinesis Firehose
Athena
Query Service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Job Authoring with AWS Glue
• Python/Scala code
generated by AWS Glue
• Connect a notebook or
IDE to AWS Glue
• Existing code brought into
AWS Glue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Job Execution with AWS Glue
• Schedule-based
• Event-based
• On demand
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless Queries: Amazon Athena
• Interactive queries
• ANSI SQL
• No infrastructure or administration
• Zero spin up time
• Query data in its raw format
• AVRO, Text, CSV, JSON, weblogs, AWS
service logs
• Convert to an optimized form like ORC
or Parquet for the best performance
and lowest cost
• No loading of data, no ETL required
• Stream data from directly from Amazon S3
• Take advantage of Amazon S3 durability
and availability
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Warehousing: Amazon Redshift
Managed, massively parallel, petabyte-scale, relational data warehouse
Scale from 160GB to 2PB online
Automatic streaming backup/restore to S3,
Automatic failover and recovery
ANSI SQL interface
Load data from S3, DynamoDB and EMR
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Query Exabytes of Data: Redshift Spectrum
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object
storage
Data
Catalog
Apache Hive
Metastore
Run Amazon Redshift SQL
queries against exabytes
of data in Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Managed Hadoop Framework: Amazon EMR
Scalable compute clusters as a
service
• Create ephemeral clusters –
only pay for the compute
you need
• Integrated with S3 (EMRFS)
• Use spot instances to
reduce processing time
AND costs
• Dynamically scale up and
down
• Data encryption at-rest and
in-transit
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business Analytics Service: Amazon QuickSight
Deep Integration with
AWS Data Sources
Amazon RDS,
Aurora
Amazon
Redshift
Amazon
Athena
Amazon S3
Flat Files
• Fully managed
• 1/10 cost of traditional BI solutions
• Integrated with AWS data sources
and third-party sources
• SPICE – Super-fast, Parallel, In-
memory, Calculation Engine
• Collaborate, share and publish data
sets and analyses
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building a Data Strategy on AWS
Kinesis Firehose
Athena
Query Service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
An Open Ecosystem of Applications
Processing &
Analytics
BI & Data Visualization
Kinesis Streams
& Firehose
Batch
EMR
Hadoop, Spark,
Presto
Redshift
Data Warehouse
Athena
Query Service
AWS Batch
Predictive
Real-time
AWS Lambda
Apache Storm
on EMR
Apache
Flink on
EMR
Spark
Streaming on
EMR
Elasticsearch
Service
Kinesis Analytics,
Kinesis Streams
EastiCache DAX
Online/Transactional
DynamoDB
(NoSQL)
Aurora
(Relational)
Neptun
e
(Graph)
26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Data Lakes
27© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
A data lake is a centralized repository that allows
you to store all your structured and unstructured
data at any scale
28© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Why data lakes?
Data Lakes provide:
Relational and non-relational data
Scale-out to EBs
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
1001100001001010111
0010101011100101010
0001011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW
Queries
Big data
processing
Interactive Real-time
29© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Any analytic
workload, any scale,
at the lowest possible
cost
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
Real-time
Data Movement
30© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
31© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Building data lakes can still take months
32© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Data preparation accounts for ~80% of the work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
33© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Sample of steps required Find sources
Create Amazon Simple Storage Service (Amazon S3) locations
Configure access policies
Map tables to Amazon S3 locations
ETL jobs to load and clean data
Create metadata access policies
Configure access from analytics services
Rinse and repeat for other:
data sets, users, and end-services
And more:
manage and monitor ETL jobs
update metadata catalog as data changes
update policies across services as users and permissions change
manually maintain cleansing scripts
create audit processes for compliance
…
Manual | Error-prone | Time consuming
34© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Enforce security policies
across multiple services
Gain and manage new
insights
Identify, ingest, clean, and
transform data
Build a secure data lake in days
AWS Lake Formation
35© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
How it works
36© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Register existing data or import new
Amazon S3 forms the storage layer for
Lake Formation
Register existing S3 buckets that
contain your data
Ask Lake Formation to create required
S3 buckets and import data into them
Data is stored in your account. You
have direct access to it. No lock-in.
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Lake Formation
Crawlers ML-based
data prep
37© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Easily load data to your data lake
logs
DBs
Blueprints
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Lake Formation
Crawlers ML-based
data prep
one-shot
incremental
38© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
With blueprints
You
1. Point us to the source
2. Tell us the location to load
to in your data lake
3. Specify how often you want
to load the data
Blueprints
1. Discover the source table(s)
schema
2. Automatically convert to
the target data format
3. Automatically partition the
data based on the
partitioning schema
4. Keep track of data that was
already processed
5. You can customize any of
the above
39© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Blueprints build on AWS Glue
40© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Easily de-duplicate your data with ML transforms
41© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Fuzzy de-duplication – under the hood
42© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Fuzzy de-duplication – Innovations
400M+
7.5B+
2.5
43© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Secure once, access in multiple ways
Data Lake Storage
Data
Catalog
Access
Control
Lake Formation
Admin
44© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Security permissions in Lake Formation
Control data access with simple
grant and revoke permissions
Specify permissions on tables and
columns rather than on buckets
and objects
Easily view policies granted to a
particular user
Audit all data access at one place
45© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Security permissions in Lake Formation
Search and view permissions
granted to a user, role, or group in
one place
Verify permissions granted to a
user
Easily revoke policies for a user
46© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Grant table and column-level permissions
User 1
User 2
47© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Security – deep dive
User
IAM users, roles
Active Directory Amazon S3
48© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Search and collaborate across multiple users
Text-based, faceted search
across all metadata
Add attributes like Data
owners, stewards, and other as
table properties
Add data sensitivity level,
column definitions, and others
as column properties
Text-based search and filtering
Query data in Amazon Athena
49© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Audit and monitor in real time
See detailed alerts in the
console
Download audit logs for
further analytics
Data ingest and catalog
notifications also published to
Amazon CloudWatch events
50© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Example: a data lake in 3 easy steps
51© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Step 1: Blueprints to ingest data
52© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Monitor the import
1
53© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Imported data as table in the data lake
54© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Step 2: Grant permissions to securely share data
55© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Step 3: Run query in Amazon Athena
56© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
AWS Lake Formation Pricing
No additional charges – Only pay for the
underlying services used.
57© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Customer interest
“We are very excited about the launch of AWS Lake
Formation, which provides a central point of control to
easily load, clean, secure, and catalog data from thousands
of clients to our AWS-based data lake, dramatically
reducing our operational load. … Additionally, AWS Lake
Formation will be HIPAA compliant from day one …”
Aaron Symanski, CTO, Change Healthcare
“I can’t wait for my team to get our hands on AWS Lake
Formation. With an enterprise-ready option like Lake
Formation, we will be able to spend more time deriving
value from our data rather than doing the heavy lifting
involved in manually setting up and managing our data
lake.”
Joshua Couch, VP Engineering, Fender Digital
58© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Thank you!

More Related Content

What's hot

Analyzing Your Web and Application Logs
Analyzing Your Web and Application Logs Analyzing Your Web and Application Logs
Analyzing Your Web and Application Logs Amazon Web Services
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computingdeimos
 
Being Well Architected in the Cloud
Being Well Architected in the CloudBeing Well Architected in the Cloud
Being Well Architected in the CloudAdrian Hornsby
 
AWSome Day 2016 - Module 1: AWS Introduction and History
AWSome Day 2016 - Module 1: AWS Introduction and HistoryAWSome Day 2016 - Module 1: AWS Introduction and History
AWSome Day 2016 - Module 1: AWS Introduction and HistoryAmazon Web Services
 
AWS Storage and Content Delivery Services
AWS Storage and Content Delivery ServicesAWS Storage and Content Delivery Services
AWS Storage and Content Delivery ServicesAmazon Web Services
 
AWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access ManagementAWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access ManagementAmazon Web Services
 
AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...
AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...
AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...Amazon Web Services
 
Introducing “Well-Architected” For Developers - Technical 101
Introducing “Well-Architected” For Developers - Technical 101Introducing “Well-Architected” For Developers - Technical 101
Introducing “Well-Architected” For Developers - Technical 101Amazon Web Services
 
Migrating to the cloud - Windows on AWS
Migrating to the cloud - Windows on AWSMigrating to the cloud - Windows on AWS
Migrating to the cloud - Windows on AWSAmazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Serverless computing - Build and run applications without thinking about servers
Serverless computing - Build and run applications without thinking about serversServerless computing - Build and run applications without thinking about servers
Serverless computing - Build and run applications without thinking about serversAmazon Web Services
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAmazon Web Services Korea
 
(SEC321) Implementing Policy, Governance & Security for Enterprises
(SEC321) Implementing Policy, Governance & Security for Enterprises(SEC321) Implementing Policy, Governance & Security for Enterprises
(SEC321) Implementing Policy, Governance & Security for EnterprisesAmazon Web Services
 

What's hot (20)

Security best practices
Security best practices Security best practices
Security best practices
 
Analyzing Your Web and Application Logs
Analyzing Your Web and Application Logs Analyzing Your Web and Application Logs
Analyzing Your Web and Application Logs
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computing
 
Being Well Architected in the Cloud
Being Well Architected in the CloudBeing Well Architected in the Cloud
Being Well Architected in the Cloud
 
AWSome Day 2016 - Module 1: AWS Introduction and History
AWSome Day 2016 - Module 1: AWS Introduction and HistoryAWSome Day 2016 - Module 1: AWS Introduction and History
AWSome Day 2016 - Module 1: AWS Introduction and History
 
AWS Storage and Content Delivery Services
AWS Storage and Content Delivery ServicesAWS Storage and Content Delivery Services
AWS Storage and Content Delivery Services
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
 
AWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access ManagementAWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access Management
 
Protecting Your Data in AWS
Protecting Your Data in AWSProtecting Your Data in AWS
Protecting Your Data in AWS
 
AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...
AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...
AWS re:Invent 2016: Enabling Enterprise Migrations: Creating an AWS Landing Z...
 
Introducing “Well-Architected” For Developers - Technical 101
Introducing “Well-Architected” For Developers - Technical 101Introducing “Well-Architected” For Developers - Technical 101
Introducing “Well-Architected” For Developers - Technical 101
 
Getting started with AWS
Getting started with AWSGetting started with AWS
Getting started with AWS
 
Amazon RDS_Deep Dive - SRV310
Amazon RDS_Deep Dive - SRV310 Amazon RDS_Deep Dive - SRV310
Amazon RDS_Deep Dive - SRV310
 
Migrating to the cloud - Windows on AWS
Migrating to the cloud - Windows on AWSMigrating to the cloud - Windows on AWS
Migrating to the cloud - Windows on AWS
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Protecting Your Data in AWS
Protecting Your Data in AWSProtecting Your Data in AWS
Protecting Your Data in AWS
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Serverless computing - Build and run applications without thinking about servers
Serverless computing - Build and run applications without thinking about serversServerless computing - Build and run applications without thinking about servers
Serverless computing - Build and run applications without thinking about servers
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 
(SEC321) Implementing Policy, Governance & Security for Enterprises
(SEC321) Implementing Policy, Governance & Security for Enterprises(SEC321) Implementing Policy, Governance & Security for Enterprises
(SEC321) Implementing Policy, Governance & Security for Enterprises
 

Similar to Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Formation

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Amazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCAmazon Web Services LATAM
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftAmazon Web Services
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczAmazon Web Services
 

Similar to Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Formation (20)

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter Dachnowicz
 

More from AWS Riyadh User Group

AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif AbbasiAWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif AbbasiAWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver VankerAWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver VankerAWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed RaafatAWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed RaafatAWS Riyadh User Group
 
Amazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAmazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on awsAWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on awsAWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Riyadh User Group
 
Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2AWS Riyadh User Group
 
Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1AWS Riyadh User Group
 

More from AWS Riyadh User Group (20)

AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
 
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif AbbasiAWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
 
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver VankerAWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
 
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed RaafatAWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
 
Demistifying serverless on aws
Demistifying serverless on awsDemistifying serverless on aws
Demistifying serverless on aws
 
Amazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAmazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML Models
 
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on awsAWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
 
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]
 
AWS Amplify
AWS AmplifyAWS Amplify
AWS Amplify
 
EC2 and S3 Level 100
EC2 and S3 Level 100EC2 and S3 Level 100
EC2 and S3 Level 100
 
Devops on AWS
Devops on AWSDevops on AWS
Devops on AWS
 
Blockchain on AWS
Blockchain on AWSBlockchain on AWS
Blockchain on AWS
 
AWS AI Services
AWS AI ServicesAWS AI Services
AWS AI Services
 
AWS Cloudformation Session 01
AWS Cloudformation Session 01AWS Cloudformation Session 01
AWS Cloudformation Session 01
 
AWS Cloud Security
AWS Cloud SecurityAWS Cloud Security
AWS Cloud Security
 
AWS Messaging
AWS MessagingAWS Messaging
AWS Messaging
 
Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2
 
Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1
 
Containers on AWS
Containers on AWSContainers on AWS
Containers on AWS
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Formation

  • 1. 1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Asif Abbasi – Specialist SA Analytics 03.Nov.2019 Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Formation @masifabbasi
  • 2. 2© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Agenda • Analytics Ecosystem • Why did we build AWS Lake Formation? • What is AWS Lake Formation? • How does AWS Lake Formation help you?
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics Value Stream idea insight COLLECT STORE PROCESS/ ANALYZE CONSUME time to first answer
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agile Analytics • Experiment • Invest in and scale up success • Fail fast • Adapt and evolve rapidly
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Modern Data Platform • Treat data as a reusable asset (while keeping the cost of reuse low) • Apply an open set of processing strategies in parallel
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Store all your data, forever, at every stage of its lifecycle Apply it using the right tool for the job
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Foundations: Amazon S3
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Events and Lifecycle Management Standard Active data Archive dataInfrequently accessed data Standard - Infrequent Access Amazon Glacier Create Delete
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of Using S3 as Your Data Lake Foundation • Unlimited number of objects and volume • 99.99% availability • 99.999999999% durability • Versioning • Tiered storage via lifecycle policies • SSL, client/server-side encryption at rest • Low cost (~ $2700/month/100 TB) • Natively supported by big data frameworks (eg. Spark, Hive, Presto) • Decouples storage and compute • Run transient compute clusters (with Amazon EC2 Spot Instances) • Multiple, heterogeneous clusters can use same data
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ingest Kinesis Firehose
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Discover and Govern Your Data Kinesis Firehose
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Glue Data Catalog Hive metastore-compatible, highly-available metadata repository: • Search metadata for data discovery • Connection info – JDBC URLs, credentials • Classification for identifying and parsing files • Versioning of table metadata as schemas evolve and other metadata are updated • Table definitions – usable by Redshift, Athena, Glue, EMR Populate using Hive DDL, bulk import, or automatically through crawlers.
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Catalog and Query Multiple Sources
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Access Control and Auditing Kinesis Firehose
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Access Control and Auditing IAM Amazon S3 Amazon DynamoDB Amazon EMR Amazon Kinesis Amazon Athena Service API Access AWS CloudTrail Identity and Access Management • Security at the API and data level 3rd party ecosystem security tools • Blue Talon, Apache Ranger, Sentry, etc Storage-level access logging and audit • AWS CloudTrail logs API calls • S3 access logging • Log analytics with Athena Encryption • AWS Server-Side Encryption • AWS Key Management Service • AWS CloudHSM • Custom materials providers
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Process and Analyze Kinesis Firehose Athena Query Service
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Job Authoring with AWS Glue • Python/Scala code generated by AWS Glue • Connect a notebook or IDE to AWS Glue • Existing code brought into AWS Glue
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Job Execution with AWS Glue • Schedule-based • Event-based • On demand
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless Queries: Amazon Athena • Interactive queries • ANSI SQL • No infrastructure or administration • Zero spin up time • Query data in its raw format • AVRO, Text, CSV, JSON, weblogs, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No loading of data, no ETL required • Stream data from directly from Amazon S3 • Take advantage of Amazon S3 durability and availability
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Warehousing: Amazon Redshift Managed, massively parallel, petabyte-scale, relational data warehouse Scale from 160GB to 2PB online Automatic streaming backup/restore to S3, Automatic failover and recovery ANSI SQL interface Load data from S3, DynamoDB and EMR
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query Exabytes of Data: Redshift Spectrum Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore Run Amazon Redshift SQL queries against exabytes of data in Amazon S3
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Managed Hadoop Framework: Amazon EMR Scalable compute clusters as a service • Create ephemeral clusters – only pay for the compute you need • Integrated with S3 (EMRFS) • Use spot instances to reduce processing time AND costs • Dynamically scale up and down • Data encryption at-rest and in-transit
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Analytics Service: Amazon QuickSight Deep Integration with AWS Data Sources Amazon RDS, Aurora Amazon Redshift Amazon Athena Amazon S3 Flat Files • Fully managed • 1/10 cost of traditional BI solutions • Integrated with AWS data sources and third-party sources • SPICE – Super-fast, Parallel, In- memory, Calculation Engine • Collaborate, share and publish data sets and analyses
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building a Data Strategy on AWS Kinesis Firehose Athena Query Service
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. An Open Ecosystem of Applications Processing & Analytics BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse Athena Query Service AWS Batch Predictive Real-time AWS Lambda Apache Storm on EMR Apache Flink on EMR Spark Streaming on EMR Elasticsearch Service Kinesis Analytics, Kinesis Streams EastiCache DAX Online/Transactional DynamoDB (NoSQL) Aurora (Relational) Neptun e (Graph)
  • 26. 26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Data Lakes
  • 27. 27© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 28. 28© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Why data lakes? Data Lakes provide: Relational and non-relational data Scale-out to EBs Diverse set of analytics and machine learning tools Work on data without any data movement Designed for low cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 1001100001001010111 0010101011100101010 0001011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 29. 29© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Any analytic workload, any scale, at the lowest possible cost AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight Analytics Machine Learning Real-time Data Movement
  • 30. 30© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5
  • 31. 31© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Building data lakes can still take months
  • 32. 32© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 33. 33© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone | Time consuming
  • 34. 34© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 35. 35© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | How it works
  • 36. 36© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Register existing data or import new Amazon S3 forms the storage layer for Lake Formation Register existing S3 buckets that contain your data Ask Lake Formation to create required S3 buckets and import data into them Data is stored in your account. You have direct access to it. No lock-in. Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep
  • 37. 37© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Easily load data to your data lake logs DBs Blueprints Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep one-shot incremental
  • 38. 38© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 39. 39© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Blueprints build on AWS Glue
  • 40. 40© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Easily de-duplicate your data with ML transforms
  • 41. 41© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Fuzzy de-duplication – under the hood
  • 42. 42© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Fuzzy de-duplication – Innovations 400M+ 7.5B+ 2.5
  • 43. 43© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Lake Formation Admin
  • 44. 44© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 45. 45© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Security permissions in Lake Formation Search and view permissions granted to a user, role, or group in one place Verify permissions granted to a user Easily revoke policies for a user
  • 46. 46© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Grant table and column-level permissions User 1 User 2
  • 47. 47© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Security – deep dive User IAM users, roles Active Directory Amazon S3
  • 48. 48© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Search and collaborate across multiple users Text-based, faceted search across all metadata Add attributes like Data owners, stewards, and other as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  • 49. 49© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Audit and monitor in real time See detailed alerts in the console Download audit logs for further analytics Data ingest and catalog notifications also published to Amazon CloudWatch events
  • 50. 50© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Example: a data lake in 3 easy steps
  • 51. 51© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Step 1: Blueprints to ingest data
  • 52. 52© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Monitor the import 1
  • 53. 53© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Imported data as table in the data lake
  • 54. 54© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Step 2: Grant permissions to securely share data
  • 55. 55© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Step 3: Run query in Amazon Athena
  • 56. 56© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | AWS Lake Formation Pricing No additional charges – Only pay for the underlying services used.
  • 57. 57© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Customer interest “We are very excited about the launch of AWS Lake Formation, which provides a central point of control to easily load, clean, secure, and catalog data from thousands of clients to our AWS-based data lake, dramatically reducing our operational load. … Additionally, AWS Lake Formation will be HIPAA compliant from day one …” Aaron Symanski, CTO, Change Healthcare “I can’t wait for my team to get our hands on AWS Lake Formation. With an enterprise-ready option like Lake Formation, we will be able to spend more time deriving value from our data rather than doing the heavy lifting involved in manually setting up and managing our data lake.” Joshua Couch, VP Engineering, Fender Digital
  • 58. 58© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Thank you!