SlideShare ist ein Scribd-Unternehmen logo
1 von 62
© 2020, Amazon Web Services, Inc. or its Affiliates.
Cobus Bernard
Senior Developer Advocate
Amazon Web Services
AWS Lake Formation
Deep Dive
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
What’s coming up:
Bonus
• QR codes
• Data Lake refresher
• AWS Lake Formation & Related Services
• Use-case examples
• Going forward
• QA
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Poll – Which of these are you using?
• S3 Lifecycle policy
• Glue ETL with ML
• Athena to query Relational Databases
• AWS Datalake
Amazon S3 | AWS Glue
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
Real-time
Data Movement
Data Lake
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Data Lake Refresher
Concepts & flow
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
A data lake isa centralized repository that allows you
to store all your structured and unstructured dataat
any scale
DataLakeDefinition
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
• All data in one place, a single source of truth
• Support Different Formats - structured/semi-structured/unstructured/raw data
• Supports fast ingestion and consumption
• Schema on read
• Designed for low-cost storage
• Decouples storage and compute
• Supports protection and security rules
DataLakeMainConcepts
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Traditional Data lake
Then:
- Enterprise Data Warehouse
- Batched based ETL for BI analysis
- Dashboarding tools
Data silos to
ERP CRM LOB
DW Silo 1
Business
Intelligence
Devices Web Sensors Social
DW Silo 2
Business
Intelligence
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Modern Data lake
Now:
- More data, less structured
- Accessed in various ways
- Real time-streaming
- Machine learning, scientific,
analytics, regulatory requirements
- Low cost storage & analytics
OLTP ERP CRM LOB
DataWarehouse
Business
Intelligence
Data Lake
10011000010010101110010101
011100101010000101111101101
0
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW Queries Big data
processing
Interactive Real-time
© 2020, Amazon Web Services, Inc. or its Affiliates.
Why choose AWS for data lakes and analytics?
Most secure
infrastructure for
analytics
Most scalable and
cost effective
Easiest to build data
lakes and analytics
Most
comprehensive and
open
1 2 3 4
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Moving to data lake architectures
Extends or evolves DW architectures
Store any data in any format
Durable, available, and exabyte scale
Secure, compliant, auditable
Run any type of analytics from DW to Predictive
Data
Warehousing
Analytics
Machine
Learning
Data lake
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
More data lakes and analytics than anywhere else
Tens of thousands of data lakes run on AWS across all industries
© 2020, Amazon Web Services, Inc. or its Affiliates.
Workflow of AWS services for big data
End-to-end Pipeline
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Simplified Data Pipeline
Data Sources Ingest
Process
&
Analyze
Consume
Amazon S3
Catalog
Store
Amazon S3
Store
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Multiple Data Sources
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Ingest
Process
&
Analyze
Consume
Amazon S3
Catalog
Store
Amazon S3
Store
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon DynamoDB
Fully managed, multi-region, multi-master database
Nonrelational database that delivers reliable performance at any scale
Consistent single-digit millisecond latency
Built-in security, backup and restore, in-memory Caching
Support Streams
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Process &
Analyze
Consume
Ingestion Options
Ingest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Catalog
Store
Amazon S3
Store
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Kinesis
Data Streams
• For technical developers
• Build your own custom
applications that process
or analyze streaming data
Amazon Kinesis
Data Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift,
and Amazon Elasticsearch
Amazon Kinesis
Data Analytics
• For all developers, data
scientists
• Easily analyze data streams
using standard SQL queries
Amazon Kinesis:StreamingData MadeEasy
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Storage Layer
Process &
Analyze
Consume
Catalog
IngestIngest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Amazon S3
Store
Amazon S3
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Secure, highly scalable, durable object storage with millisecond
latency for data access
Store any type of data–web sites, mobile apps, corporate
applications, and IoT sensors, at any scale
Store data in the format you want:
Unstructured (logs, dump files) | semi-structured (JSON, XML) | structured (CSV, Parquet)
Storage lifecycle integration
Amazon S3-Standard |Amazon S3-Infrequent Access | Amazon Glacier
AmazonS3is theBase
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Store
Data Discovery and Catalog
Amazon S3
Process &
Analyze
Consume
Catalog
AWS Glue
IngestIngest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Store
Amazon S3
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Automatically discovers data and stores schema
Data searchable, and available for ETL
Generates customizable code
Schedules and runs your ETL jobs
Serverless
AWSGlue -ServerlessDataCatalogand ETL
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Ingest
Consume
Amazon
Athena
Amazon
EMR
Amazon
Redshift
Amazon
Elasticsearch
Store
Amazon S3
Process & Analyze
Process and Analyze
Ingest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Catalog
AWS Glue
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Interactive query service to analyze data in Amazon
S3 using standard SQL
No infrastructure to set up or manage and no data
to load
Supports Multiple Data Formats – Define Schema
on Demand
AmazonAthena -InteractiveAnalysis
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Ingest Consume
Amazon Kinesis
BITools
Querying the Data Lake
Database
Migration Service
AWS Snowball
Amazon MSK
Amazon
Athena
Amazon
EMR
Amazon
Redshift
Amazon
Elasticsearch
Process & Analyze
Jupyter
Notebooks
Amazon
API Gateway
Amazon
QuickSight
Catalog
AWS Glue
Store
Amazon S3
Store
Amazon S3
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon QuickSight
Supports variety of Data source andTargets
Fully managed and scalable
Super fast and easy to use
Cost-effective
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Poll – Which do you struggle most with?
• Data migration/ingestion
• Data processing/ETL
• Data access security/auditing
• Cancelling your line with Telkom
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Data preparation accounts for ~80% of the work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Sample of steps required
Find sources
Create Amazon Simple Storage Service (Amazon S3) locations
Configure access policies
Map tables to Amazon S3 locations
ETL jobs to load and clean data
Create metadata access policies
Configure access from analytics services
Rinse and repeat for other:
data sets, users, and end-services
And more:
manage and monitor ETL jobs
update metadata catalog as data changes
update policies across services as users and permissions change
manually maintain cleansing scripts
create audit processes for compliance
…
Manual | Error-prone |Time consuming
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Lake Formation
Data lake infrastructure & management
S3/Glacier AWS GlueLake
Formation
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Lake Formation
Data lake infrastructure & management
S3/Glacier AWS GlueLake
Formation
© 2020, Amazon Web Services, Inc. or its Affiliates.
Build a secure data lake in days
with AWS Lake Formation
Amazon S3 data lake storage
AWS Lake Formation
Simplified ingest and cleaning enables data
engineers to build faster
Centralized management of fine-grained
permissions empower security officers
Comprehensive set of integrated tools enable
user access consistently
AWS
Glue
Blueprints ML
Transforms
Data
catalog
Access
control
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Top features of Lake Formation
• Enhanced governance layer - security and governance layer at the
Data Catalog level
• Blueprints / Data Importers - templates for ETL, metadata (schema)
and partition management
• ML Transformations – ML algorithms that customers can use to create
their own ML Transforms (i.e. record de-duplication)
• Enhanced Data Catalog - enable users to record more metadata and
tag Data Catalog objects (i.e. databases, tables, columns)
© 2020, Amazon Web Services, Inc. or its Affiliates.
Easier way to load data to the lake
Logs
DBs
Blueprints
One-shot
Incremental
© 2020, Amazon Web Services, Inc. or its Affiliates.
How it works
© 2020, Amazon Web Services, Inc. or its Affiliates.
How it works
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Lake Formation
Security
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Security permissions in Lake Formation
Search and view permissions granted
to a user, role, or group in one place
Verify permissions granted to a user
Easily revoke policies for a user
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Security permissions in Lake Formation
Control data access with simple grant
and revoke permissions
Specify permissions on tables and
columns rather than on buckets and
objects
Easily view policies granted to a
particular user
Audit all data access at one place
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Roles in a data lake project
IAM Administrator Data Lake Admin Data Lake Developer/Analyst
ProvisionsAWS resources and accounts Manages Data Lake Resources Access/Query Data Lake
e.g.
- Create Data Lake Admin
- Create S3 bucket
- Create Roles/Policies
e.g.
- Registers Data Lake
- Secure data lake/add access controls
e.g.
- Query Data Lake
- Creates MLTransforms
- Writes ETL Jobs
- Visualization usingAmazon QuickSight
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Configuration and Execution Steps
Activity User
1
Provision the following:
- Data lake administrator
- Data lake analyst/developer
- Data lake and glue roles
IAMAdministrator
2 Register a data lake Data Lake Administrator
3 Assign Lake Formation Permissions to data lake analyst Data Lake Administrator
4 Create AWS Glue Database Data Lake Administrator
5 Crawl and catalog Patient data in AWS Glue Data Lake Analyst
6 Assign table permissions to data lake Analyst Data Lake Administrator
7 Observe the data pattern and duplicates in data usingAmazon Athena Data Lake Analyst
8 Create, teach andTune an AWS Lake Formation MLTransform Data Lake Analyst
9 Create an AWS Glue ETL Job to use MLTransform for data deduplication Data Lake Analyst
10 Catalog de-duplicated data and query usingAmazon Athena Data Lake Analyst
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Secure once, access in multiple ways
Data Lake Storage
Data
Catalog
Access
Control
Lake Formation
Admin
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Search and collaborate across multiple users
Text-based, faceted search across
all metadata
Add attributes like Data owners,
stewards, and other as table
properties
Add data sensitivity level, column
definitions, and others as column
properties
Text-based search and filtering
Query data in Amazon Athena
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Enhanced Governance Layer
AWS Lake Formation provides a security and governance layer at the Data
Catalog level. Users can grant or revoke permissions to the Data Catalog
objects such as databases, tables and columns for IAM principals (IAM
users and roles). This functionality will be extended to row level access in
subsequent releases.
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Lake Formation
Glue/Blueprints
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
AWS Glue
Less hassle
Integrated across AWS: supports
Aurora, RDS, Redshift, S3, and common
database engines in yourVPC running
on EC2
Serverless
Serverless: no infrastructure to provision
or manage
More power
Automatically generates the code to
execute your data transformations and
loading processes
Simple, flexible, and cost-effective ETL & Data Catalog
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Blueprints build on AWS Glue
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Blueprints / Data Importers
Blueprints are templates for data ingestion, transformation, metadata
(schema) and partition management. Blueprints help customers to
quickly and easily build and maintain a data lake.
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
With blueprints
You
1. Point us to the source
2. Tell us the location to load
to in your data lake
3. Specify how often you
want to load the data
Blueprints
1. Discover the source table(s)
schema
2. Automatically convert to the target
data format
3. Automatically partition the data
based on the partitioning schema
4. Keep track of data that was
already processed
5. You can customize any of the
above
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Glue Features andTips
• Use VPC Endpoint for Glue, Athena and S3
• Daisy-chain JDBC jobs to avoid startup pain
• Use Lambda for tricks that aren’t yet available
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Lambda with Glue
• Automatically start an AWS Glue job when a file is uploaded to S3
• Use Lambda to Auto-Run a newly created Glue Crawler
• Use Cloudwatch with Lambda to receive
SNS notices for specific Glue Events
• Create & Enable Glue Trigger in Lambda
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Lake Formation
MLTransformations
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
MLTransformations
Deduplication & FuzzyMatching of your Data
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
MLTransforms
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
1. Create a new MLTransform
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
2.
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Fuzzy deduplication – Under the hood
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
What next?
Recommended things to check out
© 2020, Amazon Web Services, Inc. or its Affiliates.
http://bit.ly/LFML-2019
Lake Formation Lab
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Free foundational to advanced digital courses cover AWS services and teach
architecting best practices
AWSTraining and Certification
Visit aws.amazon.com/training/path-architecting/
Classroom offerings, including Architecting on AWS, feature
AWS expert instructors and hands-on labs
Validate expertise with the AWS Certified Solutions Architect - Associate or
AWS Certification Solutions Architect - Professional exams
Resources created by the experts at AWS to propel your organization and career forward
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Big Data blogs
Real deep, use-cases, step by step.
• “Use AWS Glue to run ETL jobs against non-native JDBC data
sources”
• “Matching patient records with the AWS Lake Formation
FindMatches transform”
• “Discovering metadata with AWS
Lake Formation “
https://aws.amazon.com/blogs/?awsf.blog-master-category=category%23big-data
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Q&A
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...Amazon Web Services Korea
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...Amazon Web Services Korea
 
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...Amazon Web Services Korea
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon Web Services Korea
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...Amazon Web Services Korea
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 
Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)Amazon Web Services
 
Object Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierObject Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 

Was ist angesagt? (20)

Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
AWS Technical Essentials Day
AWS Technical Essentials DayAWS Technical Essentials Day
AWS Technical Essentials Day
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)
 
Object Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierObject Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon Glacier
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 

Ähnlich wie AWS Lake Formation Deep Dive

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSLam Le
 

Ähnlich wie AWS Lake Formation Deep Dive (20)

Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWS
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 

Mehr von Cobus Bernard

London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservicesCobus Bernard
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...Cobus Bernard
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBCobus Bernard
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...Cobus Bernard
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...Cobus Bernard
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeCobus Bernard
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRCobus Bernard
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...Cobus Bernard
 
AWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWSAWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWSCobus Bernard
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesCobus Bernard
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataCobus Bernard
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersCobus Bernard
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...Cobus Bernard
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSCobus Bernard
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2Cobus Bernard
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSCobus Bernard
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSCobus Bernard
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityCobus Bernard
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersCobus Bernard
 

Mehr von Cobus Bernard (20)

London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservices
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DR
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
 
AWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWSAWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWS
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containers
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: Security
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with Containers
 

Kürzlich hochgeladen

Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Balliameghakumariji156
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiMonica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...meghakumariji156
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查ydyuyu
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 

Kürzlich hochgeladen (20)

Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 

AWS Lake Formation Deep Dive

  • 1. © 2020, Amazon Web Services, Inc. or its Affiliates. Cobus Bernard Senior Developer Advocate Amazon Web Services AWS Lake Formation Deep Dive
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. What’s coming up: Bonus • QR codes • Data Lake refresher • AWS Lake Formation & Related Services • Use-case examples • Going forward • QA
  • 3. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Poll – Which of these are you using? • S3 Lifecycle policy • Glue ETL with ML • Athena to query Relational Databases • AWS Datalake Amazon S3 | AWS Glue AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight Analytics Machine Learning Real-time Data Movement Data Lake
  • 4. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Data Lake Refresher Concepts & flow
  • 5. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. A data lake isa centralized repository that allows you to store all your structured and unstructured dataat any scale DataLakeDefinition
  • 6. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. • All data in one place, a single source of truth • Support Different Formats - structured/semi-structured/unstructured/raw data • Supports fast ingestion and consumption • Schema on read • Designed for low-cost storage • Decouples storage and compute • Supports protection and security rules DataLakeMainConcepts
  • 7. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Traditional Data lake Then: - Enterprise Data Warehouse - Batched based ETL for BI analysis - Dashboarding tools Data silos to ERP CRM LOB DW Silo 1 Business Intelligence Devices Web Sensors Social DW Silo 2 Business Intelligence
  • 8. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Modern Data lake Now: - More data, less structured - Accessed in various ways - Real time-streaming - Machine learning, scientific, analytics, regulatory requirements - Low cost storage & analytics OLTP ERP CRM LOB DataWarehouse Business Intelligence Data Lake 10011000010010101110010101 011100101010000101111101101 0 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 9. © 2020, Amazon Web Services, Inc. or its Affiliates. Why choose AWS for data lakes and analytics? Most secure infrastructure for analytics Most scalable and cost effective Easiest to build data lakes and analytics Most comprehensive and open 1 2 3 4
  • 10. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Moving to data lake architectures Extends or evolves DW architectures Store any data in any format Durable, available, and exabyte scale Secure, compliant, auditable Run any type of analytics from DW to Predictive Data Warehousing Analytics Machine Learning Data lake
  • 11. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. More data lakes and analytics than anywhere else Tens of thousands of data lakes run on AWS across all industries
  • 12. © 2020, Amazon Web Services, Inc. or its Affiliates. Workflow of AWS services for big data End-to-end Pipeline
  • 13. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Simplified Data Pipeline Data Sources Ingest Process & Analyze Consume Amazon S3 Catalog Store Amazon S3 Store
  • 14. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Multiple Data Sources Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Ingest Process & Analyze Consume Amazon S3 Catalog Store Amazon S3 Store
  • 15. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DynamoDB Fully managed, multi-region, multi-master database Nonrelational database that delivers reliable performance at any scale Consistent single-digit millisecond latency Built-in security, backup and restore, in-memory Caching Support Streams
  • 16. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Process & Analyze Consume Ingestion Options Ingest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Catalog Store Amazon S3 Store
  • 17. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon Kinesis Data Streams • For technical developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Data Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift, and Amazon Elasticsearch Amazon Kinesis Data Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis:StreamingData MadeEasy
  • 18. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Storage Layer Process & Analyze Consume Catalog IngestIngest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Amazon S3 Store Amazon S3
  • 19. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Secure, highly scalable, durable object storage with millisecond latency for data access Store any type of data–web sites, mobile apps, corporate applications, and IoT sensors, at any scale Store data in the format you want: Unstructured (logs, dump files) | semi-structured (JSON, XML) | structured (CSV, Parquet) Storage lifecycle integration Amazon S3-Standard |Amazon S3-Infrequent Access | Amazon Glacier AmazonS3is theBase
  • 20. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Store Data Discovery and Catalog Amazon S3 Process & Analyze Consume Catalog AWS Glue IngestIngest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Store Amazon S3
  • 21. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Automatically discovers data and stores schema Data searchable, and available for ETL Generates customizable code Schedules and runs your ETL jobs Serverless AWSGlue -ServerlessDataCatalogand ETL
  • 22. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Ingest Consume Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Store Amazon S3 Process & Analyze Process and Analyze Ingest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Catalog AWS Glue
  • 23. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports Multiple Data Formats – Define Schema on Demand AmazonAthena -InteractiveAnalysis
  • 24. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Ingest Consume Amazon Kinesis BITools Querying the Data Lake Database Migration Service AWS Snowball Amazon MSK Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Process & Analyze Jupyter Notebooks Amazon API Gateway Amazon QuickSight Catalog AWS Glue Store Amazon S3 Store Amazon S3 Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices
  • 25. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon QuickSight Supports variety of Data source andTargets Fully managed and scalable Super fast and easy to use Cost-effective
  • 26. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Poll – Which do you struggle most with? • Data migration/ingestion • Data processing/ETL • Data access security/auditing • Cancelling your line with Telkom
  • 27. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 28. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone |Time consuming
  • 29. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Data lake infrastructure & management S3/Glacier AWS GlueLake Formation
  • 30. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Data lake infrastructure & management S3/Glacier AWS GlueLake Formation
  • 31. © 2020, Amazon Web Services, Inc. or its Affiliates. Build a secure data lake in days with AWS Lake Formation Amazon S3 data lake storage AWS Lake Formation Simplified ingest and cleaning enables data engineers to build faster Centralized management of fine-grained permissions empower security officers Comprehensive set of integrated tools enable user access consistently AWS Glue Blueprints ML Transforms Data catalog Access control
  • 32. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Top features of Lake Formation • Enhanced governance layer - security and governance layer at the Data Catalog level • Blueprints / Data Importers - templates for ETL, metadata (schema) and partition management • ML Transformations – ML algorithms that customers can use to create their own ML Transforms (i.e. record de-duplication) • Enhanced Data Catalog - enable users to record more metadata and tag Data Catalog objects (i.e. databases, tables, columns)
  • 33. © 2020, Amazon Web Services, Inc. or its Affiliates. Easier way to load data to the lake Logs DBs Blueprints One-shot Incremental
  • 34. © 2020, Amazon Web Services, Inc. or its Affiliates. How it works
  • 35. © 2020, Amazon Web Services, Inc. or its Affiliates. How it works
  • 36. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Security
  • 37. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Security permissions in Lake Formation Search and view permissions granted to a user, role, or group in one place Verify permissions granted to a user Easily revoke policies for a user
  • 38. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 39. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Roles in a data lake project IAM Administrator Data Lake Admin Data Lake Developer/Analyst ProvisionsAWS resources and accounts Manages Data Lake Resources Access/Query Data Lake e.g. - Create Data Lake Admin - Create S3 bucket - Create Roles/Policies e.g. - Registers Data Lake - Secure data lake/add access controls e.g. - Query Data Lake - Creates MLTransforms - Writes ETL Jobs - Visualization usingAmazon QuickSight
  • 40. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Configuration and Execution Steps Activity User 1 Provision the following: - Data lake administrator - Data lake analyst/developer - Data lake and glue roles IAMAdministrator 2 Register a data lake Data Lake Administrator 3 Assign Lake Formation Permissions to data lake analyst Data Lake Administrator 4 Create AWS Glue Database Data Lake Administrator 5 Crawl and catalog Patient data in AWS Glue Data Lake Analyst 6 Assign table permissions to data lake Analyst Data Lake Administrator 7 Observe the data pattern and duplicates in data usingAmazon Athena Data Lake Analyst 8 Create, teach andTune an AWS Lake Formation MLTransform Data Lake Analyst 9 Create an AWS Glue ETL Job to use MLTransform for data deduplication Data Lake Analyst 10 Catalog de-duplicated data and query usingAmazon Athena Data Lake Analyst
  • 41. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Lake Formation Admin
  • 42. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Search and collaborate across multiple users Text-based, faceted search across all metadata Add attributes like Data owners, stewards, and other as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  • 43. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Enhanced Governance Layer AWS Lake Formation provides a security and governance layer at the Data Catalog level. Users can grant or revoke permissions to the Data Catalog objects such as databases, tables and columns for IAM principals (IAM users and roles). This functionality will be extended to row level access in subsequent releases.
  • 44. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Glue/Blueprints
  • 45. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. AWS Glue Less hassle Integrated across AWS: supports Aurora, RDS, Redshift, S3, and common database engines in yourVPC running on EC2 Serverless Serverless: no infrastructure to provision or manage More power Automatically generates the code to execute your data transformations and loading processes Simple, flexible, and cost-effective ETL & Data Catalog
  • 46. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Blueprints build on AWS Glue
  • 47. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Blueprints / Data Importers Blueprints are templates for data ingestion, transformation, metadata (schema) and partition management. Blueprints help customers to quickly and easily build and maintain a data lake.
  • 48. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 49. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Glue Features andTips • Use VPC Endpoint for Glue, Athena and S3 • Daisy-chain JDBC jobs to avoid startup pain • Use Lambda for tricks that aren’t yet available
  • 50. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lambda with Glue • Automatically start an AWS Glue job when a file is uploaded to S3 • Use Lambda to Auto-Run a newly created Glue Crawler • Use Cloudwatch with Lambda to receive SNS notices for specific Glue Events • Create & Enable Glue Trigger in Lambda
  • 51. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation MLTransformations
  • 52. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. MLTransformations Deduplication & FuzzyMatching of your Data
  • 53. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. MLTransforms
  • 54. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. 1. Create a new MLTransform
  • 55. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. 2.
  • 56. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Fuzzy deduplication – Under the hood
  • 57. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. What next? Recommended things to check out
  • 58. © 2020, Amazon Web Services, Inc. or its Affiliates. http://bit.ly/LFML-2019 Lake Formation Lab
  • 59. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Free foundational to advanced digital courses cover AWS services and teach architecting best practices AWSTraining and Certification Visit aws.amazon.com/training/path-architecting/ Classroom offerings, including Architecting on AWS, feature AWS expert instructors and hands-on labs Validate expertise with the AWS Certified Solutions Architect - Associate or AWS Certification Solutions Architect - Professional exams Resources created by the experts at AWS to propel your organization and career forward
  • 60. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Big Data blogs Real deep, use-cases, step by step. • “Use AWS Glue to run ETL jobs against non-native JDBC data sources” • “Matching patient records with the AWS Lake Formation FindMatches transform” • “Discovering metadata with AWS Lake Formation “ https://aws.amazon.com/blogs/?awsf.blog-master-category=category%23big-data
  • 61. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Q&A
  • 62. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Thank you!

Hinweis der Redaktion

  1. Data Lake The new, the old End to end pipeline Lake formation Security Glue Blueprints Lambda ML
  2. Who doesn’t have an AWS account?
  3. Dark data Schema on read means means defined in a catalog Parquet preferred Compute managed, but options
  4. Business decisions were made around the Enterprise Data warehousing with BI tools. Less relational, more diverse. 10x every 5 years Who has access/what type of access
  5. Business decisions were made around the Enterprise Data warehousing with BI tools. Less relational, more diverse. 10x every 5 years Who has access/what type of access Datalake is the evolution of data warehousing.
  6. 1/ easy path to build a data lake and start running diverse analytics workloads, 2/ secure cloud storage, compute, and network infrastructure that meets the specific needs of analytic workloads, 3/ a fully integrated analytics stack with a mature set of analytics tools, covering all common uses cases and leveraging open source and standard languages, engines, and platforms, and 4/ performance, the most scalability, and the lowest cost for analytics.
  7. analyze in a variety of ways with different engines go beyond insights, from operational reporting on historical data, to being able to perform ML and real-time analytics => accurately predict future outcomes.   S3 to provide even more insight without the delays and cost from moving or transforming your data.
  8. Many mature companies with dark data, to startup companies running real time applications built on what they learn
  9. (How to Build a Data Lake.pptx)
  10. Spend more time here, ask what people are using
  11. Natively supported by big data frameworks (Spark, Hive, Presto, and others) Decouple storage and compute No need to run compute clusters for storage (unlike HDFS) Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances Multiple & heterogeneous analysis clusters and services can use the same data Designed for 99.999999999% durability No need to pay for data replication within a region Secure – SSL, client/server-side encryption at rest
  12. Encryptable Hive compatible
  13. Mini-ETL with Create Table As Statement, Views, Workgroups, query JSON, catalog upgrades
  14. Who doesn’t love a good pie chart.
  15. I think that “other” might be designing their data storage structure. As someone helping solve issues in customer datasets, I can tell you some people need to spend more time defining their partition structure and data generation size
  16. Mention security here how governments + medical companies are using it There is a slide coming, detail in there, but just mention the security here
  17. Many exists, but still not simple enough. Manual and time-consuming tasks such as loading data from diverse sources, monitoring these data flows, setting up partitions, turning on encryption and managing keys, re-organizing data into columnar format, and granting and auditing access. days, not months. enables secured self-service discovery and access for users Aware of multiple analytics services, easy on-demand access to specific resources that fit the processor and memory requirements of each analytics workload. The data is curated and cataloged, already prepared for any flavor of analytics, and related records are matched and de-duplicated with machine learning. Automation reduces the time it takes to get to answers when your data lake is built on top of AWS
  18. Lake Formation simplifies this manual process, automates many of the steps, allowing customers to setup a Data Lake just a few clicks from a single, unified dashboard. This reduces the time to setup a Data Lake from months to days. To eliminate siloes, you need to build a data lake automates many of the complex steps required to set up a data lake, reducing the time required to build a secure data lake from months to days. Security control at the object level for our object storage (data lake storage) layer. Other cloud vendors only provide bucket level security control. Deep integration across services that are needed to get answers from your data, including storage, compute, networking, and data movement. For example, Amazon EMR makes it easy to use EC2 Spot instances to save up to 90% on analytic workloads. Amazon Redshift allows you to query your S3 objects directly from your data warehouse. A single security model across all analytic services. AWS Lake Formation provides a single way to control access to your data whether you are accessing that data from a data warehouse, a Spark cluster, or a serverless query technology. Mature analytics services. Amazon EMR was first released in 2009 and Amazon Redshift first launched in 2013. Amazon S3 was one of the first AWS products and has been available since 2006. Tens of thousands of customers have data lakes on AWS and X exabytes of data is analyzed every day. A single object storage layer that is compatible with all AWS analytics and machine learning services. Amazon S3 is our only object storage service, we do not have different versions of S3 and we do not have separate “data lake storage.” 5 storage tiers and intelligent tiering in Amazon S3, so are able to store more data at a lower cost and with less manual data lifecycle work than with any other cloud provider. AWS S3 and AWS managed services store customer data in independent data centers across three availability zones within a single AWS Region and automatically replicate data between any regions regardless of storage class, providing a very high degree of fault tolerance and data durability out-of-the-box. AWS analytics services provide best of breed performance. Amazon Redshift is 2x faster than the next most popular competitor and Amazon EMR runs Apache Spark workloads over 10x faster than open source Spark. Speed helps get to answers quickly and also helps keep costs down for complex analytics.
  19. AWS Lake Formation has an enhanced Data Catalog to enable users to record more metadata and Tags at Databases, Tables and Columns. All this data will be searchable. 
  20. Good time to break?
  21. Database, table, column
  22. IAM is API based, but it isn’t designed for real-world access control. On Glue Catalog we can grant resource level permissions, again, it is API based and doesn’t give granular enough access.
  23. Register location till file level. Need to give IAM role that has access to that location and trust Lake Formation We update SLR policy. By default has permissions to List bucket, but as you register locations, we add additional permissions
  24. Data lake admin not the IAM Admin by default. Keep separate for security. Data lake will not ignore IAM. Need to add themselves as DataLake admin. In LF you would give permissions to a table, without having to give access to the bucket.
  25. Glue customer but not lake-formation customer, will have full access. Until location registered. Tells Athena, Redshift and EMR to check LF when querying. By default, no one has access. Grant permissions to any IAM users/roles. All permissions are on Catalog objects. After registering, Athena won’t work. Root is denied, shouldn’t access. Allowed on users/roles
  26. Takes in principle (user/role), then resource (db/table/column). Table permissions grant/deny. Similar to databases, but differences. Grant permissions are permissions to give permissions, for example for managers. Best practice for design. Example: Athena query: get-table get-temporary-credentials We don’t use
  27. Grant permissions on table, must specify database as well. Athena/redshift can do column level filtering. Glue can’t.
  28. Column level support. Encryption on catalog Service side security
  29. Relationship advice Who here is building ETL with Glue? Who uses Crawler? Who uses Workflow?
  30. 1/ integrated natively supports data stored in Amazon Aurora and all other Amazon RDS engines, Amazon Redshift, and Amazon S3, as well as common database engines, databases EC2. 2/ Cost Effective / Serverless: AWS Glue is serverless. There is no infrastructure to provision or manage. AWS Glue handles provisioning, configuration, and scaling of the resources required to run your ETL jobs on a fully managed, scale-out Apache Spark environment. You pay only for the resources used while your jobs are running. 3/ Mover Power: AWS Glue automates much of the effort in building, maintaining, and running ETL jobs. AWS Glue crawls your data sources, identifies data formats, and suggests schemas and transformations. AWS Glue automatically generates the code to execute your data transformations and loading processes.
  31. Console only feature Why no S3 to S3?
  32. Endpoint is more secure, less latency. Examples:
  33. AWS Lake Formation includes specialized ML-based dataset transformation algorithms customers can use to create their own ML Transforms. These include record de-duplication and match finding.
  34. 1/ Less Hassle: AWS Glue is integrated across a wide range of AWS services, meaning less hassle for you when onboarding. AWS Glue natively supports data stored in Amazon Aurora and all other Amazon RDS engines, Amazon Redshift, and Amazon S3, as well as common database engines and databases in your Virtual Private Cloud (Amazon VPC) running on Amazon EC2. 2/ Cost Effective / Serverless: AWS Glue is serverless. There is no infrastructure to provision or manage. AWS Glue handles provisioning, configuration, and scaling of the resources required to run your ETL jobs on a fully managed, scale-out Apache Spark environment. You pay only for the resources used while your jobs are running. 3/ Mower Power: AWS Glue automates much of the effort in building, maintaining, and running ETL jobs. AWS Glue crawls your data sources, identifies data formats, and suggests schemas and transformations. AWS Glue automatically generates the code to execute your data transformations and loading processes.
  35. Go read the docs!!
  36. STORY BACKGROUND Georgia-Pacific, owned by Koch Industries, is an American wood products, pulp, and paper company based in Atlanta, Georgia. The organization is one of the world’s largest manufacturers and distributors of pulp, towel and tissue paper and dispensers, packaging, and wood and gypsum building products. They use an S3 data lake as part of an advanced analytics and ML solution to gain new insights, optimize processes, and maximize resources. They now save millions annually by leveraging new insights to improve equipment failure predictions, run more production lines efficiently, and ensure high quality products. https://aws.amazon.com/solutions/case-studies/georgia-pacific/ In the first six months, Georgia-Pacific transferred about 50 TB of production data—more than 500 billion records—from hundreds of large, complex manufacturing and converting-process machines. The company uses Amazon Kinesis to stream real-time data from manufacturing equipment to a central data lake based on Amazon Simple Storage Service (Amazon S3), allowing it to efficiently ingest and analyze structured and unstructured data at scale. Georgia-Pacific knew it could learn from its structured and unstructured data, but the company lacked a cost-effective storage mechanism to ingest, transform, house, and analyze this data.   Georgia-Pacific uses Amazon Elastic MapReduce (Amazon EMR) to transform the data before delivering it in a structured fashion to data analysts through Amazon Redshift. The analysts use Amazon Athena on top of Amazon S3 to query the raw data, which includes information on pulping mechanisms, paper machines, converting lines, vibration trends, throughput, and paper quality. Georgia-Pacific also uses Amazon SageMaker, an AWS machine-learning (ML) solution, to build, train, and deploy ML models at scale. Using ML models built with raw production data, Amazon SageMaker provides real-time feedback to machine operators regarding optimum machine speeds and other adjustable variables, enabling less experienced operators to detect breaks earlier and maintain quality.
  37. Sticky-note feedback.