SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2 3 A u g u s t , 2 0 2 2
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simplify data integration Using AWS Glue
Nico Anandito
Analytics Specialist Solutions Architect
Amazon Web Services
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
1. AWS Glue introduction
2. AWS Glue to simplify data integration
• Ingest
• Transform
• Operationalize
3. Demo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
Personas
N O O R L O W C O D E
D E V E L O P E R S
D A T A A N A L Y S T S A N D
D A T A S C I E N T I S T S
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
Personas
N O O R L O W C O D E
D E V E L O P E R S
D A T A A N A L Y S T S A N D
D A T A S C I E N T I S T S
Applications
R E A L - T I M E / S L A
S E N S I T I V E
H I G H L Y S C A L A B L E
P R I C E P E R F O R M A N C E
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration platform trends
Tools for all personas
Scalable Infrastructure Open Standards
Low cost
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
D A T A I N T E G R A T I O N I N B A T C H
A N D R E A L T I M E
P E R F O R M A N T A N D
C O S T - E F F E C T I V E
C E N T R A L I Z E D C A T A L O G A N D
G O V E R N A N C E
Amazon
DynamoDB
Amazon
SageMaker
Amazon
Redshift
Amazon
OpenSearch
Service
Amazon
EMR
Amazon
S3
Amazon
Aurora
Amazon
Athena
T O O L S F O R D I V E R S E S K I L L S E T S
Modern Data Architecture with AWS Glue
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
Serverless
Data Integration for
complex workloads
Serverless
No infrastructure to maintain. Allocate needed compute power and run jobs
Cost-effective
All-in-one pricing model is 55% cheaper than other cloud data integration
solutions
Handles complex workloads
Connect to 65+ data sources, process petabytes of data in real-time, includes
batch and event driven modes
No lock-in
Develop data integration pipelines in open source SparkSQL, PySpark, Python,
Scala
Data Integration for every user
Development environments catered to different skillsets - visual ETL development for
Data Engineers, notebook styled development for Data Scientists, and no code
development for Data Analysts
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Globe Telecom Develops A 360-Degree Customer View on AWS
Building a robust subscriber profile for more than 90 millions customers using AWS Glue
Can onboard 40 times more user
attributes a month
High platform availability
Integrates easily with downstream
applications
“Now, more than ever, multiple downstream applications
and analytical functions have access to real-time behavioral
data, placing us in a stronger position to deliver more
relevant and meaningful interactions with each of our
customers. We can personalize engagements from
messaging, real-time offers, to product bundles and more,”
Derick Adil
Director, Asset Delivery and Domain Integration, Globe Telecom
Read more: https://aws.amazon.com/solutions/case-studies/globe-telecom-cadenz/
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OLTP ERP CRM
Data
Warehouse Data
Lake
10011000010010101110
01010101110010101000
01011111011010
0011110010110010110
0100011000010
Devices Web Sensors
Automated schema discovery and management
Transactional
systems
Structured and Semi-Structured
discovery (Glue Crawlers)
No movement of data = Low
Costs/Admin
All metadata centrally available for
search and query = Productivity
Automate data discovery = Productivity
Unify structured, semi-structured data
= Speed to Insight
Machine
Learning
DW
Queries
Big Data
processing
Interactive Real-time
Business
Intelligence
Data Catalog
Unified Data Catalog with automated schema discovery
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On-premises DBs
Proprietary stores
SaaS applications
Data Sources
CUSTOM CONNECTOR
• No additional cost for
connecting to sources
• Flexible and easy to
build connectors
Custom Connectors with AWS Glue
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Connectors Marketplace
+ Many more…
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Execution Engine
Cost effective
job starts in seconds
Reduced job latencies
enabling micro-batching
Serverless Apache Spark and
Python environment
Per second billing with a
1-minute minimum billing
Fast and predictable
Diverse workloads
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Studio
V I S U A L J O B A U T H O R I N G A N D M O N I T O R I N G
Monitor thousands of jobs through a single
pane of glass
Advanced transforms though code snippets
Support for AWS Marketplace and custom
connectors
Preview your data at each step of the visual job
authoring process
Real-time schema inference without having to
catalog
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Studio Notebook
Interactive AWS Glue
jobs development
Submit AWS Glue jobs from the AWS
Glue Studio notebook
Use notebook magic to define
transforms in SQL and control cost
Built-in monitoring support
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Interactive Sessions
Time to first query = 10-15 minutes
AWS Glue Interactive Sessions
Steps Task Time required
1
Connect notebook to
Sessions API
In seconds
Time to first query ~ 1 min
Development
tool of your
choice
Rapid
development
Built-in cost
control
Existing options
High cost of a long-running cluster
“ oisy eighbor” problem
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with PII detection and remediation
Size of the dataset Location of PII Accuracy
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue PII Detection and remediation
T H R E E S I M P L E S T E P S
Type of Scan
1
Full Scan
Sample Scan
Remediation
3
Store results
Redact/mask
results
Entities to detect
2
Built-in Entities
(e.g. SSN, passport)
Custom Entities
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
Without Autoscaling With Autoscaling
Job execution timeline
List operation
Wide transform
Uneven distribution of data
partitions
AWS Glue job End
Start
Potential
savings
Glue Auto-scaling
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Streaming ETL
Process stream data & make it queryable in seconds
Join streams against each other or static data
Automatic updates to the AWS Glue Data Catalog
Dozens of supported data targets
Simplify your architecture with one service
for streaming and batch data integration
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring dashboard to check job status
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Orchestrate Glue jobs and other AWS services
Schedule jobs or trigger based on events
Monitor execution of the workflows in one place
Orchestrate jobs easily with AWS Glue workflows
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Glue APIs to build CI/CD pipeline
BOTO3 Endpoints to automate CI/CD pipeline
Automate to save development hours
Deploy jobs faster without any manual intervention
Manage Data Catalog though code snippets
AWS Cloud
Data Engineers
AWS CodePipeline
AWS CodeCommit/Git
AWS Lambda
AWS Glue Job
commit
deploy
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo Architecture
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
No-code to advanced data use cases
Process petabytes of data both in batch and real- time using Apache Spark
Migrate from expensive traditional ETL solutions to gain flexibility and reduce costs
Catalog data assets to make them available to AWS Analytics Services
AWS Glue to simplify data integration in the cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A modern data strategy can help you manage, act on, and react to your data so you can make
better decisions, respond faster, and uncover new opportunities. Dive deeper with these resources
today.
• Harness data to reinvent your organization
• In unpredictable times, a data strategy is key
• Make data a strategic asset
• Rewiring your culture to be data-driven
• Put your data to work with a modern analytics approach
• … and more!
Visit the AWS Data resource hub
tinyurl.com/aws-data-hub-id
Visit resource hub
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Training and Certification for Data and Analytics
Discover how to harness
data, one of the world’s
most valuable resources,
and innovate at scale.
This learning plan expose you to
the fastest way to get answers
from all your data to all your users.
It can also help prepare you for the
AWS Certified Data Analytics -
Specialty certification exam.
Earning AWS Certified Data
Analytics – Specialty
validates expertise in using
AWS data lakes and analytics
services.
AWS Data & Analytics
FREE Training Resources
AWS Data Analytics
Learning Plan
AWS Certified Data
Analytics - Specialty
https://bit.ly/3Ntlhy7 https://go.aws/3lwF0RR
https://bit.ly/3wBVjD1
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you for attending AWS Innovate – Data Edition
We hope you found it interesting! A kind reminder to complete the survey.
Let us know what you thought of today’s event and how we can improve the event
experience for you in the future.
aws-apj-marketing@amazon.com
twitter.com/AWSCloud
facebook.com/AmazonWebServices
youtube.com/user/AmazonWebServices
slideshare.net/AmazonWebServices
twitch.tv/aws
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Color palette feedback
E X A M P L E S
0
0
0
255
255
255
40
40
40
254
143
1
222
72
230
240
130
112
229
230
255
233
127
147
64
104
138
191
112
213
Current palette
0
0
0
255
255
255
6
0
73
254
143
1
222
72
230
74
201
209
21
163
99
105
20
225
40
73
189
181
145
253
Recommended palette
(hyperlinks)
(hyperlinks)
Or whatever color is
predominant in the slide
background if a solidish
color is selected for
content slides
Text
Text
Text
Text
Text
Text
Text
Hyperlink
Usable on this
background color
Usable on this
background color
Usable on this
background color
242
244
244
242
244
244

Weitere ähnliche Inhalte

Ähnlich wie Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf

Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight OverviewLam Le
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...Amazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
AWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Amazon Web Services
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceDenodo
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...Amazon Web Services
 
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Amazon Web Services
 
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Amazon Web Services
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoTBill Liu
 
AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28Amazon Web Services
 
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Amazon Web Services
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Amazon Web Services LATAM
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSAmazon Web Services
 
reInvent reCap 2022
reInvent reCap 2022reInvent reCap 2022
reInvent reCap 2022CloudHesive
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentAmazon Web Services
 
How Cardknox Migrated 1M+ Sensitive Records to AWS
 How Cardknox Migrated 1M+ Sensitive Records to AWS How Cardknox Migrated 1M+ Sensitive Records to AWS
How Cardknox Migrated 1M+ Sensitive Records to AWSAmazon Web Services
 
Single View of Data
Single View of DataSingle View of Data
Single View of Dataconfluent
 

Ähnlich wie Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf (20)

Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
AWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS Cloud
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
 
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
 
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoT
 
AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28
 
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWS
 
reInvent reCap 2022
reInvent reCap 2022reInvent reCap 2022
reInvent reCap 2022
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid Environment
 
How Cardknox Migrated 1M+ Sensitive Records to AWS
 How Cardknox Migrated 1M+ Sensitive Records to AWS How Cardknox Migrated 1M+ Sensitive Records to AWS
How Cardknox Migrated 1M+ Sensitive Records to AWS
 
Single View of Data
Single View of DataSingle View of Data
Single View of Data
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf

  • 1. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2 3 A u g u s t , 2 0 2 2
  • 2. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplify data integration Using AWS Glue Nico Anandito Analytics Specialist Solutions Architect Amazon Web Services
  • 3. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda 1. AWS Glue introduction 2. AWS Glue to simplify data integration • Ingest • Transform • Operationalize 3. Demo
  • 4. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E
  • 5. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E Personas N O O R L O W C O D E D E V E L O P E R S D A T A A N A L Y S T S A N D D A T A S C I E N T I S T S
  • 6. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E Personas N O O R L O W C O D E D E V E L O P E R S D A T A A N A L Y S T S A N D D A T A S C I E N T I S T S Applications R E A L - T I M E / S L A S E N S I T I V E H I G H L Y S C A L A B L E P R I C E P E R F O R M A N C E
  • 7. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration platform trends Tools for all personas Scalable Infrastructure Open Standards Low cost
  • 8. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. D A T A I N T E G R A T I O N I N B A T C H A N D R E A L T I M E P E R F O R M A N T A N D C O S T - E F F E C T I V E C E N T R A L I Z E D C A T A L O G A N D G O V E R N A N C E Amazon DynamoDB Amazon SageMaker Amazon Redshift Amazon OpenSearch Service Amazon EMR Amazon S3 Amazon Aurora Amazon Athena T O O L S F O R D I V E R S E S K I L L S E T S Modern Data Architecture with AWS Glue
  • 9. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Serverless Data Integration for complex workloads Serverless No infrastructure to maintain. Allocate needed compute power and run jobs Cost-effective All-in-one pricing model is 55% cheaper than other cloud data integration solutions Handles complex workloads Connect to 65+ data sources, process petabytes of data in real-time, includes batch and event driven modes No lock-in Develop data integration pipelines in open source SparkSQL, PySpark, Python, Scala Data Integration for every user Development environments catered to different skillsets - visual ETL development for Data Engineers, notebook styled development for Data Scientists, and no code development for Data Analysts
  • 10. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Globe Telecom Develops A 360-Degree Customer View on AWS Building a robust subscriber profile for more than 90 millions customers using AWS Glue Can onboard 40 times more user attributes a month High platform availability Integrates easily with downstream applications “Now, more than ever, multiple downstream applications and analytical functions have access to real-time behavioral data, placing us in a stronger position to deliver more relevant and meaningful interactions with each of our customers. We can personalize engagements from messaging, real-time offers, to product bundles and more,” Derick Adil Director, Asset Delivery and Domain Integration, Globe Telecom Read more: https://aws.amazon.com/solutions/case-studies/globe-telecom-cadenz/
  • 11. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 12. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. OLTP ERP CRM Data Warehouse Data Lake 10011000010010101110 01010101110010101000 01011111011010 0011110010110010110 0100011000010 Devices Web Sensors Automated schema discovery and management Transactional systems Structured and Semi-Structured discovery (Glue Crawlers) No movement of data = Low Costs/Admin All metadata centrally available for search and query = Productivity Automate data discovery = Productivity Unify structured, semi-structured data = Speed to Insight Machine Learning DW Queries Big Data processing Interactive Real-time Business Intelligence Data Catalog Unified Data Catalog with automated schema discovery
  • 13. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. On-premises DBs Proprietary stores SaaS applications Data Sources CUSTOM CONNECTOR • No additional cost for connecting to sources • Flexible and easy to build connectors Custom Connectors with AWS Glue
  • 14. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Connectors Marketplace + Many more…
  • 15. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 16. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Execution Engine Cost effective job starts in seconds Reduced job latencies enabling micro-batching Serverless Apache Spark and Python environment Per second billing with a 1-minute minimum billing Fast and predictable Diverse workloads
  • 17. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Studio V I S U A L J O B A U T H O R I N G A N D M O N I T O R I N G Monitor thousands of jobs through a single pane of glass Advanced transforms though code snippets Support for AWS Marketplace and custom connectors Preview your data at each step of the visual job authoring process Real-time schema inference without having to catalog
  • 18. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Studio Notebook Interactive AWS Glue jobs development Submit AWS Glue jobs from the AWS Glue Studio notebook Use notebook magic to define transforms in SQL and control cost Built-in monitoring support New
  • 19. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Interactive Sessions Time to first query = 10-15 minutes AWS Glue Interactive Sessions Steps Task Time required 1 Connect notebook to Sessions API In seconds Time to first query ~ 1 min Development tool of your choice Rapid development Built-in cost control Existing options High cost of a long-running cluster “ oisy eighbor” problem New
  • 20. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with PII detection and remediation Size of the dataset Location of PII Accuracy
  • 21. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue PII Detection and remediation T H R E E S I M P L E S T E P S Type of Scan 1 Full Scan Sample Scan Remediation 3 Store results Redact/mask results Entities to detect 2 Built-in Entities (e.g. SSN, passport) Custom Entities New
  • 22. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 Without Autoscaling With Autoscaling Job execution timeline List operation Wide transform Uneven distribution of data partitions AWS Glue job End Start Potential savings Glue Auto-scaling New
  • 23. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Streaming ETL Process stream data & make it queryable in seconds Join streams against each other or static data Automatic updates to the AWS Glue Data Catalog Dozens of supported data targets Simplify your architecture with one service for streaming and batch data integration
  • 24. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 25. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring dashboard to check job status
  • 26. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Orchestrate Glue jobs and other AWS services Schedule jobs or trigger based on events Monitor execution of the workflows in one place Orchestrate jobs easily with AWS Glue workflows
  • 27. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Glue APIs to build CI/CD pipeline BOTO3 Endpoints to automate CI/CD pipeline Automate to save development hours Deploy jobs faster without any manual intervention Manage Data Catalog though code snippets AWS Cloud Data Engineers AWS CodePipeline AWS CodeCommit/Git AWS Lambda AWS Glue Job commit deploy
  • 28. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 29. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo Architecture
  • 30. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary No-code to advanced data use cases Process petabytes of data both in batch and real- time using Apache Spark Migrate from expensive traditional ETL solutions to gain flexibility and reduce costs Catalog data assets to make them available to AWS Analytics Services AWS Glue to simplify data integration in the cloud
  • 31. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. A modern data strategy can help you manage, act on, and react to your data so you can make better decisions, respond faster, and uncover new opportunities. Dive deeper with these resources today. • Harness data to reinvent your organization • In unpredictable times, a data strategy is key • Make data a strategic asset • Rewiring your culture to be data-driven • Put your data to work with a modern analytics approach • … and more! Visit the AWS Data resource hub tinyurl.com/aws-data-hub-id Visit resource hub
  • 32. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Training and Certification for Data and Analytics Discover how to harness data, one of the world’s most valuable resources, and innovate at scale. This learning plan expose you to the fastest way to get answers from all your data to all your users. It can also help prepare you for the AWS Certified Data Analytics - Specialty certification exam. Earning AWS Certified Data Analytics – Specialty validates expertise in using AWS data lakes and analytics services. AWS Data & Analytics FREE Training Resources AWS Data Analytics Learning Plan AWS Certified Data Analytics - Specialty https://bit.ly/3Ntlhy7 https://go.aws/3lwF0RR https://bit.ly/3wBVjD1
  • 33. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you for attending AWS Innovate – Data Edition We hope you found it interesting! A kind reminder to complete the survey. Let us know what you thought of today’s event and how we can improve the event experience for you in the future. aws-apj-marketing@amazon.com twitter.com/AWSCloud facebook.com/AmazonWebServices youtube.com/user/AmazonWebServices slideshare.net/AmazonWebServices twitch.tv/aws
  • 34. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 35. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Color palette feedback E X A M P L E S 0 0 0 255 255 255 40 40 40 254 143 1 222 72 230 240 130 112 229 230 255 233 127 147 64 104 138 191 112 213 Current palette 0 0 0 255 255 255 6 0 73 254 143 1 222 72 230 74 201 209 21 163 99 105 20 225 40 73 189 181 145 253 Recommended palette (hyperlinks) (hyperlinks) Or whatever color is predominant in the slide background if a solidish color is selected for content slides Text Text Text Text Text Text Text Hyperlink Usable on this background color Usable on this background color Usable on this background color 242 244 244 242 244 244