Abebooks is one of Amazon Subsidiary and it treats data as an asset. It always looks the way to improve existing analytics solution and extract information from terabytes of data.
One of the recent initiatives was the migration from legacy DW platform to the AWS Redshift. During this journey, our data engineers met lots of challenges and sometimes tried to reinvent the wheel.
This talk will cover Abebooks journey towards Cloud DW. Moreover, we will cover the ETL tool selection process for the Cloud as well as the adoption process for the end users. This talk will help you understand the potential of the modern cloud DW and learn about our use case and save time for the future projects.
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Â
AWS User Group: Building Cloud Analytics Solution with AWS
1. select outline
from victoria_aws_user_group
where topic_name = âBuilding Modern
Analytics Solutionsâ
and solution_type = âCloudâ
and solution_provider = âAWSâ
and presenter = âDmitry Anoshinâ
And contact_info =
âDmitry.Anoshin@gmail.comâ
7. Other Activities
Tableau
Cookbook
2019.X
⢠BI Tech Talk (100+ BI teams globally)
⢠Amazon Tableau User Group (2000+ users)
⢠Conferences (EDW 2018, 2019)
⢠Amazon internal conferences
9. Business Value
Stakeholders Employees Customers
Value
âThe goal of any organization is to generate Valueâ
The Future of Competition.
https://www.amazon.com/Future-Competition-Co-Creating-Unique-Customers/dp/1578519535
10. BI Value Chain
Stakeholders Employees Customers
Value
Decisions
Data
Value creation based on
effective decisions
Effective decisions based on
accurate information
12. About Abebooks
⢠Online marketplace for books, art &
collectibles.
⢠Amazon subsidiary since 2008 we are
a marketplace for used books and
increasingly non-book-collectibles
⢠350 Mlns listings
⢠3 in âData Engineering Teamâ
⢠2 locations: Victoria, BC and
Dusseldorf
16. Leadership Principles
⢠Customer Obsession
⢠Ownership
⢠Invent and Simplify
⢠Are Right, A Lot
⢠Learn and Be Curious
⢠Hire and Develop the Best
⢠Insist on the Highest
Standards
⢠Think Big
⢠Bias for Action
⢠Frugality
⢠Earn Trust
⢠Dive Deep
⢠Have Backbone; Disagree
and Commit
⢠Deliver Results
https://www.amazon.jobs/en/principles
17. Leadership Principles (for me)
⢠Customer Obsession
⢠Ownership
⢠Invent and Simplify
⢠Are Right, A Lot
⢠Learn and Be Curious
⢠Hire and Develop the Best
⢠Insist on the Highest
Standards
⢠Think Big
⢠Bias for Action
⢠Frugality
⢠Earn Trust
⢠Dive Deep
⢠Have Backbone; Disagree
and Commit
⢠Deliver Results
https://www.amazon.jobs/en/principles
40. For Data to be a differentiator, customers
need to be able toâŚ
⢠Capture and store new non-
relational data at PB-EB scale
in real time
⢠Discover value in a new type
of analytics that go beyond
batch reporting to incorporate
real-time, predictive, voice,
and image recognition
⢠Democratize access to data in
a secure and governed way
New types of analytics
Dashboards Predictive Image
Recognition
VoiceReal-time
New types of data
41. OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
§ Relational data sources
§ TBsâPBs scale
§ Schema defined prior to data
load
§ Operational reporting and ad
hoc
Traditionally, analytics used to look
like this
42. Data & analytics partners extend the
traditional approach
Data Warehouse
Business
Intelligence
OLTP ERP CRM LOB Devices Web Sensors Social
Big Data processing,
real-time, Machine
Learning
Data Lake
§ Relational and non-relational
data
§ TBsâEBs scale
§ Diverse analytical engines
§ Low-cost storage & analytics
46. Cloud Migration Strategy
Lift & Shift
⢠Typical Approach
⢠Move all-at-once
⢠Target platform then evolve
⢠Approach gets you to the cloud
quickly
⢠Relatively small barrier to learning
new technology since it tends to be a
close fit
Split & Flip
⢠Split application into logical
functional data layers
⢠Match the data functionality with
the right technology
⢠Leverage the wide selection of
tools on AWS to best fit the need
⢠Move data in phases â prototype,
learn and perfect
47.
48. Choosing ETL Tool for Cloud
Use Cases
⢠OLTP to S3
⢠S3 to Redshift
⢠SFTP/API to
Redshift
⢠Data
Transformation
⢠Dimensional
Modelling
Tools
⢠Pentaho DI
⢠Informatica
⢠AWD Data
Pipeline/Glue
⢠Talend
⢠Matillion
49. ETL Criteria
High:
⢠Support
native
Redshift
driver
⢠Easily
capture
from
relational
db, CDC
⢠Ease of Use
for BI/DW
⢠Cover use
cases
⢠On-Premise
Medium:
⢠Support NoSQL
⢠Company âWinnerâ
⢠Deployment/Architecture
⢠Encryption
⢠Ease of Use for non
BI/DW
⢠Data Transformations
⢠Management
⢠Pricing
⢠Performance
Low:
⢠Version
Control
⢠Linux OS
⢠ETL
Monitoring
⢠Logging
⢠R/Pyhton
50. Why We Picked Matillion
⢠specific redshift support, built around Redshift
platform
⢠speed of ETL operations
⢠speed of development
⢠wide range of data sources supported
⢠ease of use outside of DE/DBA expertise
⢠Native with AWS
⢠$$$
⢠The biggest risk â putting our eggs in the Matillion
future, betting on a small and new player.
57. Coursera:
⢠Data Warehouse for Business Intelligence Specialization
⢠Data Engineering on Google Cloud Platform
⢠Architecting with Google Cloud Platform
AWS Tutorials:
⢠Getting Started with Amazon Redshift
⢠Sizing Amazon Redshift
⢠Getting Started with Amazon Spectrum, Athena, Glue, EMR
⢠AWS Free Tier (for example 2 months of Redshift)
AWS Trainings:
⢠AWS Technical Essentials
Other:
Google Machine Learning Crash Course (Deep Learning with TensorFlow)
http://hackerrank.com
Sql-ex.ru