SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
01.10.2018
Migrating to 21st Century analytics
Shafreen Sayyed Varun Gangoor
AWS Solutions Architect Zopa Sr. Data Engineer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Trends and use cases
• Data Storage
• Data collection and automated data Ingestion
• Data processing and analysis
• Data consumption and visualisation
• Security and compliance
• Zopa’s Story
• Summary
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Four trends are shaping Financial Services in EMEA today
Digital innovation
creating opportunities
to grow and
increasing
competitive pressure
Increasing regulatory
burden and ever-
increasing compliance
obligations
Resource constraints
on the road to
regaining financial
health
Laser focus on
operational risk
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Typical First Use Cases for Financial Services
Risk Calculations
using Grid Computing
Instead of 10 servers
running 6 hours, why
not have 60 servers run
for an hour
Digital channels
Automating customer
contact
Alexa / Echo
Internet banking
Data streaming
Device Farm
Greenfield digital banks
Software Development
Dramatically accelerate
the development cycle.
Scale up and down
resources
Big Data and AI/ML
Storing and analyzing
large data sets in the
cloud, using data lakes,
state of the art analytics,
AI and machine learning
technology
Backup, Archive,
Disaster Recovery
Durable, redundant,
encrypted, lockable.
Worm compliance records
management
Spin up redundant data
centers quickly
Open Banking and
FinTech integration
Build and scale secure
APIs quickly to enable an
ecosystem of FinTech
technology
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where do I start?
Ingest /
Collect
Consume/
visualize
Store Process/
analyze
Data
1 4
0 9
5
Answers and
insights
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where do I start?
Ingest /
Collect
Consume/
visualize
Store Process/
analyze
Data
1 4
0 9
5
Answers and
insights
Start here
(with a
business case)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage is Job #1
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Object Storage is Foundational
Amazon
S3
Amazon RDSAmazon
DynamoDB
EBSAmazon
EFS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data collection and automated
data Ingestion
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data collection
• Dedicated 1 Gbps and 10
Gbps fibre link to AWS
• Low cost, with consistent
low latency/jitter
• Direct access to AWS
services and your VPCs
• Tamper-resistant case and
electronics
• Ruggedized case that can
withstand 8.5 G
• Available in 50 TB or 80 TB
capacities
AWS Snowball
AWS Database
Migration Service
• Modernise, migrate, or
replicate your RDBMS
• Fan-in multiple sources to
single target
• Platform and schema
conversion
AWS Direct Connect
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Database Migration
Service
Amazon
Kinesis
Amazon
S3 Acceleration
Amazon
S3 Upload Snowball
Snowball Edge
Snowmobile
Automated Data Ingestion
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data processing and analysis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data ingestion
Amazon
Kinesis
AWS IoT
• Fully-managed real-time stream
processing
• Highly available across multiple
AZs
• Can capture and store:
• Terabytes of data per hour
• From hundreds of thousands
sources
• Collect data from your connected
devices
• Communicate securely back to your
devices
• Can easily support:
• Billions of devices
• Trillions of messages
“If you knew the state of every thing in the world, and could
reason on top of that data, what problems could you solve?”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SELECT STREAM author,
count(author) OVER ONE_MINUTE
FROM Tweets
WINDOW ONE_MINUTE AS
(PARTITION BY author
RANGE INTERVAL '1' MINUTE PRECEDING)
WHERE text LIKE ‘%#FSInsight%';
Amazon Kinesis Analytics – Simple SQL Interface
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scalable (secure, versioned, durable) storage +
Immutable data at every stage of its lifecycle +
Versioned schema and metadata
=
Data discovery, lineage
Storage + Catalog
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue
• Data Catalog
Discover and store metadata
• Job Execution
Serverless scheduling and execution
• Job Authoring
Auto-generated ETL code
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hive metastore-compatible, highly-
available metadata repository:
• Classification for identifying and
parsing files
• Versioning of table metadata as
schemas evolve
• Table definitions – usable by
Redshift, Athena, Glue, EMR
Populate using Hive DDL, bulk import,
or automatically through crawlers.
Glue Data Catalog
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
semi-structured
per-file schema
semi-structured
unified schema
identify file type
and parse files
enumerate
S3 objects
file 1
file 2
file N
…
int
array
intchar
struct
char int
array
struct
char
int
int
arrayint
char
char int
custom classifiers
app log parser
metrics parser
…
system classifiers
JSON parser
CSV parser
Apache log parser
…
bool
Crawlers: Automatic Schema Inference
bool
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lambda
Metadata Index
(Amazon DynamoDB)
Search Index
(Amazon Elasticsearch)
ObjectCreated
ObjectDeleted PutItem
Update Stream
Update Index
Extract Search Fields
Indexing and Searching Using Metadata
Amazon
S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data processing and analysis
• Petabyte scale data
warehouse
• Fault-tolerant scalable
cluster with node auto-
recovery
• Auto backup into Amazon
S3
Amazon Redshift
Structured
data processing
• Fully-managed big data
platform
• Auto-scaling clusters
• Supports Hadoop:
Hive, Spark, Presto
Zeppelin, HBase, Flink
HDFS and Amazon S3
filesystems
Amazon EMR
Semi/unstructured
data processing
• No infrastructure to manage
• No data loading required
• Supports multiple data formats:
• CSV, TSV, Avro, ORC, Parquet
• Uses ANSI SQL to directly query
Amazon S3
Amazon Athena
Serverless
query processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consume and visualise data
• No infrastructure to
manage
• Event-driven processing
• Pay per 100 ms CPU
• Node.js, Python, Java and
C# (.NET Core)
AWS
Lambda
• No infrastructure to manage
• Multiple classifier types
• Interactive UI for modelling and
dataset visualisation
Amazon
Machine Learning
• No infrastructure to manage
• Fast, cloud-powered BI tool
• Scales to hundreds of thousands of
users
• Quick calculations with SPICE
Amazon
QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Python code
generated by AWS
Glue
• Connect a notebook
or IDE to AWS Glue
• Existing code brought
into AWS Glue
Managed ETL with AWS Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Schedule-based
• Event-based
• On demand
Job Execution with AWS Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena – Analyze Data in S3
• Interactive queries
• ANSI SQL
• No infrastructure or administration
• Zero spin up time
• Query data in its raw format
• AVRO, Text, CSV, JSON, weblogs, AWS service logs
• Convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No loading of data, no ETL required
• Stream data from directly from Amazon S3, take advantage of Amazon
S3 durability and availability
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple query editor
with syntax highlighting
and autocomplete
Data Catalog
Query History, Saved Queries, and
Catalog Management
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data consumption and
visualisation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-premises
sources including Amazon Athena
Amazon RDS
Amazon S3
Amazon Redshift
Amazon Athena
Using Amazon Athena with Amazon QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security and compliance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest ServingData
sources
Modern data architecture
Insights to enhance business applications, new digital services
Transactions
Web logs /
cookies
ERP
Data analysts
Data scientists
Business users
Engagement platformsConnected
devices
Automation / events
DATA PIPELINES
EVENT PIPELINES
Data
Event Action
Insights
Data
Lake
ML / Analytics
Predict /
Recommend
AI Services
Social media
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Insights to enhance business applications, new digital services
Data analysts
Data scientists
Business users
Engagement platforms
Schemaless
Amazon ElasticSearch
Direct Query
Amazon Athena
Near-Zero Latency
Amazon DynamoDB
Automation / events
Amazon S3
Staged Data
(Data Lake)
Semi/Unstructured
Amazon EMR
Transactions
Web logs /
cookies
Connected
devices
Access securely granted - trust and compliance
1. Restricted access roles set up within account
2. Highly governed access to service endpoints
3. Remote access to virtual desktops
AWS
Cloud Trail
AWS
IAM
Amazon
CloudWatch
AWS
KMS
ERP
Enterprise Apps
Oracle, Teradata,
SQL Server, MySQL, et
Data Warehouse
Amazon Redshift
Social media
Migrating to 21st Century analytics
Varun Gangoor
Senior Data Engineer
email: varun.gangoor@zopa.com
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Who are we
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A pioneering financial services company
World’s 1st
peer-to-peer
lending platform
in 2004
£3.5 billion
lent to date,
and our growth is
accelerating
390,000
people have taken a
Zopa loan
75,000
actively invest through
Zopa
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What we do
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple loans.
Smart investments.
BorrowersInvestors
Invests Repayments
Interest + capital Loans
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics @Zopa
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Strictly Private & Confidential 47
Business Analytics
Centralised
Data
Warehouse
Business
Users
Great
Customer
Experience
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Scientists
* Statistical model and
monitoring
* Raw data
* Cold data
Data Analysts
* Data quality and
analysis
* Raw and
aggregated data
* Cold and warm data
Product Analysts
* Product feature
analysis
* Raw and
aggregated data
* Hot and warm data
Leadership Team
* Custom dashboards
* Aggregated data
Data Consumers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Existence
Same data exists in
multiple location
Reporting
Running reports from DWH
instead of operational
systems
Data Quality
Knowledge base about our
systems and our data
Time
Reporting queries against
the historical data
Analytical Data Warehouse Requirements
Historical Data
Need to gather in one
accessible location
Integration
Variety of sources and data
formats
Data Growth
From gigabytes to
petabytes
Varying Data
Rapid changes in
application features
and data sources
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bigdata @Zopa
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
V's @Zopa
Validity
• Data Quality
• Data Governance
Variablility
• Change in Data
• Change in Model
Vocabulary
• Data Model
• Data Dictionary
Value
• Useful Data
Volume
• Terabytes
• Tables and Files
Variety
• Structured
• Semi-structured
Velocity
• Batch
• Realtime
Veracity
• Trustworthyness
• Availability
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tech Stack
Collect
Kinesis Firehose
Glue
Process/Analyse
Lambda
Batch
Glue
Redshift
Redshift Spectrum
Athena
Store
S3
Glacier
Consume
Sagemaker
Tableau
Python
SQL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Zopa's Enterprise Data Warehouse
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you
We are hiring !
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Central Storage
Secure, Cost Effective
Storage in S3
S3
Access & User Interface
Give your users easy & secure access
API Gateway IAM Cognito
Protect & Secure
Use entitlements to ensure data is secure and users identities are verified
Security Token
Service
Cloudwatch Cloudtrail KMS
Athena Quicksight EMR Redshift
Processing & Analytics
Use predictive and prescriptive
analytics to gain better understanding
Firehose Direct Connect Snowball DMS
Data Ingestion
Get your data into S3
quickly and securely
Catalog & Search
Capture, Access, and Search Metadata
DynamoDB Amazon ESGlue Macie
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Again, where do I start?
Ingest /
Collect
Consume/
visualize
Store Process/
analyze
Data
1 4
0 9
5 Answers and
insights
Seriously, start here
(with a business case)
Then collect your data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Build decoupled systems
• Use Amazon S3 as the data fabric of your data lake
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage AWS managed services
• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns
• Immutable log, batch, interactive & real-time views
Be cost-conscious
• Big data ≠ big cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Solution Builder - Data Lake on AWS
Reference Architecture deployment via
CloudFormation
Configures core services to tag, search
and catalogue datasets
Deploys a console to search and
browse available datasets
http://amzn.to/2nTVjcp
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Humans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your CloudHumans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your CloudAmazon Web Services
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleAmazon Web Services
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesAmazon Web Services
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Amazon Web Services
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
 
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017Amazon Web Services
 
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...Amazon Web Services
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...Amazon Web Services
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...Amazon Web Services
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
 

Was ist angesagt? (20)

Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
Humans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your CloudHumans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your Cloud
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
 
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
 
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 

Ähnlich wie 21st Century Analytics with Zopa

STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...Amazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCAmazon Web Services LATAM
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Amazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersAmazon Web Services
 
STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 

Ähnlich wie 21st Century Analytics with Zopa (20)

STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Deep Dive on Big Data
Deep Dive on Big Data Deep Dive on Big Data
Deep Dive on Big Data
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million Users
 
STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data Workloads
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
AWS Storage Stage of Union
AWS Storage Stage of UnionAWS Storage Stage of Union
AWS Storage Stage of Union
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

21st Century Analytics with Zopa

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 01.10.2018 Migrating to 21st Century analytics Shafreen Sayyed Varun Gangoor AWS Solutions Architect Zopa Sr. Data Engineer
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Trends and use cases • Data Storage • Data collection and automated data Ingestion • Data processing and analysis • Data consumption and visualisation • Security and compliance • Zopa’s Story • Summary
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Four trends are shaping Financial Services in EMEA today Digital innovation creating opportunities to grow and increasing competitive pressure Increasing regulatory burden and ever- increasing compliance obligations Resource constraints on the road to regaining financial health Laser focus on operational risk
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Typical First Use Cases for Financial Services Risk Calculations using Grid Computing Instead of 10 servers running 6 hours, why not have 60 servers run for an hour Digital channels Automating customer contact Alexa / Echo Internet banking Data streaming Device Farm Greenfield digital banks Software Development Dramatically accelerate the development cycle. Scale up and down resources Big Data and AI/ML Storing and analyzing large data sets in the cloud, using data lakes, state of the art analytics, AI and machine learning technology Backup, Archive, Disaster Recovery Durable, redundant, encrypted, lockable. Worm compliance records management Spin up redundant data centers quickly Open Banking and FinTech integration Build and scale secure APIs quickly to enable an ecosystem of FinTech technology
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Where do I start? Ingest / Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers and insights
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Where do I start? Ingest / Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers and insights Start here (with a business case)
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage is Job #1
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Object Storage is Foundational Amazon S3 Amazon RDSAmazon DynamoDB EBSAmazon EFS
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data collection and automated data Ingestion
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data collection • Dedicated 1 Gbps and 10 Gbps fibre link to AWS • Low cost, with consistent low latency/jitter • Direct access to AWS services and your VPCs • Tamper-resistant case and electronics • Ruggedized case that can withstand 8.5 G • Available in 50 TB or 80 TB capacities AWS Snowball AWS Database Migration Service • Modernise, migrate, or replicate your RDBMS • Fan-in multiple sources to single target • Platform and schema conversion AWS Direct Connect
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Database Migration Service Amazon Kinesis Amazon S3 Acceleration Amazon S3 Upload Snowball Snowball Edge Snowmobile Automated Data Ingestion
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data processing and analysis
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data ingestion Amazon Kinesis AWS IoT • Fully-managed real-time stream processing • Highly available across multiple AZs • Can capture and store: • Terabytes of data per hour • From hundreds of thousands sources • Collect data from your connected devices • Communicate securely back to your devices • Can easily support: • Billions of devices • Trillions of messages “If you knew the state of every thing in the world, and could reason on top of that data, what problems could you solve?”
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SELECT STREAM author, count(author) OVER ONE_MINUTE FROM Tweets WINDOW ONE_MINUTE AS (PARTITION BY author RANGE INTERVAL '1' MINUTE PRECEDING) WHERE text LIKE ‘%#FSInsight%'; Amazon Kinesis Analytics – Simple SQL Interface
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scalable (secure, versioned, durable) storage + Immutable data at every stage of its lifecycle + Versioned schema and metadata = Data discovery, lineage Storage + Catalog
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue • Data Catalog Discover and store metadata • Job Execution Serverless scheduling and execution • Job Authoring Auto-generated ETL code
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hive metastore-compatible, highly- available metadata repository: • Classification for identifying and parsing files • Versioning of table metadata as schemas evolve • Table definitions – usable by Redshift, Athena, Glue, EMR Populate using Hive DDL, bulk import, or automatically through crawlers. Glue Data Catalog
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. semi-structured per-file schema semi-structured unified schema identify file type and parse files enumerate S3 objects file 1 file 2 file N … int array intchar struct char int array struct char int int arrayint char char int custom classifiers app log parser metrics parser … system classifiers JSON parser CSV parser Apache log parser … bool Crawlers: Automatic Schema Inference bool
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lambda Metadata Index (Amazon DynamoDB) Search Index (Amazon Elasticsearch) ObjectCreated ObjectDeleted PutItem Update Stream Update Index Extract Search Fields Indexing and Searching Using Metadata Amazon S3
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data processing and analysis • Petabyte scale data warehouse • Fault-tolerant scalable cluster with node auto- recovery • Auto backup into Amazon S3 Amazon Redshift Structured data processing • Fully-managed big data platform • Auto-scaling clusters • Supports Hadoop: Hive, Spark, Presto Zeppelin, HBase, Flink HDFS and Amazon S3 filesystems Amazon EMR Semi/unstructured data processing • No infrastructure to manage • No data loading required • Supports multiple data formats: • CSV, TSV, Avro, ORC, Parquet • Uses ANSI SQL to directly query Amazon S3 Amazon Athena Serverless query processing
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Consume and visualise data • No infrastructure to manage • Event-driven processing • Pay per 100 ms CPU • Node.js, Python, Java and C# (.NET Core) AWS Lambda • No infrastructure to manage • Multiple classifier types • Interactive UI for modelling and dataset visualisation Amazon Machine Learning • No infrastructure to manage • Fast, cloud-powered BI tool • Scales to hundreds of thousands of users • Quick calculations with SPICE Amazon QuickSight
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Python code generated by AWS Glue • Connect a notebook or IDE to AWS Glue • Existing code brought into AWS Glue Managed ETL with AWS Glue
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Schedule-based • Event-based • On demand Job Execution with AWS Glue
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena – Analyze Data in S3 • Interactive queries • ANSI SQL • No infrastructure or administration • Zero spin up time • Query data in its raw format • AVRO, Text, CSV, JSON, weblogs, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No loading of data, no ETL required • Stream data from directly from Amazon S3, take advantage of Amazon S3 durability and availability
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple query editor with syntax highlighting and autocomplete Data Catalog Query History, Saved Queries, and Catalog Management
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data consumption and visualisation
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-premises sources including Amazon Athena Amazon RDS Amazon S3 Amazon Redshift Amazon Athena Using Amazon Athena with Amazon QuickSight
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security and compliance
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ingest ServingData sources Modern data architecture Insights to enhance business applications, new digital services Transactions Web logs / cookies ERP Data analysts Data scientists Business users Engagement platformsConnected devices Automation / events DATA PIPELINES EVENT PIPELINES Data Event Action Insights Data Lake ML / Analytics Predict / Recommend AI Services Social media
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Insights to enhance business applications, new digital services Data analysts Data scientists Business users Engagement platforms Schemaless Amazon ElasticSearch Direct Query Amazon Athena Near-Zero Latency Amazon DynamoDB Automation / events Amazon S3 Staged Data (Data Lake) Semi/Unstructured Amazon EMR Transactions Web logs / cookies Connected devices Access securely granted - trust and compliance 1. Restricted access roles set up within account 2. Highly governed access to service endpoints 3. Remote access to virtual desktops AWS Cloud Trail AWS IAM Amazon CloudWatch AWS KMS ERP Enterprise Apps Oracle, Teradata, SQL Server, MySQL, et Data Warehouse Amazon Redshift Social media
  • 32. Migrating to 21st Century analytics Varun Gangoor Senior Data Engineer email: varun.gangoor@zopa.com
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Who are we
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A pioneering financial services company World’s 1st peer-to-peer lending platform in 2004 £3.5 billion lent to date, and our growth is accelerating 390,000 people have taken a Zopa loan 75,000 actively invest through Zopa
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What we do
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple loans. Smart investments. BorrowersInvestors Invests Repayments Interest + capital Loans
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics @Zopa
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Strictly Private & Confidential 47 Business Analytics Centralised Data Warehouse Business Users Great Customer Experience
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Scientists * Statistical model and monitoring * Raw data * Cold data Data Analysts * Data quality and analysis * Raw and aggregated data * Cold and warm data Product Analysts * Product feature analysis * Raw and aggregated data * Hot and warm data Leadership Team * Custom dashboards * Aggregated data Data Consumers
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Existence Same data exists in multiple location Reporting Running reports from DWH instead of operational systems Data Quality Knowledge base about our systems and our data Time Reporting queries against the historical data Analytical Data Warehouse Requirements Historical Data Need to gather in one accessible location Integration Variety of sources and data formats Data Growth From gigabytes to petabytes Varying Data Rapid changes in application features and data sources
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bigdata @Zopa
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. V's @Zopa Validity • Data Quality • Data Governance Variablility • Change in Data • Change in Model Vocabulary • Data Model • Data Dictionary Value • Useful Data Volume • Terabytes • Tables and Files Variety • Structured • Semi-structured Velocity • Batch • Realtime Veracity • Trustworthyness • Availability
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tech Stack Collect Kinesis Firehose Glue Process/Analyse Lambda Batch Glue Redshift Redshift Spectrum Athena Store S3 Glacier Consume Sagemaker Tableau Python SQL
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Zopa's Enterprise Data Warehouse
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you We are hiring !
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Central Storage Secure, Cost Effective Storage in S3 S3 Access & User Interface Give your users easy & secure access API Gateway IAM Cognito Protect & Secure Use entitlements to ensure data is secure and users identities are verified Security Token Service Cloudwatch Cloudtrail KMS Athena Quicksight EMR Redshift Processing & Analytics Use predictive and prescriptive analytics to gain better understanding Firehose Direct Connect Snowball DMS Data Ingestion Get your data into S3 quickly and securely Catalog & Search Capture, Access, and Search Metadata DynamoDB Amazon ESGlue Macie
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Again, where do I start? Ingest / Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers and insights Seriously, start here (with a business case) Then collect your data
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Build decoupled systems • Use Amazon S3 as the data fabric of your data lake • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage AWS managed services • Scalable/elastic, available, reliable, secure, no/low admin Use log-centric design patterns • Immutable log, batch, interactive & real-time views Be cost-conscious • Big data ≠ big cost
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Solution Builder - Data Lake on AWS Reference Architecture deployment via CloudFormation Configures core services to tag, search and catalogue datasets Deploys a console to search and browse available datasets http://amzn.to/2nTVjcp
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you