SlideShare a Scribd company logo
1 of 22
Download to read offline
A																					Presenta*on	
Strategies for suppor.ng near real .me analy.cs, OLAP, and
interac.ve data explora.on
Dr. Jeremy Engle
Engineering Manager Data Team
The Jellyvision Lab
© 2017 The Jellyvision Lab, Inc. All rights reserved.
NOTICE: We have put a lot of work into this presenta.on. Please don’t take this
material and put it into your own material without first talking with us. Seriously.
Fancy Title…WTF are you talking about?
•  How Jellyvision does the data
•  What we have done/learned
•  Strategies for effec.ve design
•  What technologies helped us
•  Best outcomes
•  You learned something
•  My pics/gifs amused you
A bit about Jellyvision
•  Interac.ve soZware that talks people through important, complex, and
poten.ally snooze-inducing life decisions.
•  Our recipe: behavioral science, purposeful humor, mighty tech, and oregano.
•  850 companies with more than 15 million employees in total – including 91 of
the Fortune 500 and 15 of the country's 50 largest companies.
•  A B2B2C company
•  Our customers are corpora.ons
•  Our users are their employees
What does the data team do?
•  Pipeline = Collec.ng/Processing/Abstrac.ng
•  Visualiza.on = Applica.ons
•  Stewards = 
•  Data valida.on/Governance
•  Ad hoc repor.ng
•  Internal dashboards
•  100s million events per year
•  Will eclipse 1TB collec.ve data this year
•  Peak traffic 70x from low traffic
What does our use of data empower?
•  Customer KPIs
•  Usage
•  Sa.sfac.on
•  Feedback
•  Customers can pick any .me
range at day granularity
•  Support 8+ products/versions
•  Analysis of user behavior to determine
effec.veness of new and legacy content
(more on this later)
•  Monitoring for customer success
•  One off data requests
•  Specific customer data points
•  Customers that are configured like XX
What is our soZware like?
•  Custom built pipeline
•  Near real .me KPI aggregator
•  CouchDB and “Data Movers”
•  Pipeline for each of 8+
products/varia.ons
•  Migrate to data warehouse
with Kinesis Firehose+Lambda
(Snowflake)
•  Ruby and Python Web apps
•  Pipeline populates rela.onal DB with
aggregated metrics
•  Based on Command Query Responsibility
Segrega.on Parern
•  hrps://mar.nfowler.com/bliki/CQRS.html
What .me granulari.es we support
•  Processing
•  Near real .me (des.na.on in minutes)
•  Batch (des.na.on in hours to days)
•  Data Access
•  Web applica.on (Response in < 500ms)
•  hrps://research.googleblog.com/2009/06/speed-marers.html
•  Interac.ve OLAP (Response in ~2sec)
•  OLAP Querying/Repor.ng (Response in seconds to minutes)
How to build a stack
Modularize your stack
•  Collec.on
•  When/What data fires
•  APIs to receive data
•  Processing/Movement
•  EL/ETL/Streaming
•  Aggrega.on
•  Storage/Access
•  Usage/Visualiza.on/Analysis
Strategy: Buy versus Build
•  If you can buy it….do that
•  Massive change in last 3 years
•  Turn key analy.cs
•  Reduce/eliminate need for engineers
•  Infrastructure
•  Engineers go faster but need to be
able to integrate mul.ple 3rd par.es
Technologies for Data
•  Movement
•  Kinesis, SQS, DB Migra.on Service, Kawa
•  Processing
•  EMR, Kinesis Analy.cs, Spark/Flink/Storm
•  Workflow/Orchestra.on
•  Glue, Data Pipeline, Azkaban, Airflow, Luigi
•  Storage/Access (Many include some form of processing)
•  DynamoDB, S3, RedshiZ/Spectrum, Snowflake, The
whole NoSQL ecosystem
Using Your Data
Data Warehouse versus Data Lake
h-p://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html
Data Lakehouse (Snowflake)
•  Support schemaless AND structure at physical layer
•  Create structure where desirable
•  Make code/queries simpler
•  Make columnar queries more preformat
•  Crea.ng structure does not require massive data
maintenance event
•  Flexibility!!!!!!!
How we are using a Data Lakehouse
•  Custom BI Tool for exploring user behavior
•  Example Query: Find users that answered
ques.on A with 1 and ques.on B with 2
•  Two approaches:
•  Each addi.onal constraint requires a self-
join with ques.ons
•  For each session, build a map of Ques.on/
Answer pairs
Data Lakehouse: Itera.ve Development
•  Views create abstrac.ons
•  Encapsulate common business logic
•  Abstract away table/database structure
•  Structure(s) under the view can itera.vely
change
•  Performance can be improved itera.vely
•  Make a smaller lake
•  Covert to columnar physical model
•  Leverage cluster/sort keys
Processing Your Data
Define your processing
•  Immutable data simplifies everything
•  Define for each unit of data:
•  Aggrega.ons
•  Roll ups
•  Required latency
•  Define how aggrega.on and usage logic
intersect with movement and storage
systems
Event driven data processing
•  Use non-temporal triggers to facilitate flow of data
•  Difference in polling .me (think Kinesis Firehose) creates the dis.nc.on
between near-real .me and batch processing
•  Can also be designed to support real .me analy.cs
•  Especially useful for system that have varied traffic levels
Mutable data processing
•  Data is dirty, some.mes to the point that it must be fixed
•  Applying logic to mul.ple systems is .me consuming and fragile, with
streaming systems it is especially problema.c
•  Big data systems make it feasible to store full historical data
•  Source of truth should be upstream of all transforma.on
•  Changing the source of truth cascades to all downstream storage
•  Downstream aggrega.ons should be done AND triggered in a way to
facilitate updates. i.e. store at session level AND daily rollup
•  Handling deletes means aggrega.ons can be triggered independent of
create/update
All Done!
Ques.ons on how we
formed the data team,
ques.ons on what we
do, technologies we
use, HIPAA, or anything
else just come find me
or contact me.
Thank	you!	
Jeremy Engle
Manager Data Team
jengle@jellyvision.com

More Related Content

What's hot

Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for DatabricksDatabricks
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELKYuHsuan Chen
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysDatabricks
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

What's hot (20)

Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Similar to Strategies for Near Real-Time Analytics and Interactive Data Exploration

5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyEduardo Piairo
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform Chris Travers
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 
Hadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaHadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaInMobi
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 

Similar to Strategies for Near Real-Time Analytics and Interactive Data Exploration (20)

5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journey
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 
Hadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaHadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yoda
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 

More from AWS Chicago

AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago
 
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...AWS Chicago
 
WilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptxWilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptxAWS Chicago
 
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfSuresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfAWS Chicago
 
Streamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha DwivedulaStreamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha DwivedulaAWS Chicago
 
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptxSteve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptxAWS Chicago
 
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptxSaurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptxAWS Chicago
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfAWS Chicago
 
Ross Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptxRoss Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptxAWS Chicago
 
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdfrobsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdfAWS Chicago
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfAWS Chicago
 
Mohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptxMohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptxAWS Chicago
 
Nick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptxNick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptxAWS Chicago
 
Pat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdfPat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdfAWS Chicago
 
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...AWS Chicago
 
MichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptxMichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptxAWS Chicago
 
Michal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdfMichal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdfAWS Chicago
 
Kamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptxKamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptxAWS Chicago
 
John Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptxJohn Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptxAWS Chicago
 
JuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptxJuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptxAWS Chicago
 

More from AWS Chicago (20)

AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
 
WilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptxWilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptx
 
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfSuresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
 
Streamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha DwivedulaStreamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
 
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptxSteve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
 
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptxSaurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptx
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdf
 
Ross Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptxRoss Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptx
 
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdfrobsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdf
 
Mohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptxMohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptx
 
Nick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptxNick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptx
 
Pat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdfPat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdf
 
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
 
MichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptxMichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptx
 
Michal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdfMichal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdf
 
Kamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptxKamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptx
 
John Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptxJohn Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptx
 
JuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptxJuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptx
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Strategies for Near Real-Time Analytics and Interactive Data Exploration

  • 1. A Presenta*on Strategies for suppor.ng near real .me analy.cs, OLAP, and interac.ve data explora.on Dr. Jeremy Engle Engineering Manager Data Team The Jellyvision Lab © 2017 The Jellyvision Lab, Inc. All rights reserved. NOTICE: We have put a lot of work into this presenta.on. Please don’t take this material and put it into your own material without first talking with us. Seriously.
  • 2. Fancy Title…WTF are you talking about? •  How Jellyvision does the data •  What we have done/learned •  Strategies for effec.ve design •  What technologies helped us •  Best outcomes •  You learned something •  My pics/gifs amused you
  • 3. A bit about Jellyvision •  Interac.ve soZware that talks people through important, complex, and poten.ally snooze-inducing life decisions. •  Our recipe: behavioral science, purposeful humor, mighty tech, and oregano. •  850 companies with more than 15 million employees in total – including 91 of the Fortune 500 and 15 of the country's 50 largest companies. •  A B2B2C company •  Our customers are corpora.ons •  Our users are their employees
  • 4. What does the data team do? •  Pipeline = Collec.ng/Processing/Abstrac.ng •  Visualiza.on = Applica.ons •  Stewards = •  Data valida.on/Governance •  Ad hoc repor.ng •  Internal dashboards •  100s million events per year •  Will eclipse 1TB collec.ve data this year •  Peak traffic 70x from low traffic
  • 5. What does our use of data empower? •  Customer KPIs •  Usage •  Sa.sfac.on •  Feedback •  Customers can pick any .me range at day granularity •  Support 8+ products/versions •  Analysis of user behavior to determine effec.veness of new and legacy content (more on this later) •  Monitoring for customer success •  One off data requests •  Specific customer data points •  Customers that are configured like XX
  • 6. What is our soZware like? •  Custom built pipeline •  Near real .me KPI aggregator •  CouchDB and “Data Movers” •  Pipeline for each of 8+ products/varia.ons •  Migrate to data warehouse with Kinesis Firehose+Lambda (Snowflake) •  Ruby and Python Web apps •  Pipeline populates rela.onal DB with aggregated metrics •  Based on Command Query Responsibility Segrega.on Parern •  hrps://mar.nfowler.com/bliki/CQRS.html
  • 7. What .me granulari.es we support •  Processing •  Near real .me (des.na.on in minutes) •  Batch (des.na.on in hours to days) •  Data Access •  Web applica.on (Response in < 500ms) •  hrps://research.googleblog.com/2009/06/speed-marers.html •  Interac.ve OLAP (Response in ~2sec) •  OLAP Querying/Repor.ng (Response in seconds to minutes)
  • 8. How to build a stack
  • 9. Modularize your stack •  Collec.on •  When/What data fires •  APIs to receive data •  Processing/Movement •  EL/ETL/Streaming •  Aggrega.on •  Storage/Access •  Usage/Visualiza.on/Analysis
  • 10. Strategy: Buy versus Build •  If you can buy it….do that •  Massive change in last 3 years •  Turn key analy.cs •  Reduce/eliminate need for engineers •  Infrastructure •  Engineers go faster but need to be able to integrate mul.ple 3rd par.es
  • 11. Technologies for Data •  Movement •  Kinesis, SQS, DB Migra.on Service, Kawa •  Processing •  EMR, Kinesis Analy.cs, Spark/Flink/Storm •  Workflow/Orchestra.on •  Glue, Data Pipeline, Azkaban, Airflow, Luigi •  Storage/Access (Many include some form of processing) •  DynamoDB, S3, RedshiZ/Spectrum, Snowflake, The whole NoSQL ecosystem
  • 13. Data Warehouse versus Data Lake h-p://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html
  • 14. Data Lakehouse (Snowflake) •  Support schemaless AND structure at physical layer •  Create structure where desirable •  Make code/queries simpler •  Make columnar queries more preformat •  Crea.ng structure does not require massive data maintenance event •  Flexibility!!!!!!!
  • 15. How we are using a Data Lakehouse •  Custom BI Tool for exploring user behavior •  Example Query: Find users that answered ques.on A with 1 and ques.on B with 2 •  Two approaches: •  Each addi.onal constraint requires a self- join with ques.ons •  For each session, build a map of Ques.on/ Answer pairs
  • 16. Data Lakehouse: Itera.ve Development •  Views create abstrac.ons •  Encapsulate common business logic •  Abstract away table/database structure •  Structure(s) under the view can itera.vely change •  Performance can be improved itera.vely •  Make a smaller lake •  Covert to columnar physical model •  Leverage cluster/sort keys
  • 18. Define your processing •  Immutable data simplifies everything •  Define for each unit of data: •  Aggrega.ons •  Roll ups •  Required latency •  Define how aggrega.on and usage logic intersect with movement and storage systems
  • 19. Event driven data processing •  Use non-temporal triggers to facilitate flow of data •  Difference in polling .me (think Kinesis Firehose) creates the dis.nc.on between near-real .me and batch processing •  Can also be designed to support real .me analy.cs •  Especially useful for system that have varied traffic levels
  • 20. Mutable data processing •  Data is dirty, some.mes to the point that it must be fixed •  Applying logic to mul.ple systems is .me consuming and fragile, with streaming systems it is especially problema.c •  Big data systems make it feasible to store full historical data •  Source of truth should be upstream of all transforma.on •  Changing the source of truth cascades to all downstream storage •  Downstream aggrega.ons should be done AND triggered in a way to facilitate updates. i.e. store at session level AND daily rollup •  Handling deletes means aggrega.ons can be triggered independent of create/update
  • 21. All Done! Ques.ons on how we formed the data team, ques.ons on what we do, technologies we use, HIPAA, or anything else just come find me or contact me.
  • 22. Thank you! Jeremy Engle Manager Data Team jengle@jellyvision.com