Weitere ähnliche Inhalte Ähnlich wie Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit TurboTax Case Study (20) Mehr von Data Con LA (20) Kürzlich hochgeladen (20) Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit TurboTax Case Study1. Ravi Pillala, Chief Data Architect & Distinguished Engineer
Modernizing Analytics & AI for today’s needs:
Intuit TurboTax Case Study
7/21/2022
4. ©2021 Intuit Inc. All rights reserved. 4
Unique consumer and small business assets at scale
5. ©2021 Intuit Inc. All rights reserved. 5
Married 2 years ago— last year he claimed his
daughter, Candace, as a dependent. This year
his ex-wife will claim their daughter.
Recently left his job at Toyota to work for Honda
Had been renting, but just bought a condo
Goal
To be confident he can file easily
with TurboTax, given all the
changes in his life.
RETURNING TURBOTAX CUSTOMER
Liam
6. ©2021 Intuit Inc. All rights reserved. 6
Goal
To be confident she can file easily
with TurboTax to get the maximum refund possible.
First time filers
Liam
7. ©2021 Intuit Inc. All rights reserved. 7
©2021 Intuit Inc. All rights reserved. 7
Intuit Confidential and Proprietary 7
8. ©2021 Intuit Inc. All rights reserved. 8
Powering Prosperity with AI and Data-driven platforms
9. ©2021 Intuit Inc. All rights reserved. 9
©2021 Intuit Inc. All rights reserved. 9
Intuit Confidential and Proprietary 9
10. ©2021 Intuit Inc. All rights reserved. 10
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
12. ©2021 Intuit Inc. All rights reserved. 12
From: Behavior Analytics - Event Collection
13. ©2021 Intuit Inc. All rights reserved. 13
Data available for consumption after 4 hours to 1 day
Legacy Clickstream Architecture
14. ©2021 Intuit Inc. All rights reserved. 14
Legacy Payload
fid : 75F773438B1D0E25-3DDB5C9586B1731B
cc : USD
ch : support
c1 : TT_S_SQ_COOKIE
c2 : 1588699149406
c4 : fecb4198593190599779
c5 : Customer Care
c6 : sh-view
c7 : Help System<mytt
c14 : View>LCQ>4331716>>2>IL
c19 : ViewWidget
c34 : en-US
c36 : websdk-prod
c44 : HPArticle<MYTT:undefined<expert_approved_ugc:false
v3 : display:viewWidget
pageName : MYTT/sh-view
v47 : https://ttlc.intuit.com/questions/4331716
WHERE ?
WHAT ?
WHO ?
Unreadable and Difficult to Use
15. ©2021 Intuit Inc. All rights reserved. 15
To: Behavior Analytics - Event Collection
Amplitude
Adobe
Braze
16. ©2021 Intuit Inc. All rights reserved. 16
Rainbow Properties
action object
What
(logical)
object_detail
ui_action
What
(behavioral)
ui_object_detail ui_access_point
ui_object
Domain purpose
org scope
Where screen
scope_area
ivid pseudonym_id
Who
17. ©2021 Intuit Inc. All rights reserved. 17
Event Collection Standards (ECS) - Standard Event Tracking Example
WHO
WHAT
org : cg
purpose : prod
scope : turbotax
event sender name : oihs/contact-us-plugin/widget
event sender purpose : care
event sender scope : contactus
event sender screen : questionStep
event : content : engaged
object : content
action : engaged
search term : I haven't received my refund yet and I need to know what's the
problem.
ui action : clicked
ui object : button
ui object detail : Continue
workflow id : 7fa8d4d6-6fb5-41c0-b2d5-971742227b6c
topic name : cg-turbotax-clickstream
timestamp : 2020-05-04T06:27:41.799Z
userId : 20abd451b935d4c27ad417a258f15ccba
*** This example only includes a specific subset of attributes ***
18. ©2021 Intuit Inc. All rights reserved. 18
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
19. ©2021 Intuit Inc. All rights reserved. 19
Intuit analytics journey before modernization
Reporting silos MPP appliance Hadoop data lake New MPP appliance Migrated to Cloud
20. ©2021 Intuit Inc. All rights reserved. 20
MPP
Data Lake
Lift and Shift to AWS
Data Sources
Applications
Behavioral
3rd Party
Hive Metastore
Data
EC2 EBS
… …
EMR Cluster
Batch
Stream
Processing
Data Workers
Tables : 50K Data : 2.5PB ETLs: 10K Queries: 500K Users: 2000 ETL Users: 60
21. ©2021 Intuit Inc. All rights reserved. 21
ETL Processing
Data Lake
Data Sources
Applications
Behavioral
3rd Party
Hive Metastore
Data
EMR Cluster
Batch
Stream
Data Workers
AWS Glue
Redshift ETL
Athena
Redshift Reporting
Dashboards
Phase 2: Migrating to Redshift (Modernizing analytics)
Tables : 10K Data : 400TB ETLs: 3K Queries: 130K Users: 2000 ETL Users: N/A
22. Modernized analytics platform with Redshift
Amazon Redshift
managed storage
Data sharing Amazon Redshift Spectrum Concurrency scaling
Elasticity
23. ©2021 Intuit Inc. All rights reserved. 23
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
24. ©2021 Intuit Inc. All rights reserved. 24
Processors and Pipelines
● Serial processors (e.g., reusable intermediate topic)
● Parallel processors (e.g., fleet deployment)
● Processor = Business Logic & Code
● Pipeline = Deployment & Infrastructure
25. ©2021 Intuit Inc. All rights reserved. 25
Processor CI/CD Layer
UX Layer
Control Layer
Runtime Layer
Infrastructure Layer
Application Layer
Pipeline CI/CD Layer
Customer
Experience
Behind-the-scenes
Tech Stack Overview
26. ©2021 Intuit Inc. All rights reserved. 26
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
27. ©2021 Intuit Inc. All rights reserved. 27
Our Data Ecosystem is big, complex and messy...
28. ©2021 Intuit Inc. All rights reserved. 28
We have a lot of data which is great, but very hard to discover and figure out what to use
Our Data Ecosystem is big, complex and messy...
DATA LAKE
DATA
WAREHOUSE(S)
200,000+
Tables
3,000+
Schemas
200+
Data Sources
DATA MARTS
CURATED
DATA
RAW DATA
ANALYST
PROCESSED
DATA SOURCES
SELECT RAW
DATA
DATA MARTS
REPORTING TABLES
& more
Internal
External/3P
Pradeep
29. ©2021 Intuit Inc. All rights reserved. 29
I am a DATA SCIENTIST building ML
models and often use data produced
by BU/FG Analysts. I would like to
know the owner, data quality and
reliability of the data I want to use.
I am a BU DEVELOPER trying to
see if data produced by the new
service launched is being ingested
accurately into the lake for
downstream consumption.
I am a DATA ENGINEER building
pipelines for data marts and trying to
choose the right data for my use-case and
get alerted when metadata changes
occur so I can ensure my pipelines
continue to work properly.
I am a BUSINESS ANALYST trying to
build Dashboards to report on KPIs
for a new product Feature launched. I
need to find data that I can trust and
use for my analysis.
What are the Core Personas and why is data important to them
I am a ENTITY DATA STEWARD
curating Data Map entities in my
domain for downstream use. I need
to query the raw data to produce
the entities.
Our Users
Veena
30. ©2021 Intuit Inc. All rights reserved. 30
Understanding user problems we need to solve
What is making data discovery and exploration hard for our data workers?
Where can I find the data?
What does the data mean?
Can I trust the data?
How is the data connected?
How can I get access to data?
Which datasource to use?
When to use what tool?
Why are my queries slow?
DISCOVERY EXPLORATION
31. ©2021 Intuit Inc. All rights reserved. 31
Ideal State
What users need for a great Data Discovery & Exploration experience?
A tool that helps our data workers to
● easily find relevant data that is well-documented, reliable & trustable by
providing quality metrics like data freshness, completeness and the ability to
quickly reach out to the owner for clarifications and see similar data and
joins to solve the use-case
● seamlessly request for access, run queries against blazing-fast,
performant engines, reuse & share their work
Veena
33. ©2021 Intuit Inc. All rights reserved. 33
Data Map
OUR
APPROACH
Data Discovery Data Exploration
Organize and
govern data
across Intuit
Build a rich
data
discovery
(catalog)
experience
for all our
data in the
lake &
warehouses
Buy a
superior data
exploration
tool for all our
data
- powered by
MDR
36. ©2021 Intuit Inc. All rights reserved. 36
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
38. ©2021 Intuit Inc. All rights reserved. 38
AWS Glue/ Lake Formation: Data Lake Design
39. ©2021 Intuit Inc. All rights reserved. 39
Data Lake & Data Mesh
Ta
x
Work
Commerce
Finance
41. ©2021 Intuit Inc. All rights reserved. 41
Intuit’s Journey
ERA OF
DOS
ERA OF
WINDOWS
ERA OF
WEB
ERA OF
MOBILE AND CLOUD
ERA OF
ARTIFICIAL
INTELLIGENCE
D
A
T
A
V
O
L
U
M
E
P
E
R
C
U
S
T
O
M
E
R
1980s 1990s 2000s 2010s 2020 to Present*
Intuit Founded Customers: 1.3M
Revenue: $33M
Digital Footprint: MBs
Customers: 5.6M
Revenue: $1B
Digital Footprint: GBs
Customers: 29M
Revenue: $3.5B
Digital Footprint: TBs
Customers: 102M
Revenue: $9.6B
Digital Footprint: PBs
2019: Analytical
Platform on AWS
2021: Analytics
powered by Redshift