Businesses are building digital platforms with modern architecture principles like domain driven design, microservice based, and event-driven. These platforms are getting ever so modular, flexible and complex.
While they are built with architecture principles like - loose coupling, individually scaling, plug-and-play components; regulations and security considerations on data - complexity leads to many unknown and grey areas in the entire architecture. Details on how the different components of this complex architecture interact with each other are lost. Generating insights becomes multi-teams, multi-staged activity and hence multi-days activity.
Multiple users and stakeholders of the platform want different and timely insights to take both corrective and preventive actions.Business teams want to know how business is doing in every corner of the country near real time at a zipcode granularity. Tech teams want to correlate flow changes with system health including that of downstream stability as it happens.Knowing these details also helps in providing the feedback to the platform itself, to make it more efficient and also to the underlying business process.
In this talk we intend to share how we made all the business and technical insights of a complicated platform available in realtime with limited incremental effort and constant validation of the ideas and slices with business teams. Since the client was a Banking client, we will also touch base handling of financial data in a secure way and still enabling insights for a large group of stakeholders.
We kept the self-service aspect at the center of our solution - to accommodate increasing components in the source platform, evolving requirements, even to support new platforms altogether. Configurability and Scalability were key here, it was important that all the data that was collected from the source platform was discoverable and presentable. This also led to evolving the solution in lines of domain data products, where the data is generated and consumed by those who understand it the best.
2. Who we are
2
Balvinder Khurana
Data Architect and
Global data community lead
Sushant Joshi
Product Principal
@sushantjoshi
https://sushant-joshi.medium.com
Balvinder has 14+ years of experience in building large-
scale custom software and big data platform solutions for
complicated client problems. She has extensive experience
in Analysis, Design, Architecture,
and Development of Web based Enterprise systems and
Analytical systems using Agile practices like Scrum and XP.
Balvinder currently works as a Data Architect and Global
Data Community Lead for Thoughtworks
Sushant is a Product Principal at ThoughtWorks. His work
includes working with clients to assess product-market-fit,
create goal aligned roadmaps and product delivery. He
brings in his ever-curious mindset, business knowledge,
and interdisciplinary thinking to solve problems that form
our surroundings. His primary focus area is - product
discovery - through which he helps address key product
risks in the early stages of the product
Sushant is passionate about Indian digital ecosystems. He
is working with Indian companies to create better products.
6. Few examples:
How much amount was disbursed yesterday
in Mumbai?
How many car loans were sanctioned this
week?
What’s the health of APIs and underlying
systems for last 3 hours?
What are we
trying to achieve
6
Actionable Insights through Real Time, Self
Serviced Information of everything that’s
happening on the platform
What are the reasons
for drop offs?
Is it system or user?
Are my APIs
overloaded?
Average time for
disbursement
Opportunity loss in the
Funnel at different
stages
Login
Offers
Risk
Checks
Sanction
2FA
Disbursement
Which offers are
attractive?
How many customers
are moving further after
viewing offers
How long it takes to
send the OTP and
customer action?
Do users need alternate
mechanism?
7. Explosion of personas and explosion of requirements
7
Revenue Generated
Price Sensitivity
Number of Users
Customer micro-
Segment
Business
Deployment status
Load on a service
Downtime for Service
Developers
Service Availability
Service Traceability
Routing
Security team
Insights
What can I
understand?
Data Scientists
Customer 360
Customer Propensity
Customer-product fit
Customer facing
executives
7
8. 8
Isolated Solutions
Web
application
Mobile
application
User click
stream
Social media
Market data
System level
metrics
Customer
support
Competitor
data
Logs
Real time monitoring of all
infrastructure components and
service issues monitored using
Prometheus and charts created
on Grafana. Many other tools
like EFK stack, kafka etc. are
used.
Developers/
IT Support
Understand
system health
and avoid failures
Periodic data is provided
(sometimes manually) after
pulling out of tools like Kafka,
GTM in form of excel.
Product
Owners
No drop-outs,
all journeys
should be
completed
Business/
C-level execs
Data pulled out on-demand
(manually) and shared via
email/excel sheets.
How is my business
performing, where
are the leakages
9. 9
Business
● Siloed
○ Systems
○ Tools
● Different
○ Targets
○ Maturity
○ Objectives
Data
● Siloed
○ Data (Storage)
○ Business Units
● Different
○ Granularity
○ Formats
○ Architectures
○ Exploration Scopes
● Different
○ Tech Stack
○ Architecture
○ Tools
Technology
Limitations and pain points
13. 13
13
What it is really? Make it quick and easy to explore a
hypothesis (business or technical),
accept or disprove it, and move on
to find the root cause.
Platform which enables people to
use their skills, extend their senses,
support their intuitions.
The Superdata solution a.k.a
Command center
14. Persona and business domains,
wants/requirements
Principles of Super-data
Traditionally data is looked from the weekly or
for monthly leadership review.
So someone requests and presents what needs
to be presented
From requests to predefined datasets
serving insights & exploration
14
Concept Validation
Product mindset
Design
#Data discoverability #Data
presentability
Holistic (Business + Tech)
Port the same to other domains
15. 15
Data Platform
Cloud DW
Data Lake
(Cloud Object
Storage)
arts
Data
Marts
ODS
Tech Solution
Dashboards
Data Service
API’s
Reports
Data Ingestion &
Integration
Batch
Ingestion
Unstructured
Source
Ingestion
API Ingestion
Streaming
Ingestion
Orchestration Service
Data Integration & Ml
ELT
Stream
Processing
ML
Toolkits*
Deep
Learning*
ETL
15
Source
Systems
Bank
Applications
Data
Tele Channels
Data
Physical
Store Data
Social Media
Data
Partners
data
15
15
DevOps/DataOps
Data Governance
Data Catalog Data Quality Security
Business Events
*Future Scope
16. Dashboards
Data Service
API’s
Reports
16
Tech Stack
Data Platform
Cloud DW
Data Lake
(Cloud Object
Storage)
Data
Marts
ODS
Data Ingestion &
Integration
Batch
Ingestion
Unstructured
Source
Ingestion
API Ingestion
Streaming
Ingestion
Orchestration Service
Data Integration & Ml
ELT
Stream
Processing
ML
Toolkits
Deep
Learning
ETL
Source
Systems
Bank
Applications
Data
Tele Channels
Data
Physical
Store Data
Social Media
Data
Partners
data
DevOps/DataOps
Data Governance
Data Catalog Data Quality Security
Business Events
17. 17
17
Data Platform
Cloud DW
Data Lake
(Cloud Object Storage) arts
Data Marts
ODS
Dashbo
ards
Data
Service
API’s
Report
s
Data Ingestion &
Integration
Batch Ingestion
Unstructured Source
Ingestion
API Ingestion
Streaming Ingestion
Orchestration Service
Data Integration & Ml
ELT
Stream
Processi
ng
ML Toolkits
Deep
Learning
ETL
Source
Systems
Bank Applications
Data
Tele Channels Data
Physical
Store Data
Social Media Data
Partners
data
DevOps/DataOps
Data Governance
Data Catalog Data Quality Security
Business Events
Serve Data as a Product
Auto Loan
Customer
Personal
Loan
Credit
Card
Social
Media
Customer
Profile
Domain driven
data boundaries
The boundaries cut across
the platform - from source
to consumption!
18. Data Product
Data Platform Architecture
Quantum
Fundamental unit of architecture
Self-Serve
Data
Product
Domain
Polyglot Data
Output Ports
Polyglot Data
Input Ports
Control
Ports Stats
Logs, metrics
Self
discovery
Management
18
Discoverable
Addressable
Self-describing
Trustworthy
Defined and Monitored silos
Secure
enforce globally configured access
control at each data product output port
Interoperable
governed by global open standard
Sushant
Every executive likes information at their fingertips. In the form which will help them do real time probing and take decisions in time.
We hear this advice from everyone, know where you stand.
They look for Actionable Insights available real time rather than monthly or periodic reports
Sushant
Imagine you needing to catch a flight, first thing you would want to know is how long it will take you to reach airport, traffic is unpredictable.
They look for Actionable Insights available real time rather than monthly or periodic reports
Sushant & Balvinder (Techview of The Ask)
THE BANK and the operating environment
Pre Covid days - Digital lending is on everyone’s agenda / Some are exploring , toying with the idea
Situation at the banking world
Engagement is a problem and Fintechs are vying for the pie
Payment infrastructure is coming to an age / wallets
Customer engagement is coming at the center of the strategy
KYC Resulting into
KYC not for compliance but for acquiring, retaining and serving right
THE ASK
Define data strategy and roadmap for a data platform on cloud
Self-service data platform which can onboard multiple products and systems in future
To know the customer you need to know your systems wells
CLIENT BACKGROUND
Leading Bank in India who had embarked on the ambitious digital journey to bring in all retail loans under one roof, provide better customer experience and eliminate waste in the proces
This was also the time startups have started making inroads in banks’ business quite well.
GOAL
Unified and clear view of customer actions
Self-service insights into customer and system behavior in real-time (at scale) to
Identify value pockets through behaviour based segments
Create business ecosystem to power real-time offers based on data insights
Provide clear view of business for timely actions and course corrections to optimize identified metric such as Risk, Account Profitability
Sushant
WHAT
Generalise.
We are directly jumping on the domain oriented
Sushant
WHO
Double click on objective through lenses of stakeholders
Balvinder will come on this slide
Balvinder
HOW
Because of all the limitations mentioned earlier, each stakeholder group started attempting to solve their problems individually
Organisation goal was not aligned and individuals were opting for solutions which would make sense to them and feasible with in the limited resources they have.
The intent (and hence call to action), granularity and scope of information needed is different. So not just the tools, but also the data that is consumed - is isolated
For each group:
Stakeholder - ask - solution - data
Add data stakeholder group
Balvinder
But it was not easy to reach to answers to the questions for each stakeholder group. There are so many hurdles on the way.
Business
Siloed and fragmented systems
Different targets and no common agreed goals for building data world view
Varying maturity of business and tech orgs
Who is focussed on Customer happiness
Do we want 99.99% availability but still pissed off customer
Data
Siloed and locked useful data into various tools and owned business orgs operating in silos
Disparate data sources
Difficult to Correlate Quickly for Monitoring or finding Business relevance
Limited scope for exploration
Unified architectures (of consuming platform) are not possible
Technology
Competing or non-compatible tech stacks
Learn individual tools
Balvinder
Human/emotional dimension
What happens because of the frustration
Consolidated insights was still a problem
It needed co-ordinating for information availability and then synthesis by someone who may not have the best of understanding of how that information is collected
Patchy solution for a group of people was still serving only limited section and broad based acceptance and hence data availability was a challenge
Sushant
How does a large enterprise look like
General picture
Thought process
Complexity of channels X complexity of Products
Each product has it’s own way of selling and operations
Customer segments also need specific handling such as HNI, premium, priority sector etc
Each product type takes it’s own shape in terms of strategy, risks and balance
Large enterprise have multi speed departments, that dictates inherent need for different systems and customised processes for suitability.
This leads to each business group optimising people, tech and processes based on their goals
Which results in
Silos
Fragmented systems
Disparate data sources owned business orgs (operating in silos)
Competing tech stacks // non compatible tech stacks
Different targets and no common agreed goals for building data world view
Varying maturity of business and tech orgs
Siloed attempts to solve the challenges made it a further big crises as big picture was missing
No one talked about it explicitly and what it means for the data solution
Sushant
Sushant & Balvinder
Sushant
Principles we considered while building a super-data solution
How did we conceptualize this
Traditional mindset of requests and presentation to exploration
Requests will have to be placed with team serving the data, different setups will be obtained then manual correlation between business and tech data sets.
This would take time and would need institutional knowledge to work and interpret or multiple cycles
Cross-domain learning was impossible
Balvinder
Inline with regulated environment
Scalability
Configurability
Self-service consumption
Security
Banking and regulated world
Financial data
Approach to security
Data Quality
Balvinder
Inline with regulated environment
Scalability
Configurability
Self-service consumption
Security
Banking and regulated world
Financial data
Approach to security
Balvinder
Bootstrap the platform so easily, scale to the sources and scale to the consumers and kept on incrementally materializing data driven value which was differentiating
Decompose data products around domains, distribute the ownership. The principle we have been applying to web services world to create microservices.
Balvinder
It’s not just a dataset- but a data product
Explain and lead to definition - Six dimensions of data product
Balvinder
Data has a better Idea
Sushant
Quotes from from the floor
Data comment came on Friday evening - the day Sindhu was playing her semifinal match at Tokyo Olympics
Sushant
Slide 1- 7 : 8-9 mins
Slide 8-10: 5-6 mins
Slide 11-14: 3-4 mins
Slide 15-20: 7 mins
Slide 21-22: 3 min
QnA : 15 mins