Simplifying Microservices & Apps - The art of effortless development - Meetup...
Cloud and Analytics -- 2020 sparksummit
1. Cloud and Analytics
– from Platforms to an Ecosystem
Ming Yuan, Zurich North America
David Carlson, Databricks
2. Agenda
▪ Data and Analytics at ZNA
▪ Data and Metadata
▪ Data Exploration and ETL
▪ Containerization
▪ DevOps in Analytics
3. Zurich is a data-enabled innovative company
• Data is used in day-to-day decision makings in key
business domains
• A strong data science team delivers predictive models and
business insights
• We are an early adopter of advanced analytics and cloud
analytics
Multiple Databases
On-premises Data Warehouse
Hadoop Data Lake
Cloud Data Lake
• Governance processes on data access and utilization are established
• Metadata is collected and stored in the repository system
4. Key capabilities support data analytics life cycle
• Data Discovery
• Data Integration
• Collaboration
• Business Impact (Operationalization)
• Scalability
• Multiple Personas
• Support multiple types of implementations
Ideation
Model Build
Model
Deployment
Model
Execution
Model
Monitoring
5. ▪ Support ML and advanced analysis to discover business insights and drive
appropriate actions
▪ Enable cross-domain data sharing, aggregation, and integration
▪ Modernize the technical landscape to handle data sets that were previously
unprocessable
Data foundation and processing power
Data
▪ Optimize data processing and archiving strategies to
reduce operation costs
▪ Apply data governance best practices to manage
utilization
6. Data lake consists of ADLS and Databricks®
clusters
Provisioning Store
Landing Staging Active Archive
Change Data
Capture
(CDC) or full
snapshots
Enrich
Landing zone
data with
additional
Date format
fields and
remove
Special
characters.
CDC records
applied
(I, U, D) to
copy of
previous
day's data
Rolling
pointers to
previous
day's
Active…
Curation Layer
Universal
Data
Model
Curated
Data Sets
Data
Sources
Data
Consumption
Azure Subscription
Services
Enterprise
level
curated
datasets
covering
broad
utilization
Pertaining
to the
needs in
specific
business
domain
7. Metadata management and data discovery
▪ For metadata administrators
▪ Maintain business glossary for data domains that are owned by function or business units
▪ Import technical metadata and catalog it as data assets
▪ Curate technical metadata relating them to logical business terms
▪ Maintain data-flows mapping transformations
▪ For data consumers
▪ Search, explore and discover data assets and data lineage
▪ Interpret data with correct meaning and context
▪ Navigate data flows to analyses processes and assess change impact
▪ Evaluate data quality reports and drive improvement actions
8. Alation® Data Catalog manages metadata
ingestions
Database
Data Warehouse
Cloud Data Lake
JSON Streams
Ingest and refresh schema, table, and column definitions
Build data lineage, popularity, common queries, and more
Profile and store sample data sets
Collect user information and usage metrics
Open APIs to programmatically import business glossaries
2,053,632
9. Intuitive user interfaces to access metadata
Users and Stewards
actively curate the
pages
Natural-language
search to easily
discover unknowns
Everyone collaborates
and communicates
Query intelligently against
source systems
10. Data exploration and ETL implementations
▪ Explore, valid and analyze existing data sets
▪ Curate new data sets for model development
▪ Construct ETL flows with embedded AI/modeling components
▪ Release ETL flows to production environment
▪ Provide runtime environments to trigger, manage, and monitor ETL flows in
production
11. Leverage technical stack and skills across
Personas
LINUX Server on
Azure Cloud
CENTRALIZED OR AD-
HOC DATA SOURCES,
DATA LAKE
AVAILABLE OR SPUN-
UP PROCESSING
RESOURCES
Leveraging
best storage
and compute
resources
Dataiku deployment servers for
enterprise grade operationalization
PRODUCTION
SYSTEMS
Centralized server to
facilitate
access to data, and foster
collaboration
Browser
based user
interfaces
User/task specific
interaction modes
INTEGRATION WITH
METADATA SYSTEM
12. Containerization in building model API services
▪ Standardize the runtime environment using commonly used ML libraries for
development and production
▪ Elastically scale the system capacity for the development environment
▪ Easily migrate system stacks from development environment to production
▪ Build CI/CD pipelines and deployment environments based on
open standards
▪ Monitor and ensure the health of model implementations in
production
13. Containerize models as cloud-native applications
Client App
Client App
Orchestration
We observed improved agility in development, more portability in deployment, and better elasticity in production
14. DevOps in data & analytics
▪ For platform administrators
▪ Codify the installation and configuration of key components in the ecosystem
▪ Streamline the process of testing and upgrading systems to newer versions
▪ Automate system’s backup and restoration
▪ For model services developers
▪ Standardize the deployment pipelines to reduce the effort per project
▪ Increase the agility of deploying applications from development to production
▪ Reduce the time to fix bugs after production releases
16. Analytical platforms fitting into different
scenarios are integrated as an ecosystem
Ideation Model Build Model Deployment Model Execution Model Monitoring
18. Zurich Insurance Group (Zurich), headquartered and founded in Switzerland, is a leading multi-
line insurance group with more than 140 years’ experience serving businesses worldwide,
including over 100 years in North America. We are committed to delivering broad and flexible
insurance solutions to our customers and helping them understand, manage and minimize risk.
Through member companies in North America, Zurich is a leading commercial property-casualty
insurance provider serving small businesses, mid-sized and large companies, including
multinational corporations.
Approximately 55,000 employees
Managing complex risks for 7,600 international programs through our global network
Achieving USD 5.3 billion in business operating profit (BOP) in 2019
Providing comprehensive solutions and insights for 25 industries
Insuring more than 215,500 customers
Insuring more than 90 percent of the Fortune 500
The Alation Data Catalog and its logo is used with kind permission of Alation, Inc.
The Dataiku DSS and its logo is used with kind permission of Dataiku, Inc.
The Domino Data Lab and its logo is used with kind permission of Domino Data Lab, Inc.
Use of them does not endorse the products.