SlideShare a Scribd company logo
1 of 31
Journey to Azure
WHATWE’VE LEARNED ALONGTHE
WAY
September 2020
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Table of contents
• About VisiQuate
• Client Case Study and Table Setting
• Why? and Where? The idea behind the migration
• Initial architecture: what works, what doesn’t work
• Architecture evolution: v2, v3, …
• Summary: what we’ve learned, what would we do differently
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
VisiQuate
VisiQuate is an
advanced
analytics service
that helps
enterprises
achieve peak
business health.
AI and ML Powered full stack
data platform
Powered by AI and machine learning, streaming
analytics from disparate data sources deliver leading
insights that alert the right users to problems and
opportunities
Value focused point solutions
Our cloud-based solutions target a 3-5x ROI within 12 months
and are non-invasive to Internal IT and is supported with white
glove service by our SME’s, technical experts and account
managers.
Velocity Consulting Services
Increase your speed to value and ROI withVelocity Data
Fanatics as a Service offerings.
© 2019 VisiQuate, Inc. All Rights Reserved.
Case Study:Transformation at a multi-region,
nationwide Health System
Starting point: data chaos, data silos at the regional and facility level, manual data collection,
integration and reporting, lack of data governance.
Project Goals:
• Consolidated data lake infrastructure serving
– Power users in Rev Cycle, Finance, operations
• Streamlined month end close
• Dynamic data mining and exploration
• ML and data science initiatives
– Automation serving data marts and consolidated system wide reporting
• Rev cycle
• ED
• Hospital operations
• Finance / Decision support
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Client Starting Point Architecture
Data Sources
Structured data
Semi-Structured data
RDBMS
SSIS
package
ODS
EDW
MS SQL Server SQL Server Agent
Manual data upload
Ad-hoc data requests
Reporting/ Data
distribution
© 2020 VisiQuate, Inc. All Rights Reserved.
Issues with current architecture
• Difficult to scale - MS SQL can only scale up can’t scale out
• Overprovisioning - you buy a box that stays idle during off peak
hours
• Rigid hardware footprint - hard to provision resources for a
short period of time
• Schema on write only
• No separation of storage and compute
• Read/write concurrency
• Maintenance cost is relatively high
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
We can solve all our problems if we
migrate everything to a cloud
Client’s Big Idea
© 2019 VisiQuate, Inc. All Rights Reserved.
Standard whitepaper architecture looks simple
Ingestion
engine
Real-Time
Processing
Batch
Processing
Hadoop Cluster
Spark
Hive
BLOB Storage
Reporting Engine
In-Memory cubes
Reports/dashboards
Dynamic query
generation
Virtual private cloud
Structured files
Semi-Structured
files
RDBMS
Logs
Events
Data Sources
© 2019 VisiQuate, Inc. All Rights Reserved.
This is how it looks in reality
© 2020 VisiQuate, Inc. All Rights Reserved.
• Flexible, scalable and secure
infrastructure
• Ability to scale up and scale down
• Pay as you go
• Reduce maintenance
• Data Lake - schema on read model
• Getting ready for AI/ML data processing
Project Goals
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
PROS
• Existing Microsoft products footprint: Office, SQL Server, etc
• Azure AD – leverage Office 365 subscription for Azure set up
• Native MS SQL Server support – different options for migration
• Power BI visualization as a part of Office 365
• Scalability
• Pay as you go model
• HIPAA compliant (for majority of services)
CONS
• Stability Issues due to rapid go to market strategy
• Not as many serverless services
• Open Source is not Native technology
• Developer community is not as strong as alternatives
Why Azure
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Migration strategies
Here is an interpretation of Microsoft’s scenarios of migration to the cloud:
1. Rehost (aka Lift and shift) – migration scenario with no or minimum code change to move your data tier to the cloud. select
option for your SQL Server and ETL jobs. Migrate manually or using Azure migration tool.
Advantages: fastest and cheapest. Get the most of your existing investment.
Disadvantages: not all pieces of your infrastructure could be migrated one-to-one. For example SSIS is a part of Data Factory and will have to be
rebuilt. Also you won’t leverage all the advantages of the cloud - scalability.
2. Refactor – adjust your data tier to use the best of cloud native services. It will require rebuilding some of the parts of your
data base and or ETL. For example migration of your ETL projects to Azure Data Factory
Advantages: partial code change. Leveraging Azure native service brings scalability, performance and cost gains.
Disadvantages: requires some experience of Azure services, require code changes. Takes more time.
3. Rearchitect – rearchitect your data tier to use cloud native Azure services. For example migrate your SQL database to
SQL Database.
Advantages: get the most benefits of Azure services. For example use serverless services to reduce cost.
Disadvantages: requires architecture and code changes. Potentially can delay migration. Can be considered as phase 2 of migration projects after Lift
Lift and shift migration.
4. Rebuild – completely rethink your architecture making it a greenfield project using the best available technology for your
data tier such as Big Data stack. Azure provides a wide variety of development and deployment capabilities. Enrich your data with
AI, IoT and streaming pipelines.
Advantages: as a result you get a modern data tier with the best advantages of a cloud native data tier
Disadvantages: as any new greenfield project it requires skills, time and budget.
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
What we did Phase1 & 2:
Phase1: Rehost/Lift and shift POC – during that phase we migrated a Proof of Concept and migrate
our data layer to Azure utilizing the following services:
Azure BLOB storage – raw data storage
SQL Server Managed Instance – SQL Server instance
Azure VM – to run SSIS packages for ETL
Results: in the scope of that POC we successfully accomplished the migration from managed hosting to
Azure cloud without significant code change.
Phase 2: Rebuild – re-think, re-architect and re-build our data layer architecture using full power of
Azure cloud services. Making it a greenfield project allows to bring modern technologies and really look at
things differently. Big Data stack and ability to scale resources quickly for a short period of time open a lot of
opportunities to bring your data pipelines to the whole new level.
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
All typical risks associated with a greenfield project but
specifically:
• Lack of experience with certain technologies
• Lack of real-life case studies and white papers about Big
Data Azure deployments
• Azure developer community is not big
• Time and budget
Project Risks
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Architecture – v1
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Initial Azure Architecture
• One Spark cluster (Spark ETL + Hive Query)
• Spark as both: processing and query engine
• Parquet is storage format
• Python 2.7.x (last update in 2010)
• SQL style ETL code
• All Development on STG
• Code management in GIT
• MicroStrategy processing through Thrift server
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Azure HDInsight eco system advanteges
• Low cost - by creating clusters on demand
• Automated cluster creation
• Managed hardware and configuration
• Easily scalable up or down
• Global availability
• Secure and compliant - protect enterprise data assets with
Virtual Network, encryption, and integration
with Azure AD.
• Simplified version management –
Hadoop eco-system components
keeps up-to-date.
• Extensibility with custom tools or third-party applications -
Azure Market place
• Easy management, administration, and monitoring - Azure
Monitor.
• Integration with other Azure services
• Azure Data Factory (ADF)
• Azure Blob Storage
• Azure Data Lake Storage Gen2
• Azure SQL Database
• Azure Analysis Services
• Self-healing processes and components
© 2020 VisiQuate, Inc. All Rights Reserved.
• You CAN build that architecture from scratch and be in production
relatively fast.
• Azure can sometimes surprise you: network issues, lost connections,
background updates, documented features didn’t work, etc.
• Performance of certain queries (e.g. table-based queries) might require
additional attention
• Plan ahead for access and security configuration.
• Make sure you understand your concurrency to architect appropriatly.
• Sometimes documentation and community search take longer than
expected.
We’re in Production - lessons learned
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Drivers for Architecture v2
• Separate clusters for ETL and run-time
• Parallel ETL architecture
• New HDP platform 3.x
• Upgraded to Python 3.5 (New libs, end of support of Python v2)
• Code structure re-design OOP instead of SQL style.
• Internal Logging module
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
V2 in Production
© 2020 VisiQuate, Inc. All Rights Reserved.
Architecture v2 - Lessons learned
• Hive can be fast
• Documented features do not always work (e.g. Hive locks,
set up access to clusters, cache, etc)
• There are differences in how Spark and Hive treat
metadata.
• You need to learn where you persist your data: data frame
vs a table or a disk for optimal performance.
• Cluster resource management and balancing for parallel
jobs
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Architecture v3
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Architecture v3
• HDInsight Spark cluster for ETL
• Separate ETL pipelines for every data source
• ETL Orchestration – running jobs in parallel
• Increasing HDInsight cluster capacity but decreasing its uptime
• Additional data persistence to simplify Spark jobs
• Using ORC format for DWH and Data Marts (Hive)
• Azure Synapse as Analytical Database and Analytics sandbox
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Why Azure Synapse
• Low maintenance solution
• Lightning speed of data ingestions from Data Lake (ORC, Parquet files)
• Several data replication models (can fine tune for your load pattern)
• Great performance on analytical workloads
• Flexible performance/cost configuration through performance tiers
• Ability to reduce performance tiers during nonproduction hours
• Easy to set up a separate instance for analysts (sandbox)
• SQL Server like experience for end users (SQL Server Management Studio, etc)
• Many connectivity options. (ODBC, Power BI, Excel etc)
© 2020 VisiQuate, Inc. All Rights Reserved.
Production system workload
0
2
4
6
8
10
12
Data Ready
HDI
Synapse
Work hoursADWH daily scaling:
• 100 DWU – off business hours
mode
• 1000 DWU – data refresh
• 300 DWH – run-time
© 2020 VisiQuate, Inc. All Rights Reserved.
Production Azure Synapse Workload pattern
ADWH daily scaling:
• 100 DWU – off business hours
mode
• 1000 DWU – data refresh
• 300 DWH – run-time
© 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved.
Synapse: lessons learned
• Documentation is not always up to date
• Performance tier scaling drops open connections
• Lack of cross database queries support
• No @@ROWCOUNT Support
• No PARTITIONED BY functionality when creating external tables
based on Hive tables
• No native Hive metastore integration
• Doesn’t integrate with SSDT
© 2020 VisiQuate, Inc. All Rights Reserved.
Summary
• There is definitely a way to get to the cloud quickly– you
should consider different options
• If you decide to rebuild your architecture be ready to go
through several iteration
• HDInsight clusters are great engine for data pipelines and
parallel workloads but it’s a very different technology and a
ramp up period is not insignificant
• Azure Synapse is a great engine for Analytical workloads and
Data Exploration and work great with all familiar
toolset(SSMS)
© 2020 VisiQuate, Inc. All Rights Reserved.
Leonid Nekhymchuk, Chief Technology Officer,
leonid.nekhymchuk@visiquate.com
Valeriy Zinovjev, Client Engineering Manager
valeriy.zinovjev@visiquate.com
© 2020 VisiQuate, Inc. All Rights Reserved.
Thank you.
Any Questions?

More Related Content

What's hot

What's hot (20)

Azure migration
Azure migrationAzure migration
Azure migration
 
Cloud migration
Cloud migrationCloud migration
Cloud migration
 
Defining Your Cloud Strategy
Defining Your Cloud StrategyDefining Your Cloud Strategy
Defining Your Cloud Strategy
 
Cloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case StudyCloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case Study
 
Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSC
 
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud MigrationCapgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
On-premise to Microsoft Azure Cloud Migration.
 On-premise to Microsoft Azure Cloud Migration. On-premise to Microsoft Azure Cloud Migration.
On-premise to Microsoft Azure Cloud Migration.
 
AWS Migration Planning Roadmap
AWS Migration Planning RoadmapAWS Migration Planning Roadmap
AWS Migration Planning Roadmap
 
SQL to Azure Migrations
SQL to Azure MigrationsSQL to Azure Migrations
SQL to Azure Migrations
 
Cloud Migration Checklist | Microsoft Azure Migration
Cloud Migration Checklist | Microsoft Azure MigrationCloud Migration Checklist | Microsoft Azure Migration
Cloud Migration Checklist | Microsoft Azure Migration
 
cloud-migrations.pptx
cloud-migrations.pptxcloud-migrations.pptx
cloud-migrations.pptx
 
Azure cloud migration simplified
Azure cloud migration simplifiedAzure cloud migration simplified
Azure cloud migration simplified
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to Cloud
 
Cloud workload migration guidelines
Cloud workload migration guidelinesCloud workload migration guidelines
Cloud workload migration guidelines
 
Cloud governance - theory and tools
Cloud governance - theory and toolsCloud governance - theory and tools
Cloud governance - theory and tools
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with Confidence
 
Migrating Data and Databases to Azure
Migrating Data and Databases to AzureMigrating Data and Databases to Azure
Migrating Data and Databases to Azure
 

Similar to VisiQuate: Azure cloud migration case study

Making Money in the Cloud
Making Money in the CloudMaking Money in the Cloud
Making Money in the Cloud
Gravitant, Inc.
 

Similar to VisiQuate: Azure cloud migration case study (20)

Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Webinar - Learn How to Deploy Microsoft SQL in the Cloud
Webinar - Learn How to Deploy Microsoft SQL in the CloudWebinar - Learn How to Deploy Microsoft SQL in the Cloud
Webinar - Learn How to Deploy Microsoft SQL in the Cloud
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Realise True Business Value .pdf
Realise True Business Value .pdfRealise True Business Value .pdf
Realise True Business Value .pdf
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
RightScale Webinar: Successfully Deploy Your Windows Workloads
RightScale Webinar: Successfully Deploy Your Windows WorkloadsRightScale Webinar: Successfully Deploy Your Windows Workloads
RightScale Webinar: Successfully Deploy Your Windows Workloads
 
Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...
Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...
Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...
 
Getting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of Concepts
 
Realize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyesRealize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyes
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
 
DevOps Case Studies
DevOps Case StudiesDevOps Case Studies
DevOps Case Studies
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
Considering Windows Azure
Considering Windows AzureConsidering Windows Azure
Considering Windows Azure
 
Making Money in the Cloud
Making Money in the CloudMaking Money in the Cloud
Making Money in the Cloud
 
RightScale Webinar: Get Your App To Azure
RightScale Webinar:  Get Your App To AzureRightScale Webinar:  Get Your App To Azure
RightScale Webinar: Get Your App To Azure
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 

VisiQuate: Azure cloud migration case study

  • 1. Journey to Azure WHATWE’VE LEARNED ALONGTHE WAY September 2020
  • 2. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Table of contents • About VisiQuate • Client Case Study and Table Setting • Why? and Where? The idea behind the migration • Initial architecture: what works, what doesn’t work • Architecture evolution: v2, v3, … • Summary: what we’ve learned, what would we do differently
  • 3. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. VisiQuate
  • 4. VisiQuate is an advanced analytics service that helps enterprises achieve peak business health. AI and ML Powered full stack data platform Powered by AI and machine learning, streaming analytics from disparate data sources deliver leading insights that alert the right users to problems and opportunities Value focused point solutions Our cloud-based solutions target a 3-5x ROI within 12 months and are non-invasive to Internal IT and is supported with white glove service by our SME’s, technical experts and account managers. Velocity Consulting Services Increase your speed to value and ROI withVelocity Data Fanatics as a Service offerings.
  • 5. © 2019 VisiQuate, Inc. All Rights Reserved. Case Study:Transformation at a multi-region, nationwide Health System Starting point: data chaos, data silos at the regional and facility level, manual data collection, integration and reporting, lack of data governance. Project Goals: • Consolidated data lake infrastructure serving – Power users in Rev Cycle, Finance, operations • Streamlined month end close • Dynamic data mining and exploration • ML and data science initiatives – Automation serving data marts and consolidated system wide reporting • Rev cycle • ED • Hospital operations • Finance / Decision support
  • 6. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Client Starting Point Architecture Data Sources Structured data Semi-Structured data RDBMS SSIS package ODS EDW MS SQL Server SQL Server Agent Manual data upload Ad-hoc data requests Reporting/ Data distribution
  • 7. © 2020 VisiQuate, Inc. All Rights Reserved. Issues with current architecture • Difficult to scale - MS SQL can only scale up can’t scale out • Overprovisioning - you buy a box that stays idle during off peak hours • Rigid hardware footprint - hard to provision resources for a short period of time • Schema on write only • No separation of storage and compute • Read/write concurrency • Maintenance cost is relatively high
  • 8. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. We can solve all our problems if we migrate everything to a cloud Client’s Big Idea
  • 9. © 2019 VisiQuate, Inc. All Rights Reserved. Standard whitepaper architecture looks simple Ingestion engine Real-Time Processing Batch Processing Hadoop Cluster Spark Hive BLOB Storage Reporting Engine In-Memory cubes Reports/dashboards Dynamic query generation Virtual private cloud Structured files Semi-Structured files RDBMS Logs Events Data Sources
  • 10. © 2019 VisiQuate, Inc. All Rights Reserved. This is how it looks in reality
  • 11. © 2020 VisiQuate, Inc. All Rights Reserved. • Flexible, scalable and secure infrastructure • Ability to scale up and scale down • Pay as you go • Reduce maintenance • Data Lake - schema on read model • Getting ready for AI/ML data processing Project Goals
  • 12. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. PROS • Existing Microsoft products footprint: Office, SQL Server, etc • Azure AD – leverage Office 365 subscription for Azure set up • Native MS SQL Server support – different options for migration • Power BI visualization as a part of Office 365 • Scalability • Pay as you go model • HIPAA compliant (for majority of services) CONS • Stability Issues due to rapid go to market strategy • Not as many serverless services • Open Source is not Native technology • Developer community is not as strong as alternatives Why Azure
  • 13. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Migration strategies Here is an interpretation of Microsoft’s scenarios of migration to the cloud: 1. Rehost (aka Lift and shift) – migration scenario with no or minimum code change to move your data tier to the cloud. select option for your SQL Server and ETL jobs. Migrate manually or using Azure migration tool. Advantages: fastest and cheapest. Get the most of your existing investment. Disadvantages: not all pieces of your infrastructure could be migrated one-to-one. For example SSIS is a part of Data Factory and will have to be rebuilt. Also you won’t leverage all the advantages of the cloud - scalability. 2. Refactor – adjust your data tier to use the best of cloud native services. It will require rebuilding some of the parts of your data base and or ETL. For example migration of your ETL projects to Azure Data Factory Advantages: partial code change. Leveraging Azure native service brings scalability, performance and cost gains. Disadvantages: requires some experience of Azure services, require code changes. Takes more time. 3. Rearchitect – rearchitect your data tier to use cloud native Azure services. For example migrate your SQL database to SQL Database. Advantages: get the most benefits of Azure services. For example use serverless services to reduce cost. Disadvantages: requires architecture and code changes. Potentially can delay migration. Can be considered as phase 2 of migration projects after Lift Lift and shift migration. 4. Rebuild – completely rethink your architecture making it a greenfield project using the best available technology for your data tier such as Big Data stack. Azure provides a wide variety of development and deployment capabilities. Enrich your data with AI, IoT and streaming pipelines. Advantages: as a result you get a modern data tier with the best advantages of a cloud native data tier Disadvantages: as any new greenfield project it requires skills, time and budget.
  • 14. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. What we did Phase1 & 2: Phase1: Rehost/Lift and shift POC – during that phase we migrated a Proof of Concept and migrate our data layer to Azure utilizing the following services: Azure BLOB storage – raw data storage SQL Server Managed Instance – SQL Server instance Azure VM – to run SSIS packages for ETL Results: in the scope of that POC we successfully accomplished the migration from managed hosting to Azure cloud without significant code change. Phase 2: Rebuild – re-think, re-architect and re-build our data layer architecture using full power of Azure cloud services. Making it a greenfield project allows to bring modern technologies and really look at things differently. Big Data stack and ability to scale resources quickly for a short period of time open a lot of opportunities to bring your data pipelines to the whole new level.
  • 15. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. All typical risks associated with a greenfield project but specifically: • Lack of experience with certain technologies • Lack of real-life case studies and white papers about Big Data Azure deployments • Azure developer community is not big • Time and budget Project Risks
  • 16. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Architecture – v1
  • 17. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Initial Azure Architecture • One Spark cluster (Spark ETL + Hive Query) • Spark as both: processing and query engine • Parquet is storage format • Python 2.7.x (last update in 2010) • SQL style ETL code • All Development on STG • Code management in GIT • MicroStrategy processing through Thrift server
  • 18. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Azure HDInsight eco system advanteges • Low cost - by creating clusters on demand • Automated cluster creation • Managed hardware and configuration • Easily scalable up or down • Global availability • Secure and compliant - protect enterprise data assets with Virtual Network, encryption, and integration with Azure AD. • Simplified version management – Hadoop eco-system components keeps up-to-date. • Extensibility with custom tools or third-party applications - Azure Market place • Easy management, administration, and monitoring - Azure Monitor. • Integration with other Azure services • Azure Data Factory (ADF) • Azure Blob Storage • Azure Data Lake Storage Gen2 • Azure SQL Database • Azure Analysis Services • Self-healing processes and components
  • 19. © 2020 VisiQuate, Inc. All Rights Reserved. • You CAN build that architecture from scratch and be in production relatively fast. • Azure can sometimes surprise you: network issues, lost connections, background updates, documented features didn’t work, etc. • Performance of certain queries (e.g. table-based queries) might require additional attention • Plan ahead for access and security configuration. • Make sure you understand your concurrency to architect appropriatly. • Sometimes documentation and community search take longer than expected. We’re in Production - lessons learned
  • 20. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Drivers for Architecture v2 • Separate clusters for ETL and run-time • Parallel ETL architecture • New HDP platform 3.x • Upgraded to Python 3.5 (New libs, end of support of Python v2) • Code structure re-design OOP instead of SQL style. • Internal Logging module
  • 21. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. V2 in Production
  • 22. © 2020 VisiQuate, Inc. All Rights Reserved. Architecture v2 - Lessons learned • Hive can be fast • Documented features do not always work (e.g. Hive locks, set up access to clusters, cache, etc) • There are differences in how Spark and Hive treat metadata. • You need to learn where you persist your data: data frame vs a table or a disk for optimal performance. • Cluster resource management and balancing for parallel jobs
  • 23. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Architecture v3
  • 24. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Architecture v3 • HDInsight Spark cluster for ETL • Separate ETL pipelines for every data source • ETL Orchestration – running jobs in parallel • Increasing HDInsight cluster capacity but decreasing its uptime • Additional data persistence to simplify Spark jobs • Using ORC format for DWH and Data Marts (Hive) • Azure Synapse as Analytical Database and Analytics sandbox
  • 25. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Why Azure Synapse • Low maintenance solution • Lightning speed of data ingestions from Data Lake (ORC, Parquet files) • Several data replication models (can fine tune for your load pattern) • Great performance on analytical workloads • Flexible performance/cost configuration through performance tiers • Ability to reduce performance tiers during nonproduction hours • Easy to set up a separate instance for analysts (sandbox) • SQL Server like experience for end users (SQL Server Management Studio, etc) • Many connectivity options. (ODBC, Power BI, Excel etc)
  • 26. © 2020 VisiQuate, Inc. All Rights Reserved. Production system workload 0 2 4 6 8 10 12 Data Ready HDI Synapse Work hoursADWH daily scaling: • 100 DWU – off business hours mode • 1000 DWU – data refresh • 300 DWH – run-time
  • 27. © 2020 VisiQuate, Inc. All Rights Reserved. Production Azure Synapse Workload pattern ADWH daily scaling: • 100 DWU – off business hours mode • 1000 DWU – data refresh • 300 DWH – run-time
  • 28. © 2020 VisiQuate, Inc. All Rights Reserved.© 2020 VisiQuate, Inc. All Rights Reserved. Synapse: lessons learned • Documentation is not always up to date • Performance tier scaling drops open connections • Lack of cross database queries support • No @@ROWCOUNT Support • No PARTITIONED BY functionality when creating external tables based on Hive tables • No native Hive metastore integration • Doesn’t integrate with SSDT
  • 29. © 2020 VisiQuate, Inc. All Rights Reserved. Summary • There is definitely a way to get to the cloud quickly– you should consider different options • If you decide to rebuild your architecture be ready to go through several iteration • HDInsight clusters are great engine for data pipelines and parallel workloads but it’s a very different technology and a ramp up period is not insignificant • Azure Synapse is a great engine for Analytical workloads and Data Exploration and work great with all familiar toolset(SSMS)
  • 30. © 2020 VisiQuate, Inc. All Rights Reserved. Leonid Nekhymchuk, Chief Technology Officer, leonid.nekhymchuk@visiquate.com Valeriy Zinovjev, Client Engineering Manager valeriy.zinovjev@visiquate.com
  • 31. © 2020 VisiQuate, Inc. All Rights Reserved. Thank you. Any Questions?

Editor's Notes

  1. Azure AD – leverage Office 365 subscription for Azure set up: https://docs.microsoft.com/en-us/azure/cost-management-billing/manage/office-365-account-for-azure-subscription