Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

DW Migration Webinar-March 2022.pptx

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 25 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie DW Migration Webinar-March 2022.pptx (20)

Anzeige

Weitere von Databricks (20)

Aktuellste (20)

Anzeige

DW Migration Webinar-March 2022.pptx

  1. 1. ©2021 Databricks Inc. — All rights reserved Modernize your Data Warehouse Amit Kara, Director, Technical Product Marketing Soham Bhatt, SME Lead, DW Migration A migration journey to the Databricks Lakehouse Platform
  2. 2. ©2021 Databricks Inc. — All rights reserved Agenda • Why lakehouse for data warehousing • How does Databricks help with Data Warehousing • Key differentiators when using the Databricks Lakehouse Platform • Demo: Data warehousing on Databricks • How to modernize your data warehouse to a Lakehouse • Key takeaways for migrating to the Lakehouse
  3. 3. ©2021 Databricks Inc. — All rights reserved What’s the problem we’re solving?
  4. 4. ©2021 Databricks Inc. — All rights reserved Legacy Data Warehouses aren’t keeping up Data Warehouses can’t keep up with data volume and variety Innovation hinges on integrating ML/AI and predictive insights Business agility requires reliable, real-time data Not cost effective, especially with scale Data is vendor locked-in and duplicated
  5. 5. ©2021 Databricks Inc. — All rights reserved The problem with legacy CDW: a fragmented approach to modernizing your architecture Structured Cloud Data Warehouse Unstructured Semi-Structured DATA LAKE BI Reports, Dashboards & SQL ELT/ETL ADLS AWS S3 GCP Data Science Model Training Model Scoring Model Deployment Limited support for streaming Limited support for unstructured data (audio/images/video) Complex & many stages. Data is duplicated Lock-in / proprietary format Compute cost for all data access Disparate tooling decreases data team productivity
  6. 6. ©2021 Databricks Inc. — All rights reserved Why Data Warehousing on Databricks?
  7. 7. ©2021 Databricks Inc. — All rights reserved Your tools of choice Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or Databricks to ingest, transform and query all your data in-place. Serverless compute Lower costs and eliminate the need to manage, configure or scale cloud infrastructure with serverless and get the best price/performance. Unified governance Simplify architecture, establish one single copy for all your data, and one unified governance layer across all data teams using standard SQL. Why Data Warehousing on Databricks Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming Break down silos Empower data scientists and analysts to access the most complete and freshest data faster, and uncover new insights together.
  8. 8. ©2021 Databricks Inc. — All rights reserved Connect your data, analytics and AI tools to the Databricks Lakehouse Discover validated data and AI solutions for new use cases Setup in a few clicks with pre-built integrations Integrated out-of-the-box with Partner Connect Business Intelligence ML Tools Data Preparation Data Connectors Solution Accelerators Data Apps Partners Discover, connect, and process data, analytics, and AI tools to your lakehouse
  9. 9. ©2021 Databricks Inc. — All rights reserved Databricks thrives within your modern data stack Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Data Science Data Pipelines Data Governance Machine Learning 10 Data Ingestion
  10. 10. ©2021 Databricks Inc. — All rights reserved First-class SQL development experience Query data lake data using familiar ANSI SQL, and collaboratively find and share new insights faster with the built-in SQL query editor, alerts, visualizations, and interactive dashboards. Collaboratively query, explore, and transform data in-place
  11. 11. ©2021 Databricks Inc. — All rights reserved Elastic, instant compute decoupled from storage • Quickly setup optimized compute resources with SQL endpoints (powered by vectorized engine Photon) • High concurrency built-in with automatic load balancing • Intelligent workload management and faster reads from cloud storage • Instant startup and greater availability • Available in Databricks Serverless (preview) ! No resource management needed with Serverless
  12. 12. ©2021 Databricks Inc. — All rights reserved Built from the ground up for best price/performance Source: Performance Benchmark with Barcelona Supercomputing Center Query and analyze your most complete and freshest data with up to 12x better price/performance than traditional cloud data warehouses. Lightning fast analytics
  13. 13. ©2021 Databricks Inc. — All rights reserved 15 ● Centralized metadata and user management ● Centralized data access controls ● Data lineage Private Preview ● Data access auditing ● Data search and discovery Coming Soon ● Secure data sharing with Delta Sharing ● Standard SQL Fine-grained governance on the Lakehouse Unity Catalog
  14. 14. ©2021 Databricks Inc. — All rights reserved Key considerations for Modern Analytics & DW ❏ Empower Business Units for Self-service and Advanced Analytics ❏ Simple, Collaborative, Agile Cross-Functional teams ❏ Machine Learning and Artificial Intelligence - CIO level initiatives ❏ Platform that support for all data types - structured and unstructured ❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
  15. 15. ©2021 Databricks Inc. — All rights reserved Demo
  16. 16. ©2021 Databricks Inc. — All rights reserved Modern Data Warehousing on Databricks Data Science and Machine Learning Databricks Machine Learning Batch Ingestion Stream Ingestion Curated Data Raw Ingestion and History BRONZE Filtered, Cleaned, Augmented SILVER Business Aggregates & Data Models GOLD Enterprise Reporting and BI DBSQL Endpoints Databricks SQL Databricks Notebooks, Delta Live Tables Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform ETL Partners Data Governance powered by Databricks Unity Catalog EDC
  17. 17. ©2022 Databricks Inc. — All rights reserved Building your Lakehouse Comprehensive investment into your success 20 Supported by 24/7/365 global, production operations at scale Your success Solution Accelerators In-person and Virtual Training Co-located Professional Services
  18. 18. ©2021 Databricks Inc. — All rights reserved Migration Methodology 21 Phase 1 Discovery Migration specific discovery and consultation Phase 2 Assessment Assessment, Design, Tooling, Accelerators, Sizing, Partners Phase 3 Strategy Technology mapping, migration workshop, migration planning Databricks Migration Team with/without Partner Phase 4 Production Pilot Reference implementation of a production use case, Overall migration implementation plan Phase 5 Execution Migration execution and support Databricks PS Driven Partner Driven
  19. 19. ©2021 Databricks Inc. — All rights reserved Migration Approach 22 Architecture/ Infrastructure ● Establish deployment Architecture ● Implement Security and Governance framework Data Migration ● Map Data Structures and Layout ● Complete One time load ● Implement incremental load approach ETL and Pipelines ● Migrate Data transformation and pipeline code, orchestration and jobs ● Speedup your migration using Automation tools ● Validate: Compare your results with On Prem data and expected results BI and Analytics ● Re-point reports and analytics for Business Analysts and Business Outcomes ● Semantic Layer/OLAP cube repointing ● Connect to reporting and analytics applications Data Science/ML ● Establish connectivity to ML Tools ● Onboard Data Science teams
  20. 20. ©2021 Databricks Inc. — All rights reserved Strategies for Data Migration One-time loads, catch-up loads , Real-time vs Batch Ingestion 1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion) 1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest 1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
  21. 21. ©2021 Databricks Inc. — All rights reserved Strategies for ETL/Code Migration Use of Automated tools or frameworks can reduce your timelines by over 50%! Migration of Stored Procedures and/or ETL Mappings • For Databricks Notebooks based ETL: • Delta Live Tables or Databricks Notebook-based ETL • Metadata-driven Ingestion Frameworks • ETL tool Partners: • Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more • Auto code converters accelerate migrations!
  22. 22. ©2022 Databricks Inc. — All rights reserved Repoint Cubes and Reports to Databricks • As easy as repointing your reports to DBSQL jdbc/odbc drivers (Photon and our newest cloudfetch ODBC drivers ) • Key Integrations • PowerBI Premium ( semantic layers, composite models, upto 400 GB caching) • Tableau Hyper Extracts • Looker • OLAP cube partners like Microstrategy • Atscale: Universal Semantic layer ( aggs built in Databricks) Unleash Self-service Analytics with a Semantic Lakehouse 25
  23. 23. ©2022 Databricks Inc. — All rights reserved Key Takeaways.. Migration is a team sport ● Data Warehousing on Lakehouse is simple ● Migrations can be accelerated using automation tools ● Extensive Partner Ecosystem around Databricks Modern Data Stack ● Huge set of joint offerings to accelerate migrations with SI/Consulting Partners
  24. 24. ©2021 Databricks Inc. — All rights reserved Next Steps 1. Learn more about the Inner Workings of the Lakehouse 1. Schedule a Data Warehouse migration workshop 1. Schedule a Databricks SQL Hands-on workshop Customize your EDW/ETL Migration Success Plan with an Expert-led Migration Assessment Workshop
  25. 25. ©2021 Databricks Inc. — All rights reserved

×