Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Designing a modern data warehouse in azure

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 35 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Designing a modern data warehouse in azure (20)

Anzeige

Weitere von Antonios Chatzipavlis (20)

Aktuellste (20)

Anzeige

Designing a modern data warehouse in azure

  1. 1. Designing a Modern Data Warehouse in Azure
  2. 2. Antonios Chatzipavlis Data Solutions Consultant & Trainer 1988 Beginning of my professional career 1996 I started working with SQL Server 6.0 1998 Certified as MCSD (3rd in Greece) 1999 Became an MCT 2010 Microsoft MVP on Data Platform Created www.sqlschool.gr 2012 Became MCT Regional Lead by Microsoft Learning 2013 Certified as MCSE : Data Platform and MCSE : Business Intelligence 2016 Certified as MCSE: Data Management & Analytics 2018 Certified as MCSA : Machine Learning Recertified as MCSE: Data Management & Analytics
  3. 3. • Articles • SQL Server in Greek • SQL Nights • Webcasts • SQL Server News • Downloads • Resources What we are doing Follow us fb/sqlschoolgr fb/groups/sqlschool @antoniosch @sqlschool yt/c/SqlschoolGr SQLschool.gr Group A community for Greek professionals who use the Microsoft Data Platform Ask your question at help@sqlschool.gr
  4. 4. Explore everything PASS has to offer Free Online Resources Newsletters PASS.org Get involved Free online webinar events Local user groups around the world Free 1-day local training events Online special interest user groups Business analytics training
  5. 5. bit.ly/AAB2019Evaluation
  6. 6. A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process. WHAT IS A DATA WAREHOUSE?
  7. 7. TRADITIONAL DATA WAREHOUSE
  8. 8. SELF-SERVICE DATA WAREHOUSE
  9. 9. TRADITIONAL DW LIMITATIONS Data sources User Competition Scaling up Data platforms
  10. 10. CURRENT DW CHALLENGES Timeliness Flexibility Quality Findability
  11. 11. RECENT RESEARCH SURVEYS of responders reports that they will replace their primary DW platform and analytics tools within 3 years +50%
  12. 12. The data tsunami
  13. 13. WHY YOU NEED A MODERN DATA WAREHOUSE Customer experience Quality assurance Operational efficiency Innovation
  14. 14. THE CRITERIA FOR SELECTING A MODERN DW Meets Current and Future Needs
  15. 15. ON-PREMISES VS. CLOUD DW • Evaluating Time to Value • Accounting for Storage and Computing Costs • Sizing, Balancing and Tuning • Considering Data Preparation and ETL Costs • Cost of Specialized Business Analytic Tools • Scaling and Elasticity • Delays and Downtime • Cost of Security Breaches • Data Protection and Recovery
  16. 16. STEPS TO GETTING STARTED WITH CLOUD DW • Evaluate your data warehousing needs. • Migrate or start fresh. • Establish success criteria. • Evaluate cloud data warehouse solutions. • Calculate your total cost of ownership. • Set up a proof of concept (POC).
  17. 17. Azure Modern Data Warehouse
  18. 18. MODERN DATA WAREHOUSE
  19. 19. MODERN DATA WAREHOUSE IN AZURE
  20. 20. ADVANCED ANALYTICS ON BIG DATA
  21. 21. REAL-TIME ANALYTICS
  22. 22. SQL SERVER 2019 BIG DATA CLUSTERS
  23. 23. INGEST DATA ADF • PaaS • Mapping Data Flow transform data (ETL) • Copy Data tool easily copy from source to destination • Templates • Any new project • Converting SSIS packages • Row by row ETL can be slower • Data needs to be moved to Databricks – limited by compute size • Mapping Data flow takes time to startup SSIS • SSDT – Visual Studio • Very popular product • Used for on-prem ETL for may year • Too big of an effort to migrate existing packages • Skillset staying on-prem • Change to IR in ADF • Row by row ETL can be slower • Data need to moved to IR • Limited by node size/number of SSIS IR
  24. 24. STORE DATA ADLS Gen 2 • PaaS • Best features of blob storage • Not all features are available yet • Some products not support yet • 5TB file size limit Blob Storage • PaaS • Original storage • Most popular • Don’t use for new projects • Account limit 2 PB for US and Europe • 4,75TB file size limit SQL Server 2019 Big Data Cluster • IaaS • Combines SQL Server database engine, Spark, HDFS (ADLS Gen2) into a unified data platform • Deployed as containers on Kubernetes • Polybase • Hybrid cloud • Data virtualization • AI Platform
  25. 25. PREP DATA Azure Databricks • PaaS • Processing massive amounts of data • Training & deploy models • Manage workflows • Spark & notebooks • Integration with ADLS, SQL DW, PBI • Writing Code • High learning curve Azure HDInsight • PaaS • Deploys & provisions Apache Hadoop clusters • No integration with SQL DW • Always running and incurring cost • Hortonworks merged with Cloudera Polybase & Stored Procedures in SQL DW • IaaS • T-SQL queries via external tables • Tuning queries • Increase storage space PowerBI Dataflow • PowerBI service • Power Query • Self-service data prep • Individual solution • Small workloads • Don’t use this to replace a DW or ADF
  26. 26. MODEL & SERVE DATA Azure SQL DW • PaaS • Fully managed petabyte scale cloud DW • Can scale compute and storage independently • Can be paused • MPP Azure Analysis Services • PaaS • Tabular model • Fast queries • High concurrency • Semantic layer • Vertical scale-out • High availability • Advanced time- calculations • Time to process the cube Azure SQL Database • PaaS • Suitable for small DW • Size limits/tier • Optimized for OLTP SQL Server in VM • IaaS • MDX models Cosmos DB • PaaS • Globally distributed • Multi-model database service • Spark to Cosmos DB connector for DW aggregations
  27. 27. ETL vs ELT ETL ELT Time – Load Uses staging area and system, extra time to load data All in one system, load only once Time – Transformation Need to wait, especially for big data sizes - as data grows, transformation time increases All in one system, speed is not dependent on data size Time – Maintenance High maintenance - choice of data to load and transform and must do it again if deleted or want to enhance the main data repository Low maintenance - all data is always available Implementation complexity At early stage, requires less space and result is clean Requires in-depth knowledge of tools and expert design of the main large repository Analysis & Processing style Based on multiple scripts to create the views - deleting view means deleting data Creating adhoc views - low cost for building and maintaining Data limitation or restriction By presuming and choosing data a priori By HW (none) and data retention policy DW Support Prevalent legacy model used for on-premises and relational, structured data Tailored to using in scalable cloud infrastructure to support structured, unstructured such big data sources Data Lake Support Not part of approach Enables use of lake with unstructured data supported Usability Fixed tables, Fixed timeline, Used mainly by IT Ad Hoc, Agility, Flexibility, Usable by everyone from developer to citizen integrator Cost-effective Not cost-effective for small and medium businesses Scalable and available to all business sizes using online SaaS solutions
  28. 28. LAMBDA ARCHITECTURE
  29. 29. LAMBDA ARCHITECTURE IN AZURE
  30. 30. COMMON DATA MODEL
  31. 31. Antonios Chatzipavlis Data Solutions Consultant & Trainer ./sqlschoolgr - ./groups/sqlschool @antoniosch - @sqlschool yt/c/SqlschoolGr SQLschool.gr Group Thank you!
  32. 32. A community for Greek professionals who use the Microsoft Data Platform Copyright © 2018 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION

×