After this tutorial you will be equipped with the knowledge necessary to make informed decisions about the design and setup of a cloud-based Big Data platform using the so-called Modern Data Stack (MDS). Bogdan will introduce the main concepts of the MDS and explain its benefits and why it is such a compelling choice for companies of any size. He will also provide advice to help you navigate the ever increasing number of vendors in this space. You will learn what you need to take into consideration when implementing the MDS in your company, and see how it works in a live demo.
10. The 3 V’s
• Volume
• Variety
• Velocity
• Veracity
• Value
Wikipedia Draft of the Modern Data Stack article: https://en.wikipedia.org/wiki/Draft:Modern_Data_Stack
13. The Cloud DWH
• Arbitrary Scale
• Decoupling of costs for Storage and Compute
• SQL interface
14. The Cloud DWH
• Arbitrary Scale
• Decoupling of costs for Storage and Compute
• SQL interface
• Providers
15. Data Storage – History in a Nutshell
• 1979 Oracle
• 1989-1993 MS SQL Server, IBM DB2
• 1995-1996 MySQL, PostgreSQL
• 2005 M. Stonebraker - One Size Fits All: An Idea Whose Time Has Come and Gone¹
• 2005-2011 Hadoop, MongoDB, CouchDB, Cassandra,
Cloudera, MapR, Hortonworks
• 2011-2014 BigQuery, Amazon Redshift, Snowflake
• 2014 Apache Spark
• 2015-2017 Google Spanner, CockroachDB, FaunaDB
• 2019 Databricks Delta Lake, Cloud DWHs gain significant traction
• 2020 Modern Data Stack²
1 http://cs.brown.edu/~ugur/fits_all.pdf
2 https://www.moderndatastackconference.com/
19. dbt – Data Build Tool
• Transformation Layer as Code
• Models defined in SQL
• Jinja templating for additional logic on top of SQL
• Version control with git
• Automated deployment
• Automated testing
• Automated generation of documentation and data lineage
• Analyst -> Analytics Engineer
• Easy to learn
Coalesce, “The annual conference dedicated to the advancement and practice of Analytics Engineering.”, https://coalesce.getdbt.com/
20. Advantages of the Modern Data Stack
• Easy to spin up
• Cost efficient
• Analytics Engineering
Future Data 2020 - Tristan Handy - The Modern Data Stack: Past, Present, and Future (https://www.youtube.com/watch?v=1Zj8gTLdf5s)
23. Keep in Touch
Modern Data Science
Group
meetup.com/vienna-data-science-
tools/events/292826191/
Vienna Data Science Tools
Meetup
linkedin.com/in/bogdan-pirvu/
linkedin.com/groups/9234718/
Personal