SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Architecting Agile Data
Applications for Scale
Richard Garris
AVP, Field Engineering @ Databricks
Agenda
▪ About Me
▪ The World’s Most
Valuable Companies
▪ Waterfall to Agile
▪ Traditional Data
Platforms
▪ Modern Data Platforms
▪ Summary
About Me
Ohio State and CMU Graduate
Almost 2 decades in the data space
IMS Database IBM Mainframe ➤ Oracle / SQL Server ➤ Big Data (Hadoop, Spark)
• 3 years as an independent consultant
• 5 years at PwC in the Data Management practice
• 3.5 years at Google on their data team
• 6 years at Databricks
Certified Scrum Master
Other Talks
• Apache Spark and Agile Model Development on Data Science Central
• https://vimeo.com/239762464
• ETL 2.0: Data Engineering using Azure Databricks and Apache Spark on MSDN Channel 9
• https://channel9.msdn.com/Events/Connect/2017/E108
The World’s Most Valuable Companies (in Billions)
Top 15 Companies in 2021
Top 15 Companies in 1993
Source: Data is Beautiful
https://www.youtube.com/channel/UCkWbqlDAyJh2n8DN5X6NZyg
Source: MGMResearch
https://mgmresearch.com/top-us-companies-by-market-cap/
What do the top five companies
do differently?
What do the top five companies do differently?
Not really. FAAMG may have
some unique datasets e.g. 2.7
B user profiles, search results,
etc.. but the other Fortune
500, commercial, mid-market,
and digital native companies
and public sector
organizations have a lot of data
too!
Not really. They did at one
point in time but many of the
AI, DL and ML algorithms are
available in Open Source
(TensorFlow, PyTorch, Mxnet,
LightGBM) or has been
released in research papers.
Better AI, DL, ML algorithms?
Not really. At one point,
Google, Amazon, Microsoft,
Facebook and Apple had the
best infrastructure in the
world to process data but
public cloud gives everyone
access to most of that. There
are also open source and
commercial software
available to anyone who want
to process Big Data at scale
Better data processing?
Lots and lots of data?
36,000 60,000 90,000
20,000 60,000
Any guesses to what these numbers are?
Number of Engineers*
*Estimated using Glassdoor, Public Job postings, financials (R&D spend as a % of total FTE) – no confidential
information was used to derive these values and the exact number of engineers is not public information
What do engineers bring to the modern enterprise?
Agile Application Development Lifecycle
Startups don’t change the world,
they adapt to the world faster than
everyone else
-Jacques Benkoski, USVP
But what does this have to do
with Data Applications?
These companies also brought Agile
to Data Applications and that’s what
makes them competitive!
What are Agile Data Applications?
▪ Self contained end-to-end projects to a data problem
▪ Built by data developers using open source programming
languages
▪ Follow good software engineering principles
▪ Can leverage algorithms and analytics
▪ Scalable both in terms of big data and total cost of
ownership
▪ Meets the responsiveness requirements of end users
▪ Deployable into a production environment
The Waterfall Development Methodology
1990-early 2000s
Concept &
Requirements
Analysis &
Design
Develop &
Implement
Test & QA
Deploy &
Maintain
One stage always follows the
previous one and it’s hard to
accommodate changes
Traditional Data Architecture (worked well with Waterfall)
Operational
Systems
Staging Area
Enterprise
Data
Warehouse
Data Marts Users/Analysis / Predictive
Analytics
Inventory
Sales
Purchasing
ERP
Sales Data
Finance Data
E
T
L
Extract, Transform and Load
DBAs ETL Designers BI Analysts
Data Modeling
Database Admin
& Security
Tuning
Mostly work in
GUI ETL Tools
SQL / Stored
Procs (CTAS)
Limited to BI
tools / Report
Designers
Limited SQL
Business
Define the
Requirements
Domain
Experts
1990- early 2000s Most operational systems
only housed structured
data with small data
volumes
GUI tools Informatica,
ODI, Ab Initio, DataStage,
SQL or Stored Procedures
Large volumes of data often
kept in unmanaged staging
tables that were often archived
to save cost
The “single source of
truth” was the
monolithic data
warehouse
Inflexible model because to add a single column
for a report or model downstream could take 6
months because the tight coupling from ETL to
staging to EDW to report
Didn't work well for
machine intelligence and
AI / ML. Data mining
mostly used for R&D and
limited to the refined and
aggregated data.
DBAs, ETL Designers, BI
dominate the Traditional
Approach – no real data
engineers or data scientists
Most EDWs were sold as expensive
appliances with data locked into a
proprietary format with combined
compute and storage
Only way to scale out is to buy more
appliances
Minimal support for arbitrary files, semi
structured, unstructured or streaming
sources
Worked well for human intelligence
like static reports and dashboards
Pure Agile Development Methodology
Mid 2000s – early 2010s • Agile Manifesto
• Agile introduced change
as part of the process
• Early versions of Agile
(Scrum and XP) worked
well for small
self-managed teams
• It didn’t scale well to
larger teams and the
needs of larger
enterprises
• It also lacked some of
the discipline of
Waterfall
Open Data Lake Architecture (like pure Agile)
Hadoop Data Lake
Hadoop Admin Hadoop Dev Analysts
Administer the
Cluster
Manage HDFS,
YARN,
Applications
Tuning
Map Reduce, Pig,
Spark, Cascading,
Mahout …
Java Developers
Hive, Impala,
Drill, LLAP
(or BI tools)
Mid 2000s to early 2010s
HDFS
Map
Reduce
Spark Hive Mahout
Enterprise Data
Warehouse
YARN Scheduler
Machine
Data
CRM
Finance Data
New Sources
Geospatial
Sensor / Logs
Clickstream Data
ERP
Finance Data
Supports new sources like web scale data,
SaaS sources, operational systems with
structured data (sequence files, parquet),
semi structured (logs, JSON) and
unstructured (images, audio, video)
because everything is a file
Distributed file system built on commodity
servers
Could handle high volumes, velocity, and
variety of data
Applications could be written and deployed
inside Hadoop using YARN in Java, Scala,
Python, Hive (SQL), Pig, Mahout for ML
Commodity servers used to scale out
compute for analytics
Initially cheaper because you used
commodity servers versus specialized
hardware like with an EDW, but because
compute and storage were paired together
you had to buy more servers for storage
even if you didn’t need more compute
Mixed bag on performance – allowed scale
out of compute resources but tuning
Hadoop and YARN as well as the query
engines like Impala, Hive, Hive variants
like Hive LLAP is difficult
Schema on read versus schema on write
created a ton of agility, but the lack of
schema enforcement and reliability of the
data became an issue at scale (hence the
Data Lake becoming a Data Swamp)
Still had some monolithic attributes that are
a better fit for waterfall (e.g because all of
the applications run inside Hadoop you
have to upgrade all your applications when
you upgrade the cluster
The goal and promise of Hadoop was to
offload or replace the EDW but that didn’t
really happen
Required specialized people to manage and
develop on Hadoop (Admin, trained developers) and
ultimately difficult with so many specialized
divergent frameworks (MapReduce, Tez, Hive, SQL
on Hadoop, Spark, Flink, Storm, Mahout,
Cascading)
Analysts and Business Users don’t concern
themselves with the infrastructure so were
shielded from the complexity but would
complain if SLAs weren’t being met and
would fallback to the EDW
Modern Agile (Hybrid, Disciple Agile Delivery, SAFe)
Mid 2010s – Today
Source: PMI Institute
The Next Hybrid is the Modern Lakehouse Platform
(Data Lake + Data Warehouse)
Late 2010s – 2020s and beyond
Machine
Data
CRM
Finance Data
New Sources
Geospatial
Sensor / Logs
Clickstream Data
BRONZE SILVER DOGECOIN
Landing Refined Aggregates
Open Cloud Storage (S3, ADLS, GCS)
Schema / ACID (Delta Lake, Iceberg, HUDI)
(Ingestion Tools)
Customer Facing Applications
The Modern Open Lakehouse
Downstream Specialized Data Stores
ERP
Finance Data
Legacy Stores
Internal Analytics
Supports Old
and New
Sources
Stored in Open
Storage (Open
Format, Reliable and
Infinitely Scalable)
Data management
layer for reliability
and schema
Multiple layers to support
staging to production grade
tables
Agile data application
platform that separates
compute and code from
storage
Internal applications
(dashboards, reports,
custom apps)
External customer
facing applications (end
to end model lifecyle,
recommendation
systems, customer
facing applications
Move downstream
specialized data stores
like graph databases,
NoSQL, SQL like MPP
or EDWs
Supports structured
(tables), semi-structured
(logs, JSON) and
unstructured (Images,
Audio, Video), Live Data
( Streaming)
Scalability of the cloud
and multi-cloud
Modern Data Personas
▪ Great for Data Scientists
▪ Data Science is a science – constant evolution through experiments and hypotheses is part of the
process
▪ Moves data scientist toward secure and scalable compute and off their laptops with R / Python / SAS
▪ Data scientists often need access to the raw or bronze transaction data for modeling and that’s often
expensive or hard to justify storing in the EDW or get access to and use from Hadoop
▪ Great for Data Engineering
▪ Data Engineers are developers
▪ Write code in standard programming languages (Java, Scala, Python) not proprietary stored procedures
▪ They should write high quality production code that is testable, reusable and modular and can be
continuously integrated and deployed (CI/CD)
▪ Great for Data Analysts
▪ Data Analysts want more data and they want data faster
▪ SQL skills are expected and even some light Python or R for advanced analytics
A Lakehouse is a Hybrid that supports the Modern Data Scientist, Data Engineer and Data Analysts
Why Cloud?
▪ Agile infrastructure that is infinitely scalable
▪ Separates compute from storage (scale compute as needed, scale storage without thinking about it)
▪ Infrastructure as code and part of the CI/CD process
▪ No need to hard code to the infrastructure for deployment
▪ Reliable, fault tolerant and recoverable
▪ Pipeline runs independent of the compute so server outages don’t stop production pipelines
▪ Can handle cases where a node or two fails but the job continues because failure is inevitable at scale
▪ If a job does fail, then the integrity of the data is not compromised and you can recover
▪ Portable
▪ Portable across different types of compute
▪ Portable across different clouds
Cloud brings agility to Data Applications when done right
What about Data Mesh?
Data Mesh is an architectural pattern introduced by Zhamak Dehghani of Thoughtworks in How to Move
Beyond a Monolithic Data Lake to a Distributed domain-driven design
• Data is a product
• Data is a business asset
• Data should be monetized
otherwise it becomes a
liability
• Data belongs to
decentralized domains or
product owners
• Each team is self managed
• But the governance and
standards are centralized to
allow for interoperability and
data sharing
• Sounds a lot like the Hybrid
Agile + Lakehouse in the
Cloud approach!
Lakehouse Technology Choices
▪ Cloud-native (separates
compute from storage,
autoscaling, cost management)
▪ Multi-cloud (AWS, Azure, Google)
▪ Open formats (Delta Lake,
Parquet, AVRO, JSON)
▪ Open source (Scala, Python,
SQL, R, Spark)
▪ Machine Learning and Data
Science out of the box
(Notebooks, Juptyer, MLFlow)
▪ Supports agile with IDE
integration and Projects in the
Workspace
▪ Production apps with
DBConnect and SQLAnalytics or
JDBC
▪ Cloud Hadoop (EMR, HDI,
DataProc)
▪ Use cloud storage (S3, ADLS,
GCS)
▪ Query your Data Lake directly
(RedShift Spectrum / Athena,
Azure Synapse, or Big Query
External Tables)
▪ Connect your choice of
Notebook for Exploratory Data
Analysis
▪ Connect your choice of MLOps
tool (SageMaker, Azure Machine
Learing, Google Cloud AI
Platform)
▪ Productionize apps using
containers and managed k8s
▪ Scalable Object Storage
(on-premise or cloud)
▪ Scalable Compute
(virtualization, k8s, open stack,
cloud, Mesos)
▪ Distributed compute framework
(Hadoop or Open Source Spark)
▪ A query engine (Trino / Presto,
SparkSQL)
Cloud Provider Do It Yourself
Databricks
▪ Notebook for EDA
(Jupyter, Zepplin,
Domino)
▪ MLOps (Open
Source MLFlow,
Dataiku)
▪ Productionize apps
using containers
and virtualization
provider (k8s)
Why build your Agile Data Applications in a
Lakehouse
▪ Often have to pay more for storage
and over provision your compute
▪ Rework, change is expensive – not
built for agility
▪ Data is monolithic and hard to
support Data Mesh and Self Managed
Data Domains
▪ Only pay for what you use (Lower TCO)
▪ Agility and change is part of the Data
Application Lifecycle
▪ Easily supports Data Applications per
Project, Team or Domain easily
supporting Data Mesh paradigm
Agile Data Applications in
Lakehouse
Datawarehouse or First Gen Data Lake
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Weitere ähnliche Inhalte

Was ist angesagt?

Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...HostedbyConfluent
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know SnowflakeKnoldus Inc.
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 

Was ist angesagt? (20)

Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 

Ähnlich wie Architecting Agile Data Applications for Scale

Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"jstrobl
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 

Ähnlich wie Architecting Agile Data Applications for Scale (20)

Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Architecting Agile Data Applications for Scale

  • 1. Architecting Agile Data Applications for Scale Richard Garris AVP, Field Engineering @ Databricks
  • 2. Agenda ▪ About Me ▪ The World’s Most Valuable Companies ▪ Waterfall to Agile ▪ Traditional Data Platforms ▪ Modern Data Platforms ▪ Summary
  • 3. About Me Ohio State and CMU Graduate Almost 2 decades in the data space IMS Database IBM Mainframe ➤ Oracle / SQL Server ➤ Big Data (Hadoop, Spark) • 3 years as an independent consultant • 5 years at PwC in the Data Management practice • 3.5 years at Google on their data team • 6 years at Databricks Certified Scrum Master Other Talks • Apache Spark and Agile Model Development on Data Science Central • https://vimeo.com/239762464 • ETL 2.0: Data Engineering using Azure Databricks and Apache Spark on MSDN Channel 9 • https://channel9.msdn.com/Events/Connect/2017/E108
  • 4. The World’s Most Valuable Companies (in Billions) Top 15 Companies in 2021 Top 15 Companies in 1993 Source: Data is Beautiful https://www.youtube.com/channel/UCkWbqlDAyJh2n8DN5X6NZyg Source: MGMResearch https://mgmresearch.com/top-us-companies-by-market-cap/
  • 5.
  • 6. What do the top five companies do differently?
  • 7. What do the top five companies do differently? Not really. FAAMG may have some unique datasets e.g. 2.7 B user profiles, search results, etc.. but the other Fortune 500, commercial, mid-market, and digital native companies and public sector organizations have a lot of data too! Not really. They did at one point in time but many of the AI, DL and ML algorithms are available in Open Source (TensorFlow, PyTorch, Mxnet, LightGBM) or has been released in research papers. Better AI, DL, ML algorithms? Not really. At one point, Google, Amazon, Microsoft, Facebook and Apple had the best infrastructure in the world to process data but public cloud gives everyone access to most of that. There are also open source and commercial software available to anyone who want to process Big Data at scale Better data processing? Lots and lots of data?
  • 8. 36,000 60,000 90,000 20,000 60,000 Any guesses to what these numbers are? Number of Engineers* *Estimated using Glassdoor, Public Job postings, financials (R&D spend as a % of total FTE) – no confidential information was used to derive these values and the exact number of engineers is not public information
  • 9. What do engineers bring to the modern enterprise? Agile Application Development Lifecycle
  • 10. Startups don’t change the world, they adapt to the world faster than everyone else -Jacques Benkoski, USVP
  • 11. But what does this have to do with Data Applications? These companies also brought Agile to Data Applications and that’s what makes them competitive!
  • 12. What are Agile Data Applications? ▪ Self contained end-to-end projects to a data problem ▪ Built by data developers using open source programming languages ▪ Follow good software engineering principles ▪ Can leverage algorithms and analytics ▪ Scalable both in terms of big data and total cost of ownership ▪ Meets the responsiveness requirements of end users ▪ Deployable into a production environment
  • 13. The Waterfall Development Methodology 1990-early 2000s Concept & Requirements Analysis & Design Develop & Implement Test & QA Deploy & Maintain One stage always follows the previous one and it’s hard to accommodate changes
  • 14. Traditional Data Architecture (worked well with Waterfall) Operational Systems Staging Area Enterprise Data Warehouse Data Marts Users/Analysis / Predictive Analytics Inventory Sales Purchasing ERP Sales Data Finance Data E T L Extract, Transform and Load DBAs ETL Designers BI Analysts Data Modeling Database Admin & Security Tuning Mostly work in GUI ETL Tools SQL / Stored Procs (CTAS) Limited to BI tools / Report Designers Limited SQL Business Define the Requirements Domain Experts 1990- early 2000s Most operational systems only housed structured data with small data volumes GUI tools Informatica, ODI, Ab Initio, DataStage, SQL or Stored Procedures Large volumes of data often kept in unmanaged staging tables that were often archived to save cost The “single source of truth” was the monolithic data warehouse Inflexible model because to add a single column for a report or model downstream could take 6 months because the tight coupling from ETL to staging to EDW to report Didn't work well for machine intelligence and AI / ML. Data mining mostly used for R&D and limited to the refined and aggregated data. DBAs, ETL Designers, BI dominate the Traditional Approach – no real data engineers or data scientists Most EDWs were sold as expensive appliances with data locked into a proprietary format with combined compute and storage Only way to scale out is to buy more appliances Minimal support for arbitrary files, semi structured, unstructured or streaming sources Worked well for human intelligence like static reports and dashboards
  • 15. Pure Agile Development Methodology Mid 2000s – early 2010s • Agile Manifesto • Agile introduced change as part of the process • Early versions of Agile (Scrum and XP) worked well for small self-managed teams • It didn’t scale well to larger teams and the needs of larger enterprises • It also lacked some of the discipline of Waterfall
  • 16. Open Data Lake Architecture (like pure Agile) Hadoop Data Lake Hadoop Admin Hadoop Dev Analysts Administer the Cluster Manage HDFS, YARN, Applications Tuning Map Reduce, Pig, Spark, Cascading, Mahout … Java Developers Hive, Impala, Drill, LLAP (or BI tools) Mid 2000s to early 2010s HDFS Map Reduce Spark Hive Mahout Enterprise Data Warehouse YARN Scheduler Machine Data CRM Finance Data New Sources Geospatial Sensor / Logs Clickstream Data ERP Finance Data Supports new sources like web scale data, SaaS sources, operational systems with structured data (sequence files, parquet), semi structured (logs, JSON) and unstructured (images, audio, video) because everything is a file Distributed file system built on commodity servers Could handle high volumes, velocity, and variety of data Applications could be written and deployed inside Hadoop using YARN in Java, Scala, Python, Hive (SQL), Pig, Mahout for ML Commodity servers used to scale out compute for analytics Initially cheaper because you used commodity servers versus specialized hardware like with an EDW, but because compute and storage were paired together you had to buy more servers for storage even if you didn’t need more compute Mixed bag on performance – allowed scale out of compute resources but tuning Hadoop and YARN as well as the query engines like Impala, Hive, Hive variants like Hive LLAP is difficult Schema on read versus schema on write created a ton of agility, but the lack of schema enforcement and reliability of the data became an issue at scale (hence the Data Lake becoming a Data Swamp) Still had some monolithic attributes that are a better fit for waterfall (e.g because all of the applications run inside Hadoop you have to upgrade all your applications when you upgrade the cluster The goal and promise of Hadoop was to offload or replace the EDW but that didn’t really happen Required specialized people to manage and develop on Hadoop (Admin, trained developers) and ultimately difficult with so many specialized divergent frameworks (MapReduce, Tez, Hive, SQL on Hadoop, Spark, Flink, Storm, Mahout, Cascading) Analysts and Business Users don’t concern themselves with the infrastructure so were shielded from the complexity but would complain if SLAs weren’t being met and would fallback to the EDW
  • 17. Modern Agile (Hybrid, Disciple Agile Delivery, SAFe) Mid 2010s – Today Source: PMI Institute
  • 18. The Next Hybrid is the Modern Lakehouse Platform (Data Lake + Data Warehouse) Late 2010s – 2020s and beyond Machine Data CRM Finance Data New Sources Geospatial Sensor / Logs Clickstream Data BRONZE SILVER DOGECOIN Landing Refined Aggregates Open Cloud Storage (S3, ADLS, GCS) Schema / ACID (Delta Lake, Iceberg, HUDI) (Ingestion Tools) Customer Facing Applications The Modern Open Lakehouse Downstream Specialized Data Stores ERP Finance Data Legacy Stores Internal Analytics Supports Old and New Sources Stored in Open Storage (Open Format, Reliable and Infinitely Scalable) Data management layer for reliability and schema Multiple layers to support staging to production grade tables Agile data application platform that separates compute and code from storage Internal applications (dashboards, reports, custom apps) External customer facing applications (end to end model lifecyle, recommendation systems, customer facing applications Move downstream specialized data stores like graph databases, NoSQL, SQL like MPP or EDWs Supports structured (tables), semi-structured (logs, JSON) and unstructured (Images, Audio, Video), Live Data ( Streaming) Scalability of the cloud and multi-cloud
  • 19. Modern Data Personas ▪ Great for Data Scientists ▪ Data Science is a science – constant evolution through experiments and hypotheses is part of the process ▪ Moves data scientist toward secure and scalable compute and off their laptops with R / Python / SAS ▪ Data scientists often need access to the raw or bronze transaction data for modeling and that’s often expensive or hard to justify storing in the EDW or get access to and use from Hadoop ▪ Great for Data Engineering ▪ Data Engineers are developers ▪ Write code in standard programming languages (Java, Scala, Python) not proprietary stored procedures ▪ They should write high quality production code that is testable, reusable and modular and can be continuously integrated and deployed (CI/CD) ▪ Great for Data Analysts ▪ Data Analysts want more data and they want data faster ▪ SQL skills are expected and even some light Python or R for advanced analytics A Lakehouse is a Hybrid that supports the Modern Data Scientist, Data Engineer and Data Analysts
  • 20. Why Cloud? ▪ Agile infrastructure that is infinitely scalable ▪ Separates compute from storage (scale compute as needed, scale storage without thinking about it) ▪ Infrastructure as code and part of the CI/CD process ▪ No need to hard code to the infrastructure for deployment ▪ Reliable, fault tolerant and recoverable ▪ Pipeline runs independent of the compute so server outages don’t stop production pipelines ▪ Can handle cases where a node or two fails but the job continues because failure is inevitable at scale ▪ If a job does fail, then the integrity of the data is not compromised and you can recover ▪ Portable ▪ Portable across different types of compute ▪ Portable across different clouds Cloud brings agility to Data Applications when done right
  • 21. What about Data Mesh? Data Mesh is an architectural pattern introduced by Zhamak Dehghani of Thoughtworks in How to Move Beyond a Monolithic Data Lake to a Distributed domain-driven design • Data is a product • Data is a business asset • Data should be monetized otherwise it becomes a liability • Data belongs to decentralized domains or product owners • Each team is self managed • But the governance and standards are centralized to allow for interoperability and data sharing • Sounds a lot like the Hybrid Agile + Lakehouse in the Cloud approach!
  • 22. Lakehouse Technology Choices ▪ Cloud-native (separates compute from storage, autoscaling, cost management) ▪ Multi-cloud (AWS, Azure, Google) ▪ Open formats (Delta Lake, Parquet, AVRO, JSON) ▪ Open source (Scala, Python, SQL, R, Spark) ▪ Machine Learning and Data Science out of the box (Notebooks, Juptyer, MLFlow) ▪ Supports agile with IDE integration and Projects in the Workspace ▪ Production apps with DBConnect and SQLAnalytics or JDBC ▪ Cloud Hadoop (EMR, HDI, DataProc) ▪ Use cloud storage (S3, ADLS, GCS) ▪ Query your Data Lake directly (RedShift Spectrum / Athena, Azure Synapse, or Big Query External Tables) ▪ Connect your choice of Notebook for Exploratory Data Analysis ▪ Connect your choice of MLOps tool (SageMaker, Azure Machine Learing, Google Cloud AI Platform) ▪ Productionize apps using containers and managed k8s ▪ Scalable Object Storage (on-premise or cloud) ▪ Scalable Compute (virtualization, k8s, open stack, cloud, Mesos) ▪ Distributed compute framework (Hadoop or Open Source Spark) ▪ A query engine (Trino / Presto, SparkSQL) Cloud Provider Do It Yourself Databricks ▪ Notebook for EDA (Jupyter, Zepplin, Domino) ▪ MLOps (Open Source MLFlow, Dataiku) ▪ Productionize apps using containers and virtualization provider (k8s)
  • 23. Why build your Agile Data Applications in a Lakehouse ▪ Often have to pay more for storage and over provision your compute ▪ Rework, change is expensive – not built for agility ▪ Data is monolithic and hard to support Data Mesh and Self Managed Data Domains ▪ Only pay for what you use (Lower TCO) ▪ Agility and change is part of the Data Application Lifecycle ▪ Easily supports Data Applications per Project, Team or Domain easily supporting Data Mesh paradigm Agile Data Applications in Lakehouse Datawarehouse or First Gen Data Lake
  • 24. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.