SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
The Evolving Landscape of
Data Engineering
Bucharest Big Data Meetup @ TechHub
Andrei Savu / @andreisavu
Andrei Savu
Currently Staff Engineer @ Twitter:
* Twitter Ad Exchange Data Team
* Focus on Mobile Monetization
Co-organizer of the Data Engineering
Club in San Francisco.
Previously Tech Lead at Cloudera via
the Axemblr.com acquisition. Started
the Cloud engineering team.
One of the early founders of the
Bucharest Java User Group.
What is data engineering?
The Past / Drivers of innovation:
● OSS communities
● AWS history
● Google Cloud history
The Present: Common Patterns
The Future: Wish List
Where do I start?
Topics
What is data engineering? (vs. data science, vs. ML)
“Unlike data scientists — and inspired by
our more mature parent, software
engineering — data engineers build tools,
infrastructure, frameworks, and services. In fact,
it’s arguable that data engineering is much
closer to software engineering than it is to a data
science.”
Maxime Beauchemin
The Rise of the Data Engineer
Weeks of Provisioning
Static Infrastructure
Commodity Hardware
Commodity Networking
Data Locality Important
Running in the Public
Cloud was unusual
CAPEX
The Past - OSS
Visionary Business
Fast iterations
Data Management as a
key platform use case
Incredible Scale
Transition to “serverless”
OPEX & Elastic
The Past - AWS
Visionary Products
Fast iterations
Machine Learning as a key
use case
State of the Art data
platform
Last 3 years on fast
forward
Intelligent Billing
OPEX & Elastic
The Past - Google Cloud
The Present: Patterns
Weeks to Minutes to Seconds
Hadoop/Spark ecosystem is mature and
continues to innovate.
We have a broad set of options.
Big Data is much bigger (e.g. x1e.32xlarge:
3TB mem, 128 vCPUs, 14Gbps network)
Scale continues to be hard.
Cloud economics can be very disruptive
(especially for data workloads)
High-performance networks are common.
Storage can be decoupled from compute.
Zone/DC locality is important (laws of physics)
Service Endpoints (not clusters, aka serverless,
aka managed etc.).
Sophisticated Auto-scaling (batch & streaming,
spot vs. on-demand, multi-az).
Multi-DC and Multi-Region from Day 1.
The Future: Wish List
A Data Catalog product as the center of the
universe.
Data Monitoring Systems:
* statistical properties, anomaly detection,
schema changes, consumption patterns etc.
More intelligence at the data infrastructure level:
* data format migrations, intelligent caching
based on access patterns.
Declarative data transformation vs. explicit ETL.
Intelligent data sampling products. Cost will
continue to be a concerns even when scale is
not.
Where do I start?
Technologies:
● SQL + Python
● Pandas + Numpy
● Jupyter or Zeppelin
● Spark
Google Cloud:
● https://www.coursera.org/specializations/g
cp-data-machine-learning ($300 credit)
Domain Knowledge:
● Critical business questions
● The data needed to answer them
● Understand access patterns
Thanks! Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

bringing Library and Researcher/Developer communities together to bridge the ...
bringing Library and Researcher/Developer communities together to bridge the ...bringing Library and Researcher/Developer communities together to bridge the ...
bringing Library and Researcher/Developer communities together to bridge the ...ARDC
 
Cloud computing 2 business perspective of cloud computing
Cloud computing 2 business perspective of cloud computingCloud computing 2 business perspective of cloud computing
Cloud computing 2 business perspective of cloud computingVaibhav Khanna
 
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)ijccsa
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big DataShankar R
 
20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_outputericpi Bi
 
Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?Saratoga
 
Cloud Services for Repositories
Cloud Services for RepositoriesCloud Services for Repositories
Cloud Services for RepositoriesEduserv
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud ComputingKelvin Lam
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An IntroductionShankar R
 
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKSMULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKSI3E Technologies
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 
The world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AIThe world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AIMeenakshiGupta127
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataMelissa Hornbostel
 

Was ist angesagt? (20)

Grid
GridGrid
Grid
 
bringing Library and Researcher/Developer communities together to bridge the ...
bringing Library and Researcher/Developer communities together to bridge the ...bringing Library and Researcher/Developer communities together to bridge the ...
bringing Library and Researcher/Developer communities together to bridge the ...
 
Cloud computing 2 business perspective of cloud computing
Cloud computing 2 business perspective of cloud computingCloud computing 2 business perspective of cloud computing
Cloud computing 2 business perspective of cloud computing
 
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 
Cloudant
CloudantCloudant
Cloudant
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Survey on NoSQL integration
Survey on NoSQL integrationSurvey on NoSQL integration
Survey on NoSQL integration
 
20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output
 
Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?
 
Cloud Services for Repositories
Cloud Services for RepositoriesCloud Services for Repositories
Cloud Services for Repositories
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Bigdata
BigdataBigdata
Bigdata
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKSMULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
The world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AIThe world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AI
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
11
1111
11
 
10
1010
10
 

Ähnlich wie The Evolving Landscape of Data Engineering

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringAndrei Savu
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用lantianlcdx
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Reference Architectures for Layered CPS System of Systems using Data Hubs and...
Reference Architectures for Layered CPS System of Systems using Data Hubs and...Reference Architectures for Layered CPS System of Systems using Data Hubs and...
Reference Architectures for Layered CPS System of Systems using Data Hubs and...Bob Marcus
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureNguyen Duong
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of ItAman Ghei
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud ComputingAnimesh Chaturvedi
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsBob Marcus
 
Cloud Computing .ppt
Cloud Computing .pptCloud Computing .ppt
Cloud Computing .pptPrukaBay
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SLSkylabReddy Vanga
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastInside Analysis
 

Ähnlich wie The Evolving Landscape of Data Engineering (20)

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Reference Architectures for Layered CPS System of Systems using Data Hubs and...
Reference Architectures for Layered CPS System of Systems using Data Hubs and...Reference Architectures for Layered CPS System of Systems using Data Hubs and...
Reference Architectures for Layered CPS System of Systems using Data Hubs and...
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sure
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of It
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Openstack
OpenstackOpenstack
Openstack
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical Systems
 
ppt2.pdf
ppt2.pdfppt2.pdf
ppt2.pdf
 
Cloud Computing .ppt
Cloud Computing .pptCloud Computing .ppt
Cloud Computing .ppt
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SL
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 

Mehr von Andrei Savu

Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Andrei Savu
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAndrei Savu
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupAndrei Savu
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Andrei Savu
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Andrei Savu
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAndrei Savu
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUGAndrei Savu
 
Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Andrei Savu
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverAndrei Savu
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with DropwizardAndrei Savu
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Andrei Savu
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Andrei Savu
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudAndrei Savu
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Andrei Savu
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Andrei Savu
 

Mehr von Andrei Savu (20)

Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
 
Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
 
Apache Whirr
Apache WhirrApache Whirr
Apache Whirr
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
 

Kürzlich hochgeladen

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 

Kürzlich hochgeladen (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

The Evolving Landscape of Data Engineering

  • 1. The Evolving Landscape of Data Engineering Bucharest Big Data Meetup @ TechHub Andrei Savu / @andreisavu
  • 2. Andrei Savu Currently Staff Engineer @ Twitter: * Twitter Ad Exchange Data Team * Focus on Mobile Monetization Co-organizer of the Data Engineering Club in San Francisco. Previously Tech Lead at Cloudera via the Axemblr.com acquisition. Started the Cloud engineering team. One of the early founders of the Bucharest Java User Group.
  • 3. What is data engineering? The Past / Drivers of innovation: ● OSS communities ● AWS history ● Google Cloud history The Present: Common Patterns The Future: Wish List Where do I start? Topics
  • 4. What is data engineering? (vs. data science, vs. ML) “Unlike data scientists — and inspired by our more mature parent, software engineering — data engineers build tools, infrastructure, frameworks, and services. In fact, it’s arguable that data engineering is much closer to software engineering than it is to a data science.” Maxime Beauchemin The Rise of the Data Engineer
  • 5. Weeks of Provisioning Static Infrastructure Commodity Hardware Commodity Networking Data Locality Important Running in the Public Cloud was unusual CAPEX The Past - OSS
  • 6. Visionary Business Fast iterations Data Management as a key platform use case Incredible Scale Transition to “serverless” OPEX & Elastic The Past - AWS
  • 7. Visionary Products Fast iterations Machine Learning as a key use case State of the Art data platform Last 3 years on fast forward Intelligent Billing OPEX & Elastic The Past - Google Cloud
  • 8. The Present: Patterns Weeks to Minutes to Seconds Hadoop/Spark ecosystem is mature and continues to innovate. We have a broad set of options. Big Data is much bigger (e.g. x1e.32xlarge: 3TB mem, 128 vCPUs, 14Gbps network) Scale continues to be hard. Cloud economics can be very disruptive (especially for data workloads) High-performance networks are common. Storage can be decoupled from compute. Zone/DC locality is important (laws of physics) Service Endpoints (not clusters, aka serverless, aka managed etc.). Sophisticated Auto-scaling (batch & streaming, spot vs. on-demand, multi-az). Multi-DC and Multi-Region from Day 1.
  • 9. The Future: Wish List A Data Catalog product as the center of the universe. Data Monitoring Systems: * statistical properties, anomaly detection, schema changes, consumption patterns etc. More intelligence at the data infrastructure level: * data format migrations, intelligent caching based on access patterns. Declarative data transformation vs. explicit ETL. Intelligent data sampling products. Cost will continue to be a concerns even when scale is not.
  • 10. Where do I start? Technologies: ● SQL + Python ● Pandas + Numpy ● Jupyter or Zeppelin ● Spark Google Cloud: ● https://www.coursera.org/specializations/g cp-data-machine-learning ($300 credit) Domain Knowledge: ● Critical business questions ● The data needed to answer them ● Understand access patterns