SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Lukáš Vereš May 3, 2021
Webinar:
Building big data pipelines: Lessons learned
2
About me
› Lead big data delivery projects; act as
a delivery lead and solution architect
› Projects focused on the creation of
pipeline generation frameworks and
data delivery for customers
› Experience from big pharma and
finance-related companies
› 10+ years experience in the industry
Lukáš Vereš
Delivery lead for big
data projects
3
PROFINIT
Our competencies
Company stats
SOFTWARE DEVELOPMENT
APPLICATION OUTSOURCING
ENTERPRISE INTEGRATION
BUSINESS INTELLIGENCE/DWH
BIG DATA AND DATA SCIENCE
22+ yrs.
On the
tech market
since 1998.
Prague
Headquarters
at cenrte of
Europe.
500+
Experienced
and enthusiastic
professionals.
Top 3
CAD company
in Czech Republic
(IDC study).
26M €
Company
revenue in
2019.
Multiple areas
Clients from
Finance, Insurance
and Telco industry.
50+
We serve many
prominent world
clients
Certifications, culture & quality
A long history of technical engineering excellence has
lead western companies to rely heavily on skills and
expertise from the Czech Repubic. We are proud of quality
of our services and the certificates ISO 9001, ISO 27001,
ISO 20000, PRINCE 2, underpinning our commitment
to provide high quality sustainable services.
ISO 9000 ISO 270000 ISO 20000
What am I going to talk about?
5
Today’s topics
› Personal experience
› Lessons learned
› Big data
What is big data?
7
The Original Vs
› Velocity
› Volume
› Variety
8
The business perspective on big data
› Validity
– Is the system under development or is it stable?
– Are data secure and can you trust them?
– Are the data compliant with laws and regulatory policies like the GDPR and CCPA?
› Value
– Set up your value objectives and then use the chosen metrics
› Visualisation
– All data flows and processes need to be monitored and illustrated descriptively
– Understand what is actually being carried out and how
9
Technical perspective on big data
› Data storage
– Data storage systems:
HDFS, GlusterFS, etc.
– File systems with internal
structures: AVRO,
PARQUET, DeltaLake, etc.
› Data processing
– Data transformation: Spark,
MapRed, etc.
– SQL engines: Hadoop, Impala,
Presto, Hive, Hbase, Phoenix
› Open technologies
– Free access with the option to
get support for special components
› Big data volume
– Datasets bigger than 1TB,
hundreds of datasets or more
from one source, different file
formats
Poll
First lesson:
“Measure twice but really cut just
once when it comes to selecting
technologies”
12
Real life stories
› Implemented two different
architectures and toolsets
for the same purpose
› Added one more reporting
tool to the pile
13
Benefits of multiple technologies
› Two different solutions for the same thing
create a competitive environment
› New ideas are created to try to differentiate
› People get the chance to decide which
technology they prefer to work with
14
Downsides of multiple technologies
› Choosing a toolset can add more work to onboarding projects
› Less transparency in decision-making
› Every tool has to be supported,
adding complexity to the infrastructure
15
Key Takeaways
› Do your work and write down any discrepancies and recommendations
› Be transparent in your decision-making
› IT teams need to support business teams by teaching them how to use
existing tools
Second lesson:
“Communication works for those
who work at it”
17
Real life stories
› The development team relocated a support
team member to work with them on product
development
› A member of the support team had
a workstation next to the developers
› The data warehouse team struggled to
understand the impact of a data lake
on their transformations
18
Benefits of Intensive Collaboration
› Helps the support team understand the technology and improves
constructive discussion
› Involvement of the other team can improve the quality of the delivery
› Decreases frustration in teams where there are misunderstandings
19
Downsides of Intensive Collaboration
› It consumes the capacity of the support team member
› Not everyone is both a good learner and a good teacher
20
Key Takeaways
› Invest in teams, not only in individuals
› It is a process, not a one-time experience;
it will take time to evolve
Poll
Third lesson:
“Think about data as a commodity”
23
Real life stories
› Too many data sources
to load
– The goal was to speed up
the pipeline delivery
process
– Difficulties with changing
architecture
› Developed a solution for
generating data ingest pipelines
– A semi-manual self-service
approach to speeding up delivery
– Built on the AWS platform as
serverless architecture
24
Benefits of Data Pipelines
› Speeding up pipeline development can increase data scientists’
and data analysts’ interest in getting data in a more standardized
way
› Standardization of data ingest increases the data quality of work
done by data scientists and data analysts
› Serverless architecture
25
Downsides of Data Pipelines
› Managing the lifecycle of data sources
› It takes time to build it
› Transparent costs
26
Key Takeaways
› Build a framework for pipeline generation; it will pay
off in the long run
– Save money on support
– Unified way to give analytics teams access in a standardized
way
› Think about how to load datasets to target systems
faster
– Open new possibilities for business customers
– Give even very small customers without big budgets access
to data
Fourth lesson:
“Involve business from the beginning”
28
Real life stories
› Implemented framework
from scratch without
involving the business
side from the beginning
› Loading all datasets in the
data lake to prepare it for the
data analyst or data scientist
was the wrong dogma
29
Benefits of Late Business Involvement
› Gives developers space to focus on technology and ideas
› Can try new things with dead ends
› Loading all datasets ahead of time gets data closer to the data
scientists and data analysts so it is ready anytime they need it
30
Downsides of Late Business Involvement
› If developers have more space, they might build
a solution that doesn’t fit real use cases
› Source systems are changing, and businesses
need to pay to support these changes and
impacts
31
Key Takeaways
› Think about when the right time is to involve people from
the business side
– The business can give developers space to work on a framework,
but at the same time, they should provide specific use cases
› The dogma for loading all datasets in advance is wrong
– Higher costs for pipeline support
– Frustration from fixing issues on pipelines that no one uses
– The focus is on fixing bugs instead of delivering higher quality
Summary
33
Lessons to be learned from this presentation
› Be constructive and honest when choosing technologies
› Have people work with different teams
› Deliver datasets from source systems faster
› Create solutions around business use cases
Profinit EU, s.r.o.
Tychonova 2, 160 00 Prague 6 | Phone + 420 224 316 016
Web
www.profinit.eu
LinkedIn
linkedin.com/company/profinit
Twitter
twitter.com/Profinit_EU
Facebook
facebook.com/Profinit.EU
Youtube
Profinit EU
Thank you
for your attention
35
We need your help to be better!
› Since you are here, please help us
improve our events and webinars
and take a look at our short survey.
We appreciate your interest to help
us grow. www.bigdataforbanking.com
linkedin.com/company/profinit
www.profinit.eu
› Contacts
Lukáš Vereš
lukas.veres@profinit.eu
Delivery lead for big data projects

Weitere ähnliche Inhalte

Was ist angesagt?

Cloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop SampleCloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop SampleAlan Quayle
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsDebmalya Biswas
 
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...Edge AI and Vision Alliance
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICEPooyan Jamshidi
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLASLA-Ready Network
 
Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011QITCOM
 
RECAP Project Overview
RECAP Project OverviewRECAP Project Overview
RECAP Project OverviewRECAP Project
 
Cloud migration
Cloud migration Cloud migration
Cloud migration deszal
 
Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Obeo
 
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...Cedar Consulting
 
NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .hpcexperiment
 
Identifying Workloads to Move to the Cloud
Identifying Workloads to Move to the CloudIdentifying Workloads to Move to the Cloud
Identifying Workloads to Move to the CloudRightScale
 
Engineering Simulation Meets the Cloud
Engineering Simulation Meets the CloudEngineering Simulation Meets the Cloud
Engineering Simulation Meets the Cloudhpcexperiment
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisJensNimis
 
The Cloud Presentation 2016
The Cloud Presentation 2016The Cloud Presentation 2016
The Cloud Presentation 2016Joel Kline
 

Was ist angesagt? (20)

Cloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop SampleCloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop Sample
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare Applications
 
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA
 
Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011
 
RECAP Project Overview
RECAP Project OverviewRECAP Project Overview
RECAP Project Overview
 
Cloud migration
Cloud migration Cloud migration
Cloud migration
 
Coud discovery chap 10
Coud discovery chap 10Coud discovery chap 10
Coud discovery chap 10
 
Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...
 
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
 
NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .
 
Identifying Workloads to Move to the Cloud
Identifying Workloads to Move to the CloudIdentifying Workloads to Move to the Cloud
Identifying Workloads to Move to the Cloud
 
Engineering Simulation Meets the Cloud
Engineering Simulation Meets the CloudEngineering Simulation Meets the Cloud
Engineering Simulation Meets the Cloud
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens Nimis
 
Cloud migration
Cloud migration Cloud migration
Cloud migration
 
The Cloud Presentation 2016
The Cloud Presentation 2016The Cloud Presentation 2016
The Cloud Presentation 2016
 
Cloud technology for hospitality
Cloud technology for hospitalityCloud technology for hospitality
Cloud technology for hospitality
 
Coud discovery chap 9
Coud discovery chap 9Coud discovery chap 9
Coud discovery chap 9
 

Ähnlich wie Building big data pipelines—lessons learned

Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)Denodo
 
Understand your data dependencies – Key enabler to efficient modernisation
 Understand your data dependencies – Key enabler to efficient modernisation  Understand your data dependencies – Key enabler to efficient modernisation
Understand your data dependencies – Key enabler to efficient modernisation Profinit
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)Denodo
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationSnapLogic
 
GHD iConnect - our intranet for the future
GHD iConnect - our intranet for the futureGHD iConnect - our intranet for the future
GHD iConnect - our intranet for the futureMaree Courts
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceData Science Milan
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsDATAVERSITY
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 
Webinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj KasturiWebinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj KasturioGuild .
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDLT Solutions
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 

Ähnlich wie Building big data pipelines—lessons learned (20)

Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
 
Understand your data dependencies – Key enabler to efficient modernisation
 Understand your data dependencies – Key enabler to efficient modernisation  Understand your data dependencies – Key enabler to efficient modernisation
Understand your data dependencies – Key enabler to efficient modernisation
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
GHD iConnect - our intranet for the future
GHD iConnect - our intranet for the futureGHD iConnect - our intranet for the future
GHD iConnect - our intranet for the future
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Webinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj KasturiWebinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj Kasturi
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 

Mehr von Profinit

Reference Data Management
Reference Data ManagementReference Data Management
Reference Data ManagementProfinit
 
Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for BanksProfinit
 
Legacy systems modernisation
Legacy systems modernisationLegacy systems modernisation
Legacy systems modernisationProfinit
 
Automating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data StoresAutomating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data StoresProfinit
 
4 Steps Towards Data Transparency
4 Steps Towards Data Transparency4 Steps Towards Data Transparency
4 Steps Towards Data TransparencyProfinit
 
Software systems modernisation
Software systems modernisationSoftware systems modernisation
Software systems modernisationProfinit
 
Odborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum MobileOdborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum MobileProfinit
 
Data Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí clouduData Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí clouduProfinit
 
Detekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přáteléDetekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přáteléProfinit
 
Výsledky backtestu propensitního modelu
Výsledky backtestu propensitního modeluVýsledky backtestu propensitního modelu
Výsledky backtestu propensitního modeluProfinit
 
Propensitní modelování
Propensitní modelováníPropensitní modelování
Propensitní modelováníProfinit
 
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...Profinit
 
Profinit webinar: Instalment Detector
Profinit webinar: Instalment DetectorProfinit webinar: Instalment Detector
Profinit webinar: Instalment DetectorProfinit
 
Profinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publishProfinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publishProfinit
 
2019 09-23-snidane qa-public
2019 09-23-snidane qa-public2019 09-23-snidane qa-public
2019 09-23-snidane qa-publicProfinit
 
2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-full2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-fullProfinit
 
2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyne2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyneProfinit
 
Matedatový sklad
Matedatový skladMatedatový sklad
Matedatový skladProfinit
 
Projekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza CoinmateProjekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza CoinmateProfinit
 
Projekt Edenred Cafeteria
Projekt Edenred CafeteriaProjekt Edenred Cafeteria
Projekt Edenred CafeteriaProfinit
 

Mehr von Profinit (20)

Reference Data Management
Reference Data ManagementReference Data Management
Reference Data Management
 
Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for Banks
 
Legacy systems modernisation
Legacy systems modernisationLegacy systems modernisation
Legacy systems modernisation
 
Automating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data StoresAutomating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data Stores
 
4 Steps Towards Data Transparency
4 Steps Towards Data Transparency4 Steps Towards Data Transparency
4 Steps Towards Data Transparency
 
Software systems modernisation
Software systems modernisationSoftware systems modernisation
Software systems modernisation
 
Odborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum MobileOdborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum Mobile
 
Data Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí clouduData Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí cloudu
 
Detekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přáteléDetekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přátelé
 
Výsledky backtestu propensitního modelu
Výsledky backtestu propensitního modeluVýsledky backtestu propensitního modelu
Výsledky backtestu propensitního modelu
 
Propensitní modelování
Propensitní modelováníPropensitní modelování
Propensitní modelování
 
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
 
Profinit webinar: Instalment Detector
Profinit webinar: Instalment DetectorProfinit webinar: Instalment Detector
Profinit webinar: Instalment Detector
 
Profinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publishProfinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publish
 
2019 09-23-snidane qa-public
2019 09-23-snidane qa-public2019 09-23-snidane qa-public
2019 09-23-snidane qa-public
 
2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-full2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-full
 
2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyne2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyne
 
Matedatový sklad
Matedatový skladMatedatový sklad
Matedatový sklad
 
Projekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza CoinmateProjekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza Coinmate
 
Projekt Edenred Cafeteria
Projekt Edenred CafeteriaProjekt Edenred Cafeteria
Projekt Edenred Cafeteria
 

Kürzlich hochgeladen

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Building big data pipelines—lessons learned

  • 1. Lukáš Vereš May 3, 2021 Webinar: Building big data pipelines: Lessons learned
  • 2. 2 About me › Lead big data delivery projects; act as a delivery lead and solution architect › Projects focused on the creation of pipeline generation frameworks and data delivery for customers › Experience from big pharma and finance-related companies › 10+ years experience in the industry Lukáš Vereš Delivery lead for big data projects
  • 3. 3 PROFINIT Our competencies Company stats SOFTWARE DEVELOPMENT APPLICATION OUTSOURCING ENTERPRISE INTEGRATION BUSINESS INTELLIGENCE/DWH BIG DATA AND DATA SCIENCE 22+ yrs. On the tech market since 1998. Prague Headquarters at cenrte of Europe. 500+ Experienced and enthusiastic professionals. Top 3 CAD company in Czech Republic (IDC study). 26M € Company revenue in 2019. Multiple areas Clients from Finance, Insurance and Telco industry. 50+ We serve many prominent world clients Certifications, culture & quality A long history of technical engineering excellence has lead western companies to rely heavily on skills and expertise from the Czech Repubic. We are proud of quality of our services and the certificates ISO 9001, ISO 27001, ISO 20000, PRINCE 2, underpinning our commitment to provide high quality sustainable services. ISO 9000 ISO 270000 ISO 20000
  • 4. What am I going to talk about?
  • 5. 5 Today’s topics › Personal experience › Lessons learned › Big data
  • 6. What is big data?
  • 7. 7 The Original Vs › Velocity › Volume › Variety
  • 8. 8 The business perspective on big data › Validity – Is the system under development or is it stable? – Are data secure and can you trust them? – Are the data compliant with laws and regulatory policies like the GDPR and CCPA? › Value – Set up your value objectives and then use the chosen metrics › Visualisation – All data flows and processes need to be monitored and illustrated descriptively – Understand what is actually being carried out and how
  • 9. 9 Technical perspective on big data › Data storage – Data storage systems: HDFS, GlusterFS, etc. – File systems with internal structures: AVRO, PARQUET, DeltaLake, etc. › Data processing – Data transformation: Spark, MapRed, etc. – SQL engines: Hadoop, Impala, Presto, Hive, Hbase, Phoenix › Open technologies – Free access with the option to get support for special components › Big data volume – Datasets bigger than 1TB, hundreds of datasets or more from one source, different file formats
  • 10. Poll
  • 11. First lesson: “Measure twice but really cut just once when it comes to selecting technologies”
  • 12. 12 Real life stories › Implemented two different architectures and toolsets for the same purpose › Added one more reporting tool to the pile
  • 13. 13 Benefits of multiple technologies › Two different solutions for the same thing create a competitive environment › New ideas are created to try to differentiate › People get the chance to decide which technology they prefer to work with
  • 14. 14 Downsides of multiple technologies › Choosing a toolset can add more work to onboarding projects › Less transparency in decision-making › Every tool has to be supported, adding complexity to the infrastructure
  • 15. 15 Key Takeaways › Do your work and write down any discrepancies and recommendations › Be transparent in your decision-making › IT teams need to support business teams by teaching them how to use existing tools
  • 16. Second lesson: “Communication works for those who work at it”
  • 17. 17 Real life stories › The development team relocated a support team member to work with them on product development › A member of the support team had a workstation next to the developers › The data warehouse team struggled to understand the impact of a data lake on their transformations
  • 18. 18 Benefits of Intensive Collaboration › Helps the support team understand the technology and improves constructive discussion › Involvement of the other team can improve the quality of the delivery › Decreases frustration in teams where there are misunderstandings
  • 19. 19 Downsides of Intensive Collaboration › It consumes the capacity of the support team member › Not everyone is both a good learner and a good teacher
  • 20. 20 Key Takeaways › Invest in teams, not only in individuals › It is a process, not a one-time experience; it will take time to evolve
  • 21. Poll
  • 22. Third lesson: “Think about data as a commodity”
  • 23. 23 Real life stories › Too many data sources to load – The goal was to speed up the pipeline delivery process – Difficulties with changing architecture › Developed a solution for generating data ingest pipelines – A semi-manual self-service approach to speeding up delivery – Built on the AWS platform as serverless architecture
  • 24. 24 Benefits of Data Pipelines › Speeding up pipeline development can increase data scientists’ and data analysts’ interest in getting data in a more standardized way › Standardization of data ingest increases the data quality of work done by data scientists and data analysts › Serverless architecture
  • 25. 25 Downsides of Data Pipelines › Managing the lifecycle of data sources › It takes time to build it › Transparent costs
  • 26. 26 Key Takeaways › Build a framework for pipeline generation; it will pay off in the long run – Save money on support – Unified way to give analytics teams access in a standardized way › Think about how to load datasets to target systems faster – Open new possibilities for business customers – Give even very small customers without big budgets access to data
  • 27. Fourth lesson: “Involve business from the beginning”
  • 28. 28 Real life stories › Implemented framework from scratch without involving the business side from the beginning › Loading all datasets in the data lake to prepare it for the data analyst or data scientist was the wrong dogma
  • 29. 29 Benefits of Late Business Involvement › Gives developers space to focus on technology and ideas › Can try new things with dead ends › Loading all datasets ahead of time gets data closer to the data scientists and data analysts so it is ready anytime they need it
  • 30. 30 Downsides of Late Business Involvement › If developers have more space, they might build a solution that doesn’t fit real use cases › Source systems are changing, and businesses need to pay to support these changes and impacts
  • 31. 31 Key Takeaways › Think about when the right time is to involve people from the business side – The business can give developers space to work on a framework, but at the same time, they should provide specific use cases › The dogma for loading all datasets in advance is wrong – Higher costs for pipeline support – Frustration from fixing issues on pipelines that no one uses – The focus is on fixing bugs instead of delivering higher quality
  • 33. 33 Lessons to be learned from this presentation › Be constructive and honest when choosing technologies › Have people work with different teams › Deliver datasets from source systems faster › Create solutions around business use cases
  • 34. Profinit EU, s.r.o. Tychonova 2, 160 00 Prague 6 | Phone + 420 224 316 016 Web www.profinit.eu LinkedIn linkedin.com/company/profinit Twitter twitter.com/Profinit_EU Facebook facebook.com/Profinit.EU Youtube Profinit EU Thank you for your attention
  • 35. 35 We need your help to be better! › Since you are here, please help us improve our events and webinars and take a look at our short survey. We appreciate your interest to help us grow. www.bigdataforbanking.com linkedin.com/company/profinit www.profinit.eu › Contacts Lukáš Vereš lukas.veres@profinit.eu Delivery lead for big data projects