SlideShare a Scribd company logo
1 of 29
Data Integration & Data QualityData Integration & Data Quality
Your open source based BI solution!!
by
Introduction to Data Quality
What is Data Quality?
Why Data Quality?
Concepts
Data Quality advantages
Data Quality & Business Intelligence
BI Tenets
Data integration
Best practices
Open Source & Data Quality
Data Quality & Pentaho Data Integration (PDI)
PDI / ETLs / Integrity / Validation
Data Cleaner
Integration Data Cleaner and PDI
Table of contents
Initial Contact
Customer Successes
Private Sector
Public Sector
Introduction to Data QualityIntroduction to Data Quality
http://optimizeyourdataquality.wordpress.com/
Introducción
What is Data Quality?What is Data Quality?
Non-standard definition
“The processes and technologies involved in
ensuring the conformance of data values to
business requirements and acceptance criteria”
Search of attributes on data:
Accuracy
Consistency
Integrity
Validity
http://unitar.org
Introduction
Why Data Quality?Why Data Quality?
Introduction
ConceptsConcepts
Data governance
Strategic decision making
improved and faster
Managing data
quality: a critical issue
Introduction
Data Quality tasks must be performed in data
integration stage
Data Quality benefitsData Quality benefits
Introduction
Suitable Customer Segmentation  Customer Satisfaction
Avoid processing unreliable data  Cost reduction
Trustable and valuable information
Improving Business Processes Increase profits
& Business& Business
IntelligenceIntelligence
What is Business Intelligence?
(BI)
The ability to apprehend the
interrelationships of presented
facts in such a way as to guide
action towards a desired goal
Data Quality & Business Intelligence
Visual tools for optimal and simple
analysis
Robust and Trustable data
Business Intelligence TenetsBusiness Intelligence Tenets
Processes involved:
•Data integration
•Efficient usage of company information
Data IntegrationData Integration
Key for any BI project
ETL = Extract, Transform and Load
Data Integration process involves data moving from different
sources, data transformation and storing in unified databases: data
warehouse / data marts.
Data Quality & Business Intelligence
Main tasks:
Extract data from multiple sources
Ensuring clean consistent data
Combining data
Load data in a DW
http://blog.bootstraptoday.com
CRM
ERP
BPM
CMS
Data Quality & Business Intelligence
CHALLENGES:
Heterogeneous data sources
Large data volumes
Improve operational efficiency
Data source synchronization
Scalability
Data integration and Data Quality, closely related conceptsData integration and Data Quality, closely related concepts
Data IntegrationData Integration
Data Quality process can be performed in different ways:
Manual  Ad-hoc queries, file searching, etc…
Automated  Included in data integration process
Both are complementary though:
Data Quality tasks as a part of Data Integration process (ETL)Data Quality tasks as a part of Data Integration process (ETL)
Data Quality & Business Intelligence
Data integrationData integration
Best ETL practicesBest ETL practices
Centralize procedures: Ensure homogeneity and consistency of data from a
great variety of sources.
Avoid redundant calculations: if a calculation has been calculated
previously, avoid repeating the same operation. Improves performance and
avoids possible inconsistencies.
Establish points of “quality control”: ensures the execution of the process at
key points and allows recording track data for future audits.
Implement information reloading processes: useful to avoid initial loading
issues/failures.
Use intermediate structures: Eases monitoring and process monitoring
Data Quality & Business Intelligence
Best ETL practicesBest ETL practices
Data Quality & Business Intelligence
Centralized and
standardized processes
Checkpoints and
registrations
Intermediate structures
Apply BI techniques to data
quality process
Analyze and take the best of
data quality results
Allows
Open SourceOpen Source &&
ETL tools and Data QualityETL tools and Data Quality
Pentaho Data Integration
Talend Open Studio
DataCleaner
Talend Data Quality
Google Refine
Open Source & Data Quality
Data Quality Open Source solutions:
Main ETL Open Source solutions
Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
Intuitive ETL tool based in jobs and transformations
Freedom to decide where and how performs tasks: profiling, cleansing,
integrity, validation; base on metadata;
Data Quality oriented components available on PDI transformations.
Not a pure profiling tool, however DataCleaner can be integrated
Plug-in architecture that allows expanding its functionalities.
Open Source & Data Quality
Open Source & Data Quality
Component variety:
Cleansing
Scripting (sql, javascript)
Validation
Statistics
Etc…
Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
Open Source & Data Quality
An accurate ETL divided in several phases is essential:
1. Preparation process
2. Data receipt
3. Data processing
4. Final Load
5. Result reports
6. Activity control
This approach allows:
Standardizing processes in an organization
Scale better by increasing the amount of sources
Centralized control of process results
Data CleanerData Cleaner
Open Source & Data Quality
Profiling tool recommended by Pentaho
Alternative tools:
Desktop tools
Web tools
PDI Plugin
Data Cleaner DesktopData Cleaner Desktop
Open Source & Data Quality
Functionalities:
Data Cleansing
Data dictionaries
definition
Search for patterns,
duplicates, null check,
etc.
Monitoring
Complete execution
stats
Etc.
Data Cleaner Monitor (web)Data Cleaner Monitor (web)
Open Source & Data Quality
Functionalities:
Centralized monitoring
Smart visualization
Schedule execution of
Data Cleaner and PDI
jobs
Create custom metrics
Etc.
Integration Data Cleaner / PDIIntegration Data Cleaner / PDI
Open Source & Data Quality
After installing PDI Data Cleaner plug-in, there are two usage possibilities:
Option A Profile data using a PDI step
Integration Data Cleaner / PDIIntegration Data Cleaner / PDI
Open Source & Data Quality
After installing PDI Data Cleaner plug-in, there are two usage possibilities:
Option B Executing a Data Cleaner job
References
International Association for Information and Data
Quality:
http://iaidq.org/
Pentaho Data Integration:
http://www.pentaho.com/explore/pentaho-data-integration/
Data Cleaner:
http://datacleaner.org/
About us
www.TodoBI.com
info@stratebi.com
www.stratebi.com
More information:
Tfno: 91.788.34.10
MadridMadrid: Pº de la Castellana, 164, 1º
BarcelonaBarcelona: C/ Valencia, 63
BrasilBrasil:: Av. Paulista, 37 4 andar

More Related Content

What's hot

Data Quality for Non-Data People
Data Quality for Non-Data PeopleData Quality for Non-Data People
Data Quality for Non-Data PeopleDATAVERSITY
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management BasicKhaled Mosharraf
 
The data quality challenge
The data quality challengeThe data quality challenge
The data quality challengeLenia Miltiadous
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
Data Quality
Data QualityData Quality
Data QualityVijaya K
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
 
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape CCG
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineerAlex Chalini
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata ManagementDATAVERSITY
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmapvictorlbrown
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyShivam Dhawan
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality StrategiesDATAVERSITY
 

What's hot (20)

Data Quality for Non-Data People
Data Quality for Non-Data PeopleData Quality for Non-Data People
Data Quality for Non-Data People
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management Basic
 
The data quality challenge
The data quality challengeThe data quality challenge
The data quality challenge
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
Data Quality
Data QualityData Quality
Data Quality
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineer
 
Ppt
PptPpt
Ppt
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 
Open data quality
Open data qualityOpen data quality
Open data quality
 
Data Quality Presentation.ppt
Data Quality Presentation.pptData Quality Presentation.ppt
Data Quality Presentation.ppt
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality Strategies
 

Similar to Data Quality Integration (ETL) Open Source

Intro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro softwareIntro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro softwarerafeq
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Softwarerafeq
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayTorana, Inc.
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information StewardVinny (Gurvinder) Ahuja
 
Enterprise Architecture
Enterprise Architecture Enterprise Architecture
Enterprise Architecture gdavie
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
DDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...Cognizant
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And IntegrityGerrit Klaschke, CSM
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxBalvinder Hira
 
Data Governance That Drives the Bottom Line
Data Governance That Drives the Bottom LineData Governance That Drives the Bottom Line
Data Governance That Drives the Bottom LinePrecisely
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS
 
Targeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise AnalyticsTargeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise AnalyticsPerficient, Inc.
 
Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Edgewater
 
Computerized system validation_final
Computerized system validation_finalComputerized system validation_final
Computerized system validation_finalDuy Tan Geek
 
The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance Precisely
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingCognizant
 
You Need a Data Catalog. Do You Know Why?
 You Need a Data Catalog. Do You Know Why? You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 

Similar to Data Quality Integration (ETL) Open Source (20)

Intro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro softwareIntro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro software
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Software
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
 
Enterprise Architecture
Enterprise Architecture Enterprise Architecture
Enterprise Architecture
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Strategy For Data Quality
Strategy For Data QualityStrategy For Data Quality
Strategy For Data Quality
 
DDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: Datakwaliteit
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptx
 
Data Governance That Drives the Bottom Line
Data Governance That Drives the Bottom LineData Governance That Drives the Bottom Line
Data Governance That Drives the Bottom Line
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master Deck
 
Targeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise AnalyticsTargeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
 
Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?
 
Computerized system validation_final
Computerized system validation_finalComputerized system validation_final
Computerized system validation_final
 
The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
You Need a Data Catalog. Do You Know Why?
 You Need a Data Catalog. Do You Know Why? You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 

More from Stratebi

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentesStratebi
 
Azure Synapse
Azure SynapseAzure Synapse
Azure SynapseStratebi
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with PythonStratebi
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with PythonStratebi
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasStratebi
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup SpainStratebi
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)Stratebi
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integrationStratebi
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data MarketingStratebi
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works Stratebi
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data AnalyticsStratebi
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosStratebi
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports AnalyticsStratebi
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme AnalysisStratebi
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIStratebi
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overviewStratebi
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalleStratebi
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con TalendStratebi
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend IntroducionStratebi
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent AnalyticsStratebi
 

More from Stratebi (20)

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentes
 
Azure Synapse
Azure SynapseAzure Synapse
Azure Synapse
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with Python
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with Python
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicas
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup Spain
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integration
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data Marketing
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y Cursos
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports Analytics
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme Analysis
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBI
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalle
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con Talend
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend Introducion
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent Analytics
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Data Quality Integration (ETL) Open Source

  • 1. Data Integration & Data QualityData Integration & Data Quality Your open source based BI solution!! by
  • 2. Introduction to Data Quality What is Data Quality? Why Data Quality? Concepts Data Quality advantages Data Quality & Business Intelligence BI Tenets Data integration Best practices Open Source & Data Quality Data Quality & Pentaho Data Integration (PDI) PDI / ETLs / Integrity / Validation Data Cleaner Integration Data Cleaner and PDI Table of contents
  • 5. Introduction to Data QualityIntroduction to Data Quality http://optimizeyourdataquality.wordpress.com/
  • 6. Introducción What is Data Quality?What is Data Quality? Non-standard definition “The processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria” Search of attributes on data: Accuracy Consistency Integrity Validity http://unitar.org
  • 9. Data governance Strategic decision making improved and faster Managing data quality: a critical issue Introduction Data Quality tasks must be performed in data integration stage
  • 10. Data Quality benefitsData Quality benefits Introduction Suitable Customer Segmentation  Customer Satisfaction Avoid processing unreliable data  Cost reduction Trustable and valuable information Improving Business Processes Increase profits
  • 12. What is Business Intelligence? (BI) The ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal Data Quality & Business Intelligence Visual tools for optimal and simple analysis Robust and Trustable data Business Intelligence TenetsBusiness Intelligence Tenets Processes involved: •Data integration •Efficient usage of company information
  • 13. Data IntegrationData Integration Key for any BI project ETL = Extract, Transform and Load Data Integration process involves data moving from different sources, data transformation and storing in unified databases: data warehouse / data marts. Data Quality & Business Intelligence Main tasks: Extract data from multiple sources Ensuring clean consistent data Combining data Load data in a DW http://blog.bootstraptoday.com CRM ERP BPM CMS
  • 14. Data Quality & Business Intelligence CHALLENGES: Heterogeneous data sources Large data volumes Improve operational efficiency Data source synchronization Scalability Data integration and Data Quality, closely related conceptsData integration and Data Quality, closely related concepts Data IntegrationData Integration
  • 15. Data Quality process can be performed in different ways: Manual  Ad-hoc queries, file searching, etc… Automated  Included in data integration process Both are complementary though: Data Quality tasks as a part of Data Integration process (ETL)Data Quality tasks as a part of Data Integration process (ETL) Data Quality & Business Intelligence Data integrationData integration
  • 16. Best ETL practicesBest ETL practices Centralize procedures: Ensure homogeneity and consistency of data from a great variety of sources. Avoid redundant calculations: if a calculation has been calculated previously, avoid repeating the same operation. Improves performance and avoids possible inconsistencies. Establish points of “quality control”: ensures the execution of the process at key points and allows recording track data for future audits. Implement information reloading processes: useful to avoid initial loading issues/failures. Use intermediate structures: Eases monitoring and process monitoring Data Quality & Business Intelligence
  • 17. Best ETL practicesBest ETL practices Data Quality & Business Intelligence Centralized and standardized processes Checkpoints and registrations Intermediate structures Apply BI techniques to data quality process Analyze and take the best of data quality results Allows
  • 19. ETL tools and Data QualityETL tools and Data Quality Pentaho Data Integration Talend Open Studio DataCleaner Talend Data Quality Google Refine Open Source & Data Quality Data Quality Open Source solutions: Main ETL Open Source solutions
  • 20. Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration Intuitive ETL tool based in jobs and transformations Freedom to decide where and how performs tasks: profiling, cleansing, integrity, validation; base on metadata; Data Quality oriented components available on PDI transformations. Not a pure profiling tool, however DataCleaner can be integrated Plug-in architecture that allows expanding its functionalities. Open Source & Data Quality
  • 21. Open Source & Data Quality Component variety: Cleansing Scripting (sql, javascript) Validation Statistics Etc… Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
  • 22. Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration Open Source & Data Quality An accurate ETL divided in several phases is essential: 1. Preparation process 2. Data receipt 3. Data processing 4. Final Load 5. Result reports 6. Activity control This approach allows: Standardizing processes in an organization Scale better by increasing the amount of sources Centralized control of process results
  • 23. Data CleanerData Cleaner Open Source & Data Quality Profiling tool recommended by Pentaho Alternative tools: Desktop tools Web tools PDI Plugin
  • 24. Data Cleaner DesktopData Cleaner Desktop Open Source & Data Quality Functionalities: Data Cleansing Data dictionaries definition Search for patterns, duplicates, null check, etc. Monitoring Complete execution stats Etc.
  • 25. Data Cleaner Monitor (web)Data Cleaner Monitor (web) Open Source & Data Quality Functionalities: Centralized monitoring Smart visualization Schedule execution of Data Cleaner and PDI jobs Create custom metrics Etc.
  • 26. Integration Data Cleaner / PDIIntegration Data Cleaner / PDI Open Source & Data Quality After installing PDI Data Cleaner plug-in, there are two usage possibilities: Option A Profile data using a PDI step
  • 27. Integration Data Cleaner / PDIIntegration Data Cleaner / PDI Open Source & Data Quality After installing PDI Data Cleaner plug-in, there are two usage possibilities: Option B Executing a Data Cleaner job
  • 28. References International Association for Information and Data Quality: http://iaidq.org/ Pentaho Data Integration: http://www.pentaho.com/explore/pentaho-data-integration/ Data Cleaner: http://datacleaner.org/
  • 29. About us www.TodoBI.com info@stratebi.com www.stratebi.com More information: Tfno: 91.788.34.10 MadridMadrid: Pº de la Castellana, 164, 1º BarcelonaBarcelona: C/ Valencia, 63 BrasilBrasil:: Av. Paulista, 37 4 andar

Editor's Notes

  1. Data Profiling: proceso de examinar los datos que existen en las fuentes de origen y recopilar estadísticas e información sobre los mismos. Data Cleansing: proceso de detectar y corregir datos corruptos, incoherentes o erróneos. Data Integrity: proceso de analizar la consistencia de los datos y las relaciones entre los diferentes conjuntos de datos. Data Validation: proceso de aplicar reglas de validación a los datos basándose en diccionarios de datos y/o reglas de negocio. Master Data Management: conjunto de procesos, políticas, estándares y herramientas que sirven para gestionar Datos Maestros de una organización (normalmente información no transaccional). Data Auditing: proceso de gestionar cómo los datos se ajustan a los propósitos definidos por la organización. Es necesario establecer las políticas necesarias. Actuar + Vigilar. Data Governance: concepto que engloba a todos los procesos anteriores y que permite a una organización disponer de una información confiable.