SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
ETL for the Masses
Régis Baccaro – IBM
@regbac
Our Sponsors
Introduction

Régis Baccaro

@regbac

http://Theblobfarm.wordpress.com
http://Thelovefarm.wordpress.com

regis@baccaro.com
•
•
•
•
•

Founder and lead organizer of SQL Saturday Denmark
PASS Regional Mentor
Works for IBM
Passionate about the community
.Net developer, BI dude, SharePoint fellow and accidental DBA
Agenda
• Power Query and the M language
• E and T and L with Power Query
• Data refresh techniques with PQ
• Next step
Introduction
• Power Query
• Get data experience
• Filter and combine
• Embedded M for repeatable mashup

• Power Query Formula Language (aka M)
•
•
•
•
•

Mostly pure
Higher-order
Dynamically typed
Partially lazy 
Functional programming language
Elements of language
• Expressions – central construct
• Evaluated to a single vlaue

• Values
•
•
•
•
•

Primitives
List – ordered seq.
Record – set of fields
Table
Function
Evaluation
• Excel-like (surprise !)
• Nested records
• In Records
• In Lists

• Lazy evaluation
• Lists and Records (and let)

• Eager evaluation
• Everything else
Functions and Standard Library
• Mapping from a set of values to a single value
• (named parameters) => function body

• Common set of definitions
Operators
• Meaning varies depending on kind of value

• & = text or list concatenation and records merge
Metadata
• Information about a value that is associated with a value
• A record
• Exists for every value
• Unobtrusive way to add information
• Accessed with Value.Metadata
Let .....in expression
• So far only literal values
• Let allows a set of value to be:
• Computed
• Named
• Used in subsequent expressions that follows the in
let
in

Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......),
RowCount = Table.RowCount(Source)
RowCount
IF expression
• Select between 2 expression based on logical condition
Error expression
• When an expression evaluation cannot yield a value
• Raised with error
• Handled with try
• Produces an Error record
• try...otherwise Used with default values
Keywords and Operators
• and as each else error false if in is let meta not
otherwise or section shared then true try type
#binary #date #datetime #datetimezone #duration
#infinity #nan #sections #shared #table #time
• , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? =>
.. ...
The ”E” - Why is Power Query great for Extracting data
• Multiple data sources

Hey wait ! Where is PDW ?
Query folding - A step toward declarative ETL approach
• Declarative vs Imperative
• Query folding similar to predicate pushdown
• Does Power Query have a Query Optimizer ?
• Demo
Query folding - the unofficial list:
• SQL Databases
• OData and OData based sources, such
as the Windows Azure Marketplace
and SharePoint Lists
• Active Directory
• HDFS.Files, Folder.Files, and
Folder.Contents (for basic operations
on paths)

•
•
•
•

Column removal
Renaming
Joins
Type conversions
Real life scenario – ETL for the masses
• Seen a lot of demos
• Build a lot of demos
• They are always so clean !
Real life scenario
Transform
• M is how the magic happens!
• Data manipulation
• Records
• Lists
• Tables

• Merging
• Function calls
What about our scenario?
• Where should I get my data from?
• Pure Excel
• Excel and MDS/DQS/SSIS/SQL
• Web, SQL, XML, ?

• Let me show you ! Input
• (cvr web)
Let’s go to homegrown data?
• Bad web service
• Bad HTML structure
• Let’s go with local data that we can control

Isolated DB

• SQL Server
• Excel

• Let’s Query!
Local storage
Clean up before you merge!
• DQS
Knowledge base with CVR
+ Cleansing project with LinkedIn input
________________________________________
= Demo2.1_AndreasStrandbyClean

+

• Hit ratio increased...

Hit

250

Total

100%
90%
80%

200

70%
60%

150

50%

=

40%

100

30%
20%

50

10%
0

0%

Clean
join

Nested Merge
join
Smarter Power Query
• Expression.Evaluate()
• Examples
• Load query text from file
• Load function from file
• Passing parameters (as constants)

• Demo
Refreshing Power Query data
• Different solutions
• All with flaws !
Refreshing Power Query data – with VB6 !
• Back from 2006
Plus

Minus

Can be scheduled

VB6 – are you kidding ?

More robust than the non-technical
solution

• From Kim GreenLee
Refreshing Power Query data – with PowerShell

Plus

Minus

Robust

Hard to troubleshoot
Can not run in a task in windows task
scheduler unless the user has checked
that the user has to be logged on to run
Refreshing Power Query data – The non-technical way
• Let me show you !
Plus

Minus

Very easy

Not very corporate !
The spreadsheet needs to be open
Excel file not saved
Locked out when it refreshes
Refreshing Power Query data – The non-technical way part 2
• Let me show you !
Plus

Minus

Very easy

Not very corporate !

Uses technique from previous

The spreadsheet needs to be open
Refreshing Power Query data – with SSIS

Plus

Minus

Robust

Requires a SQL Server (wait, it’s a plus!)
Needs a SSIS / C# developer
Refreshing Power Query data – with SSIS
• Using DQS for cleansing input

• Let me show you !
How is Power query going to be used?
• Data store accumulating interesting data points
• Hook into read only data for reporting purposes or data marts
• One file to accumulate (Produce)
• Multiple files or programs to report (Consume)
• I don’t believe in “Data Steward”
• I believe someone will be in charge of procuring and monitoring
data stores of disparate data (such as IT or DBA’s).
Conclusion
• A step toward declarative ETL approach
• Still much work to do !
We have
• A declarative data integration language
• Only surfaced in Power Query
• Can push data to an Excel spreadsheet
Imagine.....
• Connection to heterogenous data sources
THANK YOU!
@REGBAC
HTTP://THEBLOBFARM.WORDPRESS.COM
REGIS@BACCARO.COM

Weitere ähnliche Inhalte

Was ist angesagt?

PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQueryin4400
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introductionBishwadeb Dey
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It MattersJohn White
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaVishal Pawar
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerSPC Adriatics
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceRightpoint
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Zeeshan Ikram
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroVishal Pawar
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506Mark Tabladillo
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsUlysses Maclaren
 

Was ist angesagt? (20)

PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQuery
 
October2019 release
October2019 releaseOctober2019 release
October2019 release
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introduction
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It Matters
 
Ai in power platform
Ai in power platform Ai in power platform
Ai in power platform
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
 
Dax & sql in power bi
Dax & sql in power biDax & sql in power bi
Dax & sql in power bi
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint Server
 
Power BI
Power BIPower BI
Power BI
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
 
August2019 release PowerBI
August2019 release PowerBI August2019 release PowerBI
August2019 release PowerBI
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
 
Power BI for CEO
Power BI for CEOPower BI for CEO
Power BI for CEO
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on facts
 
Power bi
Power biPower bi
Power bi
 
Tableau vs PowerBI
Tableau vs PowerBITableau vs PowerBI
Tableau vs PowerBI
 

Ähnlich wie ETL for the masses with Power Query and M

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsSteve Knutson
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Spark Summit
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to SparkSky Yin
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureSanil Mhatre
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...European SharePoint Conference
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...BIWUG
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsKellyn Pot'Vin-Gorman
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsIke Ellis
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDavid Mann
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for AnalyticsIke Ellis
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 

Ähnlich wie ETL for the masses with Power Query and M (20)

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & Azure
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Breaking data
Breaking dataBreaking data
Breaking data
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Power BI Live Data Sets
Power BI Live Data SetsPower BI Live Data Sets
Power BI Live Data Sets
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

ETL for the masses with Power Query and M

  • 1. ETL for the Masses Régis Baccaro – IBM @regbac
  • 3. Introduction Régis Baccaro @regbac http://Theblobfarm.wordpress.com http://Thelovefarm.wordpress.com regis@baccaro.com • • • • • Founder and lead organizer of SQL Saturday Denmark PASS Regional Mentor Works for IBM Passionate about the community .Net developer, BI dude, SharePoint fellow and accidental DBA
  • 4. Agenda • Power Query and the M language • E and T and L with Power Query • Data refresh techniques with PQ • Next step
  • 5. Introduction • Power Query • Get data experience • Filter and combine • Embedded M for repeatable mashup • Power Query Formula Language (aka M) • • • • • Mostly pure Higher-order Dynamically typed Partially lazy  Functional programming language
  • 6. Elements of language • Expressions – central construct • Evaluated to a single vlaue • Values • • • • • Primitives List – ordered seq. Record – set of fields Table Function
  • 7. Evaluation • Excel-like (surprise !) • Nested records • In Records • In Lists • Lazy evaluation • Lists and Records (and let) • Eager evaluation • Everything else
  • 8. Functions and Standard Library • Mapping from a set of values to a single value • (named parameters) => function body • Common set of definitions
  • 9. Operators • Meaning varies depending on kind of value • & = text or list concatenation and records merge
  • 10. Metadata • Information about a value that is associated with a value • A record • Exists for every value • Unobtrusive way to add information • Accessed with Value.Metadata
  • 11. Let .....in expression • So far only literal values • Let allows a set of value to be: • Computed • Named • Used in subsequent expressions that follows the in let in Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......), RowCount = Table.RowCount(Source) RowCount
  • 12. IF expression • Select between 2 expression based on logical condition
  • 13. Error expression • When an expression evaluation cannot yield a value • Raised with error • Handled with try • Produces an Error record • try...otherwise Used with default values
  • 14. Keywords and Operators • and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time • , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...
  • 15. The ”E” - Why is Power Query great for Extracting data • Multiple data sources Hey wait ! Where is PDW ?
  • 16. Query folding - A step toward declarative ETL approach • Declarative vs Imperative • Query folding similar to predicate pushdown • Does Power Query have a Query Optimizer ? • Demo Query folding - the unofficial list: • SQL Databases • OData and OData based sources, such as the Windows Azure Marketplace and SharePoint Lists • Active Directory • HDFS.Files, Folder.Files, and Folder.Contents (for basic operations on paths) • • • • Column removal Renaming Joins Type conversions
  • 17. Real life scenario – ETL for the masses • Seen a lot of demos • Build a lot of demos • They are always so clean !
  • 19. Transform • M is how the magic happens! • Data manipulation • Records • Lists • Tables • Merging • Function calls
  • 20. What about our scenario? • Where should I get my data from? • Pure Excel • Excel and MDS/DQS/SSIS/SQL • Web, SQL, XML, ? • Let me show you ! Input • (cvr web)
  • 21. Let’s go to homegrown data? • Bad web service • Bad HTML structure • Let’s go with local data that we can control Isolated DB • SQL Server • Excel • Let’s Query! Local storage
  • 22. Clean up before you merge! • DQS Knowledge base with CVR + Cleansing project with LinkedIn input ________________________________________ = Demo2.1_AndreasStrandbyClean + • Hit ratio increased... Hit 250 Total 100% 90% 80% 200 70% 60% 150 50% = 40% 100 30% 20% 50 10% 0 0% Clean join Nested Merge join
  • 23. Smarter Power Query • Expression.Evaluate() • Examples • Load query text from file • Load function from file • Passing parameters (as constants) • Demo
  • 24. Refreshing Power Query data • Different solutions • All with flaws !
  • 25. Refreshing Power Query data – with VB6 ! • Back from 2006 Plus Minus Can be scheduled VB6 – are you kidding ? More robust than the non-technical solution • From Kim GreenLee
  • 26. Refreshing Power Query data – with PowerShell Plus Minus Robust Hard to troubleshoot Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run
  • 27. Refreshing Power Query data – The non-technical way • Let me show you ! Plus Minus Very easy Not very corporate ! The spreadsheet needs to be open Excel file not saved Locked out when it refreshes
  • 28. Refreshing Power Query data – The non-technical way part 2 • Let me show you ! Plus Minus Very easy Not very corporate ! Uses technique from previous The spreadsheet needs to be open
  • 29. Refreshing Power Query data – with SSIS Plus Minus Robust Requires a SQL Server (wait, it’s a plus!) Needs a SSIS / C# developer
  • 30. Refreshing Power Query data – with SSIS • Using DQS for cleansing input • Let me show you !
  • 31. How is Power query going to be used? • Data store accumulating interesting data points • Hook into read only data for reporting purposes or data marts • One file to accumulate (Produce) • Multiple files or programs to report (Consume) • I don’t believe in “Data Steward” • I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).
  • 32. Conclusion • A step toward declarative ETL approach • Still much work to do ! We have • A declarative data integration language • Only surfaced in Power Query • Can push data to an Excel spreadsheet Imagine..... • Connection to heterogenous data sources