SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Scott Weinstein Principal Lab49 weblogs.asp.net/sweinstein   / @ScottWeinstein Using PowerShell to simplify your ETL
GOLD SPONSORS
SILVER SPONSOR
BRONZE SPONSORS
OTHER SPONSORS
SSIS Considered Harmful  Visual programming is a terrible interface for ETL work History of UI visual designers is instructive here No way to copy/share “code samples” Low information density Even worse It’s not even a good designer Poor defaults, poor layout Nested menus of property grids are hard to use No real VCS ability Sure it’s XML, but… No diff No merge Demo ?
But what about all the features? Some are trivial: Foreach Loop Container Execute SQL Task Send Mail Task OMG! Many are useful, but not hard to recreate Demos soon Others may be worth consideration Fuzzy matching
But what about performance The advantages for data flow are real, but not that large A 5M row test on my home machine gave a 12% performance advantage to SSIS over PowerShell + SqlBulkCopy
The Answer PowerShell A general purpose scripting and automation language Wide adoption MS: Windows Server, Windows 7, SQL Server 2008, VM Manager, IIS, Deployment Toolkit Others: VMWare, Quest, IBM Websphere
PowerShell Features Takes the best of Unix, VMS, Perl Adds in integration with .Net, COM, WMI Adds in  REPL environment Full scripting/programming Flexible typing model Pipes Regular expressions XML Navigation Providers Built in command parser Standardized syntax for commands
Introducing PSIS http://psis.codeplex.com/ Yes, the name is derivative Right now, just a playground for some demos Bulk Data Transfer Star-Schema Populator Execute SQL
PSIS demo Just the ETL No logging, error reporting, etc Some typical ETL tasks Clear staging tables Bulk transfer data from one DB to another Populate a star schema Fact table and dimension tables Extract data to a .csv file
Clear staging tables Use Invoke-MSSqlCommand Use Truncate <tablename>
Bulk transfer At core, we’ll use .Net’sSqlBulkCopy under the covers uses TDS’s BCP protocol For  concurrency we’ll use the Task Parallel Library We’ll simplify things by making the destination tables map directly to the source tables Select statements could provide needed mapping
Populate a Star-Schema Destination
Populate a Star-Schema Source a view on the staging tables, creating a flat de-normalized row of all dimension and fact value Use convention to indicate which columns are fact or dimension tables
Populate a Star-Schema Code generate a populator Create functions, given a source row which return/create the corresponding dimension key Run it with the TPL for performance
Populate a Star-Schema Areas of improvement Add support for slowly changing dimensions Add support for lower memory requirements  query DB for dimension values on demand Measure and tune performance at the micro-benchmark level Concurrent dictionaries Partial loads
Extract to file Invoke-MSSqlCommand Export-Csv
Others?

Weitere ähnliche Inhalte

Was ist angesagt?

Managing V Sphere With The Vesi
Managing V Sphere With The VesiManaging V Sphere With The Vesi
Managing V Sphere With The VesiEric Sloof
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data
 
Reactive programming
Reactive programmingReactive programming
Reactive programmingNick Hodge
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for PrometheusMitsuhiro Tanda
 
Spring.Net, Feb 2008, PostSharp: A Technical Introduction
Spring.Net, Feb 2008, PostSharp:  A Technical IntroductionSpring.Net, Feb 2008, PostSharp:  A Technical Introduction
Spring.Net, Feb 2008, PostSharp: A Technical Introductiongfraiteur
 
Using Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web ApplicationUsing Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web Applicationravipratapm
 
Threading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin CoroutinesThreading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin CoroutinesLauren Yew
 
Reactive programming and RxJS
Reactive programming and RxJSReactive programming and RxJS
Reactive programming and RxJSRavi Mone
 
Continuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorksContinuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorksTomaž Zaman
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with AirflowErin Shellman
 
Serverless in azure
Serverless in azureServerless in azure
Serverless in azureVeresh Jain
 
Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Catalogic Software
 
An Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginAn Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginMitsuhiro Tanda
 
Reactive programming with rx java
Reactive programming with rx javaReactive programming with rx java
Reactive programming with rx javaCongTrung Vnit
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesSebastian Werner
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Thoughtworks
 

Was ist angesagt? (19)

Managing V Sphere With The Vesi
Managing V Sphere With The VesiManaging V Sphere With The Vesi
Managing V Sphere With The Vesi
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
 
Reactive programming
Reactive programmingReactive programming
Reactive programming
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Spring.Net, Feb 2008, PostSharp: A Technical Introduction
Spring.Net, Feb 2008, PostSharp:  A Technical IntroductionSpring.Net, Feb 2008, PostSharp:  A Technical Introduction
Spring.Net, Feb 2008, PostSharp: A Technical Introduction
 
Using Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web ApplicationUsing Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web Application
 
Threading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin CoroutinesThreading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
 
Riding the Streaming Wave DIY style
Riding the Streaming Wave  DIY styleRiding the Streaming Wave  DIY style
Riding the Streaming Wave DIY style
 
Reactive programming and RxJS
Reactive programming and RxJSReactive programming and RxJS
Reactive programming and RxJS
 
Continuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorksContinuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorks
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 
Serverless in azure
Serverless in azureServerless in azure
Serverless in azure
 
Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice
 
An Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginAn Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram Plugin
 
Reactive programming with rx java
Reactive programming with rx javaReactive programming with rx java
Reactive programming with rx java
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations Slides
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013
 

Ähnlich wie Using PowerShell to Simplify your ETL

SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesEduardo Castro
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
 
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersSQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersTobias Koprowski
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9google
 
Silicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you RESTSilicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you RESTManish Pandit
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The EnterpriseDaniel Egan
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minssparkflows
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access LayerTodd Anglin
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for DevelopersSarah Dutkiewicz
 
Making your RDBMS fast!
Making your RDBMS fast! Making your RDBMS fast!
Making your RDBMS fast! VictorSzoltysek
 
Introduction to Threading in .Net
Introduction to Threading in .NetIntroduction to Threading in .Net
Introduction to Threading in .Netwebhostingguy
 
Datastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya ElearningDatastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya Elearningshanmukha rao dondapati
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersDave Bost
 
Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014Florent BENOIT
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDBNick Court
 
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshopIntroduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshopMichael Blumenthal (Microsoft MVP)
 

Ähnlich wie Using PowerShell to Simplify your ETL (20)

SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersSQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9
 
Silicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you RESTSilicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you REST
 
It ready dw_day3_rev00
It ready dw_day3_rev00It ready dw_day3_rev00
It ready dw_day3_rev00
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The Enterprise
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access Layer
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for Developers
 
Making your RDBMS fast!
Making your RDBMS fast! Making your RDBMS fast!
Making your RDBMS fast!
 
Introduction to Threading in .Net
Introduction to Threading in .NetIntroduction to Threading in .Net
Introduction to Threading in .Net
 
Datastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya ElearningDatastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya Elearning
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
 
Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014
 
Sparkflows Use Cases
Sparkflows Use CasesSparkflows Use Cases
Sparkflows Use Cases
 
SparkFlow
SparkFlow SparkFlow
SparkFlow
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDB
 
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshopIntroduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshop
 

Kürzlich hochgeladen

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Kürzlich hochgeladen (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Using PowerShell to Simplify your ETL

  • 1. Scott Weinstein Principal Lab49 weblogs.asp.net/sweinstein / @ScottWeinstein Using PowerShell to simplify your ETL
  • 6. SSIS Considered Harmful Visual programming is a terrible interface for ETL work History of UI visual designers is instructive here No way to copy/share “code samples” Low information density Even worse It’s not even a good designer Poor defaults, poor layout Nested menus of property grids are hard to use No real VCS ability Sure it’s XML, but… No diff No merge Demo ?
  • 7. But what about all the features? Some are trivial: Foreach Loop Container Execute SQL Task Send Mail Task OMG! Many are useful, but not hard to recreate Demos soon Others may be worth consideration Fuzzy matching
  • 8. But what about performance The advantages for data flow are real, but not that large A 5M row test on my home machine gave a 12% performance advantage to SSIS over PowerShell + SqlBulkCopy
  • 9. The Answer PowerShell A general purpose scripting and automation language Wide adoption MS: Windows Server, Windows 7, SQL Server 2008, VM Manager, IIS, Deployment Toolkit Others: VMWare, Quest, IBM Websphere
  • 10. PowerShell Features Takes the best of Unix, VMS, Perl Adds in integration with .Net, COM, WMI Adds in REPL environment Full scripting/programming Flexible typing model Pipes Regular expressions XML Navigation Providers Built in command parser Standardized syntax for commands
  • 11. Introducing PSIS http://psis.codeplex.com/ Yes, the name is derivative Right now, just a playground for some demos Bulk Data Transfer Star-Schema Populator Execute SQL
  • 12. PSIS demo Just the ETL No logging, error reporting, etc Some typical ETL tasks Clear staging tables Bulk transfer data from one DB to another Populate a star schema Fact table and dimension tables Extract data to a .csv file
  • 13. Clear staging tables Use Invoke-MSSqlCommand Use Truncate <tablename>
  • 14. Bulk transfer At core, we’ll use .Net’sSqlBulkCopy under the covers uses TDS’s BCP protocol For concurrency we’ll use the Task Parallel Library We’ll simplify things by making the destination tables map directly to the source tables Select statements could provide needed mapping
  • 15. Populate a Star-Schema Destination
  • 16. Populate a Star-Schema Source a view on the staging tables, creating a flat de-normalized row of all dimension and fact value Use convention to indicate which columns are fact or dimension tables
  • 17. Populate a Star-Schema Code generate a populator Create functions, given a source row which return/create the corresponding dimension key Run it with the TPL for performance
  • 18. Populate a Star-Schema Areas of improvement Add support for slowly changing dimensions Add support for lower memory requirements query DB for dimension values on demand Measure and tune performance at the micro-benchmark level Concurrent dictionaries Partial loads
  • 19. Extract to file Invoke-MSSqlCommand Export-Csv