Azure Data Factory is a hybrid data integration service in Azure that allows you to create, manage & operate data pipelines in Azure. It is a serverless orchestrator that allows you to create data pipelines to either move, transform, load data; a fully managed Extract, Transform, Load (ETL) & Extract, Load, Transform (ELT) service if you will.
In this talk I'll cover the basics of Azure Data Factory and show you how you can create, manage & operate data pipelines.
4. Hi!
Tom Kerkhove
• Azure Consultant at Codit
• Microsoft Azure MVP & Advisor
• Belgian Azure User Group (AZUG)
blog.tomkerkhove.be
@TomKerkhove
tomkerkhove
8. ➔ Managed data orchestration service
➔ Allows you to run pipelines
➔ Execute SSIS packages
➔ Support for hybrid scenarios
➔ Data movement-as-a-service with 70+ connectors
➔ Visual tooling & programmability
➔ .NET, Python, REST, ARM
What is Azure Data Factory?
9. What is Azure Data Factory?
Trigger(s) Activity ActivityActivity
Activity
Activity
Pipeline
10. ➔ A pipeline represents a business process with multiple “steps”
which are represented by activities and is started by a trigger
➔ Activities represent a steps in a business process that perform
a specific action.
➔ This is based on the outcome of the previous step and can be on success,
failure, skipped or completion
What is Azure Data Factory?
11. ➔ Different types of triggers
➔ On-Demand (Via REST API, .NET, etc.)
• Azure API Management can make this easier
➔ Scheduled / Wall-clock
➔ Tumbling Windows (aka “data slicing”)
➔ Event-based (New file is added to blob storage)
➔ Support for passing parameters
Triggers
12. What is Azure Data Factory?
Trigger(s) Activity ActivityActivity
Activity
Activity
Pipeline
13. ➔ Data Movement
➔ Azure, Databases, NoSQL, File, SaaS, Web, etc
➔ Data Transformation
➔ Pig, Hive, Stored Procedure, U-SQL, ML, Spark, MapReduce, etc.
➔ Control Flow
➔ Web call, Lookup, Get Metadata, If, Wait, ForEach, Execute Pipeline, etc
➔ Custom
➔ Run commands on an Azure Batch cluster
➔ Run R scripts on a HDInsight cluster
Activities
14. ➔ An activity can produce or consume a data set. It is a
representation of a data structure in a data store that can be
used as a source or sink.
➔ Linked Services define how an activity can connect to an
external system. This external system can be a data store or
compute resource.
What is Azure Data Factory?
15. What is Azure Data Factory?
Activity
Data
Set
Linked
Service
Represents data
stored in
Produces
Consumes
16. ➔ Compute infrastructure used by Data Factory
➔ Azure, Azure-SSIS or Self-Hosted (Any cloud or on-prem)
➔ Core capabilities
➔ Data movement
➔ Pipeline activity execution
➔ SSIS package execution
➔ Pipelines issues commands & control, integration runtime executes
➔ Data movement is from IR to IR
➔ All executions are happening in sources & sinks
Integration Runtime (IR)
19. ➔ Stores SSISDB in Azure SQL DB or Managed Instance
➔ Azure-SSIS integration runtime as compute-layer
➔ Compute part for running SSIS
➔ Managed cluster of Azure VMs
➔ Compute-layer
➔ Can be linked to VNET for hybrid scenarios
➔ Lift & shift packages to the cloud
Running SSIS packages in Azure
21. ➔ Native support for Managed Service Identity (MSI)
➔ Native integration with Azure Key Vault
➔ Encrypted-in-transit via HTTPS
➔ Supports encryption-at-rest with data stores
Security
23. ➔ Every user should be capable of requesting their data
Using Azure Serverless to become GDPR compliant
User Profile
information
StackExchange
Data Set
Kerkhove.tom
@gmail.com
25. ➔ Visual monitoring in the portal
➔ Monitoring per pipeline run
➔ Detailed information per activity
➔ Azure Monitor integration
➔ Diagnostic Logs
➔ Metrics
➔ Alerts
Monitoring
26. ➔ Serverless orchestration
➔ Pay for what you use
➔ Data-centric vs Application-centric workflows
➔ Work together seamlessly
How is this different from Logic Apps?
27. ➔ Azure Data Factory is a great way to orchestrate data
processes and build data-integration pipelines
➔ Very powerful for data-centric workloads
➔ Unsung hero in the serverless space
➔ A perfect match with Azure Logic Apps
➔ Allows you to get to market very quickly with the built-in
connectors
Conclusion