Mastering Integration Runtimes in Azure Data Factory and Synapse Analytics
The Integration Runtime (IR) is the backbone of Azure Data Factory and Azure Synapse Analytics, providing the compute infrastructure to enable a wide range of data integration capabilities across different network environments. Whether you’re moving data between cloud data stores, dispatching transformation activities, or natively executing SSIS packages, the IR is the critical component that makes it all possible.
In this comprehensive guide, we’ll dive deep into the different types of integration runtimes available, their capabilities, network environment support, and how to determine which one to use for your specific data integration needs.
Azure Integration Runtime
The Azure integration runtime is a fully managed, serverless compute in Azure that can:
- Run Data Flows in Azure
- Run copy activities between cloud data stores
- Dispatch a wide range of transformation activities in the public cloud
Azure IR Network Environment
The Azure IR supports connecting to data stores and compute services with public accessible endpoints. When the Managed Virtual Network feature is enabled, the Azure IR can also connect to data stores using private link service in a private network environment. This allows you to meet strict data compliance requirements by ensuring data never leaves a certain geography.
Azure IR Compute and Scaling
The Azure IR provides elastic, pay-as-you-go compute power to move data between cloud data stores securely, reliably, and at high performance. You can specify the number of Data Integration Units to use for copy activities, and the Azure IR will automatically scale up the compute as needed, without requiring any manual intervention from you.
For activities like Lookup, GetMetadata, and activity dispatching, the Azure IR handles these lightweight operations without the need to scale up the compute size.
Self-Hosted Integration Runtime
If you need to access data stores in a private network environment that doesn’t have direct line-of-sight from the public cloud, the self-hosted IR is the answer. This runtime is installed on an on-premises machine or virtual machine and is capable of:
- Running copy activities between cloud data stores and on-premises/VNet data stores
- Dispatching a variety of transformation activities against compute resources in the private network
Self-Hosted IR Network Environment
The self-hosted IR only makes outbound HTTP-based connections to the internet, allowing you to securely integrate data in private network environments behind firewalls or within virtual private networks.
Self-Hosted IR Compute and Scaling
You install the self-hosted IR on a Windows machine in your private network. For high availability and scalability, you can associate the logical self-hosted IR instance with multiple on-premises machines in an active-active configuration.
Azure-SSIS Integration Runtime
If you have an existing investment in SQL Server Integration Services (SSIS) packages, you can lift and shift those workloads to Azure by creating an Azure-SSIS integration runtime. This fully managed cluster of Azure VMs allows you to natively execute your SSIS packages in the cloud.
Azure-SSIS IR Network Environment
The Azure-SSIS IR can be provisioned in either a public network or a private network. If you have on-premises data sources or destinations, you can join the Azure-SSIS IR to a virtual network that is connected to your on-premises network.
Azure-SSIS IR Compute and Scaling
The Azure-SSIS IR is a dedicated cluster of Azure VMs for running your SSIS packages. You can bring your own Azure SQL Database or SQL Managed Instance to host the SSIS catalog (SSISDB), and you can scale the compute power by adjusting the node size and number of nodes in the cluster.
Determining the Right Integration Runtime
When an activity associates with more than one type of integration runtime, the runtime selection follows a specific hierarchy:
- Self-hosted integration runtime takes precedence over the Azure integration runtime in Azure Data Factory or Synapse Workspace instances using a managed virtual network.
- The Azure integration runtime in a managed virtual network takes precedence over the global Azure integration runtime.
For example, if a copy activity has a source linked service using a self-hosted IR and a sink linked service using an Azure IR in a managed virtual network, the self-hosted IR will be used for the entire copy operation.
By understanding the capabilities and network environment support for each integration runtime type, you can make an informed decision on which one to use for your specific data integration requirements, ensuring optimal performance, security, and compliance.
For more information, check out these related articles: