Continuous Integration and Delivery in Azure Data Factory
Continuous integration is the practice of testing each change made to your codebase automatically and as early as possible. Continuous delivery follows the testing that happens during continuous integration and pushes changes to a staging or production system.
In Azure Data Factory, continuous integration and delivery (CI/CD) means moving Data Factory pipelines from one environment (development, test, production) to another. Azure Data Factory utilizes Azure Resource Manager templates to store the configuration of your various ADF entities (pipelines, datasets, data flows, and so on). There are two suggested methods to promote a data factory to another environment:
- Automated deployment using Data Factory’s integration with Azure Pipelines
- Manually upload a Resource Manager template using Data Factory UX integration with Azure Resource Manager.
CI/CD Lifecycle
Below is a sample overview of the CI/CD lifecycle in an Azure data factory that’s configured with Azure Repos Git. For more information on how to configure a Git repository, see Source control in Azure Data Factory.
- A development data factory is created and configured with Azure Repos Git. All developers should have permission to author Data Factory resources like pipelines and datasets.
- A developer creates a feature branch to make a change. They debug their pipeline runs with their most recent changes. For more information on how to debug a pipeline run, see Iterative development and debugging with Azure Data Factory.
- After a developer is satisfied with their changes, they create a pull request from their feature branch to the main or collaboration branch to get their changes reviewed by peers.
- After a pull request is approved and changes are merged in the main branch, the changes get published to the development factory.
- When the team is ready to deploy the changes to a test or UAT (User Acceptance Testing) factory, the team goes to their Azure Pipelines release and deploys the desired version of the development factory to UAT. This deployment takes place as part of an Azure Pipelines task and uses Resource Manager template parameters to apply the appropriate configuration.
- After the changes have been verified in the test factory, deploy to the production factory by using the next task of the pipelines release.
Best Practices for CI/CD
If you’re using Git integration with your data factory and have a CI/CD pipeline that moves your changes from development into test and then to production, we recommend these best practices:
- Git integration: Configure only your development data factory with Git integration. Changes to test and production are deployed via CI/CD and don’t need Git integration.
- Pre- and post-deployment script: Before the Resource Manager deployment step in CI/CD, you need to complete certain tasks, like stopping and restarting triggers and performing cleanup. We recommend that you use PowerShell scripts before and after the deployment task. For more information, see Update active triggers. The data factory team has provided a script to use.
- Integration runtimes and sharing: Integration runtimes don’t change often and are similar across all stages in your CI/CD. So Data Factory expects you to have the same name, type and sub-type of integration runtime across all stages of CI/CD. If you want to share integration runtimes across all stages, consider using a ternary factory just to contain the shared integration runtimes.
- Managed private endpoint deployment: If a private endpoint already exists in a factory and you try to deploy an ARM template that contains a private endpoint with the same name but with modified properties, the deployment will fail. You can override it by parameterizing that property and providing the respective value during deployment.
- Key Vault: When you use linked services whose connection information is stored in Azure Key Vault, it is recommended to keep separate key vaults for different environments and configure separate permission levels for each key vault.
- Resource naming: Due to ARM template constraints, issues in deployment may arise if your resources contain spaces in the name. The Azure Data Factory team recommends using ‘_’ or ‘-’ characters instead of spaces for resources.
- Altering repository: ADF manages GIT repository content automatically. Altering or adding manually unrelated files or folders into the ADF Git repository data folder could cause resource loading errors.
- Exposure control and feature flags: When working in a team, there are instances where you may merge changes, but don’t want them to be run in elevated environments such as PROD and QA. The ADF team recommends using the DevOps concept of feature flags, where you can combine global parameters and the if condition activity to hide sets of logic based upon these environment flags.
Unsupported Features
- Data Factory doesn’t allow cherry-picking of commits or selective publishing of resources. Publishes will include all changes made in the data factory.
- The Azure Data Factory team doesn’t recommend assigning Azure RBAC controls to individual entities (pipelines, datasets, etc.) in a data factory.
- You can’t publish from private branches, host projects on Bitbucket, or export and import alerts and matrices as parameters.
- The ‘PartialArmTemplates’ folder will no longer be published to the adf_publish branch starting 1-November 2021. Switch to any supported mechanism for deployments using: ‘ARMTemplateForFactory.json’ or ‘linkedTemplates’ files.
Related Content
- Continuous deployment improvements
- Automate continuous integration using Azure Pipelines releases
- Manually promote a Resource Manager template to each environment
- Use custom parameters with a Resource Manager template
- Linked Resource Manager templates
- Using a hotfix production environment
- Sample pre- and post-deployment script