Let’s walk through some of the key enhancements — including SQL Server Integration Services (SSIS) capabilities (finally!).
The overall concept of data sets, activities and pipelines remains intact within version 2; however, the new version brings a few changes. A few highlights include:
The following table introduces high-level differences between the two ADF services.
|Feature||ADF version 1||ADF version 2|
||Data sets refer to source and destination data stores — tables, files, folders, etc.; availability property refers to the processing window time slice (hourly, daily, monthly)
||Removed the availability feature —triggers replace this need
|Linked services||Connection string information for external resources
||Added functionality of connectVia property to use Integration Runtimes
|Pipelines||Logical grouping of activities with properties for start time, end time and paused state
||Removed the properties for start time, end time and paused state —rather use the triggers
|Activities||Actions to perform within a pipeline; data movement and transformation||Added control flow activities
|Hybrid data movement||Data Management Gateways orchestrate on-premise to cloud data transfer||Integration Runtimes:
Azure (cloud only)
Self-hosted (hybrid on-premise & cloud)
Azure-SSIS (SSIS execution)
|Parameters||N/A||Key-value parameters passed to pipeline activities via manual execution or triggers
|Expressions||Built-in system variables and functions||JSON values that are evaluated at runtime and return another JSON value
|Pipeline runs||N/A||Single instance of pipeline execution assigned unique GUID value
|Activity Runs||N/A||Instance of activity execution within pipeline|
|Trigger Runs||N/A||Instance of trigger execution
|Scheduling||Pipeline start & end time, dataset availability||Trigger executions
One of the larger changes is the transfer from the concept of time slices and data set availability to a more traditional ETL approach scheduling process. Instead of waiting for a data set to become available for an activity when a pipeline is executing, the pipeline itself is triggered and kicks off the activity, regardless of the state of the data set.
The Integration Runtime (IR) is the compute infrastructure used by ADF version 2 for data movement, activity execution and SSIS package executions. The IR provides the bridge between the linked services referenced in the activity and the activity itself. The IR is referenced by the linked service, which then provides the compute environment where the activity will be run in the nearest region to provide the most efficient performance based on the target data store.
The introduction to native SSIS capabilities in ADF version 2 was a key addition for the cloud data orchestration service. It provides a stepping stone for clients to get off their on-premise servers and move to a cloud-first strategy rather than completely rearchitecting their existing data integration process from SSIS to ADF version 1. Along with the SSIS integration, many other features, such as control flow tasks and triggers, allow for greater flexibility in pipeline executions. As the new ADF version 2 service approaches general availability and is no longer in public preview, users can submit feedback to Microsoft for further enhancements to the service.