Something i wanted to blog about for a long time now was about SSIS Performance enhancements and monitoring.
SQL Server Integration Services (SSIS) as it is called is a upgrade from from DTS (Data Transformation Services).
Very Simply : The SSIS Engine is architecturally divided into 2 components (Runtime Engine and Dataflow Engine). The Runtime Engine component controls the execution of tasks and the Dataflow Engine controls/manages the data pipeline.
BUFFER: The data flow engine uses a buffer-oriented architecture to efficiently load and manipulate datasets in memory (BOL).
As the Data Flows through the pipeline SSIS tries to reuse the data from previous buffers as much as possible when additional operations are performed. how Buffers are used/reused depend upon the the type of transformation as : Row Transformations / Partially blocking transformations / blocking transformations.
Choosing your components wisely and a good architecture of your Transformation environment can help improve SSIS performance.
Row Transformation : Derived Column, Data Conversion, Multicast, and Lookup are some components used in this transformation.The output rows match the input rows. They reuse existing buffers and do no need to create new buffers for data transformations.
Partially blocking transformations : used to combine datasets and can have multiple data inputs. Their number of input records normally do not match their number of output records, so these transformations are called asynchronous transformations. Merge, Merge Join, and Union All components are partially blocking as their outputs are copied into new buffers and a new thread is used/initialized in the data flow.
Blocking transformations : These kind of transformation, must read and process all data/rows before any output can be written. These perform the most work and also are heavy workers thus consuming resources.Aggregate and Sort components are known as blocking transformations. Every time a new buffer is created for the output.
EXECUTION TREES:
you can trim down or re-organize the Execution Trees to evaluate performance conditions. Execution trees normally project how to buffers are going to be used.
Use Containers to distinguish between different data flow tasks.
Consider Running Trees/Tasks in parallel rather then serially, this will make use of the parallelism Feature of SSIs and Improve performance.
Use row transformations where possible to limit the number of buffers that are created and used.
Remove Unnecessary columns, that are not going to be used in the transformations.
Configure the DataTypes correctly. This will help reduce the Source Data Set.
Check the settings for DefaultMaxBufferRows and DefaultMaxBufferSize. Start with default and evaluate as to what should be the best buffer size for the execution plan. (This is recommended for complex Packages or the package using the blocking transformation Types, i.e. to tweak this setting to allow data to be collected into buffers as much as possible. There can also be adverse affects of this setting is not used properly.)
Work with parallelism as i described above: MaxConcurrentExecutables setting helps you play with your package Thread setting. The default is –1, which is the logical machine processor+2. So depending on your workload, tweak this setting, which can help you gain performance improvement in your package.
Let maximum of your Tasks/Data Flows Run in parallel if they are not dependent on the previous or the next task.
TROUBLESHOOTING:
If your package is running slow, here is something you can check.
Check the source and targets to evaluate their performance. i.e. if it is a text file source, then dump the file into another temp table to see the speed of execution.
Similarly check the target load speed, based on some other dummy values, to check the execution speed.
The Dataflow Tasks in the middle of the Source and Target have to be dissected individually and checked for performance conditions.
Work with one transformation at a Time to evaluate its speed of execution.
check the underlying table for lengthy varchar/char datatypes, if your are loading 100K+ rows into several tables and it is taking ages. Cut the varchar/Char Datatype Length to something reasonable. example: if loading 20 columns with 100K+ rows and each is varchar(50), it will surely take more than 5 mins to load the package, so work with the team to get down the lengths to a reasonable limits, which will immediately improve your package performance.
Use SSIS Logging / Performance Counters / SQL Profiler to track down performance issues.