Developing an ETL is a part of challenge but the real challenge lies
in developing an optimized one. In this very blog I will try to explain
different performance optimizations technique that can be incorporated
in ETL packages so as to run ETL in effective and efficient way. I have
collected the material by reading MSBI books, googling out different
blogs and from my own experience with ETL.
Here we goes..
- At connections level: By default, the network
packet size has 0 value which means 4KB of data will be transferred at a
time. Change this value to 32767 so as to enable 32 KB of data transfer
to be transferred at one time

2.
Using ‘Fast Parse’ property: This option can be used for very fast loading of data from flat files. To enable this option, perform following steps:
a. Right click the flat file source, and click ‘Show Advance Editor’:

b. Go to ‘Input and Output properties’ tab. In the Inputs and
Outputs, expand the Output columns and select the column for which you
want to enable ‘fast parse’. After selecting the column, go to Custom
Properties and set ‘FastParse’ property to ‘True’ as shown:

3.
Avoid ‘Select *’: Instead of using ‘Select *’
while pulling data from SQL server table, use the column list and try to
pull only required columns of table. SSIS uses buffer oriented
architecture to transfer data wherein data from source first comes in
buffers, required transformations are performed and then the data is
dumped to destinations. However, size of the buffer depends on the
estimated row size which is calculated as the maximum size of all the
columns in a row. So if there are more number of columns in a row, they
will occupy more space in buffer resulting in less number of rows in
buffer. Therefore, always try to pull only required columns not all.
4.
Avoid Asynchronous transformations: As far as
possible, try to avoid them. Asynchronous transformations like sort
needs to read and process all input data before creating any output
records. As a result, data transfer process gets slower.
5.
Considering data types: The data types should be
as narrow as possible since they will be allocated less memory. Allocate
the required number of bytes to the columns.
6.
Indexes: If you are pulling large volume of data,
then drop all the indexes if exists. After your load is complete, then
re-create the indexes
7.
OLEDB Destination settings:
- Keep Identity: By default, this property is unchecked, which means
source identity values will not be copied to destination so destination
table will create its own identity values if it has an identity column.
If you check this property, then DFT engine will preserve the values of
source identity and pass it to destination. If possible, keep it
unchecked
- Check Constraints: By default, this property is checked. If you are
sure that incoming data will not violate destination table constraints
then uncheck this property. This setting specifies that the dataflow
pipeline engine will validate the incoming data against the constraints
of target table. If you un-check this option it will improve the
performance of the data load.
- Rows per batch: By default, the value of this property is -1 or
empty which means all the incoming rows will be treated as single batch.
Change the value of this property to a positive value that specifies
the number of rows in batch.
- Insert commit size: By default, the value is ‘2147483647’ as you
can see in the above screenshot which means all the incoming data will
be committed in one shot after successful completion of the DFT. We can
specify a positive value for this setting to indicate that commit will
be done for those number of records. Changing the default value for this
setting will put overhead on the dataflow engine to commit several
times but at the same time it will release the pressure on the
transaction log and tempdb to grow specifically during high volume data
transfers.
8.
Using Delay Validation property: By default, this
property is set to ‘False’. SSIS performs early validation that
validates the package and its components before starting execution of
package. If you are using some tasks that are dynamically passing some
values to other tasks then set this property to ‘True’ for such tasks
otherwise your package will fail.
9.
Use Performance Counters to monitor SSIS performance
10.
Using checkpoints:
11.
Parallel execution of tasks can also lead to performance benefits