I don’t know if the heading means anything, it just popped out of my mind. ok… so we are talking about pulling data from hetero systems into SQL Server using 3rd party drivers.

we all know how to get connected using OLE DB Providers, but what about ODBC and that too working against MainFrame DB2 and NonStop Tandem.. wow!

I am going to talk about about HP NONSTOP TANDEM particularly in this scenario. There are a few drivers i worked with.. Attunity software / ODBC-MX and Sequelink software. While Attunity and Sequelink are direct competitors and provide the luxury of OLE DB and ODBC drivers, which ODBC-MX as the name suggests is a Pure ODBC only.

But ODBC-MX is a version 3.X of the ODBC Family while Attunity and Sequelink are still at version 2.

So does this affect us? Well Microsoft is moving away from version 2.x of ODBC drivers starting its release of Excel 2010 versions and will only support version ODBC version 3.X.. This is my GUESS as i had been working with EXCEL and Power Pivot.

Now back to the main Topic of getting “ODBC connected through SSIS for ETL.”

First create a ODBC connection (DSN) in Administrative Tools –> ODBC

odbc_configure_mx_1

I Created a new connection in connection manager in BID-SSIS using ADO.NET Connection, then click on NEW..

I will not be able to Reveal all the settings, (you know why).. against company Law… but this is still as close as i get get to present it.

so select ODBC from the Provider List.

odbc_window_2

now select the dsn (ODBC) you created initially.

connection_mgr_3

using the username and pass, test your connection. (Mine Works all the times) :)

DataReaderSource_4

Goto DataFlow task and Pull “DataReader Source” as your Source connection.

In the Connection Manager Field use the connection you created using the SSIS connection Manager.

Now go to Component Properties: Click on the SQLCommand and we will write a sweet command, because as you guessed right, sorry no direct table selections and column mapping unless you provide column names as a select stm.

DataReaderSource_SQL_5

I am going to write a simple select as a example for this tutorial. (but the code is hidden, and you know why)

DataReaderSource_columnmapp_6

After the sql command is complete, then click on column mapping and LO!, you have a column mapping..

now go ahead a dump this data into a txt target or a OLEDB (Database Table) as a target. Rest is history.

There are situations where you might get “Another process is using the file”, Cannot Open File or Data file. I had this message with one of my SSIS Packages. Everybody was tired with their analysis and I was Pulled in to diagnose this issue.

This File is loaded through a 3 step Process. The file is generated from Mainframe and ftp’ed using another process to SQL Server Shared Folder. From Where it is picked up by the SSIS Package and Processed.(All ETL is done HERE).

The Mainframe and the FTP Process were Ruled out by the Network Engineers and Mainframe DBA’s, So it was all in WINTEL (as Usual). My diagnosis Ran me through these Steps and wanted to share this will all the Eager DBA’s.

I tried Renaming the File first to see if it was really used by another process. Yes it would not let me rename the file.

I closed the SSIS Studio and also closed all the other Open Locations (being cautious). but that did not help.

I had 2 Choices: One to keep working on to diagnose the issue OR get a Quick Fix and load the File, so that things can start moving. I did the later part first.

Let us consider the File name to be InvXXXX011_02.txt

I Copied the File (CTRL+C) and Pasted, so as a “Copy of InvXXXX011_02.txt” was created. I renamed the File to something else, in this case it was InvXXXX011_02222.txt and change my SSIS connection to point to this new File. Also changed the configuration Table value to reflect this new file name and got the package Working.

Now, it was time to diagnose the File issue.. As i did before the rename of the original File did not work, so I had to call in the Mighty Process Explorer from SYSINTERNALS. I download it on the server in question.

image

Click on Find –> Find Handle or DLL

image

Write the name of the File: In my case it was InvXXXX011_02.txt. click Search.

In my case the File was listed in the Space below and i double clicked on the File to return to the Handle Window, from where did i right clicked and Close Handle. and the File was closed.

This time i tried Running the SSIS Package with the Original names and it worked, the SSIS Package Processed without any error.

Something i wanted to blog about for a long time now was about SSIS Performance enhancements and monitoring.

SQL Server Integration Services (SSIS) as it is called is a upgrade from from DTS (Data Transformation Services).

Very Simply : The SSIS Engine is architecturally divided into 2 components (Runtime Engine and Dataflow Engine). The Runtime Engine component controls the execution of tasks and the Dataflow Engine controls/manages the data pipeline.

BUFFER: The data flow engine uses a buffer-oriented architecture to efficiently load and manipulate datasets in memory (BOL).

As the Data Flows through the pipeline SSIS tries to reuse the data from previous buffers as much as possible when additional operations are performed. how Buffers are used/reused depend upon the the type of transformation as : Row Transformations / Partially blocking transformations / blocking transformations.

Choosing your components wisely and a good architecture of your Transformation environment can help improve SSIS performance.

Row Transformation : Derived Column, Data Conversion, Multicast, and Lookup are some components used in this transformation.The output rows match the input rows. They reuse existing buffers and do no need to create new buffers for data transformations.

Partially blocking transformations : used to combine datasets and can have multiple data inputs. Their number of input records normally do not match their number of output records, so these transformations are called asynchronous transformations. Merge, Merge Join, and Union All components are partially blocking as their outputs are copied into new buffers and a new thread is used/initialized in the data flow.

Blocking transformations : These kind of transformation, must read and process all data/rows before any output can be written. These perform the most work and also are heavy workers thus consuming resources.Aggregate and Sort components are known as blocking transformations. Every time a new buffer is created for the output.

EXECUTION TREES:

you can trim down or re-organize the Execution Trees to evaluate performance conditions. Execution trees normally project how to buffers are going to be used.

Use Containers to distinguish between different data flow tasks.

Consider Running Trees/Tasks in parallel rather then serially, this will make use of the parallelism Feature of SSIs and Improve performance.

Use row transformations where possible to limit the number of buffers that are created and used.

Remove Unnecessary columns, that are not going to be used in the transformations.

Configure the DataTypes correctly. This will help reduce the Source Data Set.

Check the settings for DefaultMaxBufferRows and DefaultMaxBufferSize. Start with default and evaluate as to what should be the best buffer size for the execution plan. (This is recommended for complex Packages or the package using the blocking transformation Types, i.e. to tweak this setting to allow data to be collected into buffers as much as possible. There can also be adverse affects of this setting is not used properly.)

Work with parallelism as i described above: MaxConcurrentExecutables setting helps you play with your package Thread setting. The default is –1, which is the logical machine processor+2. So depending on your workload, tweak this setting, which can help you gain performance improvement in your package.

Let maximum of your Tasks/Data Flows Run in parallel if they are not dependent on the previous or the next task.

TROUBLESHOOTING:

If your package is running slow, here is something you can check.

Check the source and targets to evaluate their performance. i.e. if it is a text file source, then dump the file into another temp table to see the speed of execution.

Similarly check the target load speed, based on some other dummy values, to check the execution speed.

The Dataflow Tasks in the middle of the Source and Target have to be dissected individually and checked for performance conditions.

Work with one transformation at a Time to evaluate its speed of execution.

check the underlying table for lengthy varchar/char datatypes, if your are loading 100K+ rows into several tables and it is taking ages. Cut the varchar/Char Datatype Length to something reasonable. example: if loading 20 columns with 100K+ rows and each is varchar(50), it will surely take more than 5 mins to load the package, so work with the team to get down the lengths to a reasonable limits, which will immediately improve your package performance.

Use SSIS Logging / Performance Counters / SQL Profiler to track down performance issues.

If you have to deploy a SSIS (SQL Server Integration Services) Package, you have to create a deployment Utility in Visual Studio or SQL Server Business Intelligence Development Studio under SSIS Project.

Here is the Quick Basics of creating a SSIS Package. Open Business Intelligence Studio.(Pl don’t ask where can you find it.. if you don’t know, this is a homework for you..)

image

After you are done creating the SISS package and are ready for the deployment.. right click Project explorer and click properties, you will get a Properties window. Look for Deployment Utility as seen.

image

set CreateDeploymentUtility to True as seen. Also set your Deployment path is it is a UNC.(\\Servername\etc)

image

Then goto Solution Explorer..right click Project name and click build.

image

When you create a deployment utility there are 2 file created at the Target location. One is the PackageName.dtsx and other is the PackageName.SSISDeploymentManifest File.

image

Double click the SSISDeploymentManifest File to launch the Package deployment utility as a GUI.

image

select how you want to store your package. Depends on the process your company follows(if any).

Deployment to SQL Server will dump the page info in MSDB Database. (rest if self explanatory)

image

Target server can be ..? (you guessed it right this time.)

image

Nothing much here… Just continue.

image

click NEXT on confirm installation.. At this time your package will be validated.. i,e. all the connection strings and column mapping and truncations errors will be generated if any.. even warning messages are generated..

image

After you have finished deploying this package.. There are several thins you can do…

  1. If this is a test system.. execute the package and check if it works :)
  2. Create a SQL Server Agent job for your SSIS Package.(including scheduling)

image

Under MSDB you can see all your packages.

image

I know you can do the SQL jobs(every DBA does) yourself so i don’t have to provide more screens.

© 2010 D B A N A T I O N Suffusion WordPress theme by Sayontan Sinha
Better Tag Cloud