Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,162
Privacy Policy · Terms
filter by tags archive

SSISThe backlash

time to read 5 min | 874 words

Jamie Thomson has responded to my I Hate SSIS post, he agreed that most of them are valid concerns, but he also brought up some counter arguments that I wanted to respond to. The first thing that I wanted to mention is that JT has a solution for watching variables content, and I have updated the previous post & the "I Hate SSIS" page accordingly.

Now for the parts I disagree with:

Ayende: I wish I had a dollar for every time that SSIS kept track of something it shouldn't. Be it the old configuration, hard coding the connection string inside the package and completely ignoring the configuration
JT: I have never seen this happen in three years of using the product. If it seems as though configurations are not being used then they have been setup wrongly. That is not to say that the process of setting them up couldn't be improved.

I have it happening pretty much every day. Here is a simple story, I had canceled the package configuration, reconfigured a data source to point to the test database, and run the package. It executed itself against the production database!! Once I found that out, I managed to see it do it twice (while the data source point to the test database!), but I haven't been able to consistently reproduce it since. I can assure the reader that I have taken the time to understand how this thing works, out of sheer necessity. It still manage to mess with me.

Ayende: Security? Who needs that
JT: Is this a serious comment?

Please do not try to put my words out of context, the full statement was: "Security? Who needs that: I should also mention that SSIS packages requires sysadmin rights to run when scheduled as a job. Which of course it will not tell you until you have run the job. I am aware of the agent proxy solution"

Ayende: I should also mention that SSIS packages requires sysadmin rights to run when scheduled as a job
JT: This is completely untrue. It is possible to setup proxy accounts that are not sysadmin in order to run packages.

Again, please do not quote out of context, as you can see above, the very next statement acknowledged the existence of proxy solutions, I still want to understand why this exists.

To my comments about the bad configuration scheme and their unpredictability:

JT: Back to my point above, if this is happening then the configurations have been setup wrongly. It NEVER chooses configurations at random. It would be good if the person making the point could make some suggestion as to how it could be improved because if people are experiencing this then there needs to be improvement somewhere. And what's the issue with environment variables?

The issue with environment variables is this is actually something that I would never consider for configuration. Putting a connection string in an environment variables is strange. JT, let us start with a concept that doesn't hard code configuration information to the package. I want to point to a configuration file that is in the same directory as the package, it doesn't let me handle it. I want to choose one of three databases for configuration, depending when I want to do that, etc.

As for their unpredictability, it may have a system for that, but as I pointed above, even with the configuration OFF it will still do things that I don't want.

On UPSERT support:

JT: Hmm...not sure about this one. UPSERT is an operation that would have to be supported by the database platform being inserted/updated wouldn't it? Not sure why this is SSIS's fault. Perhaps I'm misunderstanding in which case I'm happy to be put straight.

It quite trivial to allow update / insert based on a given set of key fields, and it is certainly something that I would expect to see in an ETL product. Given the common need for this. Even something that was DB specific would be welcomed.

SSIS speed, lack there of:

JT: There is room for improvement here in the bloated VS shell but mainly its important to understand WHY this is happening. When a package opens up it tries to validate all external connections. If this is taking a long time then the blame is on the external connections and the network in between, not on SSIS. It is possible to turn off this validation by selecting 'Work Offline' from the SSIS menu.

Um, there is something that is called a background thread, and it is used to do work without freezing the UI. I don't care about the time that it takes to validate things, I want to get things done, let the tool sort those out without interrupting me. Working offline is not a valid option, because then you get a whole lot of validation errors, just for the fun of it.

And last:

SSIS can easily be used in a multi-developer environemnt. I know this because I'm currently working in one.

Good, how do you handle two developers working on the same package? How do you handle branching and merging?

time to read 1 min | 157 words

If you haven't guess it by now, I am not fond of SSIS. The latest installment is probably a "feature". Assume the following, I develop an SSIS package on my local machine, testing it against a local database.

Now, I want to run the package against a remote database. I did the Right Thing and put the connection string in a data source, so I change that and run the package from Visual Studio. Imagine my surprise when Visual Studio does a full cycle, including reporting the number of rows that it copied. Everything seems to be fine, until I checked the database itself.

About half the tables where empty, and I am still not sure why. The best guess that I can make is that it is caching the previous database credentials, and writing to that, since I found the data in the local database. Argh!!

time to read 1 min | 126 words

Here is SSIS deciding that "No, you really don't need to move this data."

NoNeedForThatData.png

I really like the "smarts" that went into that engine. I make sure that I keep busy and don't move on to othe areas of the application, but dedicate my full an undivided attention to SSIS, as it is apperantly should be.

Would it be too hard to provide a deteministic engine? SSIS would run only (random) parts of the sequence container if I right click it and tell it to "Execute Container". Doing a full debug seems to work, but it execute stuff that I don't want to run right now. Urgh!

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}