Memorable code
time to read 6 min | 1027 words
public class Program { static List<Thread> list = new List<Thread>(); private static void Main(string[] args) { var lines = File.ReadAllLines(args[0]); foreach (var line in lines) { var t = new Thread(Upsert) { Priority = ThreadPriority.Highest, IsBackground = true }; list.Add(t); t.Start(line); } foreach (var thread in list) { thread.Join(); } } private static void Upsert(object o) { var args = o.ToString().Split(','); try { using(var con = new SqlConnection(Environment.CommandLine.Split(' ')[1])) { var cmd = new SqlCommand { Connection = con, CommandText = "INSERT INTO Accounts VALUES(@p1, @p2, @p3, @p4,@p5)" }; for (var index = 0; index < args.Length; index++) { cmd.Parameters.AddWithValue(@"@p" + (index + 1), args[index]); } try { cmd.ExecuteNonQuery(); } catch (SqlException e) { if(e.Number == 2627 ) { cmd.CommandText = "UPDATE Accounts SET Name = @p2, Email = @p3, Active = @p4, Birthday = @p5 WHERE ID = @p1"; cmd.ExecuteNonQuery(); } } } } catch (SqlException e) { if(e.Number == 1205) { var t = new Thread(Upsert) { Priority = ThreadPriority.Highest, IsBackground = true }; list.Add(t); t.Start(o); } } } }
Comments
Nice. The possibility of an InvalidOperationException being thrown while iterating over list is especially sublime.
Handling 1205 is crucial as the db gets hit with all the rows in a "batch" :-))
I guess they never had an exception number 1205? Or is the foreach not reached before all worker threads are done because of their priority?
I like how they reinvented while(true) by starting a new thread that does the retry on failure.
Also it's nice how the Join on background threads. If they made none of the threads a background thread, the whole Join business would be unnecessary.
Due to my limited experience, I will assume the code shows the ideal way to access a database. ;-)
So here we have the 'gem' we were talking about;-) I'm sorry for anyone eyes that looked at it.
Well ... exception over exception .. this is not good.
According with the flow when we try to do that INSERT and the fail is Unique Index/Constriant Violation (http://weblogs.asp.net/guys/archive/2005/05/20/408142.aspx) the system try to perform an update.
This is stupid .. should try to do a SELECT first to check the unique key and then do the INSERT or UPDATE.... If there is any other error, nothing happen, the system continues his live.
If the update get an error and is Deadlock (never got this) the system try to do retry .. the system should put the operation in a queue or somethinf like that, and try later .. not like this.
this code only process a few exceptions. What about others like wrong number of parameters or something like that?
Regards Paulo Aboim Pinto Odivelas - Portugal
Posting this without any comment may people lead into thinking that this is code recommended by you...
Woa...make sure you wear a radiation suit when this piece of code crosses your path. I am certain I see at most half of the issues that could arise in this lethal combination of File IO, threading and DB ops. This code scares me and if I'd have the option I would run away.
You got to hand it to the author: he managed to approach CRUD without even reading anything that the DB might return :D (nitpick: yeah, SqlException does read results)
Not too bad ;-) Two main things: - There is no error handling for those lines in the file that produce an error other than 2627 or 1205 (there might be a not null constraint on some column) - Assuming the db is kind of standard: It's not the optimal way to parallelize inserts on the client side while the database can do physical writes only one at a time. Better is first starting a transaction then preparing the statement and then executing it for every line. Especially NOT performing a prepare every time (just executing) helps a LOT. (Up to 100 times faster!).
I predict this becoming an interview question for a dozen or more companies now :)
There are very few legitimate reasons to use ThreadPriority.AboveNormal, there are essentially no legitimate reasons to use ThreadPriority.Highest. I can't even look at the rest of the code now.
Just in general a no go. Load the data from the text file in bulk in a database and do the upsert via stored procedures. If you want to achieve more performance and less transaction log file growth. You can use partitioned tables and do partition swapping.
What version of MSSQL is this? If it is 2008 I would recommend looking into the MERGE DML statement along with table-valued parameters, which would allow you to pass the data as a single structured table and perform the INSERT/UPDATE in a single atomic command. Otherwise, for performance purposes, I would probably recommend using SqlBulkCopy to push the data into a temporary table and then using bulk INSERT/UPDATE commands from SQL. That can be done in batches of any size if you want progress during the file load or to lessen the possibility of lock contention.
Classical example when programmer use wrong approach to increase data access performance with Multithreading: b) Usually INSERT / UPDATE take less time than to create new Thread. Even if you use ThreadPool, time to switch context also usually greater than time to do simple SQL INSERT / UPDATE command. In given example, Threads used without any ThreadPool, and so every time new thread will be created (which is very SLOW), instead of reuse of existed Thread etc. So acceptable approach might be usage of some ThreadPool (or TPL) and do partitioning of initial data so that each thread have some real work to do, not just single SQL statement :D c) Usually different DB engines have different special API to load bulk data into some table - so why not to use that and do all required logic (if you should) after data already in DB. d) No connection string shown here, but it is very important to configure DB connection pool and set it to right size to not overload your DB (especially because all inserts go to the same table :D) e) code is NOT safe, not only because some exceptions are not handled (per comments above), but also because some operations access shared resources without concurrency issues in mind: in Upsert(object o) method, inside catch block, list.Add(t); operation do exactly that for example :) etc. I.e. basically I see many such bugs only in few lines of code, which is bad sign :( For programmer to be safe it was probably better to stay in single thread :D
P.S. some issues might be more easy to prevent with asynchronous / non-blocking programming in single thread, i.e. ala NodeJS :D Just run as many async, non-blocking inserts as you want and leave the rest to framework :D
It's the code fragment that keeps on giving.
@Paulo Abom Pinto - it's may really not be as stupid as you think. If the nature of data is such that it would be insert operations a majority of the time, the above (just referring to insert-catch exception-update part of it) is actually quite an efficient strategy. Using a select to check before every single insert results in extra work for database server, extra network traffic, slower performance, deadlocks and non-atomic operation (what if you use a select to check and then try to insert but someone already inserted data in the meantime).
@EDev Agree with you. Annotation: If you use "SELECT .. WITH XLOCK.." (or "SELECT ... FOR UPDATE " oracle) you can be sure that no one intersects.
Why did you do this to me? This is the worst thing ever!
It's easy to say "this is terrible". If you aren't going to point out why, even if you think it should be obvious, then you're not helping. You have a great opportunity to show people the better way. Why not do that instead?
Some people don't know any better because they've never been shown the "right" way to do things.
missing con.Open()?
http://www.youtube.com/watch?v=juFZh92MUOY
Threads in a loop is often an indicator of a scary code.
Typical post, leave some code, don't show what is wrong and how to fix it, thanks for helping NOT!
And yes before one of your followers tells me it is obvious what is wrong here, I can clearly see, but not EVERYONE is at a level that can see this.
These are the people who would benefit from being shown some crap code and how it should have been done. OTherwise these posts are just a stroke of your ego and do not achieve anything.
Bob: come on, man! He never promised to be didactic in every post! Ayende: thanks - that was awesome.
@Dan Bailiff & Bob, Keep in mind, this is Oren's blog. He can help or not as he sees fit. Personally, just sifting through the comments has helped me pick out several issues I missed on my first read. That lead me to thinking about how I would re-architect to fix those problems in addition to the ones I spotted instantly. That alone is useful.
Bob - if he had tried to explain everything that is wrong with this code it would have been a very long post.
and may be i'm way off, but i think that even the newest programmer should know the basics of exception handling, magic numbers transactions, threading and the rest. those who don't notice the problems with this code should read some more basic programming articles.
No more posts in queue!
Heh... all that effort to get parallelism and he shot it in the foot. Using ThreadPriority.Highest means that the worker thread will take higher priority than any other thread in the process... including the process that's running Main and scheduling the other threads. Once the thread quantum expires, no more iterations on the loop until the scheduled (and probably small number of) worker threads all finish.
nice :)
This is obviously production code for bank accounts and may be the reason for the worlds economic woes..
When all you have is a hammer, everything looks like a nail.
Not so memorable ;) For example, if you start the thread in the second loop you can have a really memorable code !!! ;) no limit to the worst!
@Dan Bailiff and @Bob
Eventually you'll figure out only the k00l kidz hang out here and they all try to out do each other to impress the guy who runs the blog.
You'll need to pass your Ayende blog readers certification test before you'll be accepted as a k00l kid.
Fear not, most of them don't understand either - they're just trying to impress each other.
Yawn, how boring. It's like living high school over and over. Occasionally there is a nice post, but each time you see code dumps, skip the post, the only responses are from kidz trying to impress each other.
A very nice new API - upsert.
@EDev of course the SELECT before it's not he perfect way, but he could ensure that on SELECT the record is locked or try a dabase trasaction and try to ensure the flow.
In my opinion some of this code should be in a Store Procedure and the thread flow in the C#.
to those who suggested select and then insert/update, it's more efficient to do update, check the return value to see if any row was updated, if it's 0 then do an insert.
@Paulo Abom Pinto - in 99% of the times, there's no justification for using a stored procedure to do anything.
I think the biggest problem with this code is that it expresses no intent. I read the code yet I don't have any idea what it does.
Yeah, there might be issues with the implementation, but even without ToString() or the threading or whatever this would still be bad code. Good code can be understood.
This code is basically doing nothing but spin a thread for each line within the text file that it is reading. It never open the connection to sql in the first place therefore none of the catches will be executed after it fails on executeNonQuery.
A piece of code and everyone starts laughing. But what is the context of this code? If it is a test for an interview without any VS at hand and just 30 minutes time for it? In this case it isn't that bad. And it would excuse that there is no con.Open...btw most of the commenters above did not realise this.
If the insert throws an exception then it must be an update haha. Love it
If somebody can code this in 30 minutes then either he is really good or really bad; generally it is the later because it include premature optimization code that is obscure.
Not opening the connection is a legitimate bug that you find out about after the first run. Not using a transaction is a huge bug that is not as easy to find. And no way someone wrote this code during a 30 minutes interview and knew the codes of specific sql exceptions
At least an invalid server name in the connection string will timeout in parallel! meh.
@Harry Agreed. And frequently even when Ayende does provide feedback on the code samples he presents, the comments include lots of additional feedback that he doesn't mention. In the case of code whose shortcomings actually require significant expertise to really appreciate, then it makes sense for Ayende to chime in. In the case of a code sample like this that's just egregiously broken, it makes perfect sense to just let the commenters evaluate it. It's like crowdsourcing source code review.
@Chris Taveres Not necessarily. Any blocking calls should make the thread sleep, at which point the other threads will have an opportunity to run. Which, of course, makes setting ThreadPriority.Highest kind of pointless, but it shouldn't starve out the other worker threads, since most of the time spent here will either be in establishing the connection or executing the command - both of which I would expect to cause the thread to sleep. Unless I'm missing something - I have to admit I'm a bit of a multithreading neophyte.
A "great" example of cloud computing. A cloud of threads...
A cloud of sycophants.
@Clay: you failed to honor Ayende properly in that post. Please enter additional grovel, else you will be reported to the authorities.
@Christopher
Thanks for bringing that to my attention. I apologize as I should have been more clear. A slobbering cloud of sycophants.
What happend to Oren? Is hope he is alive.
@dead-detector He has been teaching classes in NYC. See the top of the sidebar.
It's your RavenDB database corrupted? Post and comments seems like duplicated (or triplicated... or more) :)
Duplicated, Yes, fix now, a post with an explanation coming up shortly
I think that for db connections async dev might be beneficial, although I have not tried it. However, I would just opt for a backgroundworker instead. IMO the biggest problem with this code is in the exception handling: it lacks an "else throw;" in case the exception isn't the one expected...
I was checking whether someone provided an answer on this one yet... Looking back, please forget my previous incredibly stupid comment; I was talking rubbish;allowing an app to set up a multitude of concurrent connections would hog your DB connection pool, leaving no connections for your app, in other words this would be a recipe for disaster. Rather odd that I missed it the first time, as did everyone else... So, next to the missing "else throw" in the first exception handler, just ditch the paralleled approach all together, as it would probably hog up all connections on your database !
Comment preview