What is wrong here? Solution
In my previous post, I showed the database schema and the UI and asked what was wrong with that. Before we move on, here is what I showed.
If you looked carefully, you might have noticed that there are duplicate PO# and Tracking# in the UI. More than that, we somehow double charged the customer for shipping.
What is going on?
It is actually fairly obvious, when you think about it. Look at the schema, there isn’t actually any association between the Tracking # and the PO #. In most orders, we have only 1 PO #, so it was easy to just add this information by just pulling it from the DB and adding a few columns. But when we got an order that has multiple POs… that is when all hell breaks lose.
This is a classic Cartesian Product problem.
The solution? Actually model the UI to avoid suggesting that there is a relationship between the tracking and purchase orders, like this:
Comments
Spelling of OrderId is wrong in OrderLines table ;-)
That solution works in any case except one where management is adamant "We MUST be able to see purchase orders and tracking in the same grouping". I've run into a few places like that in the "real world" where someone absolutely MUST have things displayed a certain way whether it's convenient or not, and won't take no for an answer.
@Wayne - I have run into that too, but normally I say "how do you want it displayed when they have multiple POs and multiple tracking numbers? With what we have now, it would show double for shipping charges"
Making them make the decision makes them think about it, and when they flounder, you say "what if we did this?"
Note that nowhere in there am I telling them that they are wrong. I are not directly saying it's a bad design, and I am always using the term "we", including them.
You'd be surprised at much more effective that can be over saying "No."
@Philip - many customers are not interested at all on the limitations of the current technical solution - they just want to hear how long it will take to implement the requirement and how much it will cost. :)
So, any conclusions? Cartesian product considered harmful?
The whole model is so bad, that there's no real solution other than redoing it. For example, the 'purchase order' entity, what does it mean? It only adds a new attribute 'ApprovedAt' for the OrderId it's related to, but... does that mean an order can be approved multiple times? And what does each time mean?
Looking at the UI it doesn't make any sense either.
Oh, and there's no such thing as a 'cartesian product problem': a cartesian product is a set operation, that's all. This problem isn't about a cartesian product, as it's not a problem related to a set operation but related to a crappy modeled database.
Tip: first create an abstract entity model and then project it to a relational model (and code). You'll see your tables make much more sense.
@Rafal if you don't know what a cartesian product is, you should read up on it. Every join operation is based on it. http://en.wikipedia.org/wiki/Cartesian_product. In theory at least. (so take the cross-join, which is the cartasian product) and filter out the rows not matching the ON predicate). In practice, a hash-based join or other shortcuts are easier as cross-join based joins are often slow because it produces a lot of rows which can be discarded by the predicate (so you can better use the predicate to do the join to filter out rows which don't need to be considered in the first place.)
@Frans Bouma, I'd be interested in seeing how you'd redo this model.
The ability to add more than one PO per order is required, as approval may come from different departments within our customers. So PO should also have approved_value.
In our case we have lineitems ->packs->tracking rather than orders->tracking. We can have lineitems from multiple orders in a single pack, for small items, but we can also have multiple packs for a single lineitem.
Thanks
Steven
@Steven: First of all, the model doesn't enforce relational integrity properly. Look at Tracking. I can add multiple rows for the same Order, but what does that mean? Only if there's more information in the entity which tells me what the costs are for, it's not useful: as the information in the entity will not tell me the difference between row X and row Y (except the cost/sentat value differences). This is important as the model doesn't stop me from adding multiple times the same row (different ID, same values for OrderId, cost, sentat).
Same is true for purchaseorders: they're not related to anything, I can add 10 times the same values for Orderid and ApprovedAt.
OrderLines: the model allows multiple orderline items for the same product. This doesn't make sense. This can be fixed by making the PK (OrderId, ProductId).
If the actual, full, model is doing all that, OK. If not: it needs correction. Though looking at the fact that all tables have the same pk 'Id', I fear someone who thought to be clever made this model and giving every entity the pk 'Id', even if it didn't make any sense (like in OrderLine).
If there's an entity 'Department', PurchaseOrders should get as pk 'OrderId, DepartmentId', so the model already enforces the fact that you can't add multiple approvals of order X by the same department D.
If the model is this way because the o/r mapper is too lame to deal with compound PKs when it needs to, use another one which does.
Frans - relax, it was irony. Next time I'll try to make it more explicit.
@rafal hehe :) I didn't catch that, but then again... I sometimes have to switch off my Sheldon mode ;)
Actual question: Am I missing something? These two posts leave me with a "Well, duh..." feeling. An obvious error and the obvious fix is to....remove it.
Was it to show the pitfalls of sloppy relational querying?
Comment preview