Where is the ROI?
Justin has a post called: Keep your IQueryable in check, in which he basically says that while he finds value in the Query Object pattern that I recently talked about, he is concerned by the fact that I am returning an IQueryable (or ICriteria or IQuery) from the query object. The problem is that the UI layer can then modify the query in potentially undesirable ways. The example that he gave is:
latePayingCustomQuery
.Where(c => c.LastName.StartsWith("E"))
.Skip((page - 1) * itemCount)
.Take(10);
Which perform a query on an unindex string column, which is slow.
He suggests limiting what the user can do by providing a IPagable && ISortable that would limit the options of the UI layer to abuse it.
My response to that is: And…?
I am not in favor of doing things to protect developers, but there is also another side to that.
What is the worst thing that can possibly happen? That part of the application would be slow. We will find it during testing, or (heaven forbid), on production.
And…?
How is this in any way different than anything else that we are doing? Presumably the developer who wrote the code didn’t wrote it just because he felt like it, there was a business reason behind it. At that point, we treat this issue as a standard perf issue, and fix it.
The cost of doing that in the rare cases where it will happen vs. the cost of creating the limited abstraction, forcing developers to deal with that, etc. More than that, I can assure you that you will need to break out of the straightjacket at some point, why put it on in the first place?
This is a question that I routinely ask my clients. Why do you want that? “To stop developers from…”
To which I reply: “What is the cost of doing it this way? What is the cost of a mistake? What is the cost of fixing that mistake? What is the difference between that and just another bug?”
There is a reason that I choose the ALT.Net logo to be running with scissors. Power tools are here to be used. Yes, there are things that you want to hide, but those are by far the exception, not the norm.
Comments
I like this mentality. People whom doesn't understand the architect of a project shouldn't work on it in the first place.
Ayende, I can't agree with you more!
Two outcomes from over abstraction (besides the wasted time on the front end):
1) we will subvert your abstraction and in doing so achieve less performance in man hours and cycles
2) we will complain for the rest of our lives that the abstraction is broken or has lost its luster when new & improved abstractions comes to market
3) *just thought of this one - we will need documentation that does not exist and reach deadlock
of course I'm referring to the proverbial you in my comment =)
I have no issue with this particular problem (IQueryable <t).
BUT. I do with ICriteria. IQueryable is a part of BCL, it's in System.Core and its referenced in most other dlls anyway.
However, having NHibernate classes/interfaces in my view logic, lights a warning sign in my head. Do I really want to have a reference to NHibernate in every one of my assemblies? Do I want to deal with NHibernate in every one of my layers?
I wonder, has anyone ever (for comedy sake), needlessly abstracted and abstracted until you cannot possibly think of another abstraction.
My point being that the skills of abstraction are knowing when and how to abstract, not about creating cool and clever abstractions without need.. its along the same lines as anything, ie- the refactorbation problem.
Refactorbation, man we should have a developer quote site.
Krzysztof ,
do you have no problem with SqlConnection because it is in the BCL?
See my previous discussion about DAL all the way to the UI.
Thanks for the response, I think this discussion can only help people make better decisions.
So my question to you now is what does passing the IQueryable out of your query object buy you? If you want to keep the query contained within the query object then all you are doing by passing it out is allowing things that you don't want to happen. Why would you pass out IQueryable when you really don't want the queries formed outside of the query class? Or do you?
And my problem isn't that the application will perform poorly, my problem is that in many cases you will have an application that performs fine and performs as expected, but you start to have little bits of queries all over you application and you don't have anywhere to go to see what queries you are executing. Both the repository pattern and then query objects you are using have this benefit, but once you start returning IQueryable you don't know where your queries are being formed.
So yeah, I think that people should be given sharp tools so that they can do great things, but it just seems here that we are allowing things that we explicitly don't want happening. I'm keeping an open mind on this, I just want to be convinced if I am in fact wrong.
There's a difference between giving people sharp tools and giving people sharp tools which, when operated in what one might call a perfectly reasonable (or even idiomatic) fashion, may chop off one's thumb, despite the fact they were pointed nowhere near it.
Of course it's easier to simply ignore the fact that developers may not read the docs, or that they will fail to guess what seems obvious to you about the given interface, and I'd be tempted to do this myself, but:
Imagine yourself using someone else's API. Think about the possibility that you may not have the experience or information necessary to make assumptions about what is acceptable to put in your calls. Remember that there's always someone smarter than you - or at least someone who will make poor assumptions about your knowledge or experience.
Who said that this is a goal?
I want to encapsulate query logic, but there is a lot of value in having the high level able to enhance the query again.
In you example, doing a filter on top of the query is something that is quite common, and acceptable.
I want to encapsulate the query business logic, not filtering by name, there is no logic in that, and I have no problem with seeing this appear in two places in the app.
I should point out that nothing really prevent you from opening a SqlConnection in the view either. You just don't do that.
Rik,
I don't disagree with you.
But I think that I draw the line about sharp & dangerous tools in a very different place.
@Ayende I see where you are coming from, but if you want to perform additional grouping, sorting, filtering, etc... wouldn't it make sense to compose query objects rather than just having this done in the view? It seems to me like you would get better query reuse, and again, more isolation of your queries.
I'm not sure that we are going to agree on this, I know that many people think that I am "protecting" my teammates, but I've been on way too many projects where I wish there had been a little bit more thought put around guiding developers down the right path.
And yeah, I know that tests should catch perf problems, and that we should trust co-workers to do the right thing, but in the world of deadlines and budgets, what percentage of most systems get a code review?
As firefly said above, "people who don't understand the architecture of a project shouldn't work on it." Well, "should" and "do" are completely different subjects. So yeah, maybe I am trying to solve institutional problems by putting some big flashing signs in the code saying "you should be going this direction!", but on some projects that it is exactly what is needed.
As a final note, if I was working on a, let's say, 5 person project all of whom were good developers and all of whom cared about their craft and wanted to create excellent software, then this wouldn't be an issue. Unfortunately that is not reality for the majority of developers out there. And that is completely ignoring the fact that, as consultants, we often have no idea who is going to be maintaining our code going forward.
Justin,
a) I think you miss the point that I am trying to make. You don't mess around with the query in the view.
The highest that you deal with the query (vs. its results) is in the controller.
b) I fully agree that thought should be put into making sure people fall to the pit of success. I disagree about putting roadblocks along the way, I think it is the wrong approach.
c) The question is, what is the cost of catching this perf problem in prod? Obviously it can't be too high, otherwise you would have run a load test first. Since it is not too high, why are you worrying about it?
d) Road blocks don't work. Making sure that the right way to do things is _easy_, and making sure that you put some effort into introducing people to the project properly generally work.
People want to do the right thing, after all.
e) I am working with mixed skilled teams as well. I haven't found this to be the major issue. Training is, and proper introduction, but that is it.
I also accept as given that not everyone will write the same level of code, but as long as they are writing good enough code, that is fine
The only thing I found interesting about pushing the query to the view was that the view (as a template of that data) may often have a better idea of specifically what properties it wants to use.. so far I've kept the controllers knowing what data the view will want to use, but this is something that always pops out at me, but unless theres a performance issue with pushing more properties than needed I don't think it needs to happen.. (by properties I mean the view essentially doing a Select(..) as such to get a subset of the data- with a queryable, the query provider can optimize that requirement to only pull required data).
Stephen,
That is generally not really that much of an issue.
In most cases (unless you have BLOB or TEXT columns), it is actually more efficient to have a single select for rows in the table than to pull out just the columns that you want.
Mostly because it will make better use of the query cache
Ah very true about the cache.
Nice post, this is something I've believed for some time.
I think the 5:01 developers out there get a really bad rap. Developers who are just there to cash a pay cheque (and there are plenty) will generally move to the point of least resistance. So they'll code what is easiest, so if you (as in "you" the project lead) write code such is this that they will use, simply explaining it (with a sample) will in most cases be more than enough to get them on the right track.
Your "average" programmer likely is still an intelligent person, so building roadblocks to prevent usage probably isn't nearly as beneficial as just explaining it to them.
Here's a scenario where stopping the developer before he can hang himself with too much rope is applicable: I used to work on a very large site with a whole lot of traffic. I was part of the DB team. We supported webapp developers with APIs, Ops, etc. but a good portion of our daily operations was recovering from things that shouldn't have been done. Queries that under test DB sizes and loads seemed reasonable to developers that could hang up the main customer database causing apparent site outages in certain areas, kicking in SLAs with partners and other lovely things.
Now don't get me wrong, that this was possible to occur was the product of many WTFs, but from an operations stand point for my group it was easier to shorten the rope to be hung by than to fix all the systemic problems in the organization (which we tried to fix as well, mind you).
All that said, i much prefer working on teams where the trust, skills and responsibilities are assigned to all members in such a way that allowing developers to run with scissors is an acceptable risk, and have sought out such organizations since that past experience.
Interesting points made by all, so here we go, the be all end all IQueryFilter:
public interface IQueryFilter <t
{
<t GetQuery(IQueryable <t query);
}
public LatePayingCustomerQuery : IQueryFilter <customer
{
<customer GetQuery(IQueryable <customer query) {
}
^^
No need for a reference to nhibernate (no ISession) works with any Linq enabled persistence technology (nh, linqtosql, etc).
I hope I don't get shot, but i'm implementing it!
I think I don't want to write Linq expression in UI either, just for the same reason I don't want to write SQL/HQL in it.
I was thinking 3 folds implication:
I know unit-test helps there but I know many projects that don't even have unit-test (other than their own hands).
It's really hard to write query in UI if we have to handpick expressions and entity properties that are understood by ORM mapping out of a forest of domain structure. That tightly depends on ORM configuration. IMO it leaks persistence infrastructure to UI layer. And I believe it slows developers down. I think it's sensible to abstract those queries in data-access layer so we can deal with it and never look back at it again. I thought this is what SoC is all about
There are many ways to write the same Linq expression. Some are more optimum that the other. And I don't think UI layer is the best place to do it.
More about protecting the team, it's something to protect myself. I don't want to consult ORM mapping config in every screen just to validate if I'm writing a valid Linq expression.
Btw, what's your view about writing HQL in UI Controller/Services?
Arne,
You have a process problem. Why aren't you giving the developers a real sized database?
Why aren't you having a load test with real sized database before going to production?
Henry,
1) that is not a problem, the first time it is used, it will generate an error, that is very easy to discover and detect.
2) never had that problem, and see the previous posts about the problem with doing it this way
3) again, read the post and show me the ROI
Ayende,
Oh, I agree that there were plenty of process problems and having dev be an exact replica of production is desirable, but in this case, even with my process complaints I don't think it would have been feasible. Our databases were several gigabytes (back when gigabytes meant something), we had, at any given time, 10 or more separate dev environments going for 100+ developers in different groups, so getting everyone production equivalent hardware and DBs wasn't feasible. And even if that hadn't been a limiting factor, faking the production loads all our apps were generating would have required yet more dev hardware.
Arne,
That is pretty common scenario, and I agree that you don't want all devs to use real size data.
But that is no reason not to have load test with reasonably sized data base.
You only need one of those.
ayende, do you think that Justin is confusing middleware with DL? I don't see any domain level problem being solved by supplying page & sort... I'm not trying to call out Justin, but just wondering if you think that's where ppl are getting confused?
Ayende, I think using Query object (in your previous post) separates the concern out of UI, without imposing OCP problem. My concern is only when Linq/HQL ex[ressions are exposed to UI layer.
Doesn't look like it belongs there. It would be OK if writing Linq expression was easy. Alas the fact is it is not. Too many infrastructure-related shinanigans to deal with.
Btw, I am quite happy using NHQG in UI layer because the abstraction is less leaky than Linq IMO.
Btw I'm curious, how is it different to write Linq expressions in UI from writing HQL statements in UI?
Cali,
I am not sure that I understand the question
Hendry,
I am surprised that you find NHQG easier, it was explicitly modelled to be as close to Linq as I could get in C# 2.0
And no, I wouldn't condone using HQL in the UI
I wouldn't condone doing any complex query manipulation in the UI, for that matter.
Linq starts to feel like magic string to me (almost equivalent to HQL, except plus intellisense and refactoring), in the sense that it's so liberal that you can throw almost any expression to it, and they will all compile fine, but only small portion of it are actually valid syntax,.. depending on how I configure my hbm.
NHQG is very rigid that it allows me to write only what is not allowed by my Hbm config. The feedback is immediate.
Altho one might argue that integration-test can provide better feedback to Linq, but so can it to HQL.
Uh, I meant, "NHQG is very rigid that it allows me to write only what is allowed by my Hbm config" :P
we throw around the term middleware a lot where i work. i apologize i misused the term. i really meant infrastructure or framework level.
If he's trying to achieve that level of protection against mistake then how is that fear domain specific? Isn't that an infrastructure concern?
Justin argues that by supplying the paging and sorting implementations that he's protecting the infrastructure from abuse, slowness, stupidness... He says, "So we really have two conflicting requirements, we want to provide the flexibility of paging and sorting up closer to the UI, but we also want to lock down the queries a bit." I argue that not only are these conflicting requirements but they aren't even domain layer requirements... they are infrastructure level.
would you say that's accurate or am i missing something?
what i would be more concerned about in my DL or UI is allowing a developer the option to make a mistake that allows the user to subvert security or logic... not developer.
I don't think thats essentially the point, security and whatever makes sense to exist at lower layers still exists, I think the point is that the ability to richly query the data can still bubble up to the controllers.
Ayende,
I guess my answer would still be that in the our case that would still have been a bad ROI. Education, code reviews and creating DB APIs that make devs fall in the pit of success were still a lot cheaper.
I'm going to overlook that we usually had several parallel QA/Test efforts going on for different features to be released on the site (and site here was lots of different app clusters totallling over 1000 servers) and having them all cycle through one set of load testing hardware would have been a whole lot of man hours and a big delay added to a release cycle.
So every once in a while someone would write a query they shouldn't have and had they checked with a DBA they would have gotten a very quick "woah, buddy, don't do that, let's see how we can get what you need another way". But they made it into production, because the test loads didn't show this to be bad and even production didn't until a certian load scenario happened. Now the postmortem happens and here's conversation asking for the duplicate load testing hardware:
CTO: "how can we prevent this from happening again?"
Me: "You know that $250k cluster we have for the main prod DBs? We need another one of those and ops cycles to populate them and build fake load scenarios so we can test the dev's code"
CFO: "Really? The only way you can prevent your devs from doing something stupid is by buying loads of hardware to see their stupid actions under load? You call that engineering?. Purchase request denied"
And that response would have been valid. We had process problems, but throwing hardware at them was not the right solution. We eventually changed the most contentious DB to be a service layer that devs talked against, since them doing ad hoc queries in the first place was the real WTF. But a service layer is just another way of saying "we are limiting what you can do against the DB and if you need more, let's talk about your needs and we'll build it intelligently." Allowing adhoc queries against a DB from the webapp tier breaks down at some size of operation, test environments not withstanding, imho.
CaliCoder,
Obviously that is a totally different question, and I don't advocate that at all.
Arne,
I don't agree that you need an exact duplicate hardware for that.
For that matter, I would say that even if you needed, asking for 500$ per load test and setting up the test on the cloud using something like EC2 or GoGrid is a great way of doing that (yeah, I know all the trouble with data in the cloud, that is an example).
Having an additional server serve as the DB for the site for load test (standard one!) would actually make the load tests more effective, since the DB would be slower.
Basically, I don't consider it disciplined to just move to production without having a proper release procedure, and that include load testing, among other things.
Cycling different versions through a single set of hardware is actually easy, you need to put several versions of the site on the same hardware.
If this is hard, this point to a problem in your deployment.
A customer that I used to work with had a 4 days mandatory load test for every release. That is 4 days of hammering the site, trying to find slow parts, memory leaks, etc.
Not doing this, especially after the first time you got bitten, is not responsible.
Ayende,
I guess I'm not explaining this properly, since your response seems to indicate that our environment was a lot of simpler than it was.
First, this was almost 10 years ago now, so no cloud
Second, we had release procedures, including various QA cycles and load testing, and we did the load tests with slower hardware with limited sets, but that cannot catch all scenarios. Our production environment had a number of database machines with different databases being used in different capacities by a lot of different app clusters, so relying on our test loads recreating all possible contention scenarios was just likely.
Third, cycling versions through a single set of hardware was a large ordeal with a lot of hours and time in transferring large datasets, no matter the automation and procedure. This wasn't a setup with a single DB on a single server and one webapp hitting it, even when reduced to test environments.
Fourth, I have repeatedly stipulated that we had process problems, I'm just not convinced that had we had this ideal test environment our problems would have gone away.
All I'm really saying is that I still don't buy that it is better ROI to permit ad hoc queries against DBs with high contention and using load testing to determine what's bad, rather than limiting access to that resource under contention and when the access is insufficient determine what is the best solution.
Arne, the problem with your arguement is that you are completely loading your example to make your answer correct.
The reality is, there should be many places where a silly query should get caught, and in one situation 10 years ago, it happened to get to production. But due to that one case, you are advocating a heck of a lot of extra work not directly tackling the problem to prevent it from ever happening again.
Basically, and get ready for a lousy analogy, 10 years ago there was a car accident and someone died due to not wearing a seat belt and now to prevent it from happening again you are restricting the top speed of all cars to 5 km/hr rather than just ask people to wear seatbelts.
See...I told you it was lousy. ;)
Well Ayende is absolutely right.
You can't build a software thinking in terms of "the developpers are going to do bad things".
Well I suppose that's because tests only occurs at the end of the development or even in production.
And beside, using tools like linq or nh are subject to great performance issues.
Limitating a collection, especially binded to data, is useless. It's a fake best practice.
If it is the last solution that could be applied, well, you're in deep trouble.
It's like saying "making cars with a 20mph max speed is a good solution to avoid car crashes".
Hmm.. Maybe I'm missing the point but code like this:
latePayingCustomQuery
.Where(c => c.LastName.StartsWith("E"))
.Skip((page - 1) * itemCount)
.Take(10);
shouldn't be placed in UI but it belongs to Service Layer, right? That layer is responsible to build query and other stuff which goes down to DAL?
dario-g,
No, that belongs in the presentation layer, not service layer
Comment preview