Reviewing OSS ProjectWhiteboard Chat–Unbounded Result Sets and Denial of Service Attacks
Originally posted at 3/8/2011
As a reminder, I am reviewing the problems that I found while reviewing the Whiteboard Chat project during one of my NHibernate’s Courses. Here is the method:
[Transaction] [Authorize] [HttpPost] public ActionResult GetLatestPost(int boardId, string lastPost) { DateTime lastPostDateTime = DateTime.Parse(lastPost); IList<Post> posts = _postRepository .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId)) .OrderBy(x=>x.Id).ToList(); //update the latest known post string lastKnownPost = posts.Count > 0 ? posts.Max(x => x.Time).ToString() : lastPost; //no updates Mapper.CreateMap<Post, PostViewModel>() .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString())) .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name)); UpdatePostViewModel update = new UpdatePostViewModel(); update.Time = lastKnownPost; Mapper.Map(posts, update.Posts); return Json(update); }
In this post, I want to focus on a very common issue that I see over & over. The problem is that people usually don’t notice those sort of issues.
The problem is quite simple, there is no limit to the amount of information that we can request from this method. What this mean is that we can send it 1900-01-01 as the date, and force the application to get all the posts in the board.
Assuming even a relatively busy board, we are talking about tens or hundreds of thousands of posts that are going to be loaded. That is going to put a lot of pressure on the database, on the server memory, and on the amount of money that you’ll pay in the end of the month for the network bandwidth.
There is a reason why I strongly recommend to always use a limit, especially in cases like this, where it is practically shouting at you that the number of items can be very big.
On the next post, we will analyze the SELECT N+1 issue that I found in this method (so far, my record is 100% success in finding this type of issue in any application that I reviewed)…
More posts in "Reviewing OSS Project" series:
- (15 Mar 2011) Whiteboard Chat–The Select N+1 issue
- (14 Mar 2011) Whiteboard Chat–Unbounded Result Sets and Denial of Service Attacks
- (11 Mar 2011) Whiteboard Chat–setup belongs in the initializer
- (10 Mar 2011) Whitboard Chat–overall design
- (09 Mar 2011) Whiteboard Chat
Comments
Is the Owner property lazy loaded?
Dmitry,
Yes
"There is a reason why I strongly recommend to always use a limit"
And, once again, this is a bad recommendation. Getting all the posts in the board (or the equivalent in other apps) is quite often a legitimate use case.
Use limits when they make sense, sure. Blanket recommendations (or crippling a query engine silently) like this are bad.
@jdn The point is not that the app shouldn't be able to show all posts. It is that the service call shouldn't be returning them all back in a single response.
If you do have a legitimate need for all of the board's posts (and let's just assume that you do for the sake of argument) then there are better back end implementation details for getting that data to the client than a single response with all of the data.
@jdn
Look at Tweeter and other reaaally big services. You provide "more" button, which can be easily clicked several times, when not providing of "kill my db by querying for everything" button. I hope you don't consider export/import scenario in here;)
jdn - When would it be a legitimate use case?
Can you provide an example?
It would only be a legitimate case when you know about the size of the collection. A number of countries in the world or employees in the company is not likely to dramatically increase one day.
@karg
What if I want them all in one call?
@kooletz
I will grant you that Twitter is probably not a legitimate use case. That's rather the extreme case though.
@daniel
If I want all of the open orders for a trade group that I support, that might return 15 records, or it might return 100,000+ (that's an accurate range, btw, not making it up). Regardless, when I query for all open orders, I want ALL open orders. And not paged either (if I want them paged, I'll page them explicitly).
Ayende said 'always' and I think he means it, that's why he made Raven DB safe/crippled by default (he calls it one, I call it the other, I'll let you guess which...LOL). And, he's been open about the fact that he did it for marketing reasons as well as technical ones.
And he's still wrong. YMMV.
Jdn,
What are you going to do with 100,000+ orders?
I am going to process them and send them to a 3rd party vended application that expects them in one shot (I've never really thought about it, but I don't think they even have paging in their API).
@jdn
What if I want to make a blocking network call on my UI thread so I lock up my UI? That makes me a bad developer for wanting to do that.
You seem to be stuck on the idea that since you have a desire to show a large amount of data to the user at once that it must be returned in a single service call.
Why is it that you specifically want it returned in one service call?
Jdn,
I can guarantee that the 3rd party vendors would REALLY like it if you didn't do this.
See this:
msdn.microsoft.com/.../cc663023.aspx#id0090070
Udi describe a very similar scenario and what happens when you throw that on a system
Ayende:
I can guarantee you that the 3rd party vendor expects it in one shot.
I am familiar with Udi's article and scenario, and it is irrelevant.
Which is part of the point. You don't know my scenario or the vendor, I do.
Now, if you want to ask me whether I think the vendor is doing things correctly, we may come to a different conclusion.
Karg:
Who said anything about showing a large amount of data?
It is true that Ayende's specific example is an ActionResult, but again, he said "always" and believes it (and built/crippled RavenDB around the concept).
Karg:
Sorry, missed the question.
To use my specific example, the end client wants everything to be sent to them in one shot. Since our systems are perfectly capable of handling 100,000 open orders in one shot, there is no reason not to get it in one call.
It is an interesting fact that even good developers appear not to consider the possibility of 'unbounded' result sets. I've discussed it with Ayende, and I don't dispute that even good developers write bad code.
It still isn't right to make code un-self-documenting.
If I write:
myCollection.Skip(472).Take(14393)
I mean, "skip 472 records and then take the next 14393". I do not mean, "skip 472 records and then take whatever Ayende thinks is the correct silent formerly poorly documented number of records that he thinks you should take because he doesn't want bad performance to reflect badly on RavenDB."
@jdn
1) he never said it had to be a silent limit. There's nothing really wrong with letting the caller specify the limit, as long as it's kept reasonable.
2) if you have service A that calls service B to get data, then send it to service C (where service C needs it in one shot), why can't service B have limits? Service A can call it multiple times, aggregate, then send to C. When you write service D that presents service B results to screen, it's already paginated.
I am with jdn on this case. If the 3rd party vendor requires the whole set of records anyway, how does splitting the request to multiple paginated calls (like few people have suggested above) make it any cheaper? You're still querying for the same set of data, splitting them to multiple streams will do nothing but making it a lot more costly. Law of physics ensures that.
This is a perfect example of enforcing a blanket guidance (i.e. pagination) just for the sake of it backed by no legitimate reason, in which you're actually causing the exact problem the guidance is meant to overcome.
Hendry,
It means that Service B doesn't need to worry about Out Of Memory Exceptions, for once.
Since in Service A, you are explicitly doing something out of the ordinary, you take care of that only in that place, and you don't have to worry about this in multiple systems.
I thought we had had streaming of large data in (web-)services? And similarly in NH. Unbounded query might not necessary be all bad, but holding unbounded data in memory definitely is.
@jdn
So you're saying that because you happen to have an (allegedly) legitimate case where you need ALL of the data, that it's a bad idea in general to limit it by default?
I don't know about you, but I would much rather have a system that, by default, caters to the 99% case (i.e., when requesting all of the data at once is a mistake) and requires some minor tweaking to make that 1% case work when I'm really sure I need to shoot myself in the foot.
In RavenDB you can override the max page size on the server. Problem solved.
In the case of this post, we're talking about a UI. I don't care what business function your app performs, displaying 100,000 records all at once to a user without paging is a bad idea for a whole host of reasons.
@Nick
No one is arguing that you should pull 100,000 records to display in a UI.
I completely disagree that it is a 99% case that it is a mistake to request all data at once. My (allegedly) legitimate case is a very legitimate case, and I can come up with many more.
When I write code that pulls data, it is up to me to decide how much data is going to pulled and whether it needs to be paged. I absolutely don't want some silent global variable deciding that for me.
It is unfortunate that too many developers are too lazy to figure out the performance impact of the code that they write. Treating the symptom by crippling by default is an anti-practice, not a best practice.
Comment preview