And some people will INSIST on shooting them own foot off
Because, clearly, that is what is missing. RavenDB GetAll extension method
Because, clearly, that is what is missing. RavenDB GetAll extension method
And 4 more posts are pending...
There are posts all the way to Feb 17, 2025
Comments
Creating it in the first place is a bit WTF. Deciding to hold all of the results in a List and only return it after all of the calls complete, despite being inside an IEnumerable method just... elevates it to another level.
I can somewhat understand wanting to get all documents. But:
var results = new List<T>();
Really..?Btw, what do you think of my addition? public static IEnumerable<T> GetRange<T>(this IDocumentStore documentStore, int start, int count) { var results = new List<T>(); for (int i = 0; i < count; i++) { result.Add(documentStore.GetAll().ElementAt(start + i)); } return results; }
:trollface:
Ugh, no preview and no edit >.< Let's see if this works:
:trollface:
Can someone clarify what's wrong with this please? I'm new to ravendb and understand the basic Do's and Dont's, but a rundown of why this is bad would be great, for myself as well as anyone else, especially those who might come to this page after googling 'ravendb getall'.
Joel, Look at unbounded result sets, as well as the real reason why we don't allow this in RavenDB. Basically, what happens if you have 1 million results.
For the record I agree with the design impetus for making this so. Then again, sometimes one just wants to get all the Ts and many times you know you won't have 1m or even 1000 records in a collection but you could well have more than 128 and you don't want to write a pager loop to handle it.
Now, I recall seeing somewhere there was a new 'stream me all the T' api option but that doesn't help people on older versions.
I have some collections with many small documents, and i just need all of them, easy. As i am working a lot with moving/importing data (~2000 docs) around, i had to do the same workaround. Forcing users to make stupid things themselves, and then blaming them i find is quite silly.
I agree that most of the time you don't want unbounded result sets. But there are legitimate reasons for wanting to retrieve all the data in a collection. For instance when exporting data in some other format or when generating a sitemap.xml with all pages and such.
There are exceptions to every rule.
I agree, there are definitely cases that you need more than 1024 records. Even worse, when using a hosted RavenDB, you can't easily change this value to retrieve more.
For example, I need to list all Zip Codes in a state to allow users to multi-select them.
Not saying his implementation is good, but there are definitely cases where it's needed.
@Duckie,
having to move/import data in batches already sounds like a "workaround". If you would send a message the target system as soon as your entity represented by the document changes would change that batch process into a real-time interface. And remove the query all documents necessity.
Yield return would at least prevent complete waste when the calling code does Take(x).
The "pager code" is pretty simple to write and is a good warning that you are doing something potentially dangerous.
Trying to make GetAll generic and reusable is much much more difficult. What I've seen is that soon you want to add a Where condition, then you want custom skip/take, then you want to get the Statistics, then you want to Include some other document, then you want to WaitForStale...
Soon this GetAll method and it's overloads are a pretty substantial API for which each combination of parameters has exactly one usage in the system.
And then there's this: http://ayende.com/blog/161249/ravendbs-querying-streaming-unbounded-results
@Scott - Each zip code has it's own document? I would think they would be grouped into far fewer docs.
@duckie - import/export could be done via the smuggler api. It would be interesting to see what Studio is doing here and emulate that.
What's wrong with this? I mean theoretically a windows server can 'scale' up to 4TB of memory. That way you don't have to pay developers to think and write good code!
Wyatt, What is the actual user scenario that requires all the data, when the data can be many thousands of records?
Duckie, We have explicit support for bulk insert / reads. That prevent you from loading everything into memory.
Scott, Why are you storing all the zip codes as a separate documents?
... and I don't understand why you don't understand it. There are cenarios beyond OLTP web applications where you just need this: GetAll(). I'm using it heavily in a desktop application that runs on RavenDB embedded. I know the perfomance implications of every other approach and yes, I think GetAll is the best in our situation. I'm sure there are other valid use-cases as well which you could have addressed with a better implementation of the streaming API.
Ayende, i need all data in memory, so i can use whatever linq commands, filtering, querying, sorting etc i want. Performance here is not an issue at all. I got loads of data i need to do manipulation on.
Duckie, Whatever for? Filtering, querying & sorting are db tasks, not in memory tasks.
@Duckie, @Daniel:
Don't worry. Ayende has been wrong about this from the start but implemented this auto-handcuff for marketing reasons.
There are sound technical reasons for wanting GetAll(). There used to be a way to override the "dumb by default" behavior in RavenDB, not sure if it is still in the code base or not.
I wonder how many hundreds or thousands of apps are actually efficient because RavenDB forced them to be, and forced lazy developers to do proper paging and/or document structure.
RavenDB has forced me to think about performance from the start, when normally I'd be lazy about it with SQL+O/RM.
@Daniel and @jdn
Sure. And it's pretty easy to roll yourself with the exact "flavor" you need (from my other commment). A GetAll in the API doesn't add much value to the common case.
For "embedded and not that much data and I understand" scenarios, I personally have used LoadStartingWith and avoid the query issues altogether.
LoadStartingWith + the new Streaming API + Smuggler + roll your own while loop = a lot of ways to handle these situations without having a simple, but dangerous, method exposed on the api.
Also, Dynamic Reporting takes care of another set of cases: http://ayende.com/blog/162339/ravendbs-dynamic-reporting
Facets solve for still others.
The difference being that these choices address specific concerns regarding working with the entire dataset instead of exposing a seemingly simple api method and hoping the user understands the intersection between the subtleties of what they are actually trying to achieve and what the api is actually doing.
@Kijana:
If I say "Select * from", I want select *.
If I want "select top 1024 from", then I will write that.
"LoadStartingWith + the new Streaming API + Smuggler + roll your own while loop = " a pain in the kiester.
At some point, it went from "running with scissors" to "crawling with pillows."
@Judah is quite right that Raven makes you think about performance and therefore paging.
My only beef is I think an exception should be thrown if the number of documents requested is greater than the default 128.
@jdn - Sure. If I was writing sql, fine. The problem is we're using abstractions on top of abstractions.
Code like that GetAll extension method is one of the primary reasons so many people (DBAs) say "EF Sucks". EF is fine, but once you abstract away what's going past a certain point, it will just lead to painful "surprises" down the road.
I once worried about this and typed up a post for the forum. I then realized that the while loop to page the results was shorter than the post I was writing.
Ayende, the DB cannott do what i want, without a lot of investment in time. I just need my data out, so i can work with it myself.
I understand the desire in optimal use of Ravendb by limiting the API, but forcing users to do stupid things is .. stupid.
Maybe just make a method called quyery.GetAllWhileUnderstandingThisIsStupid() ..
@Tim, you mean if the total document count is greater than 128 and you haven't specified a Take?
I like to explicitly define a Take for all queries, but I'd probably say log WARN instead of throw.
This reminds me of a technical lead in a fortune 500 company explaining me how having a web service exposing something like public dataset execute(string query, string connectionstring) was great to speed up development and deployments. Yes you can, no you shouldn't.
@David
The 'I might need to get everything because of sitemap' is questionable. Google doesn't NEED sitemap to index your site. You just need to ensure that all of your pages are reachable from the bookmark url. Oren's blog has lots of dynamic content too, a lot more than 1024 posts - see the sidebar. But of course it is all indexed by google. Someone should write an article about this...
Sitemaps is not only about making a list of links for indexing, but also to show google the structure of the site. Besides, if they want to expose a sitemap, why is this questionable?
Fact is, if you want to load many documents to memory you have to do special stuff with ravendb, No matter what valid reason you might have for it.
This is what users experience / what i experienced.
You only get a limited number of records. You increase this. You run in to the maximum limit of records. You start paging it out, but you run in to the maximum queries per session exception. You increase the number of allowable requests, or you create multiple sessions.
Since streams were added, it is of course easier to do.
at the beginning I had the same thoughts.. but now, no way.. I rather while loop than just blindly get all documents.
I found myself asking.. do i need this here, is the model designed correctly or should this be a map/reduce..
don't change a thing.
We actually have some legacy APIs that we've converted over to use RavenDB on the back end, but we still have to maintain the non-paged methods.
We have the following (better) extension method to get all. It obeys skipped results and returns an IEnumerable<T> so you can avoid materializing the whole thing if you're just operating over the whole set.
This is with Raven 1.0, we'll use Streams when we upgrade.
http://pastebin.com/AqaAu6DC
I'm using embedded in a desktop application and I have to agree completely with @Daniel here. "GetAll" is absolutely essential for my use cases, as it ensuring that the query does not wait for any stale results.
I'm also using 1.0 currently, but will likely move to streams when I get time to upgrade.
Oh dear, how embarrassing, I know it's wrong but I needed a quick hack and had just read this:
http://stackoverflow.com/questions/11268955/retrieving-entire-data-collection-from-a-raven-db
I put in on my blog in case I needed it again; honestly didn't expect anyone to find it! I'll remove it for fear of encouraging others.
Comment preview