What is wrong with this code?
There is a huge bug in this code, resulting in data corruption. Can you spot what it is?
public byte[] DownloadBytes(string url, ICredentials credentials) { WebRequest request = Util.SetupWebRequest(WebRequest.Create(url), credentials); using (WebResponse response = request.GetResponse()) { using (Stream stream = GetResponseStream(response)) { byte[] buffer = new byte[response.ContentLength]; int current = 0; int read; do { read = stream.Read(buffer, current, buffer.Length - current); current += read; } while (read != 0); return buffer; } } }
Hint, this has nothing to do with exception handling. Assumes that nothing goes wrong.
Comments
Perhabs that response.ContentLength doesn't include headers but your Stream does?
No, the steam doesn't include headers.
Response.ContentLength could actually return -1 if the Content-Length header is omitted by the server.
Also this code can wrap the content, because the ContentLength property is actually less than the WebResponse actual length. My guess is that ContentLength usually holds buffer size instead of real content length.
For the purpose of this code, the server always returns the Content-Length header.
And you are right on the money in terms of the real issue, but not on the cause.
It tries to read the entire contents twice?
No.
Notice the parameters to Read() change
If you read in larger than int size?
Content-Length is the length in characters, not bytes. So if the stream is UTF8 (for example) the byte length is actually larger than the content length.
Not that.
That would be a problem, but this code is meant to deal with files < 10MB and commonly ~10Kb
El,
Not to my understanding:
Hmmm... now im just guessing but could it be the authentication of your credentials that mess up the stream you get back?
Getting back something other than 200 OK? Like continue or redirect, etc?
No, auth is working just fine
Doesn't care about redirect and stuff.
I still think you need to use StreamReader with an encoding
http://msdn2.microsoft.com/en-us/library/system.net.httpwebresponse.getresponsestream.aspx
Just reading my rss quickly, but I love a good challenge.
I see right away there are credentials of some kind being passed to the server, but we're not actually checking the result of that authentication...just assuming we're passing authentication every time and returning our buffer....
Note that we are returning a byte array, we don't care about encoding
If the authentication fails, it will throw an exception.
We don't have much to do about it
read = stream.Read(buffer, current, buffer.Length - current);
you're reading from 0 to length - 0... shouldn't this be
int readSize = 1024 //pick a number, here.
read = stream.Read(buffer, current, readSize);
You are attempting to read your entire buffer the first time. Theoretically, (read) should never come back greater than 0. However, if it does, you will be attempting to write outside the bounds of the buffer.
Taking a guess here... ContentLength does not include the header info, but that header info is sent to the StreamReader?
Derick and Craig: You should revisit the documentation on Stream Read it tries to read from current position and "up to" the number of bytes specified in the third argument. It doesn't nessecary return the specified number of bytes specified
stream.Read expects the "current" offset to be zero-based, your line should read:
read = stream.Read(buffer, current - 1, buffer.Length - current);
Large content is actually split into several chunks. According to Chunked transfer encoding each chunk starts with it's own size. The buffer would be of the length of the first chunk?
Derick
No, that is fine, put this in position 0 and read length - 0 bytes to the buffer
Craig,
Hm? I am not following your reasoning
Erik,
No, the stream doesn't include the headers
Well, I think I almost had it... you could get current = -1 then... probably something more like:
read = stream.Read(buffer, Math.Max(0, current -1), buffer.Length - current);
Evgeny ,
Interesting point, but I don't think so.
It isn't the scenario here, nevertheless
Doug,
No.
Current starts at 0.
Then we read read(buffer, 0, 1024)
Now, let us say that we only read 10 bytes.
current += read; // current = 10
buffer[0 .. 9] - contains the read bytes.
byte[10] is empty and where we will start reading next time
Anything to do with disposing the stream, followed by disposing the response object?
response.ContentLength is a long value which is 64 bits. current and read need to be long also.
I think there may be a problem with the termination of the loop - while (read != 0)
What if there are no bytes to be read when you try - you might terminate before you've actually read the complete response.
Shouldn't it be while (current < buffer.Length - 1)
Or maybe not :-) The documentation says the Read method blocks until at least one byte is read, so it would only return 0 at the end of the stream.
You cannot use the content-length to determine the length to read. For example if the server is compressing the data the length will be incorrect from your perspective.
Is it because the return happens before the end of the first using statement? Which would mean the response is never closed. Just a wild guess here...
What if there is nothing wrong with the code and the AI "Ayende" is having fun watching the scramble?
No there is definitely a bug with the use of content length. The HTTP content length header refers to the actual body sent over the wire. What this code needs is the actual length of the payload. These can be 2 different numbers. If the web server is compressing the data on the fly, which they all do (I.e. Mod-deflate) the content length will be smaller than the app layer payload. Hence the file is truncated and data lost, as mentioned.
Unless you add an accept encoding header to the request I don't think the server will return compressed pages.
Pete,
Yes, except that we don't care about large file sizes
Jab,
No, that is always a safe thing to do
Jay,
No, the docs explicitly says that this is not required but OK to do
El,
DING DING DING!
You got it!
Pete,
Yes, but .NET automatically send accepts gzip, deflate
Hi,
I think the problem is that once 'read' is assigned to it never becomes 0 and the while loop is not exited. You should 'read = 0;' right after 'do {...'
Angel
So what is the answer in the actual code? Do you use response.Headers[HttpResponseHeader.ContentLength] or is that actually the same as response.ContentLength?
Comment preview