What is wrong with this code?

time to read 2 min | 285 words

There is a huge bug in this code, resulting in data corruption. Can you spot what it is?

public byte[] DownloadBytes(string url,
                            ICredentials credentials)
{
    WebRequest request = Util.SetupWebRequest(WebRequest.Create(url), credentials);

    using (WebResponse response = request.GetResponse())
    {
        using (Stream stream = GetResponseStream(response))
        {
            byte[] buffer = new byte[response.ContentLength];
            int current = 0;
            int read;
            do
            {
                read = stream.Read(buffer, current, buffer.Length - current);
                current += read;
            } while (read != 0);
            return buffer;
        }
    }
}

Hint, this has nothing to do with exception handling. Assumes that nothing goes wrong.

Tweet Share Share 46 comments

Tags:

Challanges

Comments

01 Apr 2008
19:15 PM

Jakob Andersen

Perhabs that response.ContentLength doesn't include headers but your Stream does?

01 Apr 2008
19:23 PM

Ayende Rahien

No, the steam doesn't include headers.

01 Apr 2008
19:28 PM

Evgeny Shapiro

Response.ContentLength could actually return -1 if the Content-Length header is omitted by the server.

Also this code can wrap the content, because the ContentLength property is actually less than the WebResponse actual length. My guess is that ContentLength usually holds buffer size instead of real content length.

01 Apr 2008
19:32 PM

Ayende Rahien

For the purpose of this code, the server always returns the Content-Length header.

And you are right on the money in terms of the real issue, but not on the cause.

01 Apr 2008
19:34 PM

Matt

It tries to read the entire contents twice?

01 Apr 2008
19:38 PM

Ayende Rahien

No.

Notice the parameters to Read() change

01 Apr 2008
19:42 PM

Joseph Gutierrez

If you read in larger than int size?

01 Apr 2008
19:43 PM

El Guapo

Content-Length is the length in characters, not bytes. So if the stream is UTF8 (for example) the byte length is actually larger than the content length.

01 Apr 2008
19:45 PM

Ayende Rahien

Not that.

That would be a problem, but this code is meant to deal with files < 10MB and commonly ~10Kb

01 Apr 2008
19:46 PM

Ayende Rahien

El,

Not to my understanding:

The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs,

01 Apr 2008
19:51 PM

Jakob Andersen

Hmmm... now im just guessing but could it be the authentication of your credentials that mess up the stream you get back?

01 Apr 2008
19:56 PM

Getting back something other than 200 OK? Like continue or redirect, etc?

01 Apr 2008
19:57 PM

Ayende Rahien

No, auth is working just fine

01 Apr 2008
19:59 PM

Ayende Rahien

Doesn't care about redirect and stuff.

01 Apr 2008
20:03 PM

El Guapo

I still think you need to use StreamReader with an encoding

http://msdn2.microsoft.com/en-us/library/system.net.httpwebresponse.getresponsestream.aspx

01 Apr 2008
20:06 PM

Shaneo

Just reading my rss quickly, but I love a good challenge.

I see right away there are credentials of some kind being passed to the server, but we're not actually checking the result of that authentication...just assuming we're passing authentication every time and returning our buffer....

01 Apr 2008
20:06 PM

Ayende Rahien

Note that we are returning a byte array, we don't care about encoding

01 Apr 2008
20:09 PM

Ayende Rahien

If the authentication fails, it will throw an exception.

We don't have much to do about it

01 Apr 2008
20:16 PM

Derick Bailey

read = stream.Read(buffer, current, buffer.Length - current);

you're reading from 0 to length - 0... shouldn't this be

int readSize = 1024 //pick a number, here.

read = stream.Read(buffer, current, readSize);

01 Apr 2008
20:16 PM

Craig

You are attempting to read your entire buffer the first time. Theoretically, (read) should never come back greater than 0. However, if it does, you will be attempting to write outside the bounds of the buffer.

01 Apr 2008
20:18 PM

Erik

Taking a guess here... ContentLength does not include the header info, but that header info is sent to the StreamReader?

01 Apr 2008
20:21 PM

Jakob Andersen

Derick and Craig: You should revisit the documentation on Stream Read it tries to read from current position and "up to" the number of bytes specified in the third argument. It doesn't nessecary return the specified number of bytes specified

01 Apr 2008
20:23 PM

Doug Mayer

stream.Read expects the "current" offset to be zero-based, your line should read:

read = stream.Read(buffer, current - 1, buffer.Length - current);

01 Apr 2008
20:25 PM

Evgeny Shapiro

Large content is actually split into several chunks. According to Chunked transfer encoding each chunk starts with it's own size. The buffer would be of the length of the first chunk?

01 Apr 2008
20:26 PM

Ayende Rahien

Derick

No, that is fine, put this in position 0 and read length - 0 bytes to the buffer

01 Apr 2008
20:27 PM

Ayende Rahien

Craig,

Hm? I am not following your reasoning

01 Apr 2008
20:27 PM

Ayende Rahien

Erik,

No, the stream doesn't include the headers

01 Apr 2008
20:28 PM

Doug Mayer

Well, I think I almost had it... you could get current = -1 then... probably something more like:

read = stream.Read(buffer, Math.Max(0, current -1), buffer.Length - current);

01 Apr 2008
20:30 PM

Ayende Rahien

Evgeny ,

Interesting point, but I don't think so.

It isn't the scenario here, nevertheless

01 Apr 2008
20:31 PM

Ayende Rahien

Doug,

No.

Current starts at 0.

Then we read read(buffer, 0, 1024)

Now, let us say that we only read 10 bytes.

current += read; // current = 10

buffer[0 .. 9] - contains the read bytes.

byte[10] is empty and where we will start reading next time

01 Apr 2008
20:35 PM

Jay

Anything to do with disposing the stream, followed by disposing the response object?

01 Apr 2008
20:36 PM

Pete

response.ContentLength is a long value which is 64 bits. current and read need to be long also.

01 Apr 2008
20:37 PM

Craig Shearer

I think there may be a problem with the termination of the loop - while (read != 0)

What if there are no bytes to be read when you try - you might terminate before you've actually read the complete response.

Shouldn't it be while (current < buffer.Length - 1)

01 Apr 2008
20:44 PM

Craig Shearer

Or maybe not :-) The documentation says the Read method blocks until at least one byte is read, so it would only return 0 at the end of the stream.

01 Apr 2008
20:46 PM

El Guapo

You cannot use the content-length to determine the length to read. For example if the server is compressing the data the length will be incorrect from your perspective.

01 Apr 2008
20:56 PM

Jab

Is it because the return happens before the end of the first using statement? Which would mean the response is never closed. Just a wild guess here...

01 Apr 2008
20:57 PM

Conspiracy Theorist

What if there is nothing wrong with the code and the AI "Ayende" is having fun watching the scramble?

01 Apr 2008
21:02 PM

El Guapo

No there is definitely a bug with the use of content length. The HTTP content length header refers to the actual body sent over the wire. What this code needs is the actual length of the payload. These can be 2 different numbers. If the web server is compressing the data on the fly, which they all do (I.e. Mod-deflate) the content length will be smaller than the app layer payload. Hence the file is truncated and data lost, as mentioned.

01 Apr 2008
21:09 PM

Pete

Unless you add an accept encoding header to the request I don't think the server will return compressed pages.

01 Apr 2008
21:09 PM

Ayende Rahien

Pete,

Yes, except that we don't care about large file sizes

01 Apr 2008
21:11 PM

Ayende Rahien

Jab,

No, that is always a safe thing to do

01 Apr 2008
21:12 PM

Ayende Rahien

Jay,

No, the docs explicitly says that this is not required but OK to do

01 Apr 2008
21:12 PM

Ayende Rahien

El,

DING DING DING!

You got it!

01 Apr 2008
21:14 PM

Ayende Rahien

Pete,

Yes, but .NET automatically send accepts gzip, deflate

03 Apr 2008
07:06 AM

Angel

Hi,

I think the problem is that once 'read' is assigned to it never becomes 0 and the while loop is not exited. You should 'read = 0;' right after 'do {...'

Angel

13 Apr 2008
19:49 PM

Anonymous Coward

So what is the answer in the actual code? Do you use response.Headers[HttpResponseHeader.ContentLength] or is that actually the same as response.ContentLength?

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB