Archive for December 20th, 2010
For a while now I’ve been trying to deal with a bizarre issue with rails where uploads would, apparently at random, cause a server error. The error would report a ‘bad content’ error and often the upload would nearly complete as far as the user was concerned, only to fall at the last hurdle, bailing out at 99%. This happened pretty infrequently and I could never seem to replicate it but I eventually found an article that explained the problem we seemed to be having.
In the above article, it’s explained that there is an odd bug in a specific version of rack that only gets triggered in very specific cases where multipart form boundaries happen to align with multipart content parser’s buffer size.
A fix can be found in later versions, as show in this commit: Fixed multipart parameter parsing for when a field’s body ends at the same time as a chunk (i.e. we’ve reached EOL and buffer is empty)
Once you read the code and understand the context, it becomes clear from the commit why the error was occurring. Essentially the parser was thinking it had reached the end of the document if it just so happened that the data so far was perfectly aligned in the buffer instead of also assuring that it wasn’t just at the end of a field, rather than the entire document.
It’s an incredibly annoying and hard to diagnose error, especially since, from a high level view all the way at the frontend of a rails app, it’s extremely unclear as to what is triggering it. It was causing nginx to spout out 422 errors which lead me down the path of suspecting it was the user’s clients prematurely terminating the upload connection, which wasted some time, but no, eventually it turned out to be this peculiar boundary alignment bug. Keeping any eye out to see whether or not any more issues crop up, but hopefully that has solved it.
On a side note, it seems odd that rack is using such a small buffer size of only 16k. For processing large uploads that seems to put an unnecessarily high amount of I/O load on the server considering the amount of times it has to do reads from files potentially upwards of several gigabytes. Once we’ve seen more evidence that this bad content issue has gone away it might be worth adjusting this value to make it more appropriate for processing large multipart data payloads.