Archive for December, 2010
For a while now I’ve been trying to deal with a bizarre issue with rails where uploads would, apparently at random, cause a server error. The error would report a ‘bad content’ error and often the upload would nearly complete as far as the user was concerned, only to fall at the last hurdle, bailing out at 99%. This happened pretty infrequently and I could never seem to replicate it but I eventually found an article that explained the problem we seemed to be having.
In the above article, it’s explained that there is an odd bug in a specific version of rack that only gets triggered in very specific cases where multipart form boundaries happen to align with multipart content parser’s buffer size.
A fix can be found in later versions, as show in this commit: Fixed multipart parameter parsing for when a field’s body ends at the same time as a chunk (i.e. we’ve reached EOL and buffer is empty)
Once you read the code and understand the context, it becomes clear from the commit why the error was occurring. Essentially the parser was thinking it had reached the end of the document if it just so happened that the data so far was perfectly aligned in the buffer instead of also assuring that it wasn’t just at the end of a field, rather than the entire document.
It’s an incredibly annoying and hard to diagnose error, especially since, from a high level view all the way at the frontend of a rails app, it’s extremely unclear as to what is triggering it. It was causing nginx to spout out 422 errors which lead me down the path of suspecting it was the user’s clients prematurely terminating the upload connection, which wasted some time, but no, eventually it turned out to be this peculiar boundary alignment bug. Keeping any eye out to see whether or not any more issues crop up, but hopefully that has solved it.
On a side note, it seems odd that rack is using such a small buffer size of only 16k. For processing large uploads that seems to put an unnecessarily high amount of I/O load on the server considering the amount of times it has to do reads from files potentially upwards of several gigabytes. Once we’ve seen more evidence that this bad content issue has gone away it might be worth adjusting this value to make it more appropriate for processing large multipart data payloads.
When running mysql client to connect to a database on the local machine using tcp by specifying –port and optionally –host=localhost the client will still use socket connection under unix.
This could potentially be “critical” issue if the host runs multiple instances and the user connects to the wrong instance, as they won’t know this has happened and hence could delete or modify vital information.
For more information see the mysql bug:
MySQL Bug: mysql client connects via socket even when –port and –host is specified
Ok so this was a weird one, we where using unlink in perl and it was returning success yet the file still exists, so what gives?
After much hunting and digging it turned out to be a nice little gotcha about the way unlink works under cygwin and hence is inherited by their perl implementation.
The basic crux is that in order to be a unix link as possible cygwin makes use of the delete on close function within windows to attempt to delete files that are shared locked by other applications. The result is that even though the file is reported as deleted it is only marked as pending deletion and will only actually be deleted when the last the shared lock counter reaches zero.
Unfortunately its easy for this to fail if the other locking application makes a change to the file the delete request will be silently discarded.
So there you have it, cygwin + perl + unlink = files delete some times, you have been warned.
For more info on the internals see the following: Re: Inconsistent behaviour when removing files on Cygwin