Multiplay Labs

tech hits and tips from Multiplay

ruby very slow http downloads and high cpu usage

without comments

I’ve been doing some work to which required the dowload of files using ruby and during tests I was getting very slow download times coupled with very high cpu load from the ruby process.

The code in use was very simple:

require 'open-uri'
open( [http uri] )

I did some benchmarking and the download of a 157MB file from the local machine was taking over 20 seconds and using 100% where as wget for the same file only took 0.7 seconds.

Digging some more and profiling the code with RubyProf revealed that during the execution of the download over 11,000 threads where being created. This I tracked down to the net/protocol module and the Net::BufferIO::rbuf_fill method which is using a timeout block to wrap the @io.sysread(1024) call. This is clearly an extremely bad way to do this at it creates a new thread for every read call to monitor for timeout and was totally crippling the performance.

After playing with several changes to the core net/protocol.rb including:

  • Replacing timeout( @read_timeout ) { .. } with
  • Increasing the read requests to 1Mb
  • Garding against the use of a str.split!

I managed to get ruby to perform very similar to wget and download my test file in 0.8 seconds.

For those using ruby to do http requests of any significant size I would hence strongly suggest applying the patch I’ve uploaded to ruby bug tracker here: Ruby very slow http downloads bug

Written by Dilbert

June 27th, 2009 at 4:24 pm

Posted in Code,Hackery

Leave a Reply

You must be logged in to post a comment.