HomeRamblings  ⁄  General

A comparison JRuby vs. Ruby MRI with Sequel

Published: April 25, 2010 (over 7 years ago)
Updated: over 2 years ago

For the most part, I stick with Apache 2, Phusion Passenger, and Ruby MRI as my deployment stack. Even so, I regularly foray into other territories to see what’s going on in the wide world of Ruby. This weekend, I turned my eyes to JRuby again for the first time in almost 18 months. I wanted to explore how what JRuby would do performance-wise with a database intensive application.

I have been doing some data loading from one database into another using Pentaho’s Kettle and that got me curious about JRuby since Pentaho’s toolset is built upon Java. Quite simply, my work with Kettle got me wondering, “why not JRuby?” I’ve been hearing some great things about JRuby and how it greatly leverages the Java stack to bring you all manner of improvements over Ruby MRI, notably scalability on any one of the many Java deployment container servers and native-system threading capabilities.

I didn’t pull down the latest and greatest JRuby libraries to my mac. I simply got what I could get via macports:

sudo port install apache-ant
sudo port install jruby

I don’t know if the apache-ant port was actually needed or not, but I’d read that JRuby depended on ant to compile its components, so, fast as I could type the two port commands, and watch the terminal session actually install both, I was done. The above got me jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2010-04-22 6586) (Java HotSpot(TM) Client VM 1.5.0_22) [i386-java] and I was ready to play.

The JDBC Drivers

What I was really curious to find out is whether JRuby and sequel was all that hard to get set up and working with Sql Server 2000 MSDE and MySQL. Both proved fairly simple to get working, although I had to patch Sequel 3.10.0 to get MySQL working with JDBC.

Microsoft SQL Server JDBC Driver

I was impressed with the Microsoft SQL Server JDBC Driver 3.0. It worked as advertised and was pretty straightforward to install, although I had to Google a bit to find out how since wading through the help docs didn’t help me. I found out that simply putting the jar file in JRuby’s lib folder was all that was needed. If you use port to install JRuby as I did above, then that folder on your mac is:

/opt/local/share/java/jruby/lib

The latest GA version of the SQL Server JDBC 3.0 driver at the time of this writing was version 3.0.1301.101.

MySQL JDBC Driver

MySQL’s JDBC driver was equally easy to download and copy the jar to the above referenced JRuby lib folder. The latest GA driver version at the time of this writing was 5.1.12. One small change I had to make to Sequel 3.10.0 was to add the Statement.RETURN_GENERATED_KEYS option to the executeUpdate call in Sequel’s adaptors/jdbc.rb. The following thread on sequel-talk covers this.

The Script

To find out how JRuby and Ruby MRI stacked up, I wrote a script, available on github.com, that generates 20,000 rows of fake data that might be typical of what you’d find on an invoice/order table in an e-commerce site. I then clone this data to another table. Both of these actions are fairly heavy I/O operations and the JRuby page on benchmarking warns this will be a bit slower than MRI. The test results tend to bear this out. The real-world scenario where I am doing this sort of cloning is copy and consolidate data from various databases into a data-warehouse setting from production transactional systems.

The Results

Running the script in Ruby MRI and in JRuby, I found JRuby to be generally slower than Ruby MRI but still roughly on par when subjected to the usual single threaded approach to writing Ruby code that most developers use. However, JRuby definitely showed some marked improvements when you took the time to craft a threaded approach to the problem while Ruby MRI took a performance hit with a threaded approach. Unfortunately, boosting the thread count from 4 to 8 did almost nothing, so there’s some practical level to how many threads can be utilized in a scenario like this. I did notice during the test runs that my CPU and disk and network I/O never quite maxed out although the JRuby in server mode seemed to to push things the hardest with CPU hovering at about 146% (200% max given core 2 duo Intel).

None of these numbers inspire me to use Ruby or JRuby for any heavy lifting any time soon. Pentaho’s Kettle may sway me to use a Java solution as it can scale a bit further with appropriate resources than my current solution of Delphi compiled executable running on Windows XP. The Delphi-based solution can churn through about 2 million rows of data in 15 minutes. These solutions in Ruby, by comparison would take roughly 8 to 14 hours to churn through the same 2 million rows of data.

Test Name JRuby Client JRuby Server Ruby MRI
Create 20k Records 111.085 86.989 105.050
Clone with 1 Thread 63.268 55.652 47.930
Clone with 4 Threads 38.000 31.070 57.220
>> jruby -S test.rb 
      user     system      total        real
populate sample_data table: creating the sample_data table...
populating sample_data with 20000 rows
sample data:   100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:01:50
111.085000   0.000000 111.085000 (111.085000)

clone sample_data to cloned_data: creating the cloned_data table...
cloning 20000 records for sample_data
sample_data:   100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:01:03
 63.268000   0.000000  63.268000 ( 63.267000)

threaded clone: cloning 20000 records for sample_data
sample_data:   100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:00:37
 38.000000   0.000000  38.000000 ( 38.000000)
>> jruby -S test.rb --server
      user     system      total        real
populate sample_data table: creating the sample_data table...
populating sample_data with 20000 rows
sample data:   100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:01:26
 86.989000   0.000000  86.989000 ( 86.989000)

clone sample_data to cloned_data: creating the cloned_data table...
cloning 20000 records for sample_data
sample_data:   100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:00:55
 55.652000   0.000000  55.652000 ( 55.652000)

threaded clone: cloning 20000 records for sample_data
sample_data:   100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:00:30
 31.070000   0.000000  31.070000 ( 31.070000)
>> ruby test.rb 
      user     system      total        real
populate sample_data table: creating the sample_data table...
populating sample_data with 20000 rows
sample data:   100% |oooooooooooooooooooooooooooooooooooooooo| Time: 00:01:45
 85.370000   4.460000  89.830000 (105.049892)

clone sample_data to cloned_data: creating the cloned_data table...
cloning 20000 records for sample_data
sample_data:   100% |oooooooooooooooooooooooooooooooooooooooo| Time: 00:00:47
 31.700000   3.500000  35.200000 ( 47.930685)

threaded clone: cloning 20000 records for sample_data
sample_data:   100% |oooooooooooooooooooooooooooooooooooooooo| Time: 00:00:56
 39.450000   3.760000  43.210000 ( 57.226705)
comments powered by Disqus