A comparison JRuby vs. Ruby MRI with Sequel
Published: April 25, 2010 (almost 9 years ago)
Updated: almost 4 years ago
For the most part, I stick with Apache 2, Phusion Passenger, and Ruby MRI as my deployment stack. Even so, I regularly foray into other territories to see what’s going on in the wide world of Ruby. This weekend, I turned my eyes to JRuby again for the first time in almost 18 months. I wanted to explore how what JRuby would do performance-wise with a database intensive application.
I have been doing some data loading from one database into another using Pentaho’s Kettle and that got me curious about JRuby since Pentaho’s toolset is built upon Java. Quite simply, my work with Kettle got me wondering, “why not JRuby?” I’ve been hearing some great things about JRuby and how it greatly leverages the Java stack to bring you all manner of improvements over Ruby MRI, notably scalability on any one of the many Java deployment container servers and native-system threading capabilities.
I didn’t pull down the latest and greatest JRuby libraries to my mac. I simply got what I could get via macports:
sudo port install apache-ant sudo port install jruby
I don’t know if the apache-ant port was actually needed or not, but I’d read that JRuby depended on ant to compile its components, so, fast as I could type the two port commands, and watch the terminal session actually install both, I was done. The above got me jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2010-04-22 6586) (Java HotSpot(TM) Client VM 1.5.0_22) [i386-java] and I was ready to play.
The JDBC Drivers
What I was really curious to find out is whether JRuby and sequel was all that hard to get set up and working with Sql Server 2000 MSDE and MySQL. Both proved fairly simple to get working, although I had to patch Sequel 3.10.0 to get MySQL working with JDBC.
Microsoft SQL Server JDBC Driver
I was impressed with the Microsoft SQL Server JDBC Driver 3.0. It worked as advertised and was pretty straightforward to install, although I had to Google a bit to find out how since wading through the help docs didn’t help me. I found out that simply putting the jar file in JRuby’s lib folder was all that was needed. If you use port to install JRuby as I did above, then that folder on your mac is:
The latest GA version of the SQL Server JDBC 3.0 driver at the time of this writing was version 3.0.1301.101.
MySQL JDBC Driver
MySQL’s JDBC driver was equally easy to download and copy the jar to the above referenced JRuby lib folder. The latest GA driver version at the time of this writing was 5.1.12. One small change I had to make to Sequel 3.10.0 was to add the Statement.RETURN_GENERATED_KEYS option to the executeUpdate call in Sequel’s adaptors/jdbc.rb. The following thread on sequel-talk covers this.
To find out how JRuby and Ruby MRI stacked up, I wrote a script, available on github.com, that generates 20,000 rows of fake data that might be typical of what you’d find on an invoice/order table in an e-commerce site. I then clone this data to another table. Both of these actions are fairly heavy I/O operations and the JRuby page on benchmarking warns this will be a bit slower than MRI. The test results tend to bear this out. The real-world scenario where I am doing this sort of cloning is copy and consolidate data from various databases into a data-warehouse setting from production transactional systems.
Running the script in Ruby MRI and in JRuby, I found JRuby to be generally slower than Ruby MRI but still roughly on par when subjected to the usual single threaded approach to writing Ruby code that most developers use. However, JRuby definitely showed some marked improvements when you took the time to craft a threaded approach to the problem while Ruby MRI took a performance hit with a threaded approach. Unfortunately, boosting the thread count from 4 to 8 did almost nothing, so there’s some practical level to how many threads can be utilized in a scenario like this. I did notice during the test runs that my CPU and disk and network I/O never quite maxed out although the JRuby in server mode seemed to to push things the hardest with CPU hovering at about 146% (200% max given core 2 duo Intel).
None of these numbers inspire me to use Ruby or JRuby for any heavy lifting any time soon. Pentaho’s Kettle may sway me to use a Java solution as it can scale a bit further with appropriate resources than my current solution of Delphi compiled executable running on Windows XP. The Delphi-based solution can churn through about 2 million rows of data in 15 minutes. These solutions in Ruby, by comparison would take roughly 8 to 14 hours to churn through the same 2 million rows of data.
|Test Name||JRuby Client||JRuby Server||Ruby MRI|
|Create 20k Records||111.085||86.989||105.050|
|Clone with 1 Thread||63.268||55.652||47.930|
|Clone with 4 Threads||38.000||31.070||57.220|
>> jruby -S test.rb user system total real populate sample_data table: creating the sample_data table... populating sample_data with 20000 rows sample data: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:01:50 111.085000 0.000000 111.085000 (111.085000) clone sample_data to cloned_data: creating the cloned_data table... cloning 20000 records for sample_data sample_data: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:01:03 63.268000 0.000000 63.268000 ( 63.267000) threaded clone: cloning 20000 records for sample_data sample_data: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:00:37 38.000000 0.000000 38.000000 ( 38.000000)
>> jruby -S test.rb --server user system total real populate sample_data table: creating the sample_data table... populating sample_data with 20000 rows sample data: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:01:26 86.989000 0.000000 86.989000 ( 86.989000) clone sample_data to cloned_data: creating the cloned_data table... cloning 20000 records for sample_data sample_data: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:00:55 55.652000 0.000000 55.652000 ( 55.652000) threaded clone: cloning 20000 records for sample_data sample_data: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:00:30 31.070000 0.000000 31.070000 ( 31.070000)
>> ruby test.rb user system total real populate sample_data table: creating the sample_data table... populating sample_data with 20000 rows sample data: 100% |oooooooooooooooooooooooooooooooooooooooo| Time: 00:01:45 85.370000 4.460000 89.830000 (105.049892) clone sample_data to cloned_data: creating the cloned_data table... cloning 20000 records for sample_data sample_data: 100% |oooooooooooooooooooooooooooooooooooooooo| Time: 00:00:47 31.700000 3.500000 35.200000 ( 47.930685) threaded clone: cloning 20000 records for sample_data sample_data: 100% |oooooooooooooooooooooooooooooooooooooooo| Time: 00:00:56 39.450000 3.760000 43.210000 ( 57.226705)
a.k.a. Code Connoisseur
- [email protected]
- ICQ ‐ 25239620
- AIM ‐ mwlang88
- Yahoo! ‐ mwlang88
- Google ‐ mwlang
- Twitter ‐ @mwlang88
EducationBachelor of Science
Information and Computer Science
- On Hiring Good People
- Week Five in the Gym
- The True Power of the Internet
- Rekindling a desire to workout consistently
- I'd Rather Eat my Britches than Do This
- Mold Killer Recipe
- Gonna be Starting Something New
- Pitch Camp, what is it good for?
- Less communication can be more
- Let the Musings Begin
- Working on a Referral Pre-Launch Site
- Making Commitments, Reaching Out
- Preparing for Countdown
- Ground Zero
- A Reflection of the Technologies Built Things With
- Dynamic Routing in Rails Revisited
- Creating Dynamic Routes at runtime in Rails 4
- Adding Google Analytics script to Sprockets
- Gems you should consider for every Rails projects
- Weak Password will get you Hacked!
- Status updating...