我知道红宝石的“合作”线程使用绿色线程。如何在我的应用程序中创建真正的“操作系统级”线程,以便使用多个cpu内核进行处理?


当前回答

使用drb怎么样?它不是真正的多线程,而是几个进程之间的通信,但你现在可以在1.8中使用它,而且它的阻力相当低。

其他回答

这取决于实现:

MRI没有,YARV更接近。 JRuby和MacRuby有。

Ruby有闭包block, lambdas和Procs。为了充分利用JRuby中的闭包和多核,Java的执行器派上了用场;对于MacRuby,我喜欢GCD的队列。 注意,能够创建真正的“操作系统级”线程并不意味着可以使用多个cpu内核进行并行处理。请看下面的例子。

这是一个简单的Ruby程序的输出,它使用Ruby 2.1.0使用3个线程:

(jalcazar@mac ~)$ ps -M 69877
USER     PID   TT   %CPU STAT PRI     STIME     UTIME COMMAND
jalcazar 69877 s002    0.0 S    31T   0:00.01   0:00.04 /Users/jalcazar/.rvm/rubies/ruby-2.1.0/bin/ruby threads.rb
   69877         0.0 S    31T   0:00.01   0:00.00 
   69877        33.4 S    31T   0:00.01   0:08.73 
   69877        43.1 S    31T   0:00.01   0:08.73 
   69877        22.8 R    31T   0:00.01   0:08.65 

正如您在这里看到的,有四个操作系统线程,但是只有状态为R的线程正在运行。这是由于Ruby线程实现方式的限制。


同样的程序,现在是JRuby。您可以看到三个状态为R的线程,这意味着它们正在并行运行。

(jalcazar@mac ~)$ ps -M 72286
USER     PID   TT   %CPU STAT PRI     STIME     UTIME COMMAND
jalcazar 72286 s002    0.0 S    31T   0:00.01   0:00.01 /Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/bin/java -Djdk.home= -Djruby.home=/Users/jalcazar/.rvm/rubies/jruby-1.7.10 -Djruby.script=jruby -Djruby.shell=/bin/sh -Djffi.boot.library.path=/Users/jalcazar/.rvm/rubies/jruby-1.7.10/lib/jni:/Users/jalcazar/.rvm/rubies/jruby-1.7.10/lib/jni/Darwin -Xss2048k -Dsun.java.command=org.jruby.Main -cp  -Xbootclasspath/a:/Users/jalcazar/.rvm/rubies/jruby-1.7.10/lib/jruby.jar -Xmx1924M -XX:PermSize=992m -Dfile.encoding=UTF-8 org/jruby/Main threads.rb
   72286         0.0 S    31T   0:00.00   0:00.00 
   72286         0.0 S    33T   0:00.00   0:00.00 
   72286         0.0 S    31T   0:00.09   0:02.34 
   72286         7.9 S    31T   0:00.15   0:04.63 
   72286         0.0 S    31T   0:00.00   0:00.00 
   72286         0.0 S    31T   0:00.00   0:00.00 
   72286         0.0 S    31T   0:00.00   0:00.00 
   72286         0.0 S    31T   0:00.04   0:01.68 
   72286         0.0 S    31T   0:00.03   0:01.54 
   72286         0.0 S    31T   0:00.00   0:00.00 
   72286         0.0 S    31T   0:00.01   0:00.01 
   72286         0.0 S    31T   0:00.00   0:00.01 
   72286         0.0 S    31T   0:00.00   0:00.03 
   72286        74.2 R    31T   0:09.21   0:37.73 
   72286        72.4 R    31T   0:09.24   0:37.71 
   72286        74.7 R    31T   0:09.24   0:37.80 

同样的程序,现在有了MacRuby。还有三个线程并行运行。这是因为MacRuby线程是POSIX线程(真正的“操作系统级”线程),没有GVL

(jalcazar@mac ~)$ ps -M 38293
USER     PID   TT   %CPU STAT PRI     STIME     UTIME COMMAND
jalcazar 38293 s002    0.0 R     0T   0:00.02   0:00.10 /Users/jalcazar/.rvm/rubies/macruby-0.12/usr/bin/macruby threads.rb
   38293         0.0 S    33T   0:00.00   0:00.00 
   38293       100.0 R    31T   0:00.04   0:21.92 
   38293       100.0 R    31T   0:00.04   0:21.95 
   38293       100.0 R    31T   0:00.04   0:21.99 

同样的程序,但是现在用的是老式的核磁共振成像。由于这个实现使用绿色线程,所以只显示一个线程

(jalcazar@mac ~)$ ps -M 70032
USER     PID   TT   %CPU STAT PRI     STIME     UTIME COMMAND
jalcazar 70032 s002  100.0 R    31T   0:00.08   0:26.62 /Users/jalcazar/.rvm/rubies/ruby-1.8.7-p374/bin/ruby threads.rb

如果您对Ruby多线程感兴趣,那么您可能会对我的使用fork处理程序调试并行程序的报告感兴趣。 要更全面地了解Ruby内部结构,《Ruby Under a Microscope》是一本不错的读物。 此外,Omniref中的Ruby线程和C中的全局解释器锁在源代码中解释了为什么Ruby线程不能并行运行。

我将让“系统监视器”来回答这个问题。在这两种情况下,我用8个Ruby线程在i7(4个超线程核)机器上运行相同的代码(下面,计算质数)……第一次运行是:

Jruby 1.5.6 (ruby 1.8.7 patchlevel 249) (2014-02-03 6586) (OpenJDK 64-Bit Server VM 1.7.0_75) [amd64-java]

第二种是:

Ruby 2.2.1.p95 (2014-05-08) [x86_64-linux-gnu]

有趣的是,对于JRuby线程,CPU更高,但是对于解释后的Ruby,完成的时间略短。从图中很难看出这一点,但是第二次(解释的Ruby)运行使用了大约1/2的cpu(没有超线程?)

def eratosthenes(n)
  nums = [nil, nil, *2..n]
  (2..Math.sqrt(n)).each do |i|
    (i**2..n).step(i){|m| nums[m] = nil}  if nums[i]
  end
  nums.compact
end

MAX_PRIME=10000000
THREADS=8
threads = []

1.upto(THREADS) do |num|
  puts "Starting thread #{num}"
  threads[num]=Thread.new { eratosthenes MAX_PRIME }
end

1.upto(THREADS) do |num|
    threads[num].join
end

因为不能编辑那个答案,所以在这里添加一个新的回答。

更新(2017-05-08)

这篇文章很旧了,信息也跟不上时代 (2017)胎面,以下是一些补充:

Opal is a Ruby to JavaScript source-to-source compiler. It also has an implementation of the Ruby corelib, It current very active develompent, and exist a great deal of (frontend) framework worked on it. and production ready. Because base on javascript, it not support parallel threads. truffleruby is a high performance implementation of the Ruby programming language. Built on the GraalVM by Oracle Labs,TruffleRuby is a fork of JRuby, combining it with code from the Rubinius project, and also containing code from the standard implementation of Ruby, MRI, still live development, not production ready. This version ruby seem like born for performance, I don't know if support parallel threads, but I think it should.

更新自Jörg 2011年9月的评论

你好像搞混了两个完全不同的东西 Ruby编程语言和特定的线程模型之一 Ruby编程语言的具体实现。在那里 目前大约有11种不同的Ruby实现 编程语言,具有非常不同和独特的线程 模型。

(不幸的是,这11个实现中只有两个是真正的 准备投入生产使用,但要到年底这个数字 (更新:现在是5个:MRI, JRuby, YARV (Ruby 1.9的解释器),Rubinius和IronRuby)。

The first implementation doesn't actually have a name, which makes it quite awkward to refer to it and is really annoying and confusing. It is most often referred to as "Ruby", which is even more annoying and confusing than having no name, because it leads to endless confusion between the features of the Ruby Programming Language and a particular Ruby Implementation. It is also sometimes called "MRI" (for "Matz's Ruby Implementation"), CRuby or MatzRuby. MRI implements Ruby Threads as Green Threads within its interpreter. Unfortunately, it doesn't allow those threads to be scheduled in parallel, they can only run one thread at a time. However, any number of C Threads (POSIX Threads etc.) can run in parallel to the Ruby Thread, so external C Libraries, or MRI C Extensions that create threads of their own can still run in parallel. The second implementation is YARV (short for "Yet Another Ruby VM"). YARV implements Ruby Threads as POSIX or Windows NT Threads, however, it uses a Global Interpreter Lock (GIL) to ensure that only one Ruby Thread can actually be scheduled at any one time. Like MRI, C Threads can actually run parallel to Ruby Threads. In the future, it is possible, that the GIL might get broken down into more fine-grained locks, thus allowing more and more code to actually run in parallel, but that's so far away, it is not even planned yet. JRuby implements Ruby Threads as Native Threads, where "Native Threads" in case of the JVM obviously means "JVM Threads". JRuby imposes no additional locking on them. So, whether those threads can actually run in parallel depends on the JVM: some JVMs implement JVM Threads as OS Threads and some as Green Threads. (The mainstream JVMs from Sun/Oracle use exclusively OS threads since JDK 1.3) XRuby also implements Ruby Threads as JVM Threads. Update: XRuby is dead. IronRuby implements Ruby Threads as Native Threads, where "Native Threads" in case of the CLR obviously means "CLR Threads". IronRuby imposes no additional locking on them, so, they should run in parallel, as long as your CLR supports that. Ruby.NET also implements Ruby Threads as CLR Threads. Update: Ruby.NET is dead. Rubinius implements Ruby Threads as Green Threads within its Virtual Machine. More precisely: the Rubinius VM exports a very lightweight, very flexible concurrency/parallelism/non-local control-flow construct, called a "Task", and all other concurrency constructs (Threads in this discussion, but also Continuations, Actors and other stuff) are implemented in pure Ruby, using Tasks. Rubinius can not (currently) schedule Threads in parallel, however, adding that isn't too much of a problem: Rubinius can already run several VM instances in several POSIX Threads in parallel, within one Rubinius process. Since Threads are actually implemented in Ruby, they can, like any other Ruby object, be serialized and sent to a different VM in a different POSIX Thread. (That's the same model the BEAM Erlang VM uses for SMP concurrency. It is already implemented for Rubinius Actors.) Update: The information about Rubinius in this answer is about the Shotgun VM, which doesn't exist anymore. The "new" C++ VM does not use green threads scheduled across multiple VMs (i.e. Erlang/BEAM style), it uses a more traditional single VM with multiple native OS threads model, just like the one employed by, say, the CLR, Mono, and pretty much every JVM. MacRuby started out as a port of YARV on top of the Objective-C Runtime and CoreFoundation and Cocoa Frameworks. It has now significantly diverged from YARV, but AFAIK it currently still shares the same Threading Model with YARV. Update: MacRuby depends on apples garbage collector which is declared deprecated and will be removed in later versions of MacOSX, MacRuby is undead. Cardinal is a Ruby Implementation for the Parrot Virtual Machine. It doesn't implement threads yet, however, when it does, it will probably implement them as Parrot Threads. Update: Cardinal seems very inactive/dead. MagLev is a Ruby Implementation for the GemStone/S Smalltalk VM. I have no information what threading model GemStone/S uses, what threading model MagLev uses or even if threads are even implemented yet (probably not). HotRuby is not a full Ruby Implementation of its own. It is an implementation of a YARV bytecode VM in JavaScript. HotRuby doesn't support threads (yet?) and when it does, they won't be able to run in parallel, because JavaScript has no support for true parallelism. There is an ActionScript version of HotRuby, however, and ActionScript might actually support parallelism. Update: HotRuby is dead.

不幸的是,这11个Ruby实现中只有两个是正确的 实际生产就绪:MRI和JRuby。

因此,如果您想要真正的并行线程,JRuby目前是您的首选 唯一的选择——并不是说这是一个坏的选择:JRuby实际上更快 比核磁共振成像更稳定。

否则,“经典的”Ruby解决方案就是使用进程 而不是并行的线程。Ruby核心库 包含Process模块和Process.fork 方法,这使得分叉另一个Ruby非常容易 的过程。此外,Ruby标准库包含 分布式Ruby (dRuby / dRb)库,它允许Ruby 代码应该简单地分布在多个进程中,而不是 只能在同一台机器上,但也可以跨网络。

如果您确实需要Ruby在产品级系统中的并行性(您不能使用beta),那么进程可能是更好的选择。 但是,首先在JRuby下尝试线程绝对是值得的。

另外,如果您对Ruby下线程的未来感兴趣,您可能会发现本文很有用。