orionsnow 发表于 2010-11-27 14:05

关于R 软件的运行速度

有个商业软件,把R 包装了一下,然后加入了一些并行计算的元素。

号称自己比基本的 R 要快很多(特别是在用了多个cpu 的情况下)

还给了一个速度评测, 叫R benchmark 2.5

http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php

结果我下载了里边的code 自己测试了一下,发现这个64位软件的速度和32 位 的R.2.8.2 速度差不多, 如果和2.11 比较可能是快,运算还没有完。


测试系统intel 4 core i5M540 @ 2.53 ,4G memory, win7-64

R2.8.1
R2.11.1
revolutionR4.0



在R2.8.1 下边

R Benchmark 2.5
   ===============
Number of times each test is run__________________________:3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):0.806666666661234
2400x2400 normal distributed random matrix ^1000____ (sec):0.889999999999418
Sorting of 7,000,000 random values__________________ (sec):1.31000000000252
2800x2800 cross-product matrix (b = a' * a)_________ (sec):37.1633333333302
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):17.5633333333341
                      --------------------------------------------
               Trimmed geom. mean (2 extremes eliminated):2.73583193927614

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):1.13333333333624
Eigenvalues of a 640x640 random matrix______________ (sec):1.76999999999922
Determinant of a 2500x2500 random matrix____________ (sec):7.88666666667268
Cholesky decomposition of a 3000x3000 matrix________ (sec):5.81333333332926
Inverse of a 1600x1600 random matrix________________ (sec):6.00000000000485
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):3.95230010779117

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):0.966666666664726
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):0.553333333339348
Grand common divisors of 400,000 pairs (recursion)__ (sec):1.38333333333139
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):1.07000000000213
Escoufier's method on a 45x45 matrix (mixed)________ (sec):0.669999999998254
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):0.884935819802985


Total time for all 15 tests_________________________ (sec):84.9800000000056
Overall mean (sum of I, II and III trimmed means/3)_ (sec):2.12300182763157
                      --- End of test ---

在revolution下边



   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):1.35666666666665
2400x2400 normal distributed random matrix ^1000____ (sec):5.13999999999995
Sorting of 7,000,000 random values__________________ (sec):1.17999999999991
2800x2800 cross-product matrix (b = a' * a)_________ (sec):3.07000000000001
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):1.67333333333325
                      --------------------------------------------
               Trimmed geom. mean (2 extremes eliminated):1.91013764671915

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):1.13666666666662
Eigenvalues of a 640x640 random matrix______________ (sec):1.29666666666662
Determinant of a 2500x2500 random matrix____________ (sec):1.54333333333337
Cholesky decomposition of a 3000x3000 matrix________ (sec):1.40000000000002
Inverse of a 1600x1600 random matrix________________ (sec):1.44999999999997
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):1.38072798762309

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):6
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):0.879999999999882
Grand common divisors of 400,000 pairs (recursion)__ (sec):1.7366666666666
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):1.70000000000005
Escoufier's method on a 45x45 matrix (mixed)________ (sec):1.20000000000005
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):1.52445828649778


Total time for all 15 tests_________________________ (sec):30.7633333333329
Overall mean (sum of I, II and III trimmed means/3)_ (sec):1.59011833720329
                      --- End of test ---



奇怪的是做随机矩阵测试的时候

set.seed (1)
m <- 10000
n <-5000
A <- matrix (runif (m*n),m,n)
system.time (B <- crossprod(A))

R 2.8.2 用了280 多秒,第二次实验还死机

revolution只用了32 秒

下边还给了一段解释

The table above reflects the elapsed time for this and the other benchmark tests. For the Revolution R benchmarks, the computations were limited to 1 core and 4 cores by calling setMKLthreads(1) and setMKLthreads(4) respectively. Note that Revolution R performs very well even in single-threaded tests: this is a result of the optimized algorithms in the Intel MKL library linked to Revolution R. The slightly greater than linear speedup may be due to the greater total cache available to all CPU cores, or simply better OS CPU scheduling--no attempt was made to pin execution threads to physical cores. Consult Revolution R's documentation to learn how to run benchmarks that use fewer cores than your hardware offers.

zjs571 发表于 2010-11-28 00:31

最后的随机矩阵测试,我也运行了。10000x5000确实要花不少时间,没舍得等。我只是试了下1000x5000的,在R,Matlab,c#都运行了,大概是23s,20s,18s。如果是c++应该还会快个20%。但是这些都没有并行。我所知道的,可以运用c# .net 4 进行循环的并行运算(Parallel Class: Provides support for parallel loops and regions.)。速度会快很多,原理应该和R revolution一样。

orionsnow 发表于 2010-11-28 03:05

set.seed (1)
m <- 1000
n <-5000
A <- matrix (runif (m*n),m,n)
system.time (B <- crossprod(A))

R 2.8.128 秒
R 2.12.020 秒
revolution R 4秒

看来cpu 多了用revolutionR 做自动优化挺好的。

该用户名不存在 发表于 2010-11-28 16:44

好复杂的问题,飘过。。。

orionsnow 发表于 2010-12-14 21:28

本帖最后由 orionsnow 于 2010-12-14 21:01 编辑

set.seed (1)
m <- 1000
n <-1000
A <- matrix (runif (m*n),m,n)
system.time (B <- solve(A))

今天又测试了一个矩阵求逆,

1000dim
R 2.02 sec,Rvolution R 0.55 sec

10000dim
Rkann Vektor der Größe 762.9 MB nicht allozieren
Rvolution R 348 sec

再高维的没有尝试了,内存不太够了。

zidragon 发表于 2010-12-16 22:35

R 是基于Eclipse开发的吧,基础速度决定于平台,其他的在于算法。

orionsnow 发表于 2011-1-8 23:27

R 是基于Eclipse开发的吧,基础速度决定于平台,其他的在于算法。
zidragon 发表于 2010-12-16 21:35 http://www.dolc.de/forum/images/common/back.gif

我一直以为R 是c 或者 fortran 写的,因为在linux 下边安装R 的时候要这两个程序的编译器。

“基础速度决定于平台,其他的在于算法” 是, 你说的没错。

我做这个测试主要是测benchmark。 这个在实际工作的时候比较受重视,还有就是

容易说服其他领域的专家。

orionsnow 发表于 2011-6-21 14:55

最后的随机矩阵测试,我也运行了。10000x5000确实要花不少时间,没舍得等。我只是试了下1000x5000的,在R,M ...
zjs571 发表于 2010-11-28 00:31 http://www.dolc.de/forum/images/common/back.gif


最近R 好像更新了,可以调用c 语言里头一个支持多cpu 的开源包了,这样速度就几乎和c 下边差不多了。

树獭宝宝 发表于 2011-6-25 22:04

还可以吧

orionsnow 发表于 2011-12-7 10:59

R is 2.14 now

havn't tested yet
页: [1]
查看完整版本: 关于R 软件的运行速度