benchmarks - Pascal-J/Jfire GitHub Wiki

matrix multiply 1024x1024 comparisons with J, afcpu, opencl GPU, and openclCPU. The cpu and gpu tested are a 2 year old AMD A8-5500

  AFCPU: matmulp_base_                                                                                                          
1024 1024                                                                                                                
  timespacex' JR@:((0 0 matmul~)tsfX) AfM matmulp_base_ $ ?. 5$1' NB. float                                              
0.0439747 32512                                                                                                          
0.0517751 32512                                                                                                          
0.177184 5.03478e7                             NB. <-- ROUND TRIP TIME ARRAY CREATE AND FROM AND TO J                                                                                        
  timespacex'+/ . *~ matmulp_base_ $ i. 5'                                                                               
2.06675 3.35581e7                              NB. pure J 20x+ slower                                                                          

  AFOPENCL(APU): matmulp_base_                                                                                                          
1024 1024                                                                                                                
  timespacex' JR@:((0 0 matmul~)tsfX) AfM matmulp_base_ $ ?. 5$1' NB. float                                              
9.95201e_5 32512                                                                                                         
0.00010016 32512                                                                                                         
1.33115 5.03478e7                              NB. <-- ROUND TRIP TIME ARRAY CREATE AND FROM AND TO J                                                                           
  timespacex'+/ . *~ matmulp_base_ $ i. 5'                                                                               
2.07765 3.35581e7                                                                                                        
  ((+/ . *)~ -: [: JR@:(0 0 matmul~)tsfX AfM) matmulp_base_ $ ?. 5$1  NB. TEST MATCHED. timing includes getting back to J
0.469563 3.35863e7                          NB. <-- ROUND TRIP TIME FROM AND TO J (normally should be close)                                                                         
1                                                                                                                        

 AFOPENCL(GPU): matmulp_base_                                                                                                          
1024 1024                                                                                                                
  timespacex' JR@:((0 0 matmul~)tsfX) AfM matmulp_base_ $ ?. 5$1' NB. float                                              
7.616e_5 32512                                 NB. Lazy evaluation means pointer is returned quickly                                                                           
0.00020416 32512                                                                                                         
0.0784996 5.03478e7                            NB. <-- ROUND TRIP TIME ARRAY CREATE AND FROM AND TO J                                                                            
  timespacex'+/ . *~ matmulp_base_ $ i. 5'                                                                               
2.01942 3.35581e7                              NB. pure J 250x+ slower                                                                               
  ((+/ . *)~ -: [: JR@:(0 0 matmul~)tsfX AfM) matmulp_base_ $ ?. 5$1  NB. TEST MATCHED. timing includes getting back to J
0.0348432 3.35863e7                                                                                                      
1