benchmarks - Pascal-J/Jfire GitHub Wiki
matrix multiply 1024x1024 comparisons with J, afcpu, opencl GPU, and openclCPU. The cpu and gpu tested are a 2 year old AMD A8-5500
AFCPU: matmulp_base_
1024 1024
timespacex' JR@:((0 0 matmul~)tsfX) AfM matmulp_base_ $ ?. 5$1' NB. float
0.0439747 32512
0.0517751 32512
0.177184 5.03478e7 NB. <-- ROUND TRIP TIME ARRAY CREATE AND FROM AND TO J
timespacex'+/ . *~ matmulp_base_ $ i. 5'
2.06675 3.35581e7 NB. pure J 20x+ slower
AFOPENCL(APU): matmulp_base_
1024 1024
timespacex' JR@:((0 0 matmul~)tsfX) AfM matmulp_base_ $ ?. 5$1' NB. float
9.95201e_5 32512
0.00010016 32512
1.33115 5.03478e7 NB. <-- ROUND TRIP TIME ARRAY CREATE AND FROM AND TO J
timespacex'+/ . *~ matmulp_base_ $ i. 5'
2.07765 3.35581e7
((+/ . *)~ -: [: JR@:(0 0 matmul~)tsfX AfM) matmulp_base_ $ ?. 5$1 NB. TEST MATCHED. timing includes getting back to J
0.469563 3.35863e7 NB. <-- ROUND TRIP TIME FROM AND TO J (normally should be close)
1
AFOPENCL(GPU): matmulp_base_
1024 1024
timespacex' JR@:((0 0 matmul~)tsfX) AfM matmulp_base_ $ ?. 5$1' NB. float
7.616e_5 32512 NB. Lazy evaluation means pointer is returned quickly
0.00020416 32512
0.0784996 5.03478e7 NB. <-- ROUND TRIP TIME ARRAY CREATE AND FROM AND TO J
timespacex'+/ . *~ matmulp_base_ $ i. 5'
2.01942 3.35581e7 NB. pure J 250x+ slower
((+/ . *)~ -: [: JR@:(0 0 matmul~)tsfX AfM) matmulp_base_ $ ?. 5$1 NB. TEST MATCHED. timing includes getting back to J
0.0348432 3.35863e7
1