Home - Pascal-J/Jfire GitHub Wiki

Performance wise, Jfire/Arrayfire has some advantages over pure J and one clear disadvantage:

Advantages:
very easy to get data in and out of J.
Round trip overhead is fairly low: matrix multiply of 100x100 array including data round trip has equal speed in J (1 thread) and AFCPU (4 cores). On GPU matrix multiply, the breakeven is around 128x128 array size. But performance ramps up much more quickly than CPU on my devices. On 1024x1024 array, opencl GPU backend is 250x faster than J, and parallel CPU library is 20x faster than J. This is with "low end" 3 year old integrated graphics chip on AMD APU.
Lazy evaluation! If you have J code to run that doesn't depend on any Arrayfire computation, then it can run simultaneously to the Arrayfire computation. For instance I can make 5k-12k matmul calls (of any size) per second to CPU, and 9k-13k matmul calls/sec to GPU.

Disadvantages:
low end GPUs are unlikely to support double precision floating point. Although with float datatype, matmul matches results with J. Somehow double precision also matches (likely due to J's tolerant comparisons).
There is a JIT compilation step that creates a noticeable (close to 1 second) delay on initial load of the library, and on the first use of any function. (on my system)

You can benchmark matrix multiplications of any array sizes and device on loading arrayf_test.ijs. example for CPU and opencl devices:

P =. ('afcpu';0) conew 'afdevice'
G =. ('afopencl';0) conew 'afdevice'
128 512 1024 100  multimatmulsF P,G

matrix mult benchmark