mex functions for code speedup - EranOfek/AstroPack GitHub Wiki

mex functions

For code speedup we have developed matlab executable (mex) functions. These functions are written in C++, C and Fortran. These function main goal is to allow for code speedup in critical points.

non-mex changes that requires testing with the pipeline:

celestial.Kepler.kepler_elliptic line 116

Array arithmetic and manipulation

tools.array.mex.countNaN

Counts the number of NaN in a single or double array. For arrays smaller than about a few thousands elements this function is considerably slower compared with sum(isnan(Array),'all'). However, for arrays with >10,000 elements it is about a factor of two faster.

a=rand(1e4,1); a(1000)=NaN;
tic; for I=1:1e5, cc=tools.array.mex.countNaN(a); end, toc
tic; for I=1:1e5, bb=sum(isnan(a)); end, toc

tools.array.mex.countVal

Counts the number of elements equal to some value in a single or double array. For arrays smaller than about a few thousands elements this function is considerably slower compared with sum(Array=Val,'all'). However, for arrays with >10,000 elements it is about a factor of two faster.

a=rand(1e4,1); a(1000)=2;
tic; for I=1:1e5, cc=tools.array.mex.countVal(a,2); end, toc
tic; for I=1:1e5, bb=sum(a==2); end, toc

tools.array.sumInRange

tools.array.sumInRange uses tools.array.mex.sumInRange to sum all values overall dimensions in an array that their value is in some range. I.e.,

F=~isnan(A) & A>0.25 & A<0.75; S1=sum(A(F),'all'); N1=sum(F,'all');

For arrays with more than a few thousands elements, this function is x3 times faster than matlab

A=rand(1700,1700);
tic;for I=1:1:100, [S,N]=tools.array.sumInRange(A,0.25,0.75); end,toc
tic; for I=1:1:100, F=~isnan(A) & A>0.25 & A<0.75; S1=sum(A(F),'all'); N1=sum(F,'all'); end,toc
[N1-N]
[S1-S]

tools.array.sum2InRange

tools.array.sum2InRange uses tools.array.mex.sum2InRange to sum the squares of all values overall dimensions in an array that their value is in some range. I.e.,

F=~isnan(A) & A>0.25 & A<0.75; S2=sum(A(F).^2,'all'); N1=sum(F,'all');

For arrays with more than a few thousand elements, this function is x4 times faster than matlab

A=rand(1700,1700);
tic;for I=1:1:100, [S2,S,N]=tools.array.sum2InRange(A,0.25,0.75); end,toc
tic; for I=1:1:100, F=~isnan(A) & A>0.25 & A<0.75; S1=sum(A(F),'all'); N1=sum(F,'all'); end,toc
[N1-N], [S1-S]
tic; for I=1:1:100, F=~isnan(A) & A>0.25 & A<0.75; S2a=sum(A(F).^2,'all'); N1=sum(F,'all'); end,toc
[N1-N], [S2a-S2]

tools.array.mex.squeezeSumCubeMatNorm

This function performs th operator:

squeeze(sum(WInt.*MatXcen,[1 2],'omitnan')).*Norm

For large arrays it is 10% to 30% faster compared with MATLAB:

MatXcen=rand(15,15);  
Norm=ones(1000,1);
WInt=imUtil.kernel2.gauss(ones(1000,1)+randn(1000,1));
tic;for I=1:10000, a=tools.array.mex.squeezeSumCubeMatNorm(WInt, MatXcen, Norm);end, toc
tic;for I=1:10000, b=squeeze(sum(WInt.*MatXcen,[1 2],'omitnan')).*Norm; end, toc        
max(abs(a-b)) 

tools.array.mex.squeezeSumAmultB_Dim12

tools.array.onesExcept

tools.array.onesCondition

tools.array.bitsetFlag

tools.array.bitand_array

tools.array.bitor_array

tools.array.mex.diluteMatrix

Perform:

Out = In(1:StepSize:end).';

About 30% faster compared to matlab.

tools.array.mex.diluteMatrix_MinMax

Perform:

Out=In(1:StepSize:end); Out=Out(Out>Min & Out<Max); 

but about x2 faster then matlab.

Crops and cut subarrays

imUtil.cut.image2cutouts

tools.array.cropMat

Searche and find

tools.find.mfind_bin (binary search on multiple columns)

tools.find.mex.findEqual

find the indices of values in array equal to some value.

For arrays with more than 10,000 elements this function is x2 faster than find. For smaller array it is slower compared with find.

A=rand(1e6,1);
A(1e5)=2;     
tic;for J=1:1:1e3, IM=find(A==2); end, toc
tic;for J=1:1:1e3, IM=tools.find.mex.findEqual(A,2); end, toc

tools.find.mex.findLargerFirst

tools.find.mex.findSmallerFirst

Stat

tools.math.stat.minmax

tools.math.stat.meanStdSum_2Dcols_IgnoreNaN

tools.math.stat.meanStdSum_AllDim_IgnoreNaN

tools.math.stat.median1

tools.math.stat.squeezeStdCube_Dim12.cpp

tools.math.stat.mex.std_madmean_mex

Std using the mean absolute deviation (mad) function (MEX) This is equivalent to:

1.253.*mean(abs(X - mean(X,Dim, 'omitnan')),Dim, 'omitnan');

For large array adnd Dim=3 this function is x4 times faster than matlab (for other options it may be slower!).

Data = single(randn(1700,1700,20));
tic;for i=1:10, [a,b]=tools.math.stat.mex.std_madmean_mex(Data,3,1);end, toc
tic;for i=1:10, a1=tools.math.stat.std_mad(Data,0,3);b1=mean(Data,3);end,toc
max(abs(a1-a),[],'all')
max(abs(b1-b),[],'all')

fft related

tools.math.fft.mex.unwrap_mex

Phase unwrap (lime matlab unwarp), but x20 times faster.

a=tools.math.fft.mex.unwrap_mex(R);
R=rand(1e3,1e3).*2.*pi;
tic; for I=1:100, a=tools.math.fft.mex.unwrap_mex(R);end, toc
tic; for I=1:100, b=unwrap(R); end, toc
max(abs(a-b),[],'all')

Strings

tools.string.mex.str2uint64

Cells

tools.cell.isempty_cell

tools.cell.isnan_cell

Celestial coordinates

celestial.coo.mex.haversine_ScalarArray

celestial.coo.mex.haversineApprox_ScalarArray

Approximate sphereical_distance between scalar and array. The accuracy is excellent for small angles, and the output is monotonic with distance so safe to use for cases when angular distance smaller than 1000" are needed (e.g., source matching). The error is about 1 mas for a distance of 1000 arcsec. The approximation function is becoming faster compared with celestial.coo.sphere_dist_fast, for arrays with more than 1000 elements.

RA1=1;
Dec1=1;
RA=rand(1e4,1).*2.*pi;
Dec=rand(1e4,1).*pi - pi./2;
tic;for I=1:1000, D1=celestial.coo.sphere_dist_fast(RA1,Dec1,RA,Dec);end, toc               
tic;for I=1:1000, D0=celestial.coo.mex.haversineApprox_ScalarArray(RA1,Dec1,RA,Dec); end,toc

celestial.coo.mex.haversineThresh_ScalarArray

celestial.coo.mex.hadec2azalt

Histograms

tools.hist.histcounts2regular_mex

2D histogram for evenly spaced edges. About x30 times faster than histcounts2

X=rand(1e6,1); Y=rand(1e6,1); E=(0:0.01:1);
tic;for I=1:100, N=tools.hist.histcounts2regular_mex(X,Y,E,E);end,toc 
>> tic;for I=1:100, N1=histcounts2(X,Y,E,E);end,toc                     

tools.hist.mex.histcounts1regular

1D histogram for evenly spaced edges. About x2.6 times faster than matlab.internal.math.histcounts

Bits encoding

tools.bit.bitEncode

Checksums

tools.checksum.mex_xxhash

tools.checksum.mex_xxhashFile

OS

tools.os.mex.is_avx2_supported

tools.os.mex.is_avx512_supported