mex functions for code speedup - EranOfek/AstroPack GitHub Wiki
mex functions
For code speedup we have developed matlab executable (mex) functions. These functions are written in C++, C and Fortran. These function main goal is to allow for code speedup in critical points.
non-mex changes that requires testing with the pipeline:
celestial.Kepler.kepler_elliptic line 116
Array arithmetic and manipulation
tools.array.mex.countNaN
Counts the number of NaN in a single or double array. For arrays smaller than about a few thousands elements this function is considerably slower compared with sum(isnan(Array),'all'). However, for arrays with >10,000 elements it is about a factor of two faster.
a=rand(1e4,1); a(1000)=NaN;
tic; for I=1:1e5, cc=tools.array.mex.countNaN(a); end, toc
tic; for I=1:1e5, bb=sum(isnan(a)); end, toc
tools.array.mex.countVal
Counts the number of elements equal to some value in a single or double array. For arrays smaller than about a few thousands elements this function is considerably slower compared with sum(Array=Val,'all'). However, for arrays with >10,000 elements it is about a factor of two faster.
a=rand(1e4,1); a(1000)=2;
tic; for I=1:1e5, cc=tools.array.mex.countVal(a,2); end, toc
tic; for I=1:1e5, bb=sum(a==2); end, toc
tools.array.sumInRange
tools.array.sumInRange uses tools.array.mex.sumInRange to sum all values overall dimensions in an array that their value is in some range. I.e.,
F=~isnan(A) & A>0.25 & A<0.75; S1=sum(A(F),'all'); N1=sum(F,'all');
For arrays with more than a few thousands elements, this function is x3 times faster than matlab
A=rand(1700,1700);
tic;for I=1:1:100, [S,N]=tools.array.sumInRange(A,0.25,0.75); end,toc
tic; for I=1:1:100, F=~isnan(A) & A>0.25 & A<0.75; S1=sum(A(F),'all'); N1=sum(F,'all'); end,toc
[N1-N]
[S1-S]
tools.array.sum2InRange
tools.array.sum2InRange uses tools.array.mex.sum2InRange to sum the squares of all values overall dimensions in an array that their value is in some range. I.e.,
F=~isnan(A) & A>0.25 & A<0.75; S2=sum(A(F).^2,'all'); N1=sum(F,'all');
For arrays with more than a few thousand elements, this function is x4 times faster than matlab
A=rand(1700,1700);
tic;for I=1:1:100, [S2,S,N]=tools.array.sum2InRange(A,0.25,0.75); end,toc
tic; for I=1:1:100, F=~isnan(A) & A>0.25 & A<0.75; S1=sum(A(F),'all'); N1=sum(F,'all'); end,toc
[N1-N], [S1-S]
tic; for I=1:1:100, F=~isnan(A) & A>0.25 & A<0.75; S2a=sum(A(F).^2,'all'); N1=sum(F,'all'); end,toc
[N1-N], [S2a-S2]
tools.array.mex.squeezeSumCubeMatNorm
This function performs th operator:
squeeze(sum(WInt.*MatXcen,[1 2],'omitnan')).*Norm
For large arrays it is 10% to 30% faster compared with MATLAB:
MatXcen=rand(15,15);
Norm=ones(1000,1);
WInt=imUtil.kernel2.gauss(ones(1000,1)+randn(1000,1));
tic;for I=1:10000, a=tools.array.mex.squeezeSumCubeMatNorm(WInt, MatXcen, Norm);end, toc
tic;for I=1:10000, b=squeeze(sum(WInt.*MatXcen,[1 2],'omitnan')).*Norm; end, toc
max(abs(a-b))
tools.array.mex.squeezeSumAmultB_Dim12
tools.array.onesExcept
tools.array.onesCondition
tools.array.bitsetFlag
tools.array.bitand_array
tools.array.bitor_array
tools.array.mex.diluteMatrix
Perform:
Out = In(1:StepSize:end).';
About 30% faster compared to matlab.
tools.array.mex.diluteMatrix_MinMax
Perform:
Out=In(1:StepSize:end); Out=Out(Out>Min & Out<Max);
but about x2 faster then matlab.
Crops and cut subarrays
imUtil.cut.image2cutouts
tools.array.cropMat
Searche and find
tools.find.mfind_bin (binary search on multiple columns)
tools.find.mex.findEqual
find the indices of values in array equal to some value.
For arrays with more than 10,000 elements this function is x2 faster than find. For smaller array it is slower compared with find.
A=rand(1e6,1);
A(1e5)=2;
tic;for J=1:1:1e3, IM=find(A==2); end, toc
tic;for J=1:1:1e3, IM=tools.find.mex.findEqual(A,2); end, toc
tools.find.mex.findLargerFirst
tools.find.mex.findSmallerFirst
Stat
tools.math.stat.minmax
tools.math.stat.meanStdSum_2Dcols_IgnoreNaN
tools.math.stat.meanStdSum_AllDim_IgnoreNaN
tools.math.stat.median1
tools.math.stat.squeezeStdCube_Dim12.cpp
tools.math.stat.mex.std_madmean_mex
Std using the mean absolute deviation (mad) function (MEX) This is equivalent to:
1.253.*mean(abs(X - mean(X,Dim, 'omitnan')),Dim, 'omitnan');
For large array adnd Dim=3 this function is x4 times faster than matlab (for other options it may be slower!).
Data = single(randn(1700,1700,20));
tic;for i=1:10, [a,b]=tools.math.stat.mex.std_madmean_mex(Data,3,1);end, toc
tic;for i=1:10, a1=tools.math.stat.std_mad(Data,0,3);b1=mean(Data,3);end,toc
max(abs(a1-a),[],'all')
max(abs(b1-b),[],'all')
fft related
tools.math.fft.mex.unwrap_mex
Phase unwrap (lime matlab unwarp), but x20 times faster.
a=tools.math.fft.mex.unwrap_mex(R);
R=rand(1e3,1e3).*2.*pi;
tic; for I=1:100, a=tools.math.fft.mex.unwrap_mex(R);end, toc
tic; for I=1:100, b=unwrap(R); end, toc
max(abs(a-b),[],'all')
Strings
tools.string.mex.str2uint64
Cells
tools.cell.isempty_cell
tools.cell.isnan_cell
Celestial coordinates
celestial.coo.mex.haversine_ScalarArray
celestial.coo.mex.haversineApprox_ScalarArray
Approximate sphereical_distance between scalar and array. The accuracy is excellent for small angles, and the output is monotonic with distance so safe to use for cases when angular distance smaller than 1000" are needed (e.g., source matching). The error is about 1 mas for a distance of 1000 arcsec. The approximation function is becoming faster compared with celestial.coo.sphere_dist_fast, for arrays with more than 1000 elements.
RA1=1;
Dec1=1;
RA=rand(1e4,1).*2.*pi;
Dec=rand(1e4,1).*pi - pi./2;
tic;for I=1:1000, D1=celestial.coo.sphere_dist_fast(RA1,Dec1,RA,Dec);end, toc
tic;for I=1:1000, D0=celestial.coo.mex.haversineApprox_ScalarArray(RA1,Dec1,RA,Dec); end,toc
celestial.coo.mex.haversineThresh_ScalarArray
celestial.coo.mex.hadec2azalt
Histograms
tools.hist.histcounts2regular_mex
2D histogram for evenly spaced edges. About x30 times faster than histcounts2
X=rand(1e6,1); Y=rand(1e6,1); E=(0:0.01:1);
tic;for I=1:100, N=tools.hist.histcounts2regular_mex(X,Y,E,E);end,toc
>> tic;for I=1:100, N1=histcounts2(X,Y,E,E);end,toc
tools.hist.mex.histcounts1regular
1D histogram for evenly spaced edges. About x2.6 times faster than matlab.internal.math.histcounts