Data simulation - kreutz-lab/OmicsData GitHub Wiki

Two data simulation processes are available within the OmicsData environment. The first is based on O’Brien et al. (2018):

[full,data] = SimuDataOBrien(MV,MNAR,nfeat,nsamp);

with the percentage of MV and MNAR values relative to the MV, and the number of features and samples. The second data simulation is based on Lazar et al. (2016):

[full,data] = SimuDataLazar(MV,MNAR,nfeat,nsamp);

References

O’Brien, J. J., Gunawardena, H. P., Paulo, J. A., Chen, X., Ibrahim, J. G., Gygi, S. P., and Qaqish, B. F. (2018). The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann Appl Stat, 12(4), 2075 – 2095.

Lazar, C., Gatto, L., Ferro, M., Bruley, C., and Burger, T. (2016). Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res, 15, 1116–1125.