How Life Works Annotated publications - mauriceling/mauriceling.github.io GitHub Wiki

[1] Chay, ZE, Lee, CH, Lee, KC, Oon, JSH, Ling, MHT. 2010. Russel and Rao Coefficient is a Suitable Substitute for Dice Coefficient in Studying Restriction Mapped Genetic Distances of Escherichia coli. iConcept Journal of Computational and Mathematical Biology 1:1.

Dice coefficient (also known as Nei and Li coefficient) had been commonly used as a measure of genetic similarity from DNA fingerprints. This manuscript examines 19 other coefficients for its suitability. Our results suggest that Dennis, Fossum, Matching and Russel and Rao to work as well or better than Dice. Dennis, Matching and Fossum coefficients had highest discriminatory abilities but are limited by the lack of upper or lower boundaries. Russel and Rao coefficient is highly correlated with Dice coefficient (r2 = 0.998), with both higher and lower boundaries, suggesting that Russel and Rao coefficient can be used to substitute Dice coefficient in studying genetic distances in E. coli.

[2] Chia, CY, Lim, CWX, Leong, WT, Ling, MHT. 2010. High Expression Stability of Microtubule Affinity Regulating Kinase 3 (MARK3) Makes It a Reliable Reference Gene. IUBMB Life 62(3): 200-203.

Difference in gene expressions is characteristic of the function of different cell types and those genes with low expression variance can be used as standards for quantitative gene expression studies. Microarray technology is used to study global gene expression within a cell; hence, represents a suitable source of data to mine for genes with low expression variance. The coefficient of variation (COV) of each gene was determined and a threshold of less than 0.1 COV was used to select stably expressed genes in each data set. Our results showed that microtubule affinity-regulating kinase 3 (MARK3) has the lowest COV in eight microarray datasets. In addition, the gene expression of housekeeping genes, which is very likely to be stably expressed, tends to fluctuate highly under different conditions, marking them as being less reliable for use as reference genes.

[3] Lee, CH, Oon, JSH, Lee, KC, Lee, CH, Ling, MHT. 2011. Escherichia coli ATCC 8739 Adapts to the Presence of Sodium Chloride, Monosodium Glutamate, and Benzoic Acid after Extended Culture. ISRN Microbiology 2012, Article ID 965356.

We observed the adaptation of E. coli cultured in different concentration of food additives (sodium chloride, benzoic acid and monosodium glutamate), singly or in combination, over 70 passages. Adaptability over time was estimated by generation time and cell density at stationary phase. Polymerase Chain Reaction (PCR) / Restriction Fragments Length Polymorphism (RFLP) using 3 primers and restriction endonucleases each was used to characterize adaptation/evolution at genomic level and compared by Nei-Li Dissimilarity Index. Our results demonstrated that E. coli in every treatment had adapted over 465 generations. The types of stress were discovered to be different even though different concentrations of same additives were used. Genomic analysis by PCR/RFLP shows that the stress response in E. coli may be similar.

[4] Heng, SSJ, Chan, OYW, Keng, BMH, Ling, MHT. 2011. Glucan biosynthesis protein G (mdoG) is a Suitable Reference Gene in Escherichia coli K-12. ISRN Microbiology 2011, Article ID 469053.

The expressions of reference genes used in gene expression studies are assumed to be stable under most circumstances. However, a number of studies had demonstrated that such genes were found to vary under experimental conditions. In addition, genes that are stably expressed in an organ may not be stably expressed in other organs or other organisms, suggesting the need to identify reference genes for each organ and organism. This study aims at identifying stably expressed genes in Escherichia coli. Microarray datasets from E. coli substrain MG1655 and 1 dataset from W3110 were analysed. Coefficient of variance (CV) of was calculated and 10% of the lowest CV from 4631 genes common in the 3 MG1655 sets were analysed using NormFinder. Glucan biosynthesis protein G (mdoG), which is involved in cell wall synthesis, displayed the lowest weighted CV and weighted NormFinder Stability Index for the MG1655 datasets, while also showing to be the most stable in the dataset for substrain W3110, suggesting that mdoG is a suitable reference gene for E. coli K-12. Gene ontology over-representation analysis on the 39 genes suggested an over-representation of cell division, carbohydrate metabolism, and protein synthesis which supports the short generation time of E. coli.

[5] Too, IHK, Ling, MHT. 2011. Signal Peptidase Complex Subunit 1 (SPCS1) and Hydroxyacyl-CoA Dehydrogenase Beta Subunit (HADHB) are Suitable Reference Genes in Human Lungs. ISRN Bioinformatics 2012, Article ID 790452.

Lung cancer is a common cancer and expression profiling can provide an accurate indication to advance the medical intervention. However, this requires the availability of stably expressed genes as reference. Recent studies had shown that genes that are stably expressed in a tissue may not be stably expressed in other tissues suggesting the need to identify stably expressed genes in each tissue for use as reference genes. DNA microarray analysis has been used to identify those reference genes with low fluctuation. Fourteen datasets with different lung conditions were employed in our study. Coefficient of variance, followed by NormFinder, was used to identify stably expressed genes. Our results showed that classical reference genes such as GAPDH and HPRT1 were highly variable; thus, unsuitable as reference genes. SPCS1 and HADHB, involving in fundamental biochemical processes, demonstrated high expression stability suggesting that their suitability in human lung cell profiling.

[6] Dundas, JB, Ling, MHT. 2012. Reference Genes for Measuring mRNA Expression. Theory in Biosciences 131: 215-223.

The aim of this review is to find answers to some of the questions surrounding reference genes and their reliability for quantitative experiments. Reference genes are assumed to be at a constant expression level, over a range of conditions such as temperature. These genes, such as GADPH and beta-actin, are used extensively for gene expression studies using techniques like quantitative PCR. There have been several studies carried out on identifying reference genes. However, a lot of evidence indicates issues to the general suitability of these genes. Recent studies had shown that different factors, including the environment and methods, play an important role in changing the expression levels of the reference genes. Thus, we conclude that there is no reference gene that can deemed suitable for all the experimental conditions. In addition, we believe that every experiment will require the scientific evaluation and selection of the best candidate gene for use as a reference gene in order to obtain reliable scientific results.

[7] Goh, DJW, How, JA, Lim, JZR, NG, WC, Oon, JSH, Lee, KC, Lee, CH, Ling, MHT. 2012. Gradual and Step-wise Halophilization Enables Escherichia coli ATCC 8739 to Adapt to 11% NaCl. Electronic Physician 4(3): 527-535.

E. coli is a non-halophilic microbe and is used to indicate faecal contamination. Salt (sodium chloride, NaCl) is a common food additive and is used in preservatives to counter microbial growth. Previous studies had shown that pathogenic E. coli has a higher salt tolerance than non-pathogenic E. coli. The effect of how E. coli interacts with the salt present in the human diet is under-studied. Thus, it is important to investigate this relationship. In this study, we observed the genetic changes and growth kinetics of E. coli ATCC 8739 under 3% - 11% NaCl over 80 passages. Growth kinetics was estimated by generation time, cell density and minimum inhibitory concentration (MIC) of NaCl. Our results suggested that E. coli was able to adapt from 1% NaCl to 11% NaCl with an increment of 1% NaCl per month. Our MIC results suggested that E. coli was able to grow at NaCl concentration of more than 7.5% based on the Area under Curve (AUC) from 5% at passage 44 (cultured in 5% NaCl) to 13% at passage 72 (cultured at 7% NaCl). We conclude that E. coli ATCC 8739 can be adapted to grow in 11% NaCl by incremental adaptation.

[8] Ling, MHT , Rabara, RC, Tripathi, P, Rushton, PJ, Ge, X. 2013. Extending MapMan Ontology to Tobacco for Visualization of Gene Expression. Dataset Papers in Biology 2013, Article ID 706465.

Microarrays are a large-scale expression profiling method which has been used to study the transcriptome of plants under various environmental conditions. However, manual inspection of microarray data is difficult at the genome level because of the large number of genes (normally at least 30000) and the many different processes that occur within any given plant. MapMan software, which was initially developed to visualize microarray data for Arabidopsis, has been adapted to other plant species by mapping other species onto MapMan ontology. This paper provides a detailed procedure and the relevant computing codes to generate a MapMan ontology mapping file for tobacco (Nicotiana tabacum L.) using potato and Arabidopsis as intermediates. The mapping file can be used directly with our custom-made NimbleGen oligoarray, which contains gene sequences from both the tobacco gene space sequence and the tobacco gene index 4 (NTGI4) collection of ESTs. The generated dataset will be informative for scientists working on tobacco as their model plant by providing a MapMan ontology mapping file to tobacco, homology between tobacco coding sequences and that of potato and Arabidopsis, as well as adapting our procedure and codes for other plant species where the complete genome is not yet available.

[9] How, JA, Lim, JZR, Goh, DJW, NG, WC, Oon, JSH, Lee, KC, Lee, CH, Ling, MHT. 2013. Adaptation of Escherichia coli ATCC 8739 to 11% NaCl. Dataset Papers in Biology 2013, Article ID 219095.

This manuscript describes and provides the raw data used in the halophilization project (Goh et al., 2012).

[10] Low, SXZ, Aw, ZQ, Loo, BZL, Lee, KC, Oon, JSH, Lee, CH, Ling, MHT. 2013. Viability of Escherichia coli ATCC 8739 in Nutrient Broth, Luria-Bertani Broth and Brain Heart Infusion over 11 Weeks. Electronic Physician 5:576-581.

Escherichia coli is a widely studied prokaryotic system. A recent study had demonstrated that reduced growth of E. coli after extended culture in Luria-Bertani broth is a result of depletion of fermentable sugars but able to sustain extended cell culture due to the presence of amino acids, which can be utilized as a carbon source. However, this had not been demonstrated in other media. The study aimed to determine the growth and viability of E. coli ATCC 8739 in 3 different media, Nutrient Broth (NB), Brain Heart Infusion (BHI) and Luria-Bertani Broth (LB) over 11 weeks. Growth of E. coli ATCC 8739 was determined by optical density. Viability was determined by serial dilution/spread-plate enumeration. After 11 weeks, the media were exhausted by repeated culture. Glucose was added to the exhausted media to determine whether glucose is the growth-limiting factor. Our results showed that cell density in all 3 media increased to about 1 x 10e9 cells/ml by the end of week 1, from the inoculation density of 2.67 x 10e5 cells/ml, peaked at about 1 x 10e13 cells/ml at week 4, before declining to about 5 x 10e7 cells/ml at week 7. Cell density is highly correlated to genomic DNA content (r^2 = 0.93) but poorly correlated to optical density (r^2 < 0.2). Our results also showed that the spent media were able to support further growth after glucose-supplementation. NB, LB and BHI are able to support extended periods of culture and glucose depletion is the likely reason for declining cell growth.

[11] Ling, MHT Ban, Y, Wen, H, Wang, SM, Ge, SX. 2013. Conserved Expression of Natural Antisense Transcripts in Mammals. BMC Genomics 14(1): 243.

Recent studies had found thousands of natural antisense transcripts originating from the same genomic loci of protein coding genes but from the opposite strand. It is unclear whether the majority of antisense transcripts are functional or merely transcriptional noise. Using the Affymetrix Exon array with a modified cDNA synthesis protocol that enables genome-wide detection of antisense transcription, we conducted large-scale expression analysis of antisense transcripts in nine corresponding tissues from human, mouse and rat. We detected thousands of antisense transcripts, some of which show tissue-specific expression that could be subjected to further study for their potential function in the corresponding tissues/organs. The expression patterns of many antisense transcripts are conserved across species, suggesting selective pressure on these transcripts. When compared to protein-coding genes, antisense transcripts showed a lesser degree of expression conservation. We also found a positive correlation between the sense and antisense expression across tissues. Our results suggest that natural antisense transcripts are subjected to selective pressure but to a lesser degree compared to sense transcripts in mammals.

[12] Keng, BMH, Chan, OYW, Heng, SSJ, Ling, MHT. 2013. Transcriptome Analysis of Spermophilus lateralis and Spermophilus tridecemlineatus Liver Does Not Suggest the Presence of Spermophilus-liver-specific Reference Genes. ISRN Bioinformatics 2013, Article ID 361321.

The expressions of reference genes used in gene expression studies are assumed to be stable under most circumstances. However, studies had demonstrated that genes assumed to be stably expressed in a species are not necessarily stably expressed in other organisms and some studies suggested the possibility for reference genes that are both genus-specific and organ-specific. This study aims to evaluate the likelihood of genus-specific reference genes for liver using comparable microarray datasets from Spermophilus lateralis and Spermophilus tridecemlineatus. The coefficient of variance (CV) of each probe was calculated and there were 178 probes common between the lowest 10% CV of both datasets (n = 1258). All 3 lists were analysed by NormFinder. Correlation between the NormFinder ranks of the common CV-identified stable probes of both species suggests good correlation (p-value = 1e-5). This is consistent with previous studies indicating that the liver transcriptomes of S. lateralis and S. tridecemlineatus are comparable. NormFinder analysis suggests that the most invariant probe for S. tridecemlineatus was 02n12, while the most invariant probe for S. lateralis was 24j21. However, our results showed that Probes 02n12 and 24j21 are ranked 8644 and 926 in terms of invariancy for S. lateralis and S. tridecemlineatus respectively. This suggests the lack of common liver-specific reference probes for both S. lateralis and S. tridecemlineatus. Given that S. lateralis and S. tridecemlineatus are closely related species and the datasets are comparable, our results do not support the presence of genus- specific reference genes.

[13] Too, IHK, Heng, SSJ, Chan, OYW, Keng, BMH, Chia, CY, Lim, CWX, Leong, WT, Chu, QH, Ang, EJG, Lin, YJ, Ling, MHT. 2014. Identification of Reference Genes by Meta-Microarray Analyses. In James V. Rogers (ed), Microarrays: Principles, Applications and Technologies. Nova Science Publishers, Inc.

The expression levels of reference genes used in gene expression studies are assumed to not change under most circumstances. However, a number of studies have demonstrated that genes theoretically assumed to be stably expressed were found to vary under experimental conditions. In addition, previous studies have also reported that stably expressed genes in an organ, may not be stably expressed in other organs or in a different organism, suggesting the need to identify reference genes for each organ and each organism. Due to its ability to analyze the expression of thousands of genes in an experiment, microarrays present a suitable resource for the analysis and identification of reference genes. We present four cases on practical applications of microarrays whereby multiple published microarray data sets were examined to identify suitable reference genes using coefficient of variation (CV) and NormFinder. Our results suggest that microtubule affinity-regulating kinase 3 (MARK3) is a suitable reference gene for mouse liver, 40S ribosomal protein S29 (Rps29) is a suitable reference gene for mouse testes and pancreas, signal peptidase complex subunit 1 (SPCS1) and hydroxyacyl-CoA dehydrogenase beta subunit (HADHB) are suitable reference genes for human lungs, and glucan biosynthesis protein G (mdoG) is a suitable reference gene for Escherichia coli. Further analysis suggests that the identified reference genes are involved in fundamental biochemical processes. This supports the theoretical basis and previous studies that housekeeping genes, on the whole, are generally stably expressed. However, our results also suggest that certain housekeeping genes that are stably expressed in one tissue or one organism may not be stably expressed in different tissues or organisms, supporting the need to identify reference genes for each tissue and organism.

[14] Chan, OYW, Keng, BMH, Ling, MHT. 2014. Correlation and Variation Based Method for Reference Genes Identification from Large Datasets. Electronic Physician 6(1): 719-727.

Reference genes are assumed to be stably expressed under most circumstances. Previous studies have shown that identification of potential reference genes using common algorithms, such as NormFinder, geNorm, and BestKeeper, are not suitable for microarray-sized datasets. The aim of this study was to evaluate existing methods and develop methods for identifying reference genes from microarray datasets. We evaluated the correlation between outputs from 7 published methods for identifying reference genes, including NormFinder, geNorm, and BestKeeper, using subsets of published microarray data. From these results, seven novel combinations of published methods for identifying reference genes were evaluated. Our results showed that NormFinder’s and geNorm’s indices had high correlations (R2 = 0.987, P < 0.0001), which is consistent with the findings of previous studies. However, NormFinder’s and BestKeeper’s indices (R2 = 0.489, 0.01 < P < 0.05) and NormFinder’s coefficient of variance (CV) suggested a lower correlation (R2 = 0.483, 0.01 < P < 0.05). We developed two novel methods with high correlations with NormFinder (R2 values of both methods were 0.796, P < 0.0001). In addition, computational times required by the two novel methods were linear with the size of the dataset. Our findings suggested that both of our novel methods can be used as alternatives to NormFinder, geNorm, and BestKeeper for identifying reference genes from large datasets. These methods were implemented as a tool, OLIgonucleotide Variable Expression Ranker (OLIVER), which can be downloaded from http://sourceforge.net/projects/bactome/files/OLIVER/OLIVER_1.zip.

[15] Loo, BZL, Low, SXZ, Aw, ZQ, Lee, KC, Oon, JSH, Lee, CH, Ling, MHT. 2014. Escherichia coli ATCC 8739 Adapts Specifically to Sodium Chloride, Monosodium Glutamate, and Benzoic Acid after Prolonged Stress. Asia Pacific Journal of Life Sciences 7(3): 243-258.

Escherichia coli lives in the human intestine and any form of adaptation may affect the human body. The effects of food additives on E. coli have been less studied compared to antibiotics. A recent study has demonstrated that E. coli is able to adapt to food additives by demonstrating global stress response. This study continues to study the evolution of E. coli in different food additives (sodium chloride, benzoic acid, monosodium glutamate) in different concentrations, singly or in combination, for over 83 passages. Adaptability of the cells was estimated with generation time and cell density at the stationary phase. Polymerase Chain Reaction (PCR)/ Restriction Fragments Length Polymorphism (RFLP) were used to analyze the adaptation at genomic level. Our results show that adaptation started to slow down and the gradients of generation time against passage are less steep compared with previous study, suggesting that most adaptive mutations occurred within the first 500 generations. In the genomic level, ecological specialization is observed as we find that the cells adapted through a different mechanism and diverge from each other although the resulting effect of the medium is the same. It suggests that different concentrations of food additives cause different types of chemical stress, instead of different levels of chemical stress.

[16] Ling, MHT, Poh, CL. 2014. A Predictor for Predicting Escherichia coli Transcriptome and the Effects of Gene Perturbations. BMC Bioinformatics 15: 140.

A means to predict the effects of gene over-expression, knockouts, and environmental stimuli in silico is useful for system biologists to develop and test hypotheses. Several studies had predicted the expression of all Escherichia coli genes from sequences and reported a correlation of 0.301 between predicted and actual expression. However, these do not allow biologists to study the effects of gene perturbations on the native transcriptome. We developed a predictor to predict transcriptome-scale gene expression from a small number (n = 59) of known gene expressions using gene co-expression network, which can be used to predict the effects of over-expressions and knockdowns on E. coli transcriptome. In terms of transcriptome prediction, our results show that the correlation between predicted and actual expression value is 0.467, which is similar to the microarray intra-array variation (p-value = 0.348), suggesting that intra-array variation accounts for a substantial portion of the transcriptome prediction error. In terms of predicting the effects of gene perturbation(s), our results suggest that the expression of 83% of the genes affected by perturbation can be predicted within 40% of error and the correlation between predicted and actual expression values among the affected genes to be 0.698. With the ability to predict the effects of gene perturbations, we demonstrated that our predictor has the potential to estimate the effects of varying gene expression level on the native transcriptome. We present a potential means to predict an entire transcriptome and a tool to estimate the effects of gene perturbations for E. coli, which will aid biologists in hypothesis development. This study forms the baseline for future work in using gene co-expression network for gene expression prediction.

[17] Keng, BMH, Chan, OYW, Ling, MHT. 2014. Codon Usage Bias is Evolutionarily Conserved. Asia Pacific Journal of Life Sciences 7(3): 233-242.

Codon usage bias (CUB) reflects the frequency distribution of codons usage in the genome. Several studies suggest that CUB is based on the combinations, which are most chemically efficient and minimise translational error, show that amongst closely related species, CUB is similar. However, previous studies were mainly carried out on a limited number of related species. This study tests the hypothesis that CUB is evolutionarily conserved, and examines CUB over a large set of organisms. Codon usage distributions from 18 organisms across a diversity of classes were examined. The correlations of codon usage frequencies were calculated between and within classes. Our results demonstrated that Pearson’s correlation between CUBs of different organisms within the same class is significantly higher than random. The correlation between the CUBs of mammals, birds, insects, yeast, and bacteria also corresponded to evolutionary distance. This suggests that CUB is evolutionarily conserved and the degree of conservation corresponds to evolutionary distance.

[18] Wang, HJ, Ling, MHT, Chua, TK, Poh, CL. 2017. Two Cellular Resource Based Models Linking Growth and Parts Characteristics Aids the Study and Optimization of Synthetic Gene Circuits. Engineering Biology 1(1): 30 –39.

A major challenge in synthetic genetic circuit development is the inter-dependency between heterologous gene expressions by circuits and host’s growth rate. Increasing heterologous gene expression increases burden to the host, resulting in host growth reduction; which reduces overall heterologous protein abundance. Hence, it is difficult to design predictable genetic circuits. Here, we develop two biophysical models; one for promoter, another for RBS; to correlate heterologous gene expression and growth reduction. We model cellular resource allocation in E. coli to describe the burden, as growth reduction, caused by genetic circuits. To facilitate their uses in genetic circuit design, inputs to the model are common characteristics of biological parts [e.g. relative promoter strength (RPU) and relative ribosome binding sites strength (RRU)]. The models suggest that E. coli ’s growth rate reduces linearly with increasing RPU/RRU of the genetic circuits; thus, providing 2 handy models taking parts characteristics as input to estimate growth rate reduction for fine tuning genetic circuit design in silico prior to construction. Our promoter model correlates well with experiments using various genetic circuits, both single and double expression cassettes, up to a relative promoter unit of 3.7 with a 60% growth rate reduction (average R2 ∼ 0.9).

[19] Ling, MHT. 2018. Back-of-the-Envelope Guide (A Tutorial) to 10 Intracellular Landscapes. MOJ Proteomics & Bioinformatics 7(1): 00209.

Landscape is a metaphor for conceptualizing and visualizing a score across one or more biological entities or concepts. This review provides a cursory overview of 10 landscapes (in alphabetical order, copy number, fluxome, genome, molecular, metabolome, mutation, phenome, proteome, regulome, and transcriptome) in intracellular biology without going into extensive depth; hence, this article can act as a first tutorial into intracellular landscapes. The value ahead is to be able to compare and interrogate across multiple landscapes at different resolutions.

[20] Wong, A, Ling, MHT. 2018. Characterization of Transcriptional Activities. In Guenther, R. and Steel, D. (eds.), Encyclopedia of Bioinformatics and Computational Biology, 1st Edition. ISBN 978-0-12811-414-8.

Transcription is the first stage of gene expression, leading to the eventual determination of protein abundance and affecting metabolism. Hence, there is a need to measure and characterize transcriptional activities accurately. This article discusses the experimental techniques to characterize transcriptional activities from a single-gene approach (quantitative PCR) to high-throughput methods (microarray technology and next generation sequencing). Computational approaches to predict relative transcript abundance from sequence features and gene co-expression network will be described. As a proxy to protein / enzyme abundance, transcriptional activities are critical in developing simulatable biochemical models, which can then be used to test biological hypotheses prior to laboratory experimentation.

[21] Li, BT, Lim, JX, Ling, MHT. 2018. Analyzing Transcriptome-Phenotype Correlations. In Guenther, R. and Steel, D. (eds.), Encyclopedia of Bioinformatics and Computational Biology, 1st Edition. ISBN 978-0-12811-414-8.

Advancements in high-throughput transcriptomics methods in the last 2 decades had enabled the many studies aiming to examine the transcriptome differences between two samples or across time points. Transcriptomic experimental techniques are more developed and readily available compared to that of proteome, metabolome, and fluxome. Transcriptome is the first order activity of the genome, and leading higher order changes; such as, changes in proteome, metabolome, and fluxome. However, the eventual aim is to understand how changes in the omics results in phenotypic differences between the samples. This article gives an overview of transcriptomic techniques and how phenotypic differences can be elucidated.

[22] Ling, MHT. 2018. Survey of Antisense Transcription. In Guenther, R. and Steel, D. (eds.), Encyclopedia of Bioinformatics and Computational Biology, 1st Edition. ISBN 978-0-12811-414-8.

In recent years, many antisense transcripts had been discovered. Antisense transcripts are complementary to the coding transcripts, also known as sense transcripts. Duplex formation of sense/antisense transcript has been thought to limit the effective abundance of sense transcripts, leading to reduced levels of translated peptides/proteins. However, recent discoveries show that this is not the case – duplex formation is the first-order effect, which can affect higher-orders; such as, interfere with transcription and translation of sense transcripts, affecting the maturation and half-life of sense transcripts. In this article, 4 cases are examined in depth to illustrate that the effects of antisense transcript on the eventual peptide/protein level can be complex and should be considered on a case-by-case basis.

[23] Lim, JX, Li, BT, Ling, MHT. 2018. Sequence Composition. In Guenther, R. and Steel, D. (eds.), Encyclopedia of Bioinformatics and Computational Biology, 1st Edition. ISBN 978-0-12811-414-8.

Genomic sequence is commonly known as the “blueprint of life” but deciphering this blueprint has proven to be difficult and elusive. The first task to deciphering this code is sequence analysis, resulting in an annotated sequence. This annotated sequence represents the feature composition of this sequence, or commonly known as sequence composition. In this article, we will examine some of the available tools to identify various DNA sequence features, before reviewing recent studies on the application of sequence composition. Through these applications, we can appreciate that sequence composition is an integral aspect of sequence analysis.

[24] Lim, JX, Ling, MHT. 2018. Gene Ontology and KEGG Orthology Mappings for 10 Strains of Pseudomonas stutzeri. EC Proteomics and Bioinformatics 3(1): 12-18.

Gene Ontology (GO) and KEGG Orthology (KO) are controlled vocabularies for annotating gene and protein functions, and map-ping functions onto pathways; which enables metagenomic analysis. Pseudomonasstutzeri is an environmental bacterium with po-tential for biotechnology applications in the environment, despite being an opportunistic pathogen. However, there has been no GO nor KO annotations for P. stutzeri. This study presents the first GO and KO mapping for 10 strains of P. stutzeri for further studies into P. stutzeri. Of the 42764 peptides in 10 strains of P. stutzeri, 30435 (71.17%) peptides were annotated with one or more GO terms and 25034 (58.54%) of peptides were annotated with KO terms. The annotation files and sequences can be downloaded at https://tinyurl.com/GO-KO-Pstutzeri.

[25] Suwinski, P, Ong, CK, Ling, MH, Poh, YM, Khan, AM, Ong, HS. 2019. Advancing Personalized Medicine through the Application of Whole Exome Sequencing and Big Data Analytics. Frontiers in Genetics 10: 49.

There is a growing attention towards personalized medicine. This is led by a fundamental shift from the ‘one size fits all’ paradigm for treatment of patients with conditions or predisposition to diseases, to one that embraces novel approaches, such as tailored target therapies, to achieve the best possible outcomes. Driven by these, several national and international genome projects have been initiated to reap the benefits of personalized medicine. Exome and targeted sequencing provide a balance between cost and benefit, in contrast to Whole Genome Sequencing (WGS). Whole Exome Sequencing (WES) targets approximately 3% of the whole genome, which is the basis for protein-coding genes. Nonetheless, it has the characteristics of big data in large deployment. Herein, the application of WES and its relevance in advancing Personalized Medicine is reviewed. WES is mapped to Big Data “10 Vs” and the resulting challenges discussed. Application of existing biological databases and bioinformatics tools to address the bottleneck in data processing and analysis are presented, including the need for new generation big data analytics for the multi-omics challenges of personalized medicine. This includes the incorporation of artificial intelligence (AI) in the clinical utility landscape of genomic information, and future consideration to create a new frontier towards advancing the field of personalized medicine.

[26] Ling, MHT. 2019. De Novo Putative Protein Domains from Random Peptides. Acta Scientific Microbiology 2(4): 109-112.

How prebiotic chemistry in the primordial world becomes biochemistry, is a major question in evolutionary biology. Studies have found that biological activities from random DNA sequences are not rare and abiotically-catalyzed polymerization of 13 amino acid chains can occur. However, it is not clear whether random chains 13 amino acid or longer are biologically functional. In this study, random peptide sequences were generated and mapped to ProSite motifs and NCBI Conserved Domains Database. Results suggest that a large fraction of randomly generated 13 amino acid chains may contain putative protein domains while longer random peptide chains may contain functional protein domains. Large diversity of protein domains is observed. Hence, it is plausible for putative functions to originate from abiotically-catalyzed 13 amino acid chains. As both self-replicating RNA molecules and prion proteins have been found, it is plausible that both RNA and peptides may co-exist and synergize in the primordial world.

[27] Kim, JH, Ling, MHT. 2019. Proteome Diversities Among 19 Archaebacterial Species. Acta Scientific Microbiology 2(5): 20-27.

Archaebacteria is known for its presence in varied extreme environments, suggesting potential applications and an on-going need study its diversity. This led to increasing emphasis on archaeal genomic and proteomic studies. However, there is no work to-date examining the overall proteomic diversity in archaebacteria. In this study, we examine the proteomic diversities among 19 sequenced archaebacterial species and found significant differences (p-value < 2 x 10-43) in average peptide lengths, isoelectric points, aromaticity, instability, and hydropathy. Majority of the peptides in each species are stable. Predominantly consistent correlations, though widely varied, were observed between peptide physical properties except between peptide length and hydropathy. This study provides a cursory view highlighting the diversity of archaeal proteomes; thus, re-iterating the call for further studies into these organisms.

[28] Maitra, A, Ling, MHT. 2019. Codon Usage Bias and Peptide Properties of Pseudomonas balearica DSM 6083T. MOJ Proteomics & Bioinformatics 8(2):27‒39.

Pseudomonas balearica DSM 6083T has potential applications in bioremediation and its genome is recently sequenced. Codon usage bias is important in the study of evolutionary pressures on the organism and physical properties of peptides may elucidate functional peptides. However, both have not been studied for P. balearica DSM 6083T. Here, we investigated the codon usage bias and peptide properties of the 4,050 coding sequences in P. balearica. Codon usage analysis suggests that all preferred codons were either G or C ending. There is a skew towards smaller peptides and all peptide properties (pI, aromaticity, hydropathy, and instability) are correlated (|r| > 0.102, p-value < 7e-11). %GC is correlated (|r| > 0.122, p-value < 6e-15) to peptide length, aromaticity, hydropathy, and instability. Peptide length is correlated (|r| < 0.057, p-value < 0.0003) to pI, aromaticity, and instability. Codon usage is correlated (r < -0.042, p-value < 0.0075) with all peptide properties while amino acid usage is correlated (r < -0.084, p-value < 8e-8) to all peptide properties except instability. A substantial proportion (26.9%) of genes show significantly different codon and amino acid ratios compared to the genomic and proteomic averages respectively (p-value < 1.2e-5), suggesting potential exogenous origins. These results suggest a complex interplay of metagenomic environment and various genomic / proteomic properties in shaping the evolution of P. balearica DSM 6083T.

[29] Thong-Ek, C, Usman, S, Woo, JH, Chua, JW, Kwek, BZN, Ardhanari-Shanmugam, KD, B, V, Shahrukh, K, Ling, MHT. 2019. Potential De Novo Origins of Archaebacterial Glycerol-1-Phosphate Dehydrogenase (G1PDH). Acta Scientific Microbiology 2(6): 106-110.

Eubacterial glycerol-1-phosphate dehydrogenase (G1PDH) may originate from archaebacteria by horizontal gene transfer; however, the origins of archaebacterial G1PDH remains unanswered. While recent studies show possible de novo origination of protein encoding genes and functional promoters, the mechanism of de novo origins of functional genes remains debatable. In this study, we examine the probability of de novo emergence of putative G1PDH from random sequences. Our results show that high number of open reading frames in random sequences and 71.8% of randomly generated sequences have 9.88% probability of being putative G1PDH. Hence, de novo origination archaebacterial G1PDH from random sequences is plausible.

[30] Kwek, BZN, Ardhanari-Shanmugam, KD, Woo, JH, Usman, S, Chua, JW, B, V, Shahrukh, K, Thong-Ek, C, Ling, MHT. 2019. Random Sequences May Have Putative Beta-Lactamase Properties. Acta Scientific Medical Sciences 3(7): 113-117.

Beta-lactamases, which confer resistance to beta-lactam antibiotics, is of medical and healthcare concerns globally. Studies had placed the emergence of beta-lactamases to more than 2 billion years ago. However, it is not known where the first beta-lactamase originate. In this study, we examine the probability of de novo emergence of putative beta-lactamase from random sequences. A set of 10 thousand randomly generated sequences were aligned using Smith-Waterman algorithm and Needleman-Wunsch algorithm to a set of known class D beta-lactamases isolated from GenBank to determine the probability of each randomly generated sequence as putative beta-lactamases. Our results suggest that substantial proportion of randomly generated sequences may be putative beta-lactamases, with 4% of the randomly generated sequences showing 99% probability as putative beta-lactamases. To test whether a putative beta-lactamase can evolve over generations to have more characteristics of known beta-lactamases, in silico evolution was carried out using DOSE, an evolution simulation software. Our simulation results also suggest that a putative beta-lactamase may rapidly evolve into a more functional beta-lactamase under selection. Hence, de novo origination of beta-lactamase from random sequences is plausible.

[31] Chang, ED, Ling, MHT. 2019. Explaining Monod in Terms of Escherichia coli Metabolism. Acta Scientific Microbiology 2(9): 66-71.

Monod Equation is a simple empirical equation relating limiting substrate to cell growth rate. Despite being used in many studies, there is a need to elucidate growth rate in terms of metabolism, which is then used to inform metabolic engineering efforts. Here, we attempt to explain Monod Equation in terms of simulated metabolism, in the form of metabolic flux, from an Escherichia coli MG1655 flux balance analysis (FBA) model to yield a growth rate objective function. Flux values represent change of molecule concentrations over time, making biomass objective function a rate equation. This poses difficulty in representing biomass objective function as a predictive model of metabolic fluxes, which is essentially an analytical equation of fluxes. Our results show a strong correlation (r = 0.972, p-value = 1.16 x 10^-14) between Monod’s predicted growth rate and biomass objective value from FBA model. Using this relationship, Monod’s predicted growth rate can be predicted by 14 fluxes (r = 1, p-value < 1 x 10^-16, SSE = 2.3 x 10^-7, MSE = 1.8 x 10^-9). Therefore, this study explains the growth rate of E. coli MG1655 in terms of its metabolic flux and presents a methodology for unifying Monod Equation with simulated or experimental metabolism.

[32] Ardhanari-Shanmugam, KD, Shahrukh, K, B, V, Woo, JH, Thong-Ek, C, Usman, S, Kwek, BZN, Chua, JW, Ling, MHT. 2019. De Novo Origination of Bacillus subtilis 168 Promoters from Random Sequences. Acta Scientific Microbiology 2(11): 07-10.

How the first promoters may have originated is of evolutionary curiosity. Several studies have shown that new promoters arise by copying over an existing promoter sequence. Although de novo origination of promoters has also been suggested, there has been limited evidence. Hence, we investigate the possibility of de novo origination of promoters in this study using the model organism Bacillus subtilis 168. 10,000 random sequences were generated and alignment to known promoter sequences from B. subtilis 168 were used to assess their probability of being putative promoters. Results showed that 380 out of 10,000 random sequences have ≥97% probability. In silico evolution was performed to test the possibility of promoter selection using selective pressure and our simulation results suggest that the functionality of a random sequence may increase overtime. Therefore, de novo origination of promoters from random sequences is possible.

[33] Usman, S, Chua, JW, Ardhanari-Shanmugam, KD, Thong-Ek C, B, V, Shahrukh, K, Woo, JH, Kwek, BZN, Ling, MHT. 2019. Pseudomonas balearica DSM 6083T promoters can potentially originate from random sequences. MOJ Proteomics & Bioinformatics 8(2): 66‒70.

Recent studies and researches have proposed that many genes are plausibly emerged from previously non-coding genomic regions. However, how a promoter can emerge and function properly from de novo genes remain debatable as this has not been show in large numbers of organisms. Therefore, this study aims to explore the possibility of de novo evolution of a promoter from random sequences by using Pseudomonas balearica DSM 6083T as the model organism. Our result shows that 39.3% of the generated random sequences have 68.6% probability to be a functional promoter. Evolution simulation was carried out to observe the effect of evolution in the putative P. balearica promoter over generations. The simulation result proves that selection enhances the functionality of the generated random sequences overtime. Therefore, it is plausible that P. balearica promoter could emerge from random sequences, which is consistent with findings from previous studies.

[34] Sim, BKY, Ling, MHT. 2020. Possibility of Abiotic Genesis of Biochemistry. EC Microbiology 16(6): 104-109.

One of the first and primary life origin questions is how life can originate from primordial Earth chemistry. More than 60 years ago, Stanley Miller and Harold Urey conducted the famous Miller-Urey experiment where heated mixture of water, methane, ammonia, and hydrogen; representing early compounds on Earth; produces several amino acids when passed through an electrical discharge representing lightning. This gave rise to the possibility of abiotic genesis of biochemistry. Over the next six decades, evidence supporting various primordial macrobiomolecules emerged; leading to the concepts of RNA, DNA, and peptides being the first primordial macrobiomolecules. In this short review, possibility of each world originating separately, and coevolving were examined. Current evidence suggests that RNA world and peptide/amyloid world may originate independently and substantial possibility of interplay between these three worlds. Hence, RNA world, DNA world, and peptide/amyloid world may coevolve regardless of whether they originate independently. Thus, this calls for a reconciliation into a peptide-nucleic acid world.

[35] Teng, RSY, Kwang, JCY, Chin, ASQ, Sander, CJ, Ang, IZL, Foong, JH, Cheong, KC, Hon, RYH, Ling, MHT. 2020. Correlation Analysis on Transcriptomes from Published Human Skin Studies Show Variations between Control Samples. EC Clinical and Medical Case Reports 3(6): 143-146.

Reproducibility has been shown to be a problem in many areas of science, leading to a “reproducibility crisis”. Many studies had examined factors limiting experimental reproducibility and one of the factors suggested is the stability of control samples underpinning all experimental findings. This study examines the transcriptomes of the control samples from three published human skin studies using correlation analysis to evaluate the stability of clinical control samples. Our results show significant differences (t-test p-value < 5.4E-5, Mann-Whitney U p-value < 0.00001) between within data set correlations and between data set correlations, suggesting significant differences between control samples from different data sets. This may have potential implications on the interpretation of clinically important results.

[36] Neo, CY, Ling, MHT. 2020. Prevalence and Length of Open Reading Frames Vary Across Randomly Generated Sequences of Different Nucleotide Compositions. EC Microbiology 16(7): 72-78.

The emergence of open reading frames is an important step in the origination of de novo genes. However, the conditions leading to the origination of de novo genes is not well-understood. This study aims to determine the effect of nucleotide composition on the length and occurrence of ORFs by examining various ORF parameters using randomly generated sequences from 85 different nucleotide compositions. Our results suggest that various ORF parameters are significant across different nucleotide compositions (p-value < 1E-120). The average length, standard error of the average length, average maximum length, and standard error of the average maximum length of ORFs can be moderately predictable (0.43 < r^2 < 0.59) by nucleotide compositions. These results suggest that the prevalence and length of ORFs may be function of the underlying nucleotide composition.

[37] Cheong, KC, Hon, RYH, Sander, CJ, Ang, IZL, Foong, JH, Ling, MHT. 2020. A Simulation Study on the Effects of Media Composition on the Growth Rate of Escherichia coli MG1655 using iAF1260 Model. Acta Scientific Microbiology 3(8): 40-44.

Media compositions are important determinants of growth rate and genome-scale models (GSMs) had been used for optimizing media for metabolite production and growth. Recently, iAF1260, a GSM based on Escherichia coli MG1655, was used to study the effects varying glucose concentration in media on growth rate and metabolic fluxes. In this study, the effects of other media components in the presence of varying glucose concentrations on the predicted growth rate of E. coli MG1655 were examined. Our results show that 10 media components (ammonium, calcium, chloride, copper, glucose, manganese, magnesium, molybdate, phosphate, and potassium) demonstrate substantial impact on the predicted growth rate of E. coli MG1655. Of which, 4 components (glucose, ammonium, magnesium, and phosphate) have the most impact. However, our results also demonstrate the limitations of iAF1260 as media components that had been shown to affect E. coli growth rate were not reflected by the model.

[38] Gunalan, K, Wong, CQL, Neo, MPY, Ling, MHT. 2020. [One Percent of Escherichia coli O157:H7 Peptides May Contain Putative Beta-Lactamase Activity.](https://github.com/mauriceling/mauriceling.github.io/wiki/One-Percent-of-Escherichia-coli-O157-H7-Peptides-May-Contain-Putative-Beta-Lactamase-Activity._ EC Microbiology 16(8): 73-79.

Beta-lactamases are enzymes conferring resistance to beta-lactam antibiotics, which has become a global challenge. Studies had suggested that beta-lactamases are primitive enzymes that existed before the antibiotic era, leading to the question on potential sources and emergence of beta-lactamases. This study examines the possibility of putative beta-lactamases in Escherichia coli O157:H7 by sequence comparison to known extended-spectrum beta-lactamases (ESBLs) from E. coli. Our results suggest that 57 peptides out of 5021 (1.14%) E. coli O157:H7 peptides have 64.7% probability of beta-lactamase activity. Phylogenetic analysis clustered the top 10 (by sequence similarity score) of these 58 peptides within known ESBLs. This suggests that these peptides may contain putative beta-lactamases activity and potentially be a source of putative beta-lactamase.

[39] Tan, XT, Ramesh, A, Wang, VCC W, Kamarudin, NJ, Chew, SSM, Murthy, MV, Yablochkin, NV, Mathivanan, K, Ling, MHT. 2020. Core Pseudomonas Genome From 10 Pseudomonas Species. MOJ Proteomics & Bioinformatics 9(3): 68‒71.

Core genome of a set of organisms represents the set of homologous genes shared between the set of organisms with many applications. The Pseudomonas genus is highly diverse with both plant and animal pathogens. Hence, the core genome of Pseudomonas genus can be useful. Current studies presented contradictory results with the core genome of Pseudomonas genus marginally larger than that of Pseudomonas aeruginosa. In this study, we attempt to identify a core Pseudomonas genome from 10 publicly available annotated genomes by intersecting homologous coding sequences using BLAST. Our results suggest a 218-gene core genome, which is 3.46% of the coding sequences of P. aeruginosa. 136 of 218 genes were mapped to official gene symbols and were enriched in 8 clusters in Gene Ontology biological processes related to central metabolism.

[40] Kamarudin, NJ, Wang, VCC, Tan, XT, Ramesh, A, Chew, SSM, Murthy, MV, Yablochkin, NV, Mathivanan, K, Ling, MHT. 2020. [A Simulation Study on the Effects of Founding Population Size and Number of Alleles Per Locus on the Observed Population Genetic Profile: Implications to Broodstock Management.](https://github.com/mauriceling/mauriceling.github.io/wiki/A-Simulation-Study-on-the-Effects-of-Founding-Population-Size-and-Number-of-Alleles-Per-Locus-on-the-Observed-Population-Genetic-Profile-Implications-to-Broodstock-Management._ EC Veterinary Science 5(8): 176-180.

Loss of genetic variability in small population, known as founder effect, is commonly seen in aquaculture, where broodstocks are not routinely supplemented from the wild, leading to detrimental effects. Yet, the relationship between founding population size and observed population genetic profile is not clear. Here, the effects of founding population size and number of alleles per locus on the observed population genetic profile across multiple generations were examined using simulation. Our results suggest that the number of alleles per locus (p-value = 1.2E-102) and generation counts (p-value < 1E-240) are significant factors in genetic drift but not founding population size (p-value = 0.12). This suggests that genetic drift occurs regardless of population sizes, which may have implications in broodstock management to constantly minimize the impact of genetic drift regardless of broodstock population.

[41] Wang, VCC, Kamarudin, NJ, Tan, XT, Ramesh, A, Chew, SSM, Murthy, MV, Yablochkin, NV, Mathivanan, K, Ling, MHT. 2020. A Case Study using Mitochondrial Genomes of the Order Diprotodontia (Australasian Marsupials) Suggests that Single Ortholog is Not Sufficient for Phylogeny. EC Clinical and Medical Case Reports 3(9): 93-114.

All organisms exist today descended from a common ancestor and phylogenetic tree is a common means to analyze such evolutionary histories. Currently, orthologs are routinely used to construct phylogenetic trees. However, the number of orthologs required to determine the evolutionary history of a set of organisms is not clear. In this case study, we compare the generated phylogenetic trees from one ortholog against that of the complete set of orthologs using 13 mitochondrial genes of the 24 species from the Order Diprotodontia. Using the phylogenetic tree generated from the complete set of orthologs as benchmark, our results suggest that using single ortholog may result in distinctly different phylogenies as compared to benchmark and the average number of branch points from multiple single orthologs is significantly different (paired t-statistic = 8.01, p-value = 3.27e-14) from benchmark. This suggests that phylogenetic analysis from single ortholog or multiple single orthologs is not likely to reflect actual evolutionary history and the complete set of orthologs is required.

[42] Murthy, MV, Balan, D, Kamarudin, NJ, Wang, VCC, Tan, XT, Ramesh, A, Chew, SSM, Yablochkin, NV, Mathivanan, K, Ling, MHT. 2020. UniKin1: A Universal, Non-Species-Specific Whole Cell Kinetic Model. Acta Scientific Microbiology 3(10): 04-08.

Mathematical models of metabolism can be a useful tool for metabolic engineering. Genome-scale models (GSMs) and kinetic models (KMs) are the two main types of models. GSMs provide steady-state fluxes while KMs provide time-course profile of metabolites, which has more advantage in identifying metabolic bottlenecks. However, KMs require greater degree of accuracy for parameters than GSMs resulting in fewer large-scale KMs than GSMs. Recently, large-scale KMs have been developed but are not based on standard enzymatic rate equations resulting in difficulty in interpreting results in terms of enzyme kinetics. Here, we construct a universal, non-species-specific KM of core metabolism, based on Michaelis-Menten Equation, from glucose to the 20 amino acids and 5 nucleotides based on reactions listed in Kyoto Encyclopaedia of Genes and Genomes (KEGG). Non-species specificity is achieved by using the same Michaelis-Menten constant (Km), turnover number (Vmax), and concentration for each metabolite and enzyme for each equation. This forms a base model for developing species-specific whole cell KMs. The resulting model consists of 566 reactions, 306 metabolites, and 310 enzymes, involving in 1284 metabolite productions, and 1249 metabolite usages. Sensitivity analysis shows that 85% of the metabolite concentration changes with the change of one enzyme kinetic parameter. This forms a base model for developing species-specific whole cell KMs.

[43] Chew, SSM, Murthy, MV, Kamarudin, NJ, Wang, VCC, Tan, XT, Ramesh, A, Yablochkin, NV, Mathivanan, K, Ling, MHT. 2020. Rapid Genetic Diversity with Variability between Replicated Digital Organism Simulations and its Implications on Cambrian Explosion. EC Clinical and Medical Case Reports 3(11): 64-68.

Cambrian Explosion resulted in substantial increase in biodiversity, which may be attributed to both environmental and biological factors. Although increased genetic evolution rate had been shown during this period, the role of genetic evolution in increased biodiversity is unclear. Re-creating Cambrian Explosion experimentally is not feasible. In this study, we used digital organisms (DOs) at high rate of random point mutations in the absence of selective pressure to examine the extent genetic evolution possible during Cambrian Explosion. Our simulation results suggest rapid and significant genetic divergence in the absence of selective pressure can occur at a species level and at local population level with significant differences between each local population (F ≥ 15.97, p-value ≤ 1.4E-79). Hence, the emergence of biodiversity in Cambrian Explosion may be due to the release of accumulated adaptive potential.

[44] Chua, SCH, Ling, MHT. 2021. Stop Codon Usage Varies on CDS Length, Nucleotide Compositions, and Peptide Instability in Six Escherichia coli Strains. EC Clinical and Medical Case Reports 4(2): 39-46.

Prokaryotic stop codon usage has been shown to be influenced by GC content and release factor abundance. However, it is unlikely that GC content and release factors are the only factors influencing stop codon usage as nucleotide compositions and peptide properties have also been shown to influence codon usage biasness; of which, stop codon usage is a specific instance of codon usage bias. Here, the stop codon usage frequencies, nucleotide compositions, and peptide properties in six strains of E. coli (MG1655, W3110, BL21, O25b:H4, O157:H7, and 58-3) were examined. Our results suggest that the stop codon usage frequencies of pathogenic strains O157:H7 and O25b:H4 are significantly different to other strains (Chi-Square ≥ 7.241, p-value ≤ 2.7E-02), suggesting different evolutionary paths between pathogenic and non-pathogenic strains of E. coli. The average lengths, nucleotide compositions, and peptide instability between the stop codons are significantly different in all cases (F ≥ 3.07, p-value ≤ 4.7E-02) except for average thymine composition in E. coli 58-3. This suggests a relationship between stop codon usage and nucleotide compositions, other than GC content, and/or peptide properties.

[45] Cho, JL, Ling, MHT. 2021. Adaptation of Whole Cell Kinetic Model Template, UniKin1, to Escherichia coli Whole Cell Kinetic Model, ecoJC20. EC Microbiology 17(2): 254-260.

Mathematical modelling can be used to study metabolism involving thousands of biochemical reactions and kinetic models (KMs) of metabolism enable the time-course analysis of metabolic changes. Recently, a universal whole cell KM of central metabolism, UniKin1, has been presented. Here, we adapt UniKin1, into an Escherichia coli specific model by modifying the initial concentrations for 48.7% (n = 149) of the metabolites and 25.2% (n = 78) of the enzymes into E. coli specific concentrations, and term our model as ecoJC20. Our simulation results suggest that ecoJC20 is substantially different from UniKin1. We also demonstrate the potential of ecoJC20 to evaluate the effects of transgenic nitrogen fixation pathways on the central metabolism; thus, underpinning the potential applications of kinetic models as an experimental design tool.

[46] Kuan, ZJ, Ling, MHT, 2021. Core Genome of Poales, An Economically Important Order of Monocotyledons. EC Agriculture 7(2): 24-29.

The importance of Poales species; which includes rice, wheat, and maize; has led to various studies on its tolerance and evolution. Evolutionary studies are largely dependent on the presence of orthologs. A recent study suggests that the complete set of orthologs is required to reflect actual evolutionary history; thereby, underpinning the need to identify the core genome of Poales representing the set of orthologs across Poales species. Here, we identified a 6,122 gene core genome of Poales and functional analysis suggests that a strong role of interspecies interactions within Poales core genome.

[47] Kuan, ZJ, Amir-Hamzah, N, Ling, MHT. 2021. Coffee as a Potential Nutraceutical. EC Nutrition 16(3): 57-65.

Global coffee production nearly doubled over the last decade, making coffee one of the most popular beverage in current society. However, recent consumer studies suggest concerns of potential health detriments from regular coffee consumption. Given its popularity, any health benefits or detriment can have substantial public health impact. Here, we examined several potential benefits and risks with regards to coffee drinking. Evidence is non-conclusive in several cases and warrants further studies. Despite so, there is likely more health benefits than harm, especially when moderation is applied. Hence, we are in the view that coffee is a potential nutraceutical when consumed in moderation and with adequate hydration.

[48] Lim, GZK, Azmi, HH, Dolmatova, M, Ling, MHT. 2021. Significant Differences in Nucleotide and Peptide Features Between Chromosomes Suggesting Sequence Non-Randomness Across Chromosomes. Acta Scientific Microbiology 4(4): 23-28.

Eukaryotic genomes are organized into multiple chromosomes. Several studies have suggested that chromosomes are organized functionally and spatially, indicative of selective pressure in chromosomal organization. However, question remains as to whether chromosomes of the same organism are significantly different based on nucleotide and peptide features. Here, we examine three eukaryotic species across kingdoms; animalia (Sarcophilus harrisii), plantae (Prunus dulcis), and SAR supergroup (Plasmodium falciparum); to identify whether chromosomes of the same organism are significantly different based on nucleotide and peptide features. Our results show that the average GC contents in coding sequences are significantly different (p-value ≤ 3.30E-09) to their chromosomal GC content in all 30 chromosomes across 3 organisms. Our results also show that 38 out of 45 (15 features by 3 organisms) the nucleotide and peptide features are significantly different (p-value ≤ 0.044) between chromosomes. These results imply the presence of selective pressure in chromosomal organization.

[49] Johny, A, Sumedha, PR, Ling, MHT. 2021. Simulation Suggests that One-Off Simple Supplementation from the Wild into Captive Population May Not Increase Captive Genetic Diversity. EC Veterinary Science 6(7): 107-111.

Loss of genetic diversity in captive population due to inbreeding is a concern to commercial farming. Supplementation from the wild, which is deemed to be more genetically diverse, into captivity has been proposed as a method to increase genetic diversity in inbred captive populations. Here, we examine the possibility of a one-off supplementation by maintaining the increased captive population size for the period of one generation after supplementation using computer simulations on 50 markers of 10 equally proportioned alleles each. Our results suggest that one-off supplementation is not likely to increase the genetic diversity of captive population and we also observed that the genetic diversity of captive population may reduce proportional to supplementation ratio.

[50] Tan, FL, Kuan, ZJ, Amir-Hamzah, N, Kng, X, Wee, YY, Sor, SX, Ling, MHT. 2022. Significant Differences in Media Components and Predicted Growth Rates of 58 Escherichia coli Genome-scale Models. Acta Scientific Microbiology 5(2): 56-68.

Escherichia coli is a common host for metabolite production and genome-scale metabolic models (GSMs) is an important computational tool to aid in such experimental design. As of September 30, 2021; 58 GSMs have been registered with BiGG database. However, these GSMs had been built for different applications and no large-scale comparative study had been performed to-date. In this study, we examine the media components and predicted growth rates of these 58 GSMs using flux balance analysis across various glucose uptake rates. Only 5 out of 29 uptake rates (as proxy for media components) are common in all 58 GSMs; namely, proton, water, ammonium, oxygen, and phosphate. 74.25% (2370 of the 3192) pairwise comparisons of predicted growth rates show significant differences (p-value < 0.05) and 34 of 42 pairwise comparisons of predicted growth rates within the same strain are significantly different. Hence, our results demonstrated substantial differences in media components and significant differences in predicted growth rates between the GSMs and even within GSMs constructed for the same strain.

[51] Chua, MTE, Dumanglas, ABG, Ling, MHT. 2022. Gene Co-Expressions Cannot Predict Protein-Protein Interactions in Escherichia coli. EC Microbiology 18(3): 102-109.

Gene co-expression is the correlation of gene expressions across multiple samples or conditions. Significant gene co-expressions have been used to construct gene co-expression networks and used to elucidate biological information. However, the suitability of gene co-expressions in predicting protein-protein interaction is not clear. In this study, ten gene co-expression measures were evaluated for its suitability in predicting PPIs in Escherichia coli. Our results show poor precision (precision ≤ 0.00188). This suggests that gene co-expression alone is not likely to be suitable to predict protein-protein interactions.

[52] Sor, SX, Wee, YY, Kng, X, Ling, MHT. 2022. A Systematic Scoping Review on the Current Applications of Environmental DNA (eDNA). EC Clinical and Medical Case Reports 5(4): 46-64.

Environmental DNA (eDNA) are DNA shredded by an organism to its surrounding, which presents a non-invasive means to detect organisms of interests or to assess the biodiversity of the specific environment. Applications of eDNA is rapidly growing but a systematic review on the breath and depth of the applications of eDNA has not been carried out. Here, we present a systematic scoping review of the current applications of eDNA up to July 31, 2021, using Google Scholar and PubMed as source databases. 159 articles were identified, and 54 articles were included in this review. Our analysis suggests 10 themes of applications; namely, (a) detecting rare, cryptic or endangered species, (b) detecting bacterial and parasitic pathogens/disease outbreaks, (c) invasive species detection, (d) biodiversity characterisation and biomonitoring, (e) spawning ecology, (f) management of fisheries, (h) hatchery management/selective breeding application, (i) forensic/forensic ecology, (j) crop cultivation and soil fertility, and (k) anthropogenic effects on biodiversity.

[53] Wee, YY, Kng, X, Sor, SX, Ling, MHT. 2022. Genome-Scale Metabolic Model-Based Reactome-Phenome Map of Synechocystis sp. PCC 6803, A Potential Biofuel Producer. Medicon Microbiology 1 (4): 02-08.

Synechocystis sp. PCC 6803 is a potential producer of lipids, alcohols, and biofuels. Genome-scale models (GSM) has been used to examine potential knockout to optimize specific metabolite (such as, ethanol) production. Besides from a metabolic production perspective, GSMs can also be used examine the effects of genes from the perspective of genotype-phenotype relationship. However, most GSMs are reaction-based rather than gene-based. Hence, GSMs can be used for reactome-phenome mapping where each reaction may be the result of one or more genes. In this study, we examine the reactome-phenome map of Synechocystis sp. PCC 6803 using its GSM model, iJN678, by performing single knockouts to each of its 863 reactions. Our results suggest that 37.3% to 39.7% (322 to 343 reactions) of the knockouts have minimal impact on the phenome as they were clustered together with wildtype phenotype and 53.5% (462 reactions) are essential. The rest of the 58 to 79 reactions can be clustered into 9 to 33 phenotypic clusters. Moreover, the fluxome variation within wildtype cluster is significantly larger than that of essential reaction cluster (t ≥ 3.26, p-value ≤ 1.3E-3). This suggests that individual reaction knockout may have measurable effects on the fluxes; which may be useful in metabolic engineering.

[56] Tang, AY, Ling, MHT. 2022. Relapse Processes are Important in Modelling Drug Epidemic. Acta Scientific Medical Sciences 6(6): 177-182.

Global drug epidemic is an important public health issue. Mathematical modelling is vital for gaining insights, which may inform policy making. Several modelling studies fail to adequately address relapse, which includes rapid relapse into heavy or light drug use, and relapse after extended sobriety. Here, we study the impact of relapses by incorporating relapse processes into an existing 6-compartment model. Our results show that the proportions of drug users are higher with relapse processes than that without relapse processes; yet, the proportion of rehabilitation is lower with relapse than without relapse. This highlights the importance of relapse processes in modelling drug epidemic.

[57] Sim, BJH, Wong, KM, Ling, MHT. 2022. Metabolite Overproduction Potential of Saccharomyces cerevisiae S288C Explored Using Its Genome-Scale Metabolic Model, iMM904. EC Microbiology 18(7): 46-51.

While many genome-scale metabolic models (GSMs) had been constructed with specific purpose, it can be used to explore the potential of the organism for metabolite overproduction. Saccharomyces cerevisiae is a widely used microorganism in biotechnology with many successful applications and a GSM, iMM904, has been constructed for S. cerevisiae S288C. In this study, explore the metabolite overproduction capabilities of S. cerevisiae S288C given single gene knockout using iMM904. Our simulation results suggest that 217 of the 1577 (13.76%) single reaction knockouts potentially leads to metabolite overproductions. This suggests the potential of overproducing native metabolites using only gene knockouts and forms the basis of future validation studies between S. cerevisiae S288C and its corresponding GSM.

[58] Loh, BJK, Kannan, KSS, Patil, T, Vij, R, Ling, MHT. 2022. Inconsistent Phylogenetic Trees from Nucleotide or Amino Acid Sequences from Mammalian Mitochondrial Genomes. EC Clinical and Medical Case Reports 5(7): 03-09.

Phylogenetic trees using orthologs are commonly used to analyze such evolutionary histories of organisms. A recent study suggests that phylogenetic analysis from single orthologous genes or multiple single orthologous genes are not likely to reflect actual evolutionary history and core genome is required. However, it is not clear whether this finding can be extrapolated to orthologous peptide sequences. In this study, we compare the generated phylogenetic trees constructed using nucleotide and peptide sequences from mitochondrial genomes of fourteen mammals. Our results confirmed that different orthologous nucleotide sequences may result in different phylogenetic trees with 20.5% of pairwise comparisons being significantly different (p-value < 0.05). This result is extrapolatable to orthologous peptide sequences with 33.3% of the pairwise comparisons being significantly different. In addition, the phylogenetic tree constructed using core genome is significantly different (paired t-test p-value = 5.52E-7) from the phylogenetic tree constructed using core proteome.

[59] Ng, ASY, Azan, NK, Samsudi, F, Mazlan, MR, Loh, YK, Ling, MHT. 2023. A 5-Year Systematic Review (01 April 2017 to 31 March 2022) on the Causes of Abdominal Obesity. EC Clinical and Medical Case Reports 6(1): 90-110.

Abdominal obesity (AO) is a global public health concern with few reviews on the underlying causes. Here, we conduct a systematic review on the causes of AO using publications indexed in PubMed from April 1, 2017, to March 31, 2022. 46 out of 199 (23%) articles were included, revealing 10 themes of causes; namely, (a) age, gender, socioeconomic, and genetic / biological determinants, (b) nutritional intake, (c) lifestyle habits, (d) mental and cognitive disorders, (e) smoking, (f) gastrointestinal microbiota, (g) alcohol consumption, (h) rural regions, (i) non-alcoholic fatty liver disease, and (j) noise pollution.

[60] Azan, NK, Ng, ASY, Samsudi, F, Mazlan, MR, Loh, YK, Ling, MHT. 2023. A 5-Year Systematic Review (2018 to 2022) on The Effectiveness of Mediterranean Diet in Preventing Alzheimer’s Disease. Acta Scientific Nutritional Health 7(2): 79-90.

Alzheimer’s disease (AD) is an age-related neuronal disorder characterized by abnormal levels of proteins, beta amyloid (Aβ) and tau, resulting in gradual loss of cognitive functions due to impaired network of neurons in the brain. Past literature has proposed dietary interventions through administration of Mediterranean Diet (MD) as a solution to prevent AD development. A systematic review on the effectiveness of MD in preventing AD is executed up to July 3, 2022, using PubMed as source database within the last 5 years between 2018 to 2022. 131 articles were identified, and 26 articles were included in this review. After analysing the articles, 5 themes were identified to examine the effectiveness of MD: namely, (a) MD adherence and AD risk (b) MD and AD pathological development, (c) MD and cognitive health (d) Mediterranean - DASH Intervention for Neurodegenerative Delay (MIND) diet, and (e) Diet-microbe interaction. MD adherence is a vital factor in achieving successful dietary intervention. Various covariates and demographics affect adherence level. Differing evidence from literature discuss MD’s efficacy in preventing AD. MD is concluded to be effective to a certain extent in preventing Alzheimer’s disease due to various factors such as adherence levels, demographics etc and further longitudinal and randomised control trials (RCT) are warranted.

[61] Wong, KM, Sim, BJH, Ling, MHT. 2023. Consistency Between Saccharomyces cerevisiae S288C Genome Scale Models (iND750 and iMM904). Acta Scientific Microbiology 6(3): 63-68.

Saccharomyces cerevisiae is an important experimental organism for industrial and scientific research with S. cerevisiae S288C as the first eukaryote genome sequenced. Genome-scale metabolic models (GSMs) are computational tools to explore metabolic engineering requirements. Currently, there are 2 major GSMs of S. cerevisiae S288C, iND750 and iMM904, which raises the question of whether they are consistent to each other. Here, we compare iND750 and iMM904 by examining the fluxomic changes resulting from single reaction knockouts. 40.5% to 50.3% (n = 637) of the reactions are common in both GSMs. Of which, 64 (10.0% of common reactions, or between 4.1% and 5.2% of the total reactions in each GSM) reaction knockouts resulted in significant fluxomic changes. This is significantly lower (t = -15.882, df = 30, p-value = 3.82E-16) from expected using randomization test, suggesting that iND750 and iMM904 are likely to be consistent with each other from the perspective of common reactions.

[62] Roh, D, Naing, SY, Ling, MHT. 2023. Peptide Properties of Saccharomyces arboricola H-6 Suggest Randomness in Chromosomal Organization. EC Microbiology 19(3): 01-08.

Eukaryotic genomes are organized into multiple chromosomes and studies have suggested that chromosomal organization is subjected to evolutionary pressure as demonstrated by non-randomness of properties across chromosomes. However, a recent study provided evidence to suggest that chromosomal organization may be more random in unicellular than multicellular eukaryotes. Here, we examine the distribution of five peptide features (length, aromaticity, instability, hydropathy, and isoelectric point) across the 16 nuclear chromosomes of Saccharomyces arboricola H-6, a recently identified and sequenced unicellular eukaryote. Our results show that only hydropathy is not random across the chromosomes (F = 1.914, p-value = 0.018); thereby, supporting the hypothesis that chromosomal organization may be random in unicellular eukaryotes.

[63] Toh, BCY, Ling, MHT. 2023. Applications Utilizing CRISPR/Cas9. Novel Research in Sciences 14(1):NRS.000826.

Clustered regularly interspaced short palindromic repeats (CRISPR) Cas (CRISPR-associated) system is an adaptive immune system used by prokaryotes, which has been adapted for many laboratory applications. Here, we illustrate applications of CRISPR/Cas9 in 7 areas: (i) genome engineering, (ii) edition of single-stranded RNA (ssRNA), (iii) high throughput gene screening, (iv) creating disease models, (v) live labelling of chromosomal loci, (vi) epigenome editing, and (vii) regulation of endogenous gene expression.

[64] Chia, VSQ, Ling, MHT. 2023. Potential Information Processing Differences in Male and Hermaphrodite Neural Networks of Caenorhabditis elegans. Medicon Medical Sciences 5(2): 53-59.

Connectome generally refers to the macroscale connectivity between anatomical areas of the brain to mesoscale connectivity between neurons to synaptic connectivity at the microscale level. Studies has implicated macroscale connectomes in functional behaviours. Although macroscale connectomes are likely to affect functions via mesoscale connectomes, this has not been demonstrated. Recently, mesoscale connectomes of male and hermaphrodite Caenorhabditis elegans have been published. Here, we simulate computationally the mesoscale connectomes of male and hermaphrodite C. elegans to examine differences in information processing. Our results show that the number of significantly differently neurons (n = 28, p-value < 0.05) is significantly higher than random (p-value = 0.00468), suggesting potential differences in information processing between male and hermaphrodite C. elegans. Hence, mesoscale connectome differences may result in information processing differences.

[65] Yap, SSK, Choy, WJ, Tan, RYH, Ling, MHT. 2024. Assembly of Single Substance Use Epidemiological Models. Acta Scientific Medical Sciences 8(1): 43-50.

Substance use/abuse is a public health concern with a long history and mathematical modelling is an important tool to study its epidemiology. Recently, a study showed that adding 2 processes into a 6-compartment model with 15 processes can drastically affect the conclusions, illustrating the importance of a more complete but complicated model. A systematic review in 2022 presented 24 ordinary differential equations (ODE) models of substance use/abuse epidemiology. This study aims to assemble these 24 ODE models, for single substance use only, by stepwise analysis and assembly. Multiple substance uses and comorbidities are deemed out of scope. The assembled model consists of 11 compartments [(i) susceptible without or refusing health education (S), (ii) susceptible with or accepted health education (C), (iii) light drug users (L), (iv) heavy drug users (H), (v) users under in-patient treatment (Ti), (vi) users under out-patient treatment (To), (vii) users in remission (Re), (viii) drug sellers (D), (ix) susceptible who matured (M), (x) users who quit permanently (Q), and (xi) removed (R)] with 42 processes and 40 parameters. We present the assembled model, SubstanceUseModel, as a Python command-line script where model parameters can be changed using command-line arguments, to improve its usability. This can form the basis for further model development in the field.

[66] Lao, S, Seow, SK, Ong, RT, Dave, VS, Ling, MHT. 2023. Systematic Review on the Effects of Food on Mental Health via Gut Microbiome. SciMedicine Journal 5(2): 81-91.

Recent studies have suggested that diet may affect gut microbiome and subsequently influencing mental health. While several systematic reviews have been done on the effects of diet on mental health via gut microbiome, there are focused on either specific diets or mental disorders. This systematic review examines the effect of diet and broad-based mental health via gut microbiome. 21 out of 99 studies published prior to 2023 and listed in PubMed are included. Our analysis suggests that vegan diet, Mediterranean style diet, fibre, probiotics, dietary vitamin D, unpasteurised milk, foods with a low omega-3 to omega-6 ratio, and Xiao Yan San, may have positive effects on gut microbiome leading to positive influence on mental health; while meat-rich diet, high-fat diet, high fructose intake, and zinc deficiency, may have negative effects on gut microbiome leading to negative influence on mental health. Collectively, the effects of diet on mental health via gut microbiome may be explained by the composition of gut microbiome and the metabolites produced by gut microbiome on gut permeability.

[67] Ong, RT, Lao, S, Seow, SK, Dave, VS, Ling, MHT. 2024. Systematic Review of PubMed Articles Prior to 2023 on Effects of Breakfast on School Performance. Medicon Medical Sciences 6(1): 11-25.

Breakfast has been touted as the most important meal of the day, especially for school-aged individuals who requires energy and nutrients to support activities in school. However, the effects of breakfast on school performance have not been systematically reviewed. This study aims to fill this gap by conducting a systematic review on the effects of breakfast on school performance using articles indexed in PubMed prior to 2023. 41 of the original 94 articles (43.6%) were included. Majority of the studies have concluded that regular breakfast allows students to perform better in school; hence, skipping breakfast, especially prolonged skipping of meals, is not encouraged for school-aged individuals as it may lead to impaired cognitive function required for education.

[68] Lum, AKY, Shanmugam, JH, Teo, W, Kwan ZJ, Ng, SMH, Ling, MHT. 2024. Core Genome of Deinococcota Phylum from 72 Strains Across 40 Species Consist of Only One Gene, Beta Subunit of DNA-Directed RNA Polymerase. Medicon Microbiology 3(1): 03-06.

Deinococcota, or Deinococcus-Thermus, is a phylum of highly environmentally tolerant extremophiles with value in industrial applications and evolution studies. Phylogenetics using core genome is an important aspect of evolutionary studies. However, the core genome of Deinococcota phylum has not yet been identified. In this study, we report 6 species-specific core genomes of Deinococcota. However, the core genome of Deinococcota from 72 strains across 40 species consist of only one gene – beta subunit of DNA-directed RNA polymerase. This surprisingly small core genome may be the result of tolerance to diverse environments and may suggest that sequence similarity alone may not be sufficient enough to identify core genomes.

[69] Seow, SK, Dave, VS, Ong, RT, Lao, S, Ling, MHT. 2024. A 10-Year Systematic Review (2013 to 2022) on Effects of Diet on Migraine. EC Clinical and Medical Case Reports 7(2): 01-15.

Migraine, a type of headache characterized by moderate or severe throbbing pain on one side of the head, has sparked growing interest in relation to the role of diet. Although extensive research had been conducted throughout the years, few reviews have been done. Here, a systematic review is conducted to determine the diet, food, and dietary pattern that worsen or reduce migraine, as well as the mechanisms behind it. Using articles indexed in PubMed within the last 10 years, from 2013 to 2022, 190 articles were identified. Of which, 45 articles were included in this review. After analysis, two distinct themes emerged, namely (a) diet, food, and dietary pattern that worsen migraine, and (b) diet, food, and dietary pattern that reduce migraine. The current body of literature shows that diet assumes a critical role and exerts a notable influence in migraine. Diet that worsens migraine include pro-inflammatory diet, high-sodium diet, and high-fat diet. Food that worsens migraine include meat, milk, and dairy products, alcoholic beverages, and chocolate. Low meal frequency may also worsen migraine. On the other hand, diet that reduces migraine include ketogenic diet, Mediterranean diet, DASH diet, and MIND diet. Food that reduces migraine include fruits and vegetables, as well as cold-water fatty fish. High meal frequency may also reduce migraine. Interestingly, caffeinated beverages may worsen or reduce migraine, depending on consumption.

[70] Kwan, ZJ, Teo, W, Lum, AKY, Ng, SMH, Ling, MHT. 2024. Ab Initio Whole Cell Kinetic Model of Stutzerimonas balearica DSM 6083 (pbmKZJ23). Acta Scientific Microbiology 7(2): 28-31.

Stutzerimonas balearica (formerly, known as Pseudomonas balearica) is an environmentally tolerant bacterium with denitrification and bioremediation capabilities. Hence, it has been studied for industrial applications; such as, high-value chemical production using metabolic engineering or synthetic biology approaches. Mathematical modelling has the potential to predict biological phenotypes under metabolic perturbations, which can be used to guide engineering approaches. However, there is no mathematical model of S. balearica to-date. In this study, we present a whole cell simulatable kinetic model of S. balearica DSM 6083, pbmKZJ23, constructed using ab initio approach by identifying enzymes from its published genome. The resulting model consists of 737 metabolites, 533 enzymes, and 802 reactions; which can be a baseline model for incorporating other cellular and growth processes, or as a system to examine cellular resource allocations necessary for engineering.