Literature Review - CankayaUniversity/ceng-407-408-2020-2021-Monitoring-System-of-Water-Quality-and-Efficiency-of-Wastewater-Treatment GitHub Wiki
Abdulkerim GÜVEN, Alp ÖZEREN, M. Kayhan ARICAN, Oğuzhan SALTIK
{c1711033, c1711051, c1611004, c1611048}@student.cankaya.edu.tr
Department of Computer Engineering, Çankaya University
Nov 5, 2020, version 1.0
Table of Contents
- Table of Contents
- Abstract
- Introduction
- Related Work
- Proposed System
- REFERENCES
Abstract
In this project, we aim to design a web-based monitoring system for water quality and efficiency to be used for decision-making involving wastewater treatment plants. The current and present water quality data will be visualized by our web-based system. With our project, the quality of water will be predictable using machine learning algorithms. There are two parts to this project. Analyzing the water quality data for rivers, lakes, seas all around Turkey and analyzing the data for water treatment plants. The data for water treatment plants include samples taken from both the inlets and outlets of these plants.
Introduction
Water is the most important source of life. Water covers %71 of the earth and only %3 of water is fresh. Humankind always settled down near freshwater sources. Water has an enormous effect on human life throughout history. Even in ancient times, people found ways to purify water or to keep it clean [1]. Considering people drink about 2-3 liters of water per day, they need to be sure that the water is clean and drinkable. Water is the home for the microorganism if there are no toxic chemicals in it. Although most microorganisms are harmless, there can be viruses or bacteria that can cause health damage [2]. Also, there are toxic inorganic matters that can not be tolerated.
The effect of climate change and increasing demand for water by rapidly increasing population, industrialization, agricultural and other sectors is putting serious pressure on quality and quantity of water resources. For those reasons, managing and monitoring water, and detecting potential dangers before they affect water is highly important for protection and cleansing of water resources.
Therefore, The Ministry of Environment and Urban Planning is monitoring physical, chemical and biological parameters of important rivers, lakes, drainage channels and seas inside the Special Environmental Protection Area (SEPA).
There are 19 wastewater plants which analyze samples taken from 254 SEPA spots. The data gathered are stored in a database. However, data access, processing or visualizing is not done by a front-end software. Hence, often changing needs of institutions are not satisfied in respect of what data needs to be collected and how to report the findings.
Managers and analysts need operational tools that help understanding the complex information about quality of water. Tools based on statistical approaches are often unable to conduct a detailed analysis due to sparse data and the invisible interactions of analysis results. In this project, samples taken in this area with cooperation of public institutes since 2005, this big and complex data will be visualized. This module will include basic data management functions such as time series management, spatial selection and representation, data availability assessment and data series comparison (visual and statistical).
Improved water quality prediction, accuracy and reduced computational complexity are vital for precise control over water quality. For this purpose, our aim is to develop a machine learning model that effectively predicts water quality and establishes an early warning system for water pollution. In this study, it will be possible to train a model with high prediction sensitivity by using artificial neural networks (such as WNN, BPNN) algorithms on the collected data. Also, the parameters of each treatment plant and the performance of artificial intelligence techniques (e.g. transfer learning) will be evaluated in order to create a water quality prediction model for that plant and plants with similar parameters.
Related Work
Machine Learning Methods for Better Water Quality Prediction
In this research paper [3], the dataset was created with 4 monitoring states on Johor River, a river at Johor State in Malaysia. A comparison is made between the following machine learning algorithms: WDT-ANFIS, ANFIS, RBF-ANN, and MLP-ANN. Due to the presence of noise in the data, it is relatively difficult to make an accurate prediction. Hence, a Neuro-Fuzzy Inference System based augmented wavelet de-noising technique has been recommended that depends on historical data of the water quality parameter.
Dataset and Data Processing
Selecting the input variables for a model is very important for Artificial Neural Networks. The following water quality parameters were chosen for ANN modelling: temperature, electrical conductivity, salinity, nitrate(NO3), turbidity, phosphate(PO4), chloride(Cl), potassium(K), sodium(Na), magnesium(Mg), iron(Fe) and Escherichia coli(E-coli). These input parameters were used in many previous studies for ANN models [4,5,6]. Using these parameters the prediction of pH, suspended solids (SS), and ammoniacal nitrogen (AN) is made possible.
Model Performance
There are in total 3 models for three primary water quality parameters: AN, SS, and pH. The performance of the models is measured with the Coefficient of Efficiency (CE). Mean Square Error (MSE) is used to see the level of fitness between the network output and the desired output. Performance is better with smaller MSE values. Coefficient of Correlation (CC) is employed to inspect the linear relationship between the measured and predicted dissolved oxygen in the water. Using this methodology, the WDT-ANFIS models outperformed others.
Study of Short-Term Water Quality Prediction Model Based on Wavelet Neural Network
This research paper [7] combines the wavelet transform with a Back Propagation (BP) neural network to build a short- term water prediction model. The trained model is used to predict the water quality on freshwater pearl breeding ponds in Duchang County, Jiangxi province, China. Also, a comparison has been made between Elman Neural Network, Wavelet Neural Network (WNN), and a BP network. The proposed model also features a high learning speed and improved accuracy.
Dataset and Data Processing
The dataset consists of measurements taken from Jishan Lake in Duchang Country, Jiangxi province, China. The research took the ecological environment monitoring data of the mussel aquaculture pond as research samples; each sample includes solar radiation, water temperature, dissolved oxygen, pH, humidity, and wind speed. The sampling period was from July 21 to July 27, 2010. Data were collected every 60 minutes for a total of 168 samples. 144 of the samples were used as training sets and 24 of them were used as test sets. Data was normalized because the dimensions of the sample data were different. This normalization step reduced the influence of the prediction performance. As for model input, the first half-hour of dissolved oxygen, PH, temperature, humidity, wind speed, and solar radiation were used. The subsequent dissolved oxygen predictive values were used as outputs. After properly training the model, the prediction of dissolved oxygen in freshwater pearl aquaculture ponds as possible.
Model Performance
Model performance was measured by Absolute Percentage Error (APE) and Mean Absolute Percentage Error(MAPE). The Wavelet Neural Network (WNN) outperformed BPNN and Elman NN by significantly lower APE. The model accuracy was greater than 90%. As shown in Figure 1, WNN also has higher prediction precision, stronger learning, and generalization ability compared to BPNN and Elman NN. [7]
Figure 1: WNN compared to BPNN, Elman NN, and Actual Data. (x axis: time in minutes and seconds, y axis: Predicted dissolved oxygen mg/L)
Prediction of water quality time series data based on least squares support vector machine
This paper [8] proposes using least squares support vector machine (LS-SVM) algorithm to construct a non-linear time series forecasting model for predicting water quality.
Dataset and Data Processing
They use the small number of samples (actual number not shared) provided by the Beijing Water Authority and after normalizing data points, a variance of 0.01 as random white noise was introduced to training samples.
Model Performance
A comparative study of prediction is performed using the LS-SVM algorithm, Backpropagation (BP), and Radial basis function (RBF) network methods. Predicted values from three models and the true models are compared by examining the percent of deviation. Since the LS-SVM model has the lowest average deviation from the true value, it is considered the best of three. The paper argues that the LS-SVM model can take full advantage of the distribution of the training samples and has a better ability to process small samples. It is concluded that LS-SVM has a lower root mean square error and mean relative error than other methods and has high prediction accuracy, and applicable to real-time water quality data with a small sample.
Proposed System
The Wastewater Monitoring project will consist of two modules. The first module will be responsible for monitoring the water quality by providing visualizations of physical, chemical and biological factors taken from the dataset. The second module will be responsible for predicting the water quality using machine learning, based on the current data.
In the following sections, the properties of the dataset are given and the two planned modules are explained.
Dataset
The dataset contains the data gathered from testing important rivers, lakes, drainage channels, and marine areas in 245 Special Environmental Protection Areas in Turkey in terms of physical, inorganic - chemical, and organic parameters since 2005.
In the dataset, we have samples from rivers, seas, lakes, and water treatment plants. Samples taken from water treatment plants started in 2011. The number of samples taken from all of the water sources is increased each year. In 2005, 15 samples were taken from rivers and in 2006 the number of samples was 32. All the samples have columns SAMPLE_NAME, REGION_NAME, LOCATION, X_UTM, Y_UTM, DATE, TEMPERATURE_C regarding where and when the sample is taken. Also, the dataset contains the following sample values; pH, SALINITY_THOUSANDTH, DISSOLVED_OXYGEN_MGL, DISSOLVED_OXYGEN_PERCENT, for all the samples taken. For water treatments plants we also have ELECTRICAL_CONDUCTIVITY, BOD_MG (biochemical oxygen demand), COD_MG (chemical oxygen demand), TOTAL_SUSPENDED_SOLID_MGL, TOTAL_NITROGEN_MGL, TOTAL_PHOSPHORUS_MGL, TOTAL_COLIFORM_CFU_100ML, FECAL_COLIFORM_CFU_100ML and FLOW_RATE. However, the dataset has some null values in alternating years and sometimes there are extra sample parameters for the same sample location.
The dataset is provided in Microsoft Access 2003 (.mdb) format. To be able to visualize the data and train Machine Learning algorithms, we need to access the data inside these ".mdb" files. Since this file format is proprietary and has no public specification, there are only a couple of methods to programmatically obtain the data inside. The first method is to write a Microsoft Access extension using “Visual Basic for Applications” for the users of the software and ask them to use this extension to export the data inside to the XML format that then we can easily read. The second method is to ask the users to install the Access Database Driver that Microsoft provides and then use this driver to access the file contents as if the file itself was a database.
Water Quality Monitoring System
The governmental agency, The Ministry of Environment and Urban Planning, needs a reporting system that visualizes the observations from important rivers, lakes, and marine special environmental protection areas (SEPA) in Turkey.
The reports are then used for decision making. In the past, the reports and visualizations in these reports were prepared manually. Our goal is to develop a web-based reporting system for SEPA that will automatically read the observation data set and present test results to decision-makers. This system will have detailed filtering features and will be able to perform data visualization. The data collected from the field will be used to produce visualizations for statistical modeling. Data visualization is helpful to decision-makers in identifying features that are not easily noticed by statistical models or humans, such as detection of outlier values of parameters, missing values. Visualizations enable doing correlation analysis, determination of the relationship between dependent variables [9].
In the reports prepared for the years 2005-2009 provided to us by the agency, the tables were often plotted with a bar chart. The graphics were created from samples taken from the inlets and outlets of wastewater treatment plants, as well as from different points of seas, lakes, and rivers. The charts parameters vary by year and region, but generally depend on the parameters of pH, Temperature (0C), Light Transmittance (m), Dissolved Oxygen (mg / L), O2 (%), Ammonia (mg / L), Total Phenol (mg / L), Total Coliform (CFU / 100mL), Fecal Coliform (CFU / 100mL), Fecal Streptococcus(CFU / 100mL), Oil-Grease (mg / L), Color (Pt-Co), Fragrance (TON).
There are some studies on water quality visualization. One of them used old-fashioned interactive maps and various types of plotting [10]. Besides, tables are used, which contain somewhat similar to the parameters used to determine water quality in the reports provided to us by the agency, sorted by years with parameter values of total phosphorus, total nitrogen, electrical conductivity, pH, dissolved oxygen.
A modern user-friendly interface, more effective and easy-to-understand graphics will be produced by considering the types of graphics and parameters used previously. Also, the locations where the test sample was taken will be displayed on the interactive maps.
Tools and Frameworks
There are multiple promising options for generating charts and creating user interfaces. The tools that are being considered include (but not limited to): Electron, React.js, Chart.js, ASP.NET, Flask, Qt and more. We are weighing all the options and will be considering the input of the agency since the end product will run on their environment.
Water Quality Prediction
The second goal of the project is to predict the future properties of a water sample. These predicted properties can then be used to predict the water quality and inform the water treatment plant.
For the problem of forecasting how to treat the water, since the quality of water can be affected by various parameters and such parameters show a complex non-linear relationship with each other and water quality, traditional techniques for data processing are no longer efficient enough.
Tools and Frameworks
For this project, we decided to use the Python programming language for implementing machine learning algorithms. The machine learning models will be trained locally. As for machine learning framework, we decided to use Tensorflow because, after training, the model can be easily used in Tensorflow.js, therefore making it very easy to run in a browser.
REFERENCES
- [1] APEC. 'The History of Clean Drinking Water', 2018. [Online]. Available: https://www.freedrinkingwater.com/resource-history-of-clean-drinking-water.htm [Accessed: 2020/11/01]
- [2] Minnesota Department of Health, 'Bacteria, Viruses, and Parasites in Drinking Water', 2019. [Online PDF]. Available: https://www.health.state.mn.us/communities/environment/water/docs/contaminants/parasitesfactsht.pdf [Accessed: 2020/11/01]
- [3] A. N. Ahmed, F. B. Othman, H. A. Afan, R. K. Ibrahim, C. M. Fai, M. S. Hossain, M. Ehteram, and A. Elshafie, “Machine learning methods for better water quality prediction,” Journal of Hydrology, vol. 578, p. 124084, Aug. 2019.
- [4] J.-T. Kuo, M.-H. Hsieh, W.-S. Lung, and N. She, “Using artificial neural networks for reservoir eutrophication prediction,” Ecological Modelling, vol. 200, no. 1-2, pp. 171–177, 2007. Retrieved from: https://www.sciencedirect.com/science/article/abs/pii/S0304380006002985?via%3Dihub
- [5] A. Zaqoot, A. K. Ansari, M. A. Unar, and S. H. Khan, “Prediction of dissolved oxygen in the Mediterranean Sea along Gaza, Palestine – an artificial neural network approach,” Water Science and Technology, vol. 60, no. 12, pp. 3051–3059, 2009. Retrieved from: https://iwaponline.com/wst/article-abstract/60/12/3051/13774/Prediction-of-dissolved-oxygen-in-the?redirectedFrom=fulltext
- [6] Sengorur, B , Dogan, E , Koklu, R , Samandar, A . "Dissolved Oxygen Estimation using Artificial Neural Network for Water Quality Control", Electronic Letters on Science and Engineering 1 pp. 13-16, 2005. Retrieved from: https://dergipark.org.tr/en/pub/else/issue/29326/313793
- [7] L. Xu and S. Liu, “Study of short-term water quality prediction model based on wavelet neural network,” Mathematical and Computer Modelling, 22-Dec-2012. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0895717712003676. [Accessed: 01-Nov-2020].
- [8] Tan, G., Yan, J., Gao, C. and Yang, S. “Prediction of water quality time series data based on least squares support vector machine”, Procedia Engineering, 31, pp.1194-1199. 2012.
- [9] Unwin, A. (2020). Why is Data Visualization Important? What is Important in Data Visualization? Harvard Data Science Review, 2(1). Retrieved from: https://doi.org/10.1162/99608f92.8ae4d525
- [10] Ramsay, Ian & Shen, S. & Tennakoon, S.. (2009). Water Quality Visualisation and Tracking - Generic Decision Support Tool. Retrieved from: https://www.researchgate.net/publication/237627349_Water_Quality_Visualisation_and_Tracking_-_Generic_Decision_Support_Tool