<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml"><head/><body><content id="9781788393485/0ac1eacf_3e00_4944_b261_0f403e63f0d5_xhtml" format="xhtml" nodetypeset="all" previewonerror="true"><section>
<header>
<h1 class="epub__header-title"><a name="Transforming Your"></a>Transforming Your Data</h1>
</header>
<article>
<p><a name="can be"></a>Real-world datasets are very varied: variables can be textual, numerical, or categorical, and observations can be missing, false, or wrong (outliers). To perform a proper data analysis, we will understand how to correctly parse data, clean it, and create an output matrix optimally built for machine learning analysis. To extract knowledge, it is essential that the reader is able to create an observation matrix using different techniques of data analysis and cleaning.</p>
<p><a name="extract features"></a>In this chapter, we'll present Cloud Dataprep, a service useful to preprocess the data, extract features, and clean up the records. We'll also cover Cloud Dataflow, a service to implement streaming and batch processing. We'll go into some practical details with real-life examples. We'll start from discovering different ways to transform data and the degree of cleaning data. We will analyze the techniques available for preparing the most suitable data for analysis and modeling, which includes imputation of missing data, detecting and eliminating outliers, and adding derived variables. Then we will learn how to normalize the data, in which data units are eliminated, allowing us to easily compare data from different locations.</p>
<p><a name="this chapter"></a>In this chapter, we will be covering the following topics:</p>
<ul>
<li>Different ways to transform data</li>
<li>How to organize data</li>
<li>Dealing with missing data</li>
<li>Detecting outliers</li>
<li>Data normalization</li>
</ul>
<p><a name="able to"></a>At the end of the chapter, we will be able to perform data preparation so that its information content is best exposed to the regression tools. We'll learn how to apply transforming methods to our own data and how these techniques work. We'll discover how to clean the data, identify missing data, and work with outliers and missing entries. We'll also learn how to use normalization techniques to compare data from different locations.</p>
</article>
</section></content></body></html>