Feature Engineering - sagr4019/ResearchProject GitHub Wiki

What is Feature Engineering?

Feature engineering is a key part to mashine learning algorithms. Its a process of transforming the raw data into features that represent the underlying problem. They act as inputs for mashine learning models. They are dependent on the problem you focus (e.g a Features for image recognation are completely different from network traffic recognation). Features will influence your results. "The quality and quantity of the features will have great influence on whether the model is good or not." Feature engineering is difficult and time-consuming. A way to reduce the time-consuming you can use Feature Extraction, which is decribed further.


Feature relevance

Depending on a feature it could be strongly relevant (has information that doesn't exist in any other feature), relevant, weakly relevant (some information that other features include) or irrelevant. It is important to create a lot of features. Even if some of them are irrelevant, you can't afford missing the rest. Afterwards, feature selection can be used in order to prevent overfitting.

Better features means flexibility.

You can choose “the wrong models” (less than optimal) and still get good results. Most models can pick up on good structure in data. The flexibility of good features will allow you to use less complex models that are faster to run, easier to understand and easier to maintain. This is very desirable.

Better features means simpler models.

With well engineered features, you can choose “the wrong parameters” (less than optimal) and still get good results, for much the same reasons. You do not need to work as hard to pick the right models and the most optimized parameters. With good features, you are closer to the underlying problem and a representation of all the data you have available and could use to best characterize that underlying problem.

What are Features?

A feature is typically a specific representation on top of raw data, which is an individual, measurable attribute, typically depicted by a column in a dataset. Considering a generic two-dimensional dataset, each observation is depicted by a row and each feature by a column, which will have a specific value for an observation.

src:https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b


Sub Forms of Deature Engineering

Feature Extraction:

Is the automatic construction of new features from raw data Some observations are far too voluminous in their raw state to be modeled by predictive modeling algorithms directly. for more information see FeatureExtractionUsingConvolution

Common examples include image, audio, and textual data, but could just as easily include tabular data with millions of attributes.

Feature extraction is a process of automatically reducing the dimensionality of these types of observations into a much smaller set that can be modelled.

For tabular data, this might include projection methods like Principal Component Analysis and unsupervised clustering methods. For image data, this might include line or edge detection. Depending on the domain, image, video and audio observations lend themselves to many of the same types of DSP methods.

Key to feature extraction is that the methods are automatic (although may need to be designed and constructed from simpler methods) and solve the problem of unmanageably high dimensional data, most typically used for analog observations stored in digital formats.


references

Discover Feature Engineering, How to Engineer Features and How to Get Good at It

Understanding Feature Engineering

The Art of Feature Engineering