Predicting the house prices by Saria - UMKCNSF/UMKC--HACKATHON GitHub Wiki
Use case id: 16-UM-HousePrice
Title: Predicting the house prices
Presenter: Saria Goudarzvand
House Price Prediction
This use case challenges you to predict the final price of a house. There are 79 columns in the given dataset. One important step in any analysis is to decide which feature should receive the analysis. Feature engineering is one step which can aid this process. Feature engineering allows for you to add/update the crime rate feature. You are welcome to experiment with additional features to see how different combinations may change the result.
As you know, the crime rate of an area can change over time and increased crime will result in lower housing prices. Sadly, datasets are not available online that capture the impact of recent crime rates on housing prices. By querying the web, you can find information related to certain areas/neighborhoods. If the text is positive, the sale price should increase by a point. However, if it is negative, the point value should decrease. It is up to you to decide whether you want to
create a feature column as crime rate with positive/negative values, analyze the dataset, or change the price column in the dataset at some point.
Let’s have a look at the steps involved in the process:
-
Pre-process the data (e.g. delete Non-informative columns, Rows with Null values…)
-
Explore the data and engineer features to find the most positively correlated features to the target (Actual Price)
-
Split the dataset into training and test subsets
-
Build a regression model (you are free to apply any advanced model such as GradientBoostingRegressor)
-
Evaluate the performance of your model. As we probably added at least one feature you should evaluate your model with and without that feature. You can use accuracy or RMSE for evaluating your mode. (RMSE is a measurement distance for saying how much distance there is between the actual value and the predicted value)
-
Visualize results.
-
There will be additional credit if you improve the result by restricting the features which have less impact on the result(Regularization).
Detailed information about how to update crime feature column
You can use Google API to search a string or zip code to get the most relevant information. Once you get the result, you have several options to go through. The simplest approach would be to consider the first result and then analyze the text to see if it is positive or negative.
Dataset:
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
API usage:
API URL Explanations: https://developers.google.com/places/web-service/search
Good luck
Questions:
Please create an issue on Github with usecase-id or email to: [email protected]