Week 08 (W2 Jan11) London RE - Rostlab/DM_CS_WS_2016-17 GitHub Wiki
Summary:
Over these weeks, we have developed some more vital descriptive features for our dataset and are now looking into developing predictive features using all of the available data attributes. Of course, we will also make alterations to the existing descriptive attributes as/if needed as well as create new ones when required. Short summary of the work done over this time period is:
- Extracted geocoded values and other vital information for all public transit stations in London.
- For each listing address, calculated the distance to the nearest public transport point.
- Took last feedback into account and revised the Area measurement attribute(s).
Details of work done:
Public Transit data and distance of the nearest station from each listing:
Using OpenStreetMap, extracted vital information about the metro public transit system. Just like Munich, London is divided into zones (in the form of rings) starting from the middle of the city and then spreading outwards. Most of our listings lie between the 1st and 2.5th zone, approximately 80% to be more accurate.
Location of the Public Transit points:
Location of the property listings:
The majority of the listings have a public transit point within a 1-2 kilometer radius. A conclusive correlation between the price and just the distance is difficult to make because of this nature of data.
Histogram of Nearest Public Station:
As one can see, most listings are within 1-2 kilometers to public transit.
Rework on the property area measurement:
Incorporated feedback received regarding this on the last wiki. However, predictions about area measurements for those that don't have it mentioned failed, because an appropriate model cold not be created due to insufficient training data since only about 35% of the listings have area mentioned for them.
Predictive Tasks - Future Work
Use of current values as training data
We are planning to use our current data as training data.
More data-mining
We are going to grab more recent entries from the Zoopla api and we will begin to use that data for our tests. The information gathered from the training data will be able to somewhat accurately predict information for the test data.
Learning
We are still exploring techniques and ways to accomplish this. As we have just gotten back from Holidays, we are planning on meeting this week to discuss our tasks. Please stay tuned.