Inspiration

Commuting is a Victorian invention, spurred on by the coming of the railways. Now, in the post-COVID world, with many workers only spending a few days in the office commuting longer distances may become more common. The problem with this notion lies in finding data about house prices (in this case rents) and commuting costs. A systematic search manually would take an inordinate amount of time. So, I thought, could this be put into a nifty program? Personally, the challenge I posed myself was well suited as I wanted to gain experience with APIs, geospatial data visualisation and webscraping having never used any of these tools before.

What it does

The program creates an HTML map showing the nearest 20 candidate stations to an office in the centre of Bristol. The closest station, Bristol Temple Meads, is the location to which one might commute (i.e. it is not a candidate as you wouldn't catch the train to work, you'd walk) . Each candidate station has a circle overlay centred upon it. The circle has a radium of 1 mile which denotes the extent of the property search. The colour of the circle denotes the combined cost of commuting and renting there (lighter red is more expensive, note: this is incorrect in the readme file). The text pop up (accessible by clicking on the circle) gives the station name and a price breakdown.

How we built it

Firstly, the program identifies the nearest n+1 stations to a provided office location (n=20 in the current version) using a CSV data file from the Office of Rail and Road. Then, using the nearest station as the destination station, the fares from the remaining n stations are calculated via an API call to the BR fares API (https://www.brfares.com/api/). The median rental price (for a 2-bed property) in the 1 mile area surrounding the stations are found via a webscraping of Rightmove.com. The output map is exported to HTML and accessible in a browser.

Challenges we ran into

Fares - The BR Fares API has a usage limit of 100 requests/day. This limit was reached during the hackathon but wasn't too much of an issue - see below for the solution. Rightmove - In an attempt to deter web scraping, the Rightmove location variable uses a proprietary code. These codes can be scraped via a more sophisticated process (Selenium etc.) however I was unable to implement this and at this stage part of the scraping process is not automatic. The location codes have had to be hard coded in a CSV file.

Accomplishments that we're proud of

I'm proud of the implementation of the fares and station candidates calculator. The code is very robust, with the candidates calculator returning any n candidates for any location across Britain and the fares code stores fare data locally to minimise API calls making to 100 requests/day requirement manageable for.

What we learned

I hadn't worked seriously before with APIs, geospatial data or webscraping. I feel I've learnt a lot about APIs, particularly with regards making efficient use of API calls, and have learnt a good amount about geospatial data and webscraping. I certainly wouldn't feel like a beginner approaching projects in future which use these three tools.

What's next for Commuting cost calculator

The next step is to tale the features so far and making them all automatic, i.e. fully implementing the Rightmove scraper. Having done this there are a few avenues for future work: 1) I'd like to allow users to change the variables within the map (e.g. office location, number of candidates to consider, housing radius, percentile of rent results to consider) 2) I'd like to implement a zone where stations would not be considered, e.g. because they are already within walking distance of the office, but where the price would be considered stand alone. 3) Link new transport data sources, e.g. Transport for London/Greater Manchester, currently the tool doesn't consider metro/tram/underground options which makes it unrealistic for cities with light rail (this is why Bristol is selected - it has notoriously limited public transport options). 4) Add historic housing data so results aren't dependent on what's available right now.

Built With

Share this project:

Updates