Asher Falcon

live public transport routing

We've all used a public transit router to find a quick route from A to B, such as Google Maps or CityMapper, but how do they actually work? In this post I will discuss how I tried to build my own public transport routing system for London that supports buses, tubes and trains.

To limit the scope of the project, I decided to only support buses, tubes and trains in London and surrounding areas. This still, however leaves a lot of data to process, and the main goal of the project is to use live arrivals data so we know the actual time the next bus, tube or train will arrive, which needs to be updated frequently. To get an idea of the scope of these modes in London, the following table summarizes the details of each:

Anyone that's studied Graph Theory would look at a problem like this and quickly recognise that each stop/station could be a node and each journey could be an edge, and to get the fastest routes we could update the edges in real time and use Dijkstra's algorithm to find the shortest path. This would work fine in a small network, but when you have lots of edges it can take a while, and it won't optimise for the amount of transfers you take, for example it might prefer for you to take three or four buses instead of one train even if it's only a few minutes quicker, where it is obviously wildly less convenient.

The solution I decided to use is an algorithm called RAPTOR, which was introduced by Microsoft Research in 2012. RAPTOR optimises for the amount of transfers you take whilst minimising the time taken for the journey by running in rounds, where round $k$ searches for routes with $k$ transfers. It works similarly to Djikstra with how it stores the shortest time to arrive at each node, but it takes advantage of the fact that you can group the edges into the same vehicle or trip and traverse all the way to the end. You initiate the algorithm by setting the arrival time for each stop to $\infty$ , except the start node which you set to zero, or in my implementation I chose to use unix timestamps, so I set it to the current time in milliseconds. The algorithm will then repeat the following process for each round:

This process repeats for each round, so if you are going from A to B a direct route would be found in the first round, and then a route that might require a walk before a bus or two buses where you transfer at the exact same stop would come up in the second round. I'll now move on to one of the most difficult parts of this project, which is getting the live data. I'll start with the rail data, as it was by far the easiest to get. I got the data from the Rail Data Marketplace, which sounds terribly official but it's basically just a developer portal for national rail. I believe various companies sell custom data on there, but you can get all the data you need for this project for free. Creating an account was straightforward, and once I was in I found the endpoint I needed, the Live Arrival and Departure Boards. You just supply a CRS code, which is a unique identifier for each train station, and you get a response containing the scheduled and estimated times for each train. Before I could make use of this however, I needed to know the CRS codes for all the stations and to get their coordinates which would be necessary for calculating walking times further down the line. The rail data site had an easy to use dataset for this which I hacked together a python script to parse each station and store it in a local sqlite database. Below you can see an interactive sample of the data we get in response when asking for the arrivals and departures for London Victoria:

{9 ItemstrainServices: []10 Items
Xmlns: {}1 Items
generatedAt: "2025-10-26T18:20:36.2627696+00:00"
locationName: "London Victoria"
crs: "VIC"
filterType: "to"
nrccMessages: []1 Items
platformAvailable: true
areServicesAvailable: true
}

After playing around with a few requests, I realised that the serviceID field for each train service has two parts, for example the first train service shown above is 1971893VICTRIC_, where the first part (the number) uniquely identifies the specific train. Using this, we can form together our unique trips to be used in the RAPTOR algorithm, and add in the time it will arrive at each stop. I couldn't find an endpoint to get this for every station, so for now the algorithm sends out requests for a few hundred stations to make sure it doesn't miss any trains, and then pieces them together.

The next part was getting the bus data. If you remember from the table before, there are thousands of buses running at a time, and the scale of the data was made apparent when I hit the TFL bus arrivals endpoint and it took over 15 seconds to load. After parsing it, I found it had over 108,000 arrival times, where each arrival time is a object with a bus stop id and a bus route, see an example of one of these objects below:

[1 Items{21 Itemsid: "1652474360"
operationType: 2
vehicleId: "LJ67DKY"
naptanId: "490014149S"
stationName: "Walpole Road"
lineId: "219"
lineName: "219"
platformName: "M"
direction: "inbound"
bearing: "210"
tripId: "552779"
baseVersion: "20251017"
destinationNaptanId: ""
destinationName: "Wimbledon"
timestamp: "2025-10-26T18:57:03.618Z"
currentLocation: ""
towards: "South Wimbledon"
expectedArrival: "2025-10-26T18:56:49Z"
timeToLive: "1970-01-01T00:00:00Z"
modeName: "bus"
timing: {}7 Items
}
]

Like with the trains, we can piece together unique trips for each route using the vehicleId to group together the arrival times for the same bus, and add in the time it will arrive at each stop. The process for the London Underground was extremely similar, however you have to be careful because the vehicleId for the underground isn't unique, as each line could have multiple trains with the same vehicleId, e.g. a train on the Victoria line could have the same vehicleId as a train on the District line. To get around this, I added in the line name to the vehicleId, so now each train has a unique vehicleId like circle/234. Now I have all the trips for each mode, but something's missing - walking. Walking is crucial for two main reasons: firstly, it massively increases the number of route combinations you can take. For example, you might take a bus to a stop that's a 5-minute walk from a tube station, which could save you 20 minutes compared to staying on the bus. Secondly, it's essential for connecting stops that are technically different but practically the same - like two bus stops with identical names on opposite sides of the street, which have completely different stop IDs in the TfL system. Without walking transfers, the algorithm wouldn't even know these stops are related.

At first I considered using some public API to compute the walking times, but after doing the math I realised I could need more than a million requests, and even using servers that support batch requests could end up taking ages, but I came across an open source project called OSRM which is a routing engine for walking, driving and cycling. I downloaded the OpenStreetMap data for England because it was the smallest available dataset that covered London, and set it up on my machine. It took around 20 minutes to load all the data and then it was ready - I could send two coordinates to the endpoint and it would immediately return the walking time between them. Between trains, tubes and buses there are over 33,500 unique stops, so for each one to limit the number of requests I only queried the walking times for stops within a few kilometers. Surprisingly, it was able to calculate over 1.8M walking distances in less than 15 minutes, and the end result was a json file that was 55MB! I'm sure I could have optimised the storage format but it wasn't large enough to be a serious issue.

To account for walking between stops, I just considered the walk from one stop to another to be a transfer and ran RAPTOR like normal. Hacking together a quick backend to allow searching for stops to pick as the start and end points and using the raptor algorithm to find the fastest route, I was able to get a working prototype. A simple frontend was then added to allow searching for routes and displaying the results. One thing I was unable to do with this project was show on the frontend the route trace for trains and tubes. For buses I could get a geoJson list of coordinates of the route the bus took and then cut it down for the stops you board and get off at, but for trains and tubes I struggled to find a reliable way of getting the route it took across the rail network, and just ended up making it draw straight lines.

I've chosen not to deploy this project publicly as I found the number of requests it has to make to stay up to date and accurate is too high for the utility it would provide, however I have recorded (and sped up) some footage of it in action below:

The code for this project is available on a git repo here, and as always thanks for reading, and feel free to reach out to me here if you have any questions or comments.