Many users (myself included) have noticed a recurring issue with the Belfast bus system—timing discrepancies and "ghost buses" that don’t show up. Buses often arrive either too early, too late, or not at all. This issue has been ongoing since late December 2023, with over 10,000 metro services cancelled or missing in 12 months, causing frustration among commuters.
The Minister for Transport has even commented that "ghost buses" are "simply not acceptable."
The Data Problem
The crux of the issue is simple: static timetables don't align with reality. According to reports, 2,500 complaints have been made to Translink, with passengers frustrated by buses that “disappear” from digital displays, or app schedules that don't reflect cancellations at all.
These discrepancies lead to wasted time, missed connections, and decreased trust in public transport. The solution? A real-time arrival prediction model that uses historical data and user-reported events, rather than relying solely on static schedules.
Enter Weighted Averages
In statistics, weighted averages allow us to make certain data points more important than others. This is particularly useful in cases like the Belfast bus system, where the most recent data is likely the most accurate predictor of future events.
A weighted average works by assigning a weight (or "confidence score") to each data point, and adjusting predictions based on those weights. In this case, the more recent bus arrival times are given higher weights since they reflect the latest, most relevant information.
How It Works in Practice
For instance, imagine we have the following bus arrival times:
08:02
08:05
08:07
08:10
08:12
We assign weights like this (newer data gets higher weight):
08:12 = 5
08:10 = 4
08:07 = 3
08:05 = 2
08:02 = 1
Now, we convert these times into minutes since midnight:
08:12 = 492 minutes
08:10 = 490 minutes
08:07 = 487 minutes
08:05 = 485 minutes
08:02 = 482 minutes
Next, we multiply each time by its corresponding weight:
08:12 (492m) × 5 = 2460
08:10 (490m) × 4 = 1960
08:07 (487m) × 3 = 1461
08:05 (485m) × 2 = 970
08:02 (482m) × 1 = 482
Sum them up:
2460 + 1960 + 1461 + 970 + 482 = 7333
Then, we sum the weights:
5 + 4 + 3 + 2 + 1 = 15
Finally, we calculate the weighted average by dividing the total sum by the total weight:
7333 ÷ 15 = 488.87 minutes past midnight, which is 08:08:52 (rounded to 08:09).
This weighted average gives us a more accurate prediction of the bus arrival time, based on real-time performance rather than relying on outdated, static schedules.
How This Solves the Ghost Bus Problem
Here’s where the magic happens: by using weighted averages to calculate real-time predictions, we move away from static timetables and instead create dynamic predictions that react to the real-time performance of buses.
Every time a bus arrives, that data point gets added to the system, and the system recalculates the expected time for the next bus. The more recent arrivals have a higher weight, meaning the predicted arrival time always reflects the latest performance.
For passengers, this means no more waiting around in the rain, unsure when (or if) their bus will arrive. Instead, they get a live ETA based on actual performance, not just scheduled times.
Why This Matters for Developers
The concept of weighted averages is straightforward, but its real-world application in transport systems shows the power of data-driven predictions. In a public transit scenario like this, relying on historical performance (rather than static timetables) makes a huge difference in delivering a better user experience.
The model could be further enhanced with machine learning to predict delays, cancellations, and bus frequency, factoring in variables like traffic patterns, weather conditions, or driver availability.
For developers working on similar projects, this kind of dynamic prediction model has wide-ranging applications—any system that relies on real-time data (e.g., ride-sharing, delivery systems, public transit apps) can benefit from this approach.
TL;DR
To fix the ghost bus problem in Belfast, we use weighted averages to predict bus arrival times dynamically. By giving more weight to recent bus data, we move away from static timetables and create more accurate, real-time predictions, ultimately improving the passenger experience.
Closing Thoughts
This solution illustrates a simple but effective way of using historical performance data to improve real-time predictions. It also opens up opportunities to build better, smarter systems that adapt to the real world, helping developers address similar challenges across various industries.
Check the project out here
Top comments (0)