Have you ever experienced a situation where you waited almost half an hour for a bus, only to find that the bus is actually crowded when it arrives? After spending some time in Nanyang Technological University (NTU), we, a group of NTU students found this situation to be very frustrating -- we could have walked if we knew that the bus is going to be crowded! This made us wonder -- is there a way for us to tell whether a bus is crowded? Can this feature be integrated into the current NTU bus app? We could have installed an infrared sensor (like what they do at SBS buses), but we thought that we can make do with what we have now in the buses -- the bus camera footages.
What it does
We attempt to develop a method to predict how many people are currently in the bus using the bus camera footages, and then this information can be relayed to the app users. With this information, app users can tell whether a bus is crowded or not and thus decide whether or not it is worth the wait!
How we built it
Using the bus camera footages, we developed a computer vision system which would be able to recognise the objects in the footages, using yolo. The footages would be decomposed into frames (which are technically images), and we use these individual frames for object recognition. By recognising the object
Person in the frame, we would be able to count how many people are in the bus, and thus giving us insight about how crowded the bus is.
Challenges we ran into
It was our first time trying to develop a computer vision system, and the modules were foreign to us! Learning definitely took a big portion of the time spent into this project, and was a really satisfying process. We also faced issues with the suitability of the footages used, as the camera systems used in the buses are definitely not the best available out there. The low resolution of the footages posed a challenge to our recognition system and thus compelled us to explore how we can preprocess the images before being used for object recognition. We also attempted to optimise the algorithms available out there in order to yield better results. In addition, we also took video ourselves in order to train our model better.
Accomplishments that we're proud of
We're very proud to be able to have a working prototype of this object recognition model! We are able to pass in a video into the model, and by specifying the frame rate, the model is able to return us the number of people at each frame. This is heartening as this is an evidence that our idea is feasible -- we can predict the crowd using bus videos! We also made the model to work with live webcam as well, so it is very possible to do the object recognition on the fly.
What we learned
What did we learn? It's not easy to pinpoint a single thing as this hackathon was a learning process in itself. The ideation process was challenging as we had to ask ourselves what societal problem can we address using a technological solution. In the end, we decided on a problem close to our hearts -- NTU buses. We ventured into an unknown field, object recognition, and we had to self-learn how object recognition is usually done in this field. It was a valuable learning opportunity to learn how images and videos are processed in Python (i.e. opencv), as well as how we can use yolo weights and helping functions to recognise the objects in a given image.
There are so many exciting things that we can explore in this project, which we would love to explore if given more time. We believe there are still many ways in which we can optimize our algorithm to give better predictions, knowing that camera systems in buses do not have the best video resolutions. Or could we take an incremental approach from the bus door's perspective to calculate number of people coming in and out? We can also extend this project so that we do not only predict crowd in buses, but perhaps also at the bus stops, as this information would be able to us how many people would get on board a bus at a given stop. We also wanted to explore if we can develop a forecasting model in which we can predict how crowded it would be at a given day and time.