Inspiration
This project draws inspiration from the impractical bus schedules posted on the official Texas A&M University (TAMU) bus routes website. It addresses the challenges of unpredictable traffic during rush hours, disruptions caused by events, weather conditions, and train delays. Factors that often contributed to use being late to our classes on campus.
What it does
The project's primary function is to periodically gather bus data (including but not limited to: longitude, latitude, type of bus, speed, last GPS update, passenger load, stop name, departure status, etc.) in JSON format and store it in CSV files. To facilitate data separation within the CSV, we employ the '|' character as a delimiter, as the JSON data itself contains commas.
How we built it
The project leverages the urllib3 library to simulate the communication behavior of the official TAMU bus website. Various functions within the code facilitate HTTP requests, encompassing both GET and POST methods, essential for tasks such as negotiating connections and querying data. Robust error handling, coupled with detailed logging, allows us to capture and report exceptions effectively. The code follows a modular structure, ensuring that the data collected from the website is systematically stored in CSV files. The script operates in a continuous loop, guaranteeing the timely updating of data.
Challenges we ran into
Although TAMU transportation services used to offer API documentation, they have since transitioned to a new asynchronous SignalR API, greatly increasing complexity. Since there is no official documentation for the bus routes, we had to reverse engineer our way to mimic the request behaviors seen in the network tab of developer tools. Additionally, the API does not support the latest SSL standards, so we had to use urllib3 to work around that issue. To add to our challenges, we encountered limitations in data availability due to a football game and the fact that less buses run on the weekends.
Accomplishments that we're proud of
We are proud to being able to reverse engineer the SignalR protocol and being able to reliably scrape the bus data with zero official documentation. Furthermore, we are proud to have mimicked professional design for the data collection script as it is fairly modular and robust. This design allows for a number of configuration options such as the list of route numbers and SSL context configuration, as well as gracefully handling exceptions which is an essential part for long-running data collection tasks.
What we learned
We learned the fundamentals of the SignalR protocol and provided hands-on experience in reverse engineering requests through developer tools. Additionally, we gained insight into ensuring secure communication when dealing with websites through SSL/TLS configuration.
What's next for Texas A&M Bus Routes Dataset
Our next step is to expand the dataset's capabilities. By combining weather and traffic data from sources like TomTom, we aim to develop AI tools that analyze trends and associations across these data categories. Ultimately, our goal is to create a reliable bus arrival time predictor, addressing the challenges posed by unpredictable traffic and external factors.
Log in or sign up for Devpost to join the conversation.