## Inspiration

One day sitting in the car stuck in traffic, I wondered if the congestion was the result of a faulty design of an inefficient network. Soon, what became my research question began to form: how do we create a system or mathematical model to automatically generate high-efficiency transportation networks given a simulated urban environment with dynamic population, the shape of the city, the length constraints of the network and the number of stations? This is an optimization algorithm that has led me to explore graph theory and the application of genetic algorithms.

## What it does

The system first takes in information about the number of stations desired and total length of network, as well as the population distribution (which is defined in terms as parameters to multiple Gaussian Distributions). The system first use the genetic algorithm to find the optimal locations of the predefined number of stations. Then, the system uses Prim's algorithm to connect all stations, and uses my algorithm to add more routes upon this tree such that the efficiency of a network is improved until the network length reaches the predefined quota. The output of the system will be a map of the transportation network surrounded within the contour of the city, as well as vectors containing the coordinates of these nodes and a matrix detailing how they are connected. The system will also display the average travel time for that network based on the population distribution.

## How I built it

All parts of the system was programmed in MATLAB, while the data analytical tasks were done with Microsoft Excel and a Geogebra.

To use the genetic algorithm to find the optimal distribution of stations, I must first create a measurement for the fitness of individuals. To this end, I first analyzed the features (e.g. number of edges in the network) of 31 topological structures confined in a 12x12 plane, with 21 stations in 72 units in total network length. Using Dijkstra's algorithm, I determined each structure's average travel time given a different types of population distributions, as simulated by the Gaussian Mixture Model (GMM). The numerical expressions of the aforementioned characteristics were statistically analyzed, and the feature with the highest correlation with the average travel times was used as the benchmark feature - that is, the fitness function of the genetic algorithm is defined by that feature. I found that the percentage of the average travel time spent on walking was the most strongly-correlating feature with the average travel time, and so the fitness value of every individual in the algorithm would be inversely proportional to the average walking distance from any point (randomly chosen by the GMM) to the nearest station. Thus, the genetic algorithm was programmed to generate a distribution of stations such that the time spent on walking is minimized for that population distribution.

## Challenges I ran into

Given a set of coordinates for the stations of the network, the challenge arrises when one must connect them together to achieve maximum time efficiency while not exceeding the user-inputed network length quota. Even when Prim's algorithm is used to create a Minimum Spanning Tree (MST), paths must be added to increase the efficiency of the MST. At first, the algorithm was designed to assess the resulting average travel time for each addable path between two stations that have not been connected yet in the MST. However, due to the magnitude of the number of stations a typical city can have, this process is very computationally expensive. The challenge then, is to create an algorithm that achieves the same job without assessing all remaining possible addable paths for every iteration of path addition.

To achieve this, I made the observation that an addable path that does not contribute a lot to the network efficiency would not contribute a radically different amount in the following iteration. As such, after ranking the possibilities after the first-time round evaluation of all possible addable paths, the top of the list is added onto the network. Then, only the next 50 records of the ranking system are assessed and reranked by their ability to improve the latest network. The program will add the best path into the network, and repeat this process until the quota for network length is reached.

## Accomplishments that I'm proud of

I implemented my system on Beijing City to compare the generated network with the current Beijing subway system. As over 93% of all daily commuting trips are within the 5th ring road, the program only generated starting and destination points within this area. I used multivariate gaussian distributions to model the population. I verified this model of the Beijing City by computing the average travel time of the current Beijing subway system under these conditions, and was glad to find that it matched the publicly reported 45 minutes.

Applying my system on the same model of Beijing City, I found that my system created a subway network that was 14.05% more time efficient than the current Beijing subway.

At the national science fair, I won the Mu Alpha Theta Award from the National Mathematics Honor Society, the ISEF National Champion Award, Top Logical Science Project Award, Best of Show Award, Rank of Superior Award, and the title as an Intel ISEF Finalist.