For a manager of a small business -- be it a store, restaurant, gym, or even a movie theater -- improving the customer experience and understand what's going on is tremendously important. Having access to analytics of when people are entering the building, what areas they're spending time at, and what crowds and lines are forming can provide managers with incredibly useful insights -- from identifying parts of the building layout that are poorly designed and causing congestion, to figuring out that certain table setups or shop items are particularly engaging, to having a better idea of what's going on in their business and being able to make data-driven decisions about how to improve.

What it does

Given a live video feed from an overhead camera, Crowd Insights’ AI algorithms detect human heads within the video and use this positional data to identify lines and clusters of people and create heatmaps. The small business owner can then examine this data to learn about human traffic flow within their store over a specified period of time.

There are a variety of use cases for this data: congestion tracking, popular hotspots in store, long lines, etc. By analyzing these trends over time, small business owners can make informed decisions on how to improve their business to optimize the physical interaction of customers with the store. For example, if they notice that lots of people tend to group up around a certain product, then they can know to place that product near the back of the store to prevent crowding around the store entrance.

Other use cases for this technology could include event management. Event organizers such as the TreeHacks team can use this technology to monitor the congestion within each room and help disperse people from highly crowded rooms to open spaces for work. They can monitor lines, ie for food or networking, and figure out novel ways to deal with long lines and heavy foot traffic.

How we built it

We built the theory and data science toolkits, machine learning model, frontend, and backend separately. For the machine learning, we used the Pytorch FCHD fully convolutional head detector, running on a Google Cloud VM. Afterwards, we passed the list of heads to the graph theory library that we built, which constructed the Minimum Spanning Tree through the graph, removed edges that were too long, and performed elliptical fits to determine whether a group of points was a line or a cluster. We also aggregated human location data over time to create a heatmap of the environment to see which places are interacted with the most. Firebase is used to communicate between the head detector and the computer (like a Raspberry Pi), which sends webcam feed data. Finally we have a web server using ReactJS that displays the results.

Challenges we ran into

One main issue was finding a vision model that could provide dense data for human position in a camera frame. Most models tend to do decent at closer distances but as we try to monitor areas that are >15 feet away from a camera, the precision becomes an issue. Due to the fact that we needed this sort of density in our data, we had to work through testing many model architectures and fusion techniques to yield the best results.

We also had a lot of trouble rendering the line/cluster data from Firebase in a real-time graph on the website. This was tough because no member had extensive experience with realtime updating and with push/pull requests between Firebase and the web app. To solve this, we worked together to break the problem down into two parts—that of collecting and parsing data from Firebase, and that of displaying the data in a dynamic graph.

Lastly, this was our first time incorporating a big chunk of frontend programming in our application. Our experience in JavaScript, HTML, and Firebase was limited. Thus, it took us a long time to implement the syntax of the languages from scratch. However, this also made this project really impactful as it provided us with an exceptional learning opportunity.

Accomplishments that we’re proud of

We implemented simple but effective algorithms for recognizing clusters of crowds and lines. We used minimum spanning trees and fitting ellipses to identify clusters, then took clusters with particularly elongated ellipses and fit them with best fit lines. We developed a decision tree that applied knowledge from all branches of computer science - from theory to machine learning and software engineering - together in a product that became more than the sum of its parts. The final web product took tens of hours to complete, and we’re confident that we were able to get it right.

What we learned

A lot of new frontend learning and creating algorithms ReactJS, ChartJS, CanvasJS, Plotly, firebase ML Head and Body Detection Algorithms Kruskal’s Minimum Spanning Tree, Automatic K-Means Clustering, Depth-First Search, Firebase - Realtime graphs, how to upload data from Jetson to Firebase to web app

Even though the project was divided into a frontend and backend portion, all members were able to understand the implementation on both sides. Throughout the implementation, we worked as a unified team, especially when we ran into roadblocks. The core takeaway from this project is our improved understanding of realtime databases, machine learning models, and frontend program structure.

What's next for Crowd Insights AI

One big next step would be applying mapping techniques to create a 3D map of the shop, then localize detected crowds in that 3D map. It would allow the business owner to analyze exactly which shelves or tables are becoming crowded. Furthermore, performing spatial transforms on the angled camera footage would allow us to track 3D from a 2D space.

We'd also want to apply optical flow and motion tracking to see how people are moving through the space and what slows them down.

Share this project: