DoDCAT

Inspiration

We were inspired to pursue this challenge by the Bloomburg Industry challenge. We also thought it would be a good way to learn more about data pipelines and develop a public-facing cloud solution.

What it does

Stores and analyzes DoD contract data in a SQL database, then provides a secure outwards-facing querying API.

How we built it

Wrote a series of scripts to scrape raw data from the DoD website
Developed a custom LLM completions model with OpenAI
Used completions model to parse the natural language of published contracts into storable structured JSON
Wrote a script to filter and clean all the parsed data
Uploaded all data (over 12,000 contracts parsed!) to CockroachDB, a serverless cloud PostgreSQL database
Built AWS API Gateway endpoints and AWS Lambda functions to securely authenticate and pull contract information and metrics from the database
Developed React frontend to dynamically pull and display paginated data from the API
Deployed using GitHub pages!

Challenges we ran into

I forgot to await the resolution of a contract page scrape before starting the next one, accidentally sending around 2400 simultaneous requests to DoD servers and getting my IP flagged for a DDoS attempt (which prevented accessing the site again on that machine)

Accomplishments that we're proud of

We were able to actually build a working product using completely foreign technologies

What we learned

OpenAI credits aren't free, and AWS Lambda doesn't automatically save debug logs

What's next for DoDCAT

We will fix the frontend so that you can actually switch pages (the pages exist, we promise!)

Built With

Updates

Quinn Hancock started this project — Apr 21, 2024 09:28 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.