Inspiration

We were inspired to pursue this challenge by the Bloomburg Industry challenge. We also thought it would be a good way to learn more about data pipelines and develop a public-facing cloud solution.

What it does

Stores and analyzes DoD contract data in a SQL database, then provides a secure outwards-facing querying API.

How we built it

  • Wrote a series of scripts to scrape raw data from the DoD website
  • Developed a custom LLM completions model with OpenAI
  • Used completions model to parse the natural language of published contracts into storable structured JSON
  • Wrote a script to filter and clean all the parsed data
  • Uploaded all data (over 12,000 contracts parsed!) to CockroachDB, a serverless cloud PostgreSQL database
  • Built AWS API Gateway endpoints and AWS Lambda functions to securely authenticate and pull contract information and metrics from the database
  • Developed React frontend to dynamically pull and display paginated data from the API
  • Deployed using GitHub pages!

Challenges we ran into

I forgot to await the resolution of a contract page scrape before starting the next one, accidentally sending around 2400 simultaneous requests to DoD servers and getting my IP flagged for a DDoS attempt (which prevented accessing the site again on that machine)

Accomplishments that we're proud of

We were able to actually build a working product using completely foreign technologies

What we learned

OpenAI credits aren't free, and AWS Lambda doesn't automatically save debug logs

What's next for DoDCAT

We will fix the frontend so that you can actually switch pages (the pages exist, we promise!)

Share this project:

Updates