Inspiration

Inspired by sites that track congresspersons' stock trades, we decided to track executives trading history for their own companies.

What it does

Veritas is a financial intelligence platform dedicated to restoring integrity to capital markets by revealing subtle, suspicious trading patterns among corporate executives. We move beyond public headlines to give investors a clear, unvarnished view of corporate conduct.

  • Focusing on the Critical Window:

We recognize that the most material non-public information is often traded on just before it becomes official news. Veritas automatically links these two factors: Executive Action (Form 4): We track every stock sale and purchase made by officers and directors. Corporate Event (Form 8-K): We analyze the precise text of mandatory company disclosures—such as mergers, major debt defaults, or unexpected resignations—that occur within 30 days of that trade.

  • Exposing the Pattern of Suspicion:

Veritas uses a proprietary, high-speed vector similarity model to analyze the language of every disclosure. This model performs a rapid, contextual analysis of the Form 8-K text to classify the event's inherent market impact: STOCK_UP or STOCK_DOWN. We then apply a strict suspicion logic: if an executive sold stock right before a disclosure classified as STOCK_DOWN (which would have tanked the stock anyway), their trade is flagged as suspicious. They may have acted to avoid a loss based on privileged information. On the other hand, if an executive bought stock right before a disclosure classified as STOCK_UP, they could have acted to realize a gain based on private knowledge.

  • The Corporate Integrity Score:

The platform aggregates detected anomalies to generate a single, easy-to-understand Corporate Integrity Score for the company. This score reflects the overall trustworthiness of the leadership team's actions relative to their access to confidential information. We give the public the intelligence previously available only to high-frequency traders and regulators.

How we built it

We built the Veritas application as a robust, high-performance financial intelligence platform using a stable tech stack to solve the problem of market integrity and executive accountability. Our primary engineering goal was to eliminate reliance on slow, external API calls by building the analysis engine directly into the backend infrastructure.

  • Technology Foundation:

The application runs on a modular stack: React (TypeScript) for the clean, analytical interface; a Python Flask API to orchestrate data and logic; and MongoDB Atlas for data persistence and, critically, high-speed vector search. The entire backend, including the AI model, is packaged via Docker for automated deployment using Google Cloud Build.

  • Low-Latency AI Engine:

To ensure predictions are near-instant, we moved the complex AI work offline:

Local Embedding Model: We utilize the open-source Sentence Transformer model (all-MiniLM-L6-v2), which runs locally inside the container, eliminating network lag, API key issues, and quota failures.

Vector Database Training: We trained our model using over 500 custom-labeled examples of long, descriptive SEC filing text (manually tagged as STOCK_UP, STOCK_DOWN, or NEUTRAL). Each text snippet was converted into a 384-dimensional vector and stored in the MongoDB Atlas Vector Search index.

  • Data Parsing and Analysis Pipeline:

The system links trades to events through a precise, high-fidelity pipeline:

Targeted Content Extraction: When a company is searched, the system fetches raw 8-K filings. A custom Python parser is used to scrape the messy SEC HTML, extracting the entire narrative block for any Item X.XX disclosure, ensuring no critical context is missed.

Vector Search Prediction: The extracted narrative text is converted into a query vector, and the database instantly retrieves the 10 closest matching vectors from our 500+ pre-labeled examples. The model determines the final sentiment by calculating the weighted majority vote of these matches, yielding a high-confidence prediction in milliseconds.

Anomaly Scoring: The Flask API checks all executive trades within 30 days of the filing date. A trade is flagged as Suspicious if the executive's action benefited from the classified market impact (e.g., Executive Sold before a model-predicted STOCK_DOWN event).

The resulting analysis is used to generate the Corporate Integrity Score, giving investors transparent and trustworthy intelligence.

Challenges we ran into

We initially attempted to utilize a news API instead of the SEC 8-K forms for significant events surrounding the company because it provided a nicely formatted string for the event to feed into the vector database. However, we could not find an API that had enough free credits for our use. Next, we discussed using use an LLM to generate potential URLs that were significant towards a specific company, but this would've taken between two and ten seconds. After these ideas, we decided to go back to the SEC API to analyze 8-K forms and parse through them for summaries. This was incredibly challenging, as parsing any useful data out of the forms proved to be incredibly difficult. However, after much struggle, we were able to get event summaries to input into the vector database.

We also ran into issues regarding the integrity score algorithm. Specifically, determining the weights and decay model we wanted to use for the time difference between a trade being placed and an important, stock-altering event occurring. We tested various weights and landed on an exponential decay model that more accurately represented the possibility for insider trading.

Accomplishments that we're proud of

Our favorite accomplishments include:

  • Implementing NLP for Form 8-K articles to determine positive, negative, or neutral outlooks on the stock price.
  • Using caching to store users' past searches.
  • Successfully parsing difficult forms from the SEC EDGAR API.
  • Dynamically plotting company executives' stock trades in a searchable chart.
  • Using Docker to deploy the backend to Google Cloud.

What we learned

Our biggest takeaway was learning when to utilize libraries and APIs and when to write our own solutions for problems. We spent a few hours researching writing our own NLP model for sentiment analysis before realizing that the solution we needed already existed. On the other hand, there wasn't a good free library to parse the SEC form 8-Ks, so we knew we needed to go all in on getting the relevant data. Building visually appealing, dynamic visualizations for the data was time consuming and challenging, but it was an incredible takeaway for the entire team.

What's next for Veritas

We'd love to add compatibility for more SEC forms like the 10-K (annual reports) or the 10-Q (quarterly reports). Additionally, we tried adding pictures for profiles of well-known company executives, but we did not have enough time to completely finish the feature without bugs.

Share this project:

Updates