Inspiration

As an engineer deeply involved in application security, I've consistently observed a critical bottleneck: the disconnect between identifying API vulnerabilities and providing developers with immediate, actionable remediation. Security alerts flood in, but the process of researching, understanding, and implementing fixes is often manual, time-consuming, and resource-intensive. This project was born from the desire to truly "shift left" security, making expert advice an integrated, seamless part of the development workflow, rather than an afterthought. The AI Accelerate Hackathon, especially the Fivetran challenge, presented the perfect opportunity to combine robust data pipelines with cutting-edge AI to solve this pervasive industry problem.

What it Does: The OWASP API Vulnerability Advisor

The OWASP API Vulnerability Advisor is an end-to-end, AI-powered RAG (Retrieval-Augmented Generation) system that transforms raw CVE data into actionable API security intelligence.

  1. Automated Data Ingestion: A custom-built Fivetran Connector SDK ingests real-time API vulnerability data from the National Vulnerability Database (NVD) API. This connector is robust, handling incremental syncs and intelligently filtering for relevant OWASP-related CVEs, pushing clean data directly into Google BigQuery.
  2. Centralized Vulnerability Data: Google BigQuery serves as our scalable, serverless data warehouse, providing a live repository of categorized and filterable CVE information.
  3. Intelligent Remediation Engine: A Streamlit web application, deployed on Google Cloud Run, acts as the user interface. Users select severity and date range filters, and upon submission, the app dynamically queries BigQuery to retrieve contextually relevant CVEs. This specific, up-to-date data is then passed to Vertex AI's Gemini model (Gemini 2.5 Pro). The Gemini model, acting as an API security expert, generates concise, detailed remediation steps, best practices, and even code examples tailored to the identified vulnerabilities.

This system provides a "win-win-win" scenario for developers (instant solutions), testers (clear verification guidance), and InfoSec (proactive risk reduction).

How it was Built

The architecture leverages the best of cloud-native and open-source tools:

  • Data Source: National Vulnerability Database (NVD) API for authoritative CVE data.
  • Data Integration: Fivetran Custom Connector SDK for reliable, incremental ELT into BigQuery.
  • Data Storage: Google BigQuery for scalable, performant data warehousing.
  • AI Backend: Vertex AI (Gemini 2.5 Pro) for advanced natural language understanding and generation, acting as the RAG layer.
  • Web Application: Streamlit for rapid UI development and Python-native interaction.
  • Deployment: Docker and Google Cloud Run for efficient, scalable, and serverless hosting.
  • Configuration Management: config.ini for secure and flexible environment configuration.

The development process emphasized modularity, using separate directories for the Fivetran connector and the RAG app, while ensuring a cohesive deployment strategy via Docker and Cloud Build.

Challenges I Faced & Overcame

  1. NVD API Rate Limits & Data Volume: Initially, managing API rate limits and processing large volumes of NVD data efficiently within the Fivetran SDK was a challenge. I optimized the connector to handle pagination gracefully and filter for critical data points early, reducing unnecessary processing.
  2. BigQuery DATETIME vs. TIMESTAMP Parsing: A subtle but critical issue arose with parsing last_modified_date containing timezone offsets (+HH:MM) for BigQuery's DATETIME type, which caused query failures. I debugged and resolved this by correctly using BigQuery's TIMESTAMP() function for automatic ISO 8601 parsing, ensuring robust date comparisons.
  3. Seamless Cloud Run Deployment: Integrating config.ini into the Docker build process while keeping it out of Git, and ensuring Streamlit correctly bound to the PORT environment variable required careful Dockerfile and .gcloudignore configuration. This was crucial for a smooth transition from local development to production-ready deployment.
  4. UX for RAG Latency: RAG applications involve fetching data and then calling an LLM, which can introduce latency. To improve user experience, I implemented a two-step spinner in Streamlit, clearly indicating "Querying BigQuery..." then displaying found CVEs, and finally "Generating AI Recommendations...", managing user expectations effectively.

What I Learned

This project significantly deepened my understanding of:

  • The practical application of Fivetran's Custom Connector SDK for bespoke data ingestion needs.
  • Optimizing BigQuery queries for date/time handling and performance in real-world scenarios.
  • Designing and implementing a production-ready RAG pipeline leveraging Vertex AI for domain-specific insights.
  • Best practices for Streamlit app development and containerized deployment on Google Cloud Run.
  • The importance of thoughtful UX design in AI applications to communicate system progress.

Future Vision

My future vision for the OWASP API Vulnerability Advisor includes:

  • Advanced Filtering: Incorporating filtering by specific CWE IDs or affected software versions.
  • Proactive Alerts: Integrating with notification systems (e.g., Slack, PagerDuty) to push AI-generated remediations for new critical CVEs.
  • Interactive Remediation: Allowing developers to provide feedback on recommendations, improving future AI responses through fine-tuning.
  • Multi-API Integration: Connecting to other vulnerability databases or internal security tools.
  • Agentic Workflows: Exploring how AgentSpace could orchestrate multiple steps of vulnerability analysis and remediation, potentially even drafting pull requests for simple fixes.

This project lays a robust foundation for a powerful tool that can dramatically improve developer productivity and overall API security posture.

Built With

  • ai/ml
  • cicd
  • cloud-computing
  • containerization
  • data-warehousing
  • docker
  • fivetran-custom-connector-sdk
  • git
  • github
  • google-bigquery
  • google-cloud-build
  • google-cloud-platform-(gcp)
  • google-cloud-run
  • google-vertex-ai-(gemini-2.5-pro)
  • large-language-models-(llms)
  • markdown
  • nvd-api-(national-vulnerability-database)
  • python-3.11
  • rest-apis
  • retrieval-augmented-generation-(rag)
  • streamlit
Share this project:

Updates