-
-
My Pull Request for my new fivetran custom sdk connector to the five tran repo
-
MY RAG application OWASP API Vulnerability Advisor Landing Page
-
Bigquery Datasets receiving data from my fivetran custom sdk connector
-
my fivetran custom sdk connector owasp_api_vulns sync log
-
my public forked repo for my custom sdk connector
-
RAG application - OWASP API Vulnerability Advisor results page top
-
RAG application - OWASP API Vulnerability Advisor results page part
Inspiration
As an engineer deeply involved in application security, I've consistently observed a critical bottleneck: the disconnect between identifying API vulnerabilities and providing developers with immediate, actionable remediation. Security alerts flood in, but the process of researching, understanding, and implementing fixes is often manual, time-consuming, and resource-intensive. This project was born from the desire to truly "shift left" security, making expert advice an integrated, seamless part of the development workflow, rather than an afterthought. The AI Accelerate Hackathon, especially the Fivetran challenge, presented the perfect opportunity to combine robust data pipelines with cutting-edge AI to solve this pervasive industry problem.
What it Does: The OWASP API Vulnerability Advisor
The OWASP API Vulnerability Advisor is an end-to-end, AI-powered RAG (Retrieval-Augmented Generation) system that transforms raw CVE data into actionable API security intelligence.
- Automated Data Ingestion: A custom-built Fivetran Connector SDK ingests real-time API vulnerability data from the National Vulnerability Database (NVD) API. This connector is robust, handling incremental syncs and intelligently filtering for relevant OWASP-related CVEs, pushing clean data directly into Google BigQuery.
- Centralized Vulnerability Data: Google BigQuery serves as our scalable, serverless data warehouse, providing a live repository of categorized and filterable CVE information.
- Intelligent Remediation Engine: A Streamlit web application, deployed on Google Cloud Run, acts as the user interface. Users select severity and date range filters, and upon submission, the app dynamically queries BigQuery to retrieve contextually relevant CVEs. This specific, up-to-date data is then passed to Vertex AI's Gemini model (Gemini 2.5 Pro). The Gemini model, acting as an API security expert, generates concise, detailed remediation steps, best practices, and even code examples tailored to the identified vulnerabilities.
This system provides a "win-win-win" scenario for developers (instant solutions), testers (clear verification guidance), and InfoSec (proactive risk reduction).
How it was Built
The architecture leverages the best of cloud-native and open-source tools:
- Data Source: National Vulnerability Database (NVD) API for authoritative CVE data.
- Data Integration: Fivetran Custom Connector SDK for reliable, incremental ELT into BigQuery.
- Data Storage: Google BigQuery for scalable, performant data warehousing.
- AI Backend: Vertex AI (Gemini 2.5 Pro) for advanced natural language understanding and generation, acting as the RAG layer.
- Web Application: Streamlit for rapid UI development and Python-native interaction.
- Deployment: Docker and Google Cloud Run for efficient, scalable, and serverless hosting.
- Configuration Management:
config.inifor secure and flexible environment configuration.
The development process emphasized modularity, using separate directories for the Fivetran connector and the RAG app, while ensuring a cohesive deployment strategy via Docker and Cloud Build.
Challenges I Faced & Overcame
- NVD API Rate Limits & Data Volume: Initially, managing API rate limits and processing large volumes of NVD data efficiently within the Fivetran SDK was a challenge. I optimized the connector to handle pagination gracefully and filter for critical data points early, reducing unnecessary processing.
- BigQuery
DATETIMEvs.TIMESTAMPParsing: A subtle but critical issue arose with parsinglast_modified_datecontaining timezone offsets (+HH:MM) for BigQuery'sDATETIMEtype, which caused query failures. I debugged and resolved this by correctly using BigQuery'sTIMESTAMP()function for automatic ISO 8601 parsing, ensuring robust date comparisons. - Seamless Cloud Run Deployment: Integrating
config.iniinto the Docker build process while keeping it out of Git, and ensuring Streamlit correctly bound to thePORTenvironment variable required carefulDockerfileand.gcloudignoreconfiguration. This was crucial for a smooth transition from local development to production-ready deployment. - UX for RAG Latency: RAG applications involve fetching data and then calling an LLM, which can introduce latency. To improve user experience, I implemented a two-step spinner in Streamlit, clearly indicating "Querying BigQuery..." then displaying found CVEs, and finally "Generating AI Recommendations...", managing user expectations effectively.
What I Learned
This project significantly deepened my understanding of:
- The practical application of Fivetran's Custom Connector SDK for bespoke data ingestion needs.
- Optimizing BigQuery queries for date/time handling and performance in real-world scenarios.
- Designing and implementing a production-ready RAG pipeline leveraging Vertex AI for domain-specific insights.
- Best practices for Streamlit app development and containerized deployment on Google Cloud Run.
- The importance of thoughtful UX design in AI applications to communicate system progress.
Future Vision
My future vision for the OWASP API Vulnerability Advisor includes:
- Advanced Filtering: Incorporating filtering by specific CWE IDs or affected software versions.
- Proactive Alerts: Integrating with notification systems (e.g., Slack, PagerDuty) to push AI-generated remediations for new critical CVEs.
- Interactive Remediation: Allowing developers to provide feedback on recommendations, improving future AI responses through fine-tuning.
- Multi-API Integration: Connecting to other vulnerability databases or internal security tools.
- Agentic Workflows: Exploring how AgentSpace could orchestrate multiple steps of vulnerability analysis and remediation, potentially even drafting pull requests for simple fixes.
This project lays a robust foundation for a powerful tool that can dramatically improve developer productivity and overall API security posture.
Built With
- ai/ml
- cicd
- cloud-computing
- containerization
- data-warehousing
- docker
- fivetran-custom-connector-sdk
- git
- github
- google-bigquery
- google-cloud-build
- google-cloud-platform-(gcp)
- google-cloud-run
- google-vertex-ai-(gemini-2.5-pro)
- large-language-models-(llms)
- markdown
- nvd-api-(national-vulnerability-database)
- python-3.11
- rest-apis
- retrieval-augmented-generation-(rag)
- streamlit
Log in or sign up for Devpost to join the conversation.