Inspiration## Inspiration
What inspired you to build this project?
Traditional competitive intelligence is broken:
- Costs $50,000-$150,000/year for enterprise platforms
- Requires 20+ hours/week of manual analyst work
- Data scattered across sources (patents, jobs, news, code)
- Insights arrive too late to inform strategic decisions
- Human analysts struggle to correlate patterns across domains
I watched strategy teams spend entire days manually checking USPTO patent filings, scraping job boards, reading news articles, and browsing GitHub repos—only to miss critical signals because the data points were never connected.
The question: What if an AI agent could autonomously gather all this data, correlate patterns humans miss, and deliver strategic predictions in under 60 seconds?
What it does
Explain what your project does in simple terms.
IntelAgent is a fully autonomous AI competitive intelligence platform that:
Gathers Intelligence - Automatically collects data from 4 sources:
- Patents (via Fivetran Custom Connector + BigQuery) → R&D focus and IP strategy
- Job Postings (via Cloud Functions + Firestore) → Hiring patterns and org priorities
- News Coverage (via Cloud Functions + Firestore) → Public narrative and partnerships
- GitHub Activity (via Cloud Functions + Firestore) → Developer ecosystem strategy
Analyzes Patterns - Gemini 2.5 Pro autonomously:
- Chooses which tools to call based on the query
- Executes function calls in parallel for speed
- Cross-correlates signals to find non-obvious insights
- Adapts when data is unavailable (graceful degradation)
Generates Predictions - Produces:
- Executive summary with strategic direction
- Detailed analysis across all 4 intelligence sources
- Evidence-based 30/60/90-day forecasts with confidence levels
- Transparent reasoning showing how conclusions were reached
Example Query: "Analyze Anthropic's strategic direction"
Agent Output (in 45 seconds):
- Found 6 patents (all AI agent automation → product direction clear)
- Analyzed 224 jobs (67% sales vs 12% R&D → enterprise pivot)
- Reviewed 57 news articles (Google Cloud partnership → infrastructure scaling)
- Examined 50 GitHub repos (151K stars → strong developer adoption)
Prediction: Enterprise platform launch in 30-60 days (HIGH confidence) Evidence: Sales hiring surge + Google Cloud partnership + enterprise safety patents + developer SDK maturity
How I built it
Describe your technical implementation.
Architecture (9 services working together):
Fivetran Custom Connector
└─ USPTO Patents API → BigQuery patent_intelligence dataset
Cloud Functions Gen 2 (Data Collectors)
├─ job-scraper: Greenhouse API → Firestore
├─ news-search: Google News RSS → Firestore
└─ github-activity: GitHub API → Firestore
BigQuery
├─ Patents Public Dataset (100M+ patents globally)
└─ patent_intelligence dataset (recent USPTO filings via Fivetran)
Firestore (NoSQL Database)
├─ jobs collection (224 documents for Anthropic)
├─ news collection (57 documents)
└─ github collection (50 repositories)
Cloud Run
└─ Streamlit app (Python 3.11, auto-scaling)
Vertex AI
└─ Gemini 2.5 Pro (function calling, 16K output tokens)
Cloud Scheduler
└─ Automated data refresh (daily)
Cloud Build + Artifact Registry
└─ CI/CD pipeline
Agent Intelligence:
- 15,000-character system instruction defining analytical framework
- Function calling with 4 tools:
get_patents(),get_jobs(),get_news(),get_github() - Iterative reasoning: Shows thinking before/during/after tool execution
- Cross-signal correlation: Connects patterns across independent sources
- Adaptive behavior: Handles missing data gracefully
Code Highlights:
# Tool definition for Gemini
patent_function = FunctionDeclaration(
name="get_patents",
description="Get recent patent filings from BigQuery",
parameters={"company": "string", "limit": "integer"}
)
# Agent execution with function calling
model = GenerativeModel(
"gemini-2.5-pro",
tools=[intelligence_tool],
generation_config=GenerationConfig(
temperature=0.7,
max_output_tokens=16384
)
)
Production Features:
- Exponential backoff retry (handles rate limits)
- Error handling with fallbacks
- Comprehensive logging
- Graceful degradation when data unavailable
Challenges I ran into
What obstacles did you face?
1. BigQuery Patents Schema Complexity
- Patents dataset uses
ARRAY<STRING>for assignees, notARRAY<STRUCT> - Initial queries failed with schema errors
- Solution: Used
UNNEST(assignee)andassignee_harmonized[].namefor robust matching - Added fallback to simpler query if comprehensive fails
2. Gemini Rate Limits on Free Tier
- ResourceExhausted errors when making 4+ tool calls rapidly
- Solution: Implemented exponential backoff retry (2s → 4s → 8s → 60s)
- Added 300-second deadline with graceful timeout handling
- Increased output tokens to 16K to reduce multiple generations
3. Executive Summary Truncation Bug
- Regex parser stopped at keywords like "strategic" or "patent" in body text
- Thought it found a section header when it was just regular content
- Solution: Required
##prefix to distinguish headers from text - Changed regex from
(?=Strategic)to(?=\n##\s+Strategic Reasoning)
4. Vertex AI SDK Migration
- Preview API (
vertexai.preview.generative_models) deprecated - Stable API doesn't support "default" field in function parameters
- Solution: Migrated to
vertexai.generative_models, removed unsupported fields - Updated all imports and tested thoroughly
5. Real Patent Data Access - The Fivetran Solution
- Problem: BigQuery Patents Public Dataset has 1-3 month sync lag for recent USPTO filings
- Anthropic's 6 patents (filed 2024-2025) not yet in BigQuery = blind spot for recent competitive intelligence
- Solution: Built custom Fivetran connector
- Fetches patents directly from Google Patents API (zero lag)
- Syncs to BigQuery
patent_intelligencedataset with UPSERT operations - Automated daily/hourly refresh schedule
- Seamlessly integrates with existing BigQuery queries
- Result: Agent now queries both historical BigQuery dataset AND recent USPTO filings via Fivetran
- Provides complete analysis with most current patent data
Accomplishments that we're proud of
What achievements stand out?
✅ True Agentic Autonomy - Agent makes real decisions, not following scripts
- Chooses which tools to call based on query context
- Executes multiple tool calls in parallel
- Adapts strategy when data unavailable
✅ 99.96% Cost Reduction - $50,000/year → $20/month
- Serverless architecture (pay only for usage)
- No expensive enterprise platform licenses
- Gemini 2.5 Pro highly cost-efficient
✅ 100x Speed Improvement - 20 hours/week → 60 seconds
- Parallel data collection via Cloud Functions
- Real-time BigQuery queries
- Instant AI synthesis
✅ Transparent Reasoning - Shows thinking process
- "Analysis Strategy" section before tool calls
- "Initial Findings Review" after data collection
- Explains evidence for every claim
- Builds user trust through explainability
✅ Multi-Signal Correlation - Connects patterns across 4 independent sources
- Patents + Jobs + News + GitHub = comprehensive view
- Finds insights humans miss (e.g., patent cluster + hiring spike = product launch)
✅ Production Quality - Enterprise-ready implementation
- Error handling and retry logic
- Graceful degradation when data missing
- Comprehensive logging for debugging
- Auto-scaling serverless infrastructure
✅ Custom Fivetran Connector - Bridges BigQuery patent data lag
- Fetches USPTO patents in real-time (zero lag vs 60-90 day BigQuery lag)
- Automated sync pipeline with UPSERT deduplication
- Enriches BigQuery Patents Public Dataset with recent filings
- Critical for detecting competitor pivots immediately
✅ Real Business Value - Solves actual $50K problem for competitive intelligence teams
What I learned
Key technical and design insights.
Technical Learnings:
Gemini 2.5 Pro's Function Calling is Exceptional
- Reliably chooses correct tools based on query context
- Handles complex multi-tool scenarios autonomously
- Understands when to call tools multiple times
- Much better than GPT-4 function calling in our testing
BigQuery Public Datasets are Powerful but Complex
- 100M+ patents available for free
- Schema requires deep understanding (ARRAY vs STRUCT)
assignee_harmonizedfield crucial for standardization- 1-3 month sync lag for very recent patents → solved with Fivetran
Fivetran Custom Connectors Enable Real-Time Data Pipelines
- Built Python connector using Fivetran SDK
- Automated sync schedule with configurable frequency
- UPSERT operations prevent duplicate patent records
- Seamless BigQuery integration (writes to custom dataset)
- Critical for time-sensitive competitive intelligence
Firestore Perfect for Multi-Source Aggregation
- Flexible schema handles different data types
- Real-time updates from Cloud Functions
- Simple queries with
.where()filters - Cost-effective for moderate data volumes
Cloud Run Simplicity Accelerates Development
- Just need a Dockerfile
- Auto-scaling handles unpredictable AI workloads
- Deploy in minutes with
gcloud run deploy - No infrastructure management
AI Agent Design Learnings:
Show Your Work = Trust
- Users trust AI more when they see reasoning
- "Analysis Strategy" before action builds confidence
- Citing specific evidence (patent numbers, job titles) crucial
- Transparency > black box magic
Graceful Degradation is Essential
- Data sources fail (APIs down, rate limits, no data)
- Agent must adapt, not crash
- "No patents found → focus on other 3 sources" = good UX
- Communicate what's missing and why it's okay
Cross-Signal Correlation = Magic Moment
- Single source = data dump
- Multiple sources correlated = strategic insight
- "Patents on X + Jobs hiring for X + News about X = Prediction Y" = value
- This is where AI > human analyst (pattern recognition at scale)
Structured Output Matches Analyst Workflows
- Executive Summary → Decision makers
- Detailed Analysis → Strategy teams
- Predictions → Planning teams
- Format matters as much as content
What's next for IntelAgent
Future roadmap and enhancements.
Near-Term (Next 30 Days):
- [ ] LinkedIn integration for employee movement tracking
- [ ] Custom alerts (email/Slack) on strategic shift detection
- [ ] Comparative dashboards (side-by-side company analysis)
- [ ] Export to PDF/PowerPoint for executive presentations
- [ ] API access for programmatic integration
Medium-Term (3-6 Months):
- [ ] Funding database integration (Crunchbase, PitchBook)
- [ ] Sentiment analysis on news coverage (NLP deep dive)
- [ ] Custom data source connectors (user-supplied APIs)
- [ ] Multi-company monitoring dashboard
- [ ] Historical trend analysis and pattern detection
Long-Term (6-12 Months):
- [ ] Multi-modal analysis (earnings call videos, slide decks)
- [ ] Predictive ML models trained on historical accuracy
- [ ] Customer success stories and case studies
- [ ] Enterprise features (SSO, audit trails, role-based access)
- [ ] White-label offering for consulting firms
Production Hardening:
- [ ] Increase Vertex AI quotas for enterprise scale
- [ ] Add Redis caching for frequently accessed patents
- [ ] Implement user authentication and usage tracking
- [ ] Build compliance audit trail (SOC 2, GDPR)
- [ ] Add data quality monitoring and alerts
Business Model:
- Free tier: 10 analyses/month
- Pro: $49/month unlimited analyses
- Enterprise: Custom pricing with dedicated support
What it does
How I built it
Challenges I ran into
Accomplishments that I'm proud of
What I learned
What's next for Intelligence Agent
Built With
- bigquery
- cloud-functions
- cloud-run
- cloud-scheduler
- firestore
- fivetran
- gemini-2-5-pro
- google-cloud
- python
- streamlit
- vertex-ai
Log in or sign up for Devpost to join the conversation.