Intelligence Agent

Inspiration## Inspiration

What inspired you to build this project?

Traditional competitive intelligence is broken:

Costs $50,000-$150,000/year for enterprise platforms
Requires 20+ hours/week of manual analyst work
Data scattered across sources (patents, jobs, news, code)
Insights arrive too late to inform strategic decisions
Human analysts struggle to correlate patterns across domains

I watched strategy teams spend entire days manually checking USPTO patent filings, scraping job boards, reading news articles, and browsing GitHub repos—only to miss critical signals because the data points were never connected.

The question: What if an AI agent could autonomously gather all this data, correlate patterns humans miss, and deliver strategic predictions in under 60 seconds?

What it does

Explain what your project does in simple terms.

IntelAgent is a fully autonomous AI competitive intelligence platform that:

Gathers Intelligence - Automatically collects data from 4 sources:
- Patents (via Fivetran Custom Connector + BigQuery) → R&D focus and IP strategy
- Job Postings (via Cloud Functions + Firestore) → Hiring patterns and org priorities
- News Coverage (via Cloud Functions + Firestore) → Public narrative and partnerships
- GitHub Activity (via Cloud Functions + Firestore) → Developer ecosystem strategy
Analyzes Patterns - Gemini 2.5 Pro autonomously:
- Chooses which tools to call based on the query
- Executes function calls in parallel for speed
- Cross-correlates signals to find non-obvious insights
- Adapts when data is unavailable (graceful degradation)
Generates Predictions - Produces:
- Executive summary with strategic direction
- Detailed analysis across all 4 intelligence sources
- Evidence-based 30/60/90-day forecasts with confidence levels
- Transparent reasoning showing how conclusions were reached

Example Query: "Analyze Anthropic's strategic direction"

Agent Output (in 45 seconds):

Found 6 patents (all AI agent automation → product direction clear)
Analyzed 224 jobs (67% sales vs 12% R&D → enterprise pivot)
Reviewed 57 news articles (Google Cloud partnership → infrastructure scaling)
Examined 50 GitHub repos (151K stars → strong developer adoption)

Prediction: Enterprise platform launch in 30-60 days (HIGH confidence) Evidence: Sales hiring surge + Google Cloud partnership + enterprise safety patents + developer SDK maturity

How I built it

Describe your technical implementation.

Architecture (9 services working together):

Fivetran Custom Connector
  └─ USPTO Patents API → BigQuery patent_intelligence dataset

Cloud Functions Gen 2 (Data Collectors)
  ├─ job-scraper: Greenhouse API → Firestore
  ├─ news-search: Google News RSS → Firestore
  └─ github-activity: GitHub API → Firestore

BigQuery
  ├─ Patents Public Dataset (100M+ patents globally)
  └─ patent_intelligence dataset (recent USPTO filings via Fivetran)

Firestore (NoSQL Database)
  ├─ jobs collection (224 documents for Anthropic)
  ├─ news collection (57 documents)
  └─ github collection (50 repositories)

Cloud Run
  └─ Streamlit app (Python 3.11, auto-scaling)

Vertex AI
  └─ Gemini 2.5 Pro (function calling, 16K output tokens)

Cloud Scheduler
  └─ Automated data refresh (daily)

Cloud Build + Artifact Registry
  └─ CI/CD pipeline

Agent Intelligence:

15,000-character system instruction defining analytical framework
Function calling with 4 tools: get_patents(), get_jobs(), get_news(), get_github()
Iterative reasoning: Shows thinking before/during/after tool execution
Cross-signal correlation: Connects patterns across independent sources
Adaptive behavior: Handles missing data gracefully

Code Highlights:

# Tool definition for Gemini
patent_function = FunctionDeclaration(
    name="get_patents",
    description="Get recent patent filings from BigQuery",
    parameters={"company": "string", "limit": "integer"}
)

# Agent execution with function calling
model = GenerativeModel(
    "gemini-2.5-pro",
    tools=[intelligence_tool],
    generation_config=GenerationConfig(
        temperature=0.7,
        max_output_tokens=16384
    )
)

Production Features:

Exponential backoff retry (handles rate limits)
Error handling with fallbacks
Comprehensive logging
Graceful degradation when data unavailable

Challenges I ran into

What obstacles did you face?

1. BigQuery Patents Schema Complexity

Patents dataset uses ARRAY<STRING> for assignees, not ARRAY<STRUCT>
Initial queries failed with schema errors
Solution: Used UNNEST(assignee) and assignee_harmonized[].name for robust matching
Added fallback to simpler query if comprehensive fails

2. Gemini Rate Limits on Free Tier

ResourceExhausted errors when making 4+ tool calls rapidly
Solution: Implemented exponential backoff retry (2s → 4s → 8s → 60s)
Added 300-second deadline with graceful timeout handling
Increased output tokens to 16K to reduce multiple generations

3. Executive Summary Truncation Bug

Regex parser stopped at keywords like "strategic" or "patent" in body text
Thought it found a section header when it was just regular content
Solution: Required ## prefix to distinguish headers from text
Changed regex from (?=Strategic) to (?=\n##\s+Strategic Reasoning)

4. Vertex AI SDK Migration

Preview API (vertexai.preview.generative_models) deprecated
Stable API doesn't support "default" field in function parameters
Solution: Migrated to vertexai.generative_models, removed unsupported fields
Updated all imports and tested thoroughly

5. Real Patent Data Access - The Fivetran Solution

Problem: BigQuery Patents Public Dataset has 1-3 month sync lag for recent USPTO filings
Anthropic's 6 patents (filed 2024-2025) not yet in BigQuery = blind spot for recent competitive intelligence
Solution: Built custom Fivetran connector
- Fetches patents directly from Google Patents API (zero lag)
- Syncs to BigQuery patent_intelligence dataset with UPSERT operations
- Automated daily/hourly refresh schedule
- Seamlessly integrates with existing BigQuery queries
Result: Agent now queries both historical BigQuery dataset AND recent USPTO filings via Fivetran
Provides complete analysis with most current patent data

Accomplishments that we're proud of

What achievements stand out?

✅ True Agentic Autonomy - Agent makes real decisions, not following scripts

Chooses which tools to call based on query context
Executes multiple tool calls in parallel
Adapts strategy when data unavailable

✅ 99.96% Cost Reduction - $50,000/year → $20/month

Serverless architecture (pay only for usage)
No expensive enterprise platform licenses
Gemini 2.5 Pro highly cost-efficient

✅ 100x Speed Improvement - 20 hours/week → 60 seconds

Parallel data collection via Cloud Functions
Real-time BigQuery queries
Instant AI synthesis

✅ Transparent Reasoning - Shows thinking process

"Analysis Strategy" section before tool calls
"Initial Findings Review" after data collection
Explains evidence for every claim
Builds user trust through explainability

✅ Multi-Signal Correlation - Connects patterns across 4 independent sources

Patents + Jobs + News + GitHub = comprehensive view
Finds insights humans miss (e.g., patent cluster + hiring spike = product launch)

✅ Production Quality - Enterprise-ready implementation

Error handling and retry logic
Graceful degradation when data missing
Comprehensive logging for debugging
Auto-scaling serverless infrastructure

✅ Custom Fivetran Connector - Bridges BigQuery patent data lag

Fetches USPTO patents in real-time (zero lag vs 60-90 day BigQuery lag)
Automated sync pipeline with UPSERT deduplication
Enriches BigQuery Patents Public Dataset with recent filings
Critical for detecting competitor pivots immediately

✅ Real Business Value - Solves actual $50K problem for competitive intelligence teams

What I learned

Key technical and design insights.

Technical Learnings:

Gemini 2.5 Pro's Function Calling is Exceptional
- Reliably chooses correct tools based on query context
- Handles complex multi-tool scenarios autonomously
- Understands when to call tools multiple times
- Much better than GPT-4 function calling in our testing
BigQuery Public Datasets are Powerful but Complex
- 100M+ patents available for free
- Schema requires deep understanding (ARRAY vs STRUCT)
- assignee_harmonized field crucial for standardization
- 1-3 month sync lag for very recent patents → solved with Fivetran
Fivetran Custom Connectors Enable Real-Time Data Pipelines
- Built Python connector using Fivetran SDK
- Automated sync schedule with configurable frequency
- UPSERT operations prevent duplicate patent records
- Seamless BigQuery integration (writes to custom dataset)
- Critical for time-sensitive competitive intelligence
Firestore Perfect for Multi-Source Aggregation
- Flexible schema handles different data types
- Real-time updates from Cloud Functions
- Simple queries with .where() filters
- Cost-effective for moderate data volumes
Cloud Run Simplicity Accelerates Development
- Just need a Dockerfile
- Auto-scaling handles unpredictable AI workloads
- Deploy in minutes with gcloud run deploy
- No infrastructure management

AI Agent Design Learnings:

Show Your Work = Trust
- Users trust AI more when they see reasoning
- "Analysis Strategy" before action builds confidence
- Citing specific evidence (patent numbers, job titles) crucial
- Transparency > black box magic
Graceful Degradation is Essential
- Data sources fail (APIs down, rate limits, no data)
- Agent must adapt, not crash
- "No patents found → focus on other 3 sources" = good UX
- Communicate what's missing and why it's okay
Cross-Signal Correlation = Magic Moment
- Single source = data dump
- Multiple sources correlated = strategic insight
- "Patents on X + Jobs hiring for X + News about X = Prediction Y" = value
- This is where AI > human analyst (pattern recognition at scale)
Structured Output Matches Analyst Workflows
- Executive Summary → Decision makers
- Detailed Analysis → Strategy teams
- Predictions → Planning teams
- Format matters as much as content

What's next for IntelAgent

Future roadmap and enhancements.

Near-Term (Next 30 Days):

[ ] LinkedIn integration for employee movement tracking
[ ] Custom alerts (email/Slack) on strategic shift detection
[ ] Comparative dashboards (side-by-side company analysis)
[ ] Export to PDF/PowerPoint for executive presentations
[ ] API access for programmatic integration

Medium-Term (3-6 Months):

[ ] Funding database integration (Crunchbase, PitchBook)
[ ] Sentiment analysis on news coverage (NLP deep dive)
[ ] Custom data source connectors (user-supplied APIs)
[ ] Multi-company monitoring dashboard
[ ] Historical trend analysis and pattern detection

Long-Term (6-12 Months):

[ ] Multi-modal analysis (earnings call videos, slide decks)
[ ] Predictive ML models trained on historical accuracy
[ ] Customer success stories and case studies
[ ] Enterprise features (SSO, audit trails, role-based access)
[ ] White-label offering for consulting firms

Production Hardening:

[ ] Increase Vertex AI quotas for enterprise scale
[ ] Add Redis caching for frequently accessed patents
[ ] Implement user authentication and usage tracking
[ ] Build compliance audit trail (SOC 2, GDPR)
[ ] Add data quality monitoring and alerts

Business Model:

Free tier: 10 analyses/month
Pro: $49/month unlimited analyses
Enterprise: Custom pricing with dedicated support

What it does

How I built it

Challenges I ran into

Accomplishments that I'm proud of

What I learned

What's next for Intelligence Agent

Built With

bigquery
cloud-functions
cloud-run
cloud-scheduler
firestore
fivetran
gemini-2-5-pro
google-cloud
python
streamlit
vertex-ai

Updates

richelgomez99 Gomez started this project — Oct 24, 2025 04:57 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.