As someone navigating the job market and building sales pipelines, I noticed a frustrating pattern: the best opportunities weren't on job boards or in CRM databases—they were hidden in plain sight. Recruiters posting referral threads on Reddit, hiring managers announcing openings on LinkedIn, potential customers complaining about problems on X (Twitter).

I thought: What if we could automatically scan these platforms and surface actionable signals—not just discussions, but posts where someone can actually do something?

That's how Signal AI was born: an AI-powered scout that finds hiring signals for job seekers and buying signals for sales teams by intelligently filtering through thousands of posts across LinkedIn, Reddit, and X.

What I Learned

1. The Challenge of Semantic Search

Traditional keyword matching fails when people use different wording. "PM" vs "Product Manager," "Seattle area" vs "Greater Seattle," "hiring" vs "looking for"—these are semantically equivalent but keyword different. I learned that LLMs are incredibly powerful for semantic matching, understanding intent beyond exact words.

2. The Cost-Performance Trade-off

Early on, I used Google Custom Search Engine (CSE) for everything. It was simple but had limitations:

  • Indexing delay: 1-7 days for new content
  • Limited freshness: Recent posts often missed
  • API costs: Can add up quickly

I learned to balance:

  • Direct APIs (Reddit, X, LinkedIn via Google CSE) for real-time results
  • Batch LLM processing to reduce costs while maintaining quality
  • Heuristic pre-filtering before expensive LLM calls

3. Quality Over Quantity

Initially, I was too generous with results—84% were marked as LOW intent. I learned that strict filtering is better than showing everything. Users want actionable opportunities, not noise. This meant:

  • Strict LLM scoring criteria
  • Always filtering LOW signals
  • Prioritizing posts with clear actionability (DM, email, referral links)

4. The Importance of Deduplication

When searching across multiple platforms with overlapping queries, duplicates are inevitable. I implemented three-stage deduplication:

  1. URL deduplication: Remove exact duplicates
  2. Account deduplication: Limit to 1 post per account/user
  3. Final normalization: Handle URL variations (query params, fragments)

How I Built It

Architecture

Backend (Python/FastAPI)

  • FastAPI for async API endpoints
  • httpx.AsyncClient with connection pooling for performance
  • Server-Sent Events (SSE) for streaming responses

Search Strategy

  • Multi-platform search: Reddit API, X API v2, Google CSE for LinkedIn
  • Two-stage ranking:
    1. Fast heuristic filtering (keywords, recency, source quality)
    2. LLM semantic matching on top candidates
  • Batch LLM processing: Score 5 posts per call, 5 parallel batches = 25 posts simultaneously

Frontend (Vanilla JavaScript)

  • Dual-card landing page (Find Jobs / Find Customers)
  • Full-page flow views for focused experience
  • Real-time loading indicators
  • Results tables with intent badges

Key Features

Find Jobs Flow:

  1. Parse natural language query → extract roles, companies, locations
  2. Generate diverse search queries (20+ variations)
  3. Search across platforms (LinkedIn-focused for jobs)
  4. Heuristic ranking → top 200 posts
  5. LLM semantic scoring → HIGH/MEDIUM intent only
  6. Deduplicate by URL and account
  7. Return top 50 actionable signals

Find Customers Flow:

  1. Extract ICP from website URL
  2. Generate GTM topics
  3. Create search keywords
  4. Search for buying signals
  5. LLM filter for relevance
  6. Return actionable customer signals

Challenges Faced

1. Performance Optimization

Problem: Initial searches took 2+ minutes—too slow for users.

Solution:

  • Implemented batch LLM processing (5 posts per call vs 1:1)
  • Parallel batch execution (5 batches simultaneously)
  • Heuristic pre-filtering (reduce 300+ posts → 100 before LLM)
  • Result: 10-20 seconds instead of 2+ minutes

2. API Cost Management

Problem: SerpApi was expensive ($50/month for 5,000 searches).

Solution:

  • Switched LinkedIn search to Google CSE (much cheaper: $5 per 1,000 queries)
  • Kept direct APIs for Reddit and X (real-time, free tier available)
  • Reduced costs by ~80% while maintaining quality

3. Quality vs Quantity

Problem: Too many LOW intent results (84% of results).

Solution:

  • Made LLM scoring stricter (HIGH requires: hiring signal + way to apply + role match + location/company)
  • Always filter LOW signals (quality over quantity)
  • Improved prompts to emphasize actionability
  • Result: 0% LOW signals, only HIGH/MEDIUM actionable posts

4. Duplicate Detection

Problem: Same posts appearing multiple times from different queries.

Solution:

  • Three-stage deduplication (URL → account → normalization)
  • Account-based deduplication (limit 1 post per user/account)
  • Normalize URLs (remove query params, fragments)

5. Flow Isolation

Problem: Job search results appearing in customer flow (and vice versa).

Solution:

  • Clear results when switching flows
  • Separate global state variables
  • Reset all chips, inputs, and data on flow switch

6. LinkedIn Post Quality

Problem: Getting job board links instead of actual posts.

Solution:

  • Filter out /jobs/ paths
  • Focus on /posts/, /activity/, /feed/ URLs
  • Check title patterns for job aggregators
  • Use semantic matching to identify real posts

7. Date Filtering Accuracy

Problem: Google CSE date filtering wasn't reliable.

Solution:

  • Implemented client-side date parsing with python-dateutil
  • Strict date validation
  • Fallback for posts without clear dates
  • User-configurable date ranges (1 month to 2 years)

Technical Highlights

Batch LLM Processing

# Score 5 posts per LLM call, 5 parallel batches = 25 posts simultaneously
posts_per_batch = 5
batch_size = 5
# Process in parallel groups for speed

Semantic Matching

Instead of exact keyword matching, the LLM understands:

  • "PM" = "Product Manager"
  • "Seattle area" = "Greater Seattle"
  • "hiring" = "looking for" = "open role"

Intent Scoring

  • HIGH: Clear hiring signal + way to apply + role match + location/company
  • MEDIUM: Hiring signal but less actionable
  • LOW: Filtered out (quality over quantity)

Results

  • Search Speed: 10-20 seconds (down from 2+ minutes)
  • Cost Reduction: ~80% cheaper API costs
  • Quality: 0% LOW signals, only actionable HIGH/MEDIUM
  • Coverage: LinkedIn-focused for jobs, balanced for customers
  • Deduplication: Zero duplicate posts

What's Next

  • Real-time monitoring: Set up alerts for specific keywords/roles
  • Personalized scoring: Learn from user feedback to improve ranking
  • More platforms: Expand to other communities (Discord, Slack, etc.)
  • Batch exports: Export results to CSV/Google Sheets
  • API access: Let other apps integrate Signal Finder

Takeaways

  1. Semantic understanding beats keyword matching - LLMs unlock intent-based search
  2. Batch processing is essential - Parallel LLM calls dramatically improve performance
  3. Quality > Quantity - Users prefer fewer, actionable results over many irrelevant ones
  4. Cost optimization matters - Direct APIs + smart caching reduce expenses significantly
  5. User experience first - Clean UI, loading indicators, and flow isolation make all the difference

Built With

Share this project:

Updates