Signal AI

As someone navigating the job market and building sales pipelines, I noticed a frustrating pattern: the best opportunities weren't on job boards or in CRM databases—they were hidden in plain sight. Recruiters posting referral threads on Reddit, hiring managers announcing openings on LinkedIn, potential customers complaining about problems on X (Twitter).

I thought: What if we could automatically scan these platforms and surface actionable signals—not just discussions, but posts where someone can actually do something?

That's how Signal AI was born: an AI-powered scout that finds hiring signals for job seekers and buying signals for sales teams by intelligently filtering through thousands of posts across LinkedIn, Reddit, and X.

What I Learned

1. The Challenge of Semantic Search

Traditional keyword matching fails when people use different wording. "PM" vs "Product Manager," "Seattle area" vs "Greater Seattle," "hiring" vs "looking for"—these are semantically equivalent but keyword different. I learned that LLMs are incredibly powerful for semantic matching, understanding intent beyond exact words.

2. The Cost-Performance Trade-off

Early on, I used Google Custom Search Engine (CSE) for everything. It was simple but had limitations:

Indexing delay: 1-7 days for new content
Limited freshness: Recent posts often missed
API costs: Can add up quickly

I learned to balance:

Direct APIs (Reddit, X, LinkedIn via Google CSE) for real-time results
Batch LLM processing to reduce costs while maintaining quality
Heuristic pre-filtering before expensive LLM calls

3. Quality Over Quantity

Initially, I was too generous with results—84% were marked as LOW intent. I learned that strict filtering is better than showing everything. Users want actionable opportunities, not noise. This meant:

Strict LLM scoring criteria
Always filtering LOW signals
Prioritizing posts with clear actionability (DM, email, referral links)

4. The Importance of Deduplication

When searching across multiple platforms with overlapping queries, duplicates are inevitable. I implemented three-stage deduplication:

URL deduplication: Remove exact duplicates
Account deduplication: Limit to 1 post per account/user
Final normalization: Handle URL variations (query params, fragments)

How I Built It

Architecture

Backend (Python/FastAPI)

FastAPI for async API endpoints
httpx.AsyncClient with connection pooling for performance
Server-Sent Events (SSE) for streaming responses

Search Strategy

Multi-platform search: Reddit API, X API v2, Google CSE for LinkedIn
Two-stage ranking:
1. Fast heuristic filtering (keywords, recency, source quality)
2. LLM semantic matching on top candidates
Batch LLM processing: Score 5 posts per call, 5 parallel batches = 25 posts simultaneously

Frontend (Vanilla JavaScript)

Dual-card landing page (Find Jobs / Find Customers)
Full-page flow views for focused experience
Real-time loading indicators
Results tables with intent badges

Key Features

Find Jobs Flow:

Parse natural language query → extract roles, companies, locations
Generate diverse search queries (20+ variations)
Search across platforms (LinkedIn-focused for jobs)
Heuristic ranking → top 200 posts
LLM semantic scoring → HIGH/MEDIUM intent only
Deduplicate by URL and account
Return top 50 actionable signals

Find Customers Flow:

Extract ICP from website URL
Generate GTM topics
Create search keywords
Search for buying signals
LLM filter for relevance
Return actionable customer signals

Challenges Faced

1. Performance Optimization

Problem: Initial searches took 2+ minutes—too slow for users.

Solution:

Implemented batch LLM processing (5 posts per call vs 1:1)
Parallel batch execution (5 batches simultaneously)
Heuristic pre-filtering (reduce 300+ posts → 100 before LLM)
Result: 10-20 seconds instead of 2+ minutes

2. API Cost Management

Problem: SerpApi was expensive ($50/month for 5,000 searches).

Solution:

Switched LinkedIn search to Google CSE (much cheaper: $5 per 1,000 queries)
Kept direct APIs for Reddit and X (real-time, free tier available)
Reduced costs by ~80% while maintaining quality

3. Quality vs Quantity

Problem: Too many LOW intent results (84% of results).

Solution:

Made LLM scoring stricter (HIGH requires: hiring signal + way to apply + role match + location/company)
Always filter LOW signals (quality over quantity)
Improved prompts to emphasize actionability
Result: 0% LOW signals, only HIGH/MEDIUM actionable posts

4. Duplicate Detection

Problem: Same posts appearing multiple times from different queries.

Solution:

Three-stage deduplication (URL → account → normalization)
Account-based deduplication (limit 1 post per user/account)
Normalize URLs (remove query params, fragments)

5. Flow Isolation

Problem: Job search results appearing in customer flow (and vice versa).

Solution:

Clear results when switching flows
Separate global state variables
Reset all chips, inputs, and data on flow switch

6. LinkedIn Post Quality

Problem: Getting job board links instead of actual posts.

Solution:

Filter out /jobs/ paths
Focus on /posts/, /activity/, /feed/ URLs
Check title patterns for job aggregators
Use semantic matching to identify real posts

7. Date Filtering Accuracy

Problem: Google CSE date filtering wasn't reliable.

Solution:

Implemented client-side date parsing with python-dateutil
Strict date validation
Fallback for posts without clear dates
User-configurable date ranges (1 month to 2 years)

Technical Highlights

Batch LLM Processing

# Score 5 posts per LLM call, 5 parallel batches = 25 posts simultaneously
posts_per_batch = 5
batch_size = 5
# Process in parallel groups for speed

Semantic Matching

Instead of exact keyword matching, the LLM understands:

"PM" = "Product Manager"
"Seattle area" = "Greater Seattle"
"hiring" = "looking for" = "open role"

Intent Scoring

HIGH: Clear hiring signal + way to apply + role match + location/company
MEDIUM: Hiring signal but less actionable
LOW: Filtered out (quality over quantity)

Results

Search Speed: 10-20 seconds (down from 2+ minutes)
Cost Reduction: ~80% cheaper API costs
Quality: 0% LOW signals, only actionable HIGH/MEDIUM
Coverage: LinkedIn-focused for jobs, balanced for customers
Deduplication: Zero duplicate posts

What's Next

Real-time monitoring: Set up alerts for specific keywords/roles
Personalized scoring: Learn from user feedback to improve ranking
More platforms: Expand to other communities (Discord, Slack, etc.)
Batch exports: Export results to CSV/Google Sheets
API access: Let other apps integrate Signal Finder

Takeaways

Semantic understanding beats keyword matching - LLMs unlock intent-based search
Batch processing is essential - Parallel LLM calls dramatically improve performance
Quality > Quantity - Users prefer fewer, actionable results over many irrelevant ones
Cost optimization matters - Direct APIs + smart caching reduce expenses significantly
User experience first - Clean UI, loading indicators, and flow isolation make all the difference

Built With

cse
fastapi
gemini

Updates

Cyndi Song started this project — Nov 01, 2025 02:47 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.