Info Hunter - Project Description

💡 Inspiration

As developers, we've all experienced the frustration of spending hours jumping between GitHub repositories, Stack Overflow threads, documentation sites, and blog posts just to find that one code example that solves our problem. The information we need is scattered across the web, and traditional search engines often return hundreds of results with minimal relevance.

We built Info Hunter to solve a real pain point: developers waste 2-3 hours daily searching for code examples and solutions. We wanted to create a single, intelligent search interface that aggregates programming knowledge from across the web and understands what developers are actually looking for—not just matching keywords, but understanding intent and context.

The idea was born from the recognition that while we have amazing resources like GitHub, Stack Overflow, and technical blogs, there wasn't a unified way to search them all intelligently. Info Hunter fills that gap with AI-powered semantic search that makes finding relevant code examples as easy as asking a question.

🎯 What It Does

Info Hunter is a developer knowledge aggregator and search engine that makes finding code examples and programming solutions faster and more efficient.

Core Features:

🔍 Intelligent Search:

Keyword Search: Traditional full-text search across all indexed content
Semantic Search: AI-powered search that understands meaning and context
Hybrid Search: Combines keyword and semantic search for best results
Filters by source type, programming language, framework, tags, and date

🤖 AI-Powered Features:

"Ask Info Hunter": Ask natural language questions and get answers with citations
AI Enrichment: Automatically generates better summaries, tags, and quality scores
Smart Tagging: AI identifies programming languages, frameworks, and key concepts

📚 Knowledge Aggregation:

Ingests content from GitHub READMEs via REST API
Pulls questions and answers from Stack Exchange (Stack Overflow, etc.)
Aggregates articles from programming RSS feeds
Extracts code blocks with their surrounding explanations
Respects licensing and maintains full attribution

🎨 Modern UI:

Sleek glassmorphism design with smooth animations
Interactive kinetic glyph background
Real-time search with highlighting
Responsive and accessible interface

🔒 Developer-Focused:

Proper attribution and source links for all content
Respects API rate limits and terms of service
Open source and fully self-hostable
Docker-based deployment for easy setup

🛠️ How We Built It

Architecture Overview

Info Hunter follows a microservices architecture with clear separation between ingestion, storage, indexing, and search.

Backend (Python/FastAPI):

Built with FastAPI for high-performance async API endpoints
SQLAlchemy and Alembic for database ORM and migrations
PostgreSQL as the canonical data store
Elasticsearch 8.x for full-text and vector search
Celery with Redis for background job processing
Modular connector architecture for different data sources (GitHub, Stack Exchange, RSS)

Data Ingestion Pipeline:

Connectors fetch data from sources (GitHub API, Stack Exchange API, RSS feeds)
Extractors parse markdown/HTML and extract code blocks with context
Deduplication prevents duplicate entries using content hashing
Storage saves to PostgreSQL with proper schema
Indexing updates Elasticsearch with searchable content

AI Integration:

OpenAI and Anthropic provider adapter for flexible AI model usage
Embeddings: OpenAI text-embedding-3-small for semantic search vectors
Enrichment Task: Celery task that uses LLMs to improve metadata
RAG Pipeline: Retrieval-Augmented Generation for question answering
Pydantic schemas for strict JSON output validation from LLMs

Frontend (Next.js/React):

Next.js 15 with App Router for modern React development
TypeScript for type safety
Tailwind CSS for utility-first styling
Framer Motion for smooth animations and transitions
Axios for API communication
Custom components for glassmorphism UI and kinetic interactions

Infrastructure:

Docker Compose for orchestration of all services
Multi-stage Docker builds for optimized images
Health checks and service dependencies
Volume mounts for development hot-reloading

Development Workflow

Setup: Docker Compose brings up all services (Postgres, Redis, Elasticsearch, Backend, Celery workers, Frontend)
Ingestion: Admin endpoints trigger connectors to fetch and process data
AI Processing: Background tasks enrich content and generate embeddings
Search: Users query through REST API, which searches Elasticsearch
Frontend: React components fetch results and display with animations

🚧 Challenges We Ran Into

Technical Challenges

1. Elasticsearch Vector Search Implementation

Challenge: Implementing semantic search with vector embeddings in Elasticsearch 8.x
Solution: Used dense_vector field type with cosine similarity scoring, implemented hybrid search that combines keyword and vector queries

2. AI Provider Abstraction

Challenge: Supporting multiple AI providers (OpenAI, Anthropic) with different APIs
Solution: Created a unified adapter interface with provider-specific implementations, allowing easy switching between providers

3. Code Block Extraction and Context

Challenge: Extracting code snippets while preserving surrounding explanation text
Solution: Built custom markdown and HTML parsers that track context around code blocks, maintaining the relationship between code and explanations

4. Deduplication Logic

Challenge: Preventing duplicate entries from the same source across multiple ingestion runs
Solution: Implemented content hashing and deterministic dedupe keys, with efficient database queries to check for existing content

5. Rate Limiting and API Compliance

Challenge: Respecting rate limits for GitHub, Stack Exchange, and other APIs
Solution: Built per-domain rate limiting with exponential backoff and retry logic, with configurable limits per connector

6. Frontend Performance

Challenge: Initial load performance issues (LCP 4.17s, CLS 0.46, INP 3,688ms)
Solution: Implemented lazy loading, memoization, reduced DOM nodes, optimized CSS with GPU acceleration, and reduced animation complexity

7. AI Output Validation

Challenge: Ensuring LLMs return structured, valid JSON for enrichment data
Solution: Used Pydantic schemas with strict validation, prompt engineering for JSON-only output, and proper error handling for malformed responses

Integration Challenges

1. Celery Task Coordination

Challenge: Ensuring idempotent tasks and proper error handling across distributed workers
Solution: Implemented task retries with exponential backoff, comprehensive logging, and database-level idempotency checks

2. Docker Networking

Challenge: Service discovery and communication between containers
Solution: Used Docker Compose service names for internal networking, proper health checks, and dependency management

3. Search Query Building

Challenge: Building complex Elasticsearch queries that support both keyword and semantic search simultaneously
Solution: Created a flexible query builder that constructs bool queries with proper must/should/filter clauses based on search mode

🏆 Accomplishments That We're Proud Of

1. Full-Stack AI Integration We successfully integrated AI at multiple levels—semantic search with embeddings, content enrichment, and RAG-based question answering—all working seamlessly together. The hybrid search (keyword + semantic) delivers significantly better results than either approach alone.

2. Production-Ready Architecture Built a scalable, maintainable architecture with proper separation of concerns, background job processing, error handling, and comprehensive logging. The system can handle large-scale ingestion and search workloads.

3. Beautiful, Performant UI Created a stunning glassmorphism interface with smooth animations that doesn't sacrifice performance. Optimized from initial LCP of 4.17s down to sub-second load times while maintaining visual polish.

4. Comprehensive Testing and Documentation Implemented unit tests for critical components, created detailed flow diagrams, and maintained thorough documentation including setup guides, API documentation, and architectural decisions.

5. Open Source Best Practices Following best practices for open source projects: proper .gitignore, environment variable management, Docker-based deployment, comprehensive README, and clear contribution guidelines.

6. API Design Designed a clean, RESTful API with proper error handling, pagination, filtering, and support for both traditional and AI-powered search modes. The API is intuitive and well-documented.

7. Developer Experience Made the project easy to set up and run locally with Docker Compose, providing example configuration files and clear documentation. Developers can be up and running in minutes.

8. Respectful Data Usage Implemented proper attribution, licensing respect, and API compliance. The system never scrapes without permission and uses official APIs wherever possible.

📚 What We Learned

Technical Learnings:

Vector Search and Embeddings: Deep dive into semantic search, vector databases, and how to effectively combine keyword and vector search for optimal results.
Elasticsearch Advanced Features: Learned to leverage Elasticsearch's dense_vector fields, query DSL, highlighting, and complex bool queries for sophisticated search functionality.
LLM Integration Patterns: Gained experience with prompt engineering, structured output generation, error handling with AI APIs, and building reliable RAG pipelines.
Async Python: Mastered FastAPI's async capabilities, proper async/await patterns, and coordinating async operations with Celery tasks.
Frontend Performance Optimization: Learned about Core Web Vitals, React optimization techniques (memoization, lazy loading), CSS performance (GPU acceleration, containment), and reducing JavaScript bundle sizes.
Microservices Coordination: Understanding service discovery, health checks, dependency management, and proper logging across distributed services.

Process Learnings:

Incremental Development: Building the MVP first (ingestion + basic search) then adding AI features incrementally proved more effective than trying to build everything at once.
Debugging Distributed Systems: Learned the importance of structured logging, instrumentation, and trace IDs when debugging issues across multiple services.
AI API Rate Limiting: Understanding and implementing proper rate limiting and retry logic is crucial when working with paid AI APIs.
User Experience Matters: Even with powerful backend features, the frontend experience determines user adoption. Investing time in UI/UX pays off.
Documentation as You Go: Maintaining documentation alongside code prevents it from becoming outdated and helps with onboarding.

🚀 What's Next for Info Hunter

Short-Term Roadmap (Next 3 Months)

Enhanced Search:

[ ] Query autocomplete and suggestions
[ ] Search history and saved searches UI improvements
[ ] Advanced filters (author, license type, code quality metrics)
[ ] Search result ranking improvements based on user feedback

Content Expansion:

[ ] Add more data sources (Dev.to, Medium programming tags, official documentation sites)
[ ] Support for more file types (Jupyter notebooks, API specs, tutorials)
[ ] GitHub Gist integration
[ ] Package manager documentation (npm, PyPI, etc.)

AI Improvements:

[ ] Multi-model support for different use cases (faster/cheaper models for simple tasks)
[ ] Code explanation generation for complex snippets
[ ] Automatic code quality scoring
[ ] Duplicate detection using embeddings

Medium-Term Roadmap (3-6 Months)

User Features:

[ ] User accounts and personalization
[ ] Custom collections and bookmarks
[ ] Collaborative filtering ("users who viewed this also viewed")
[ ] Comments and community annotations on code snippets

Advanced Features:

[ ] Code snippet syntax highlighting improvements
[ ] Side-by-side code comparison
[ ] Code snippet execution in browser (sandboxed)
[ ] Integration with IDEs (VS Code extension, JetBrains plugin)

Performance & Scale:

[ ] Horizontal scaling for ingestion workers
[ ] Elasticsearch cluster support
[ ] CDN integration for static assets
[ ] Caching layer (Redis) for frequent searches

Long-Term Vision (6+ Months)

Platform Features:

[ ] Public API for developers to integrate Info Hunter search
[ ] Webhooks for new content matching saved searches
[ ] Browser extension for quick search from any page
[ ] Mobile app (iOS/Android)

Community & Open Source:

[ ] Contributor guidelines and project governance
[ ] Plugin system for custom connectors
[ ] Community-driven content curation
[ ] Translation support for international developers

Enterprise Features:

[ ] Self-hosted enterprise deployment options
[ ] Private knowledge base support (internal docs)
[ ] SSO integration
[ ] Analytics and usage insights dashboard

Research & Innovation:

[ ] Fine-tuned models for code understanding
[ ] Automatic code example quality assessment
[ ] Context-aware code suggestions
[ ] Integration with AI coding assistants (GitHub Copilot, Cursor)

🛠️ Technologies Used

Backend

Python 3.11 - Core programming language
FastAPI - High-performance async web framework
SQLAlchemy - Python SQL toolkit and ORM
Alembic - Database migration tool
PostgreSQL 15 - Relational database
Elasticsearch 8.11 - Search and analytics engine
Celery - Distributed task queue
Redis 7 - In-memory data store and message broker
Pydantic - Data validation using Python type annotations
BeautifulSoup4 - HTML parsing and extraction
Markdown - Markdown parsing library
Requests - HTTP library for API calls

AI & Machine Learning

OpenAI API - GPT models for text generation and embeddings
Anthropic API - Claude models (alternative AI provider)
text-embedding-3-small - OpenAI embedding model for semantic search

Frontend

Next.js 15 - React framework with App Router
React 18.3 - UI library
TypeScript - Typed JavaScript
Tailwind CSS - Utility-first CSS framework
Framer Motion - Animation library for React
Axios - Promise-based HTTP client

Infrastructure & DevOps

Docker - Containerization
Docker Compose - Multi-container orchestration
Nginx - (Potential) reverse proxy and load balancer

Development Tools

Git - Version control
pytest - Python testing framework
ESLint/Prettier - Code linting and formatting (frontend)
Alembic - Database migrations

APIs & Services

GitHub REST API - Repository and content access
Stack Exchange API - Q&A site data
RSS/Atom Feeds - Blog and article aggregation

Libraries & Utilities

python-dateutil - Date parsing utilities
uuid - UUID generation (Python stdlib)
logging - Structured logging
asyncio - Asynchronous I/O support

📊 Tech Stack Summary

Frontend Layer:
├── Next.js 15 (React 18.3)
├── TypeScript
├── Tailwind CSS
└── Framer Motion

API Layer:
├── FastAPI (Python 3.11)
├── RESTful endpoints
└── Async/await support

Data Layer:
├── PostgreSQL 15 (canonical storage)
└── Elasticsearch 8.11 (search index)

Background Processing:
├── Celery (task queue)
└── Redis 7 (broker & cache)

AI Integration:
├── OpenAI API
├── Anthropic API
└── Custom provider adapter

Infrastructure:
├── Docker
└── Docker Compose

Data Sources:
├── GitHub REST API
├── Stack Exchange API
└── RSS/Atom Feeds

Info Hunter represents a full-stack application that combines modern web technologies, AI capabilities, and best practices to solve a real developer pain point. It's a testament to how thoughtful architecture, clean code, and user-centric design can create a powerful tool that developers actually want to use.

Built With

axios
celery
docker
elasticsearch
fastapi
framer-motion
github
next.js
openai-api
postgresql
python
redis
rss
sqlalchemy
stack-exchange
tailwind-css
typescript

Updates

Abdulhamid Sonaike started this project — Jan 07, 2026 09:40 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.