Info Hunter - Project Description
💡 Inspiration
As developers, we've all experienced the frustration of spending hours jumping between GitHub repositories, Stack Overflow threads, documentation sites, and blog posts just to find that one code example that solves our problem. The information we need is scattered across the web, and traditional search engines often return hundreds of results with minimal relevance.
We built Info Hunter to solve a real pain point: developers waste 2-3 hours daily searching for code examples and solutions. We wanted to create a single, intelligent search interface that aggregates programming knowledge from across the web and understands what developers are actually looking for—not just matching keywords, but understanding intent and context.
The idea was born from the recognition that while we have amazing resources like GitHub, Stack Overflow, and technical blogs, there wasn't a unified way to search them all intelligently. Info Hunter fills that gap with AI-powered semantic search that makes finding relevant code examples as easy as asking a question.
🎯 What It Does
Info Hunter is a developer knowledge aggregator and search engine that makes finding code examples and programming solutions faster and more efficient.
Core Features:
🔍 Intelligent Search:
- Keyword Search: Traditional full-text search across all indexed content
- Semantic Search: AI-powered search that understands meaning and context
- Hybrid Search: Combines keyword and semantic search for best results
- Filters by source type, programming language, framework, tags, and date
🤖 AI-Powered Features:
- "Ask Info Hunter": Ask natural language questions and get answers with citations
- AI Enrichment: Automatically generates better summaries, tags, and quality scores
- Smart Tagging: AI identifies programming languages, frameworks, and key concepts
📚 Knowledge Aggregation:
- Ingests content from GitHub READMEs via REST API
- Pulls questions and answers from Stack Exchange (Stack Overflow, etc.)
- Aggregates articles from programming RSS feeds
- Extracts code blocks with their surrounding explanations
- Respects licensing and maintains full attribution
🎨 Modern UI:
- Sleek glassmorphism design with smooth animations
- Interactive kinetic glyph background
- Real-time search with highlighting
- Responsive and accessible interface
🔒 Developer-Focused:
- Proper attribution and source links for all content
- Respects API rate limits and terms of service
- Open source and fully self-hostable
- Docker-based deployment for easy setup
🛠️ How We Built It
Architecture Overview
Info Hunter follows a microservices architecture with clear separation between ingestion, storage, indexing, and search.
Backend (Python/FastAPI):
- Built with FastAPI for high-performance async API endpoints
- SQLAlchemy and Alembic for database ORM and migrations
- PostgreSQL as the canonical data store
- Elasticsearch 8.x for full-text and vector search
- Celery with Redis for background job processing
- Modular connector architecture for different data sources (GitHub, Stack Exchange, RSS)
Data Ingestion Pipeline:
- Connectors fetch data from sources (GitHub API, Stack Exchange API, RSS feeds)
- Extractors parse markdown/HTML and extract code blocks with context
- Deduplication prevents duplicate entries using content hashing
- Storage saves to PostgreSQL with proper schema
- Indexing updates Elasticsearch with searchable content
AI Integration:
- OpenAI and Anthropic provider adapter for flexible AI model usage
- Embeddings: OpenAI
text-embedding-3-smallfor semantic search vectors - Enrichment Task: Celery task that uses LLMs to improve metadata
- RAG Pipeline: Retrieval-Augmented Generation for question answering
- Pydantic schemas for strict JSON output validation from LLMs
Frontend (Next.js/React):
- Next.js 15 with App Router for modern React development
- TypeScript for type safety
- Tailwind CSS for utility-first styling
- Framer Motion for smooth animations and transitions
- Axios for API communication
- Custom components for glassmorphism UI and kinetic interactions
Infrastructure:
- Docker Compose for orchestration of all services
- Multi-stage Docker builds for optimized images
- Health checks and service dependencies
- Volume mounts for development hot-reloading
Development Workflow
- Setup: Docker Compose brings up all services (Postgres, Redis, Elasticsearch, Backend, Celery workers, Frontend)
- Ingestion: Admin endpoints trigger connectors to fetch and process data
- AI Processing: Background tasks enrich content and generate embeddings
- Search: Users query through REST API, which searches Elasticsearch
- Frontend: React components fetch results and display with animations
🚧 Challenges We Ran Into
Technical Challenges
1. Elasticsearch Vector Search Implementation
- Challenge: Implementing semantic search with vector embeddings in Elasticsearch 8.x
- Solution: Used
dense_vectorfield type with cosine similarity scoring, implemented hybrid search that combines keyword and vector queries
2. AI Provider Abstraction
- Challenge: Supporting multiple AI providers (OpenAI, Anthropic) with different APIs
- Solution: Created a unified adapter interface with provider-specific implementations, allowing easy switching between providers
3. Code Block Extraction and Context
- Challenge: Extracting code snippets while preserving surrounding explanation text
- Solution: Built custom markdown and HTML parsers that track context around code blocks, maintaining the relationship between code and explanations
4. Deduplication Logic
- Challenge: Preventing duplicate entries from the same source across multiple ingestion runs
- Solution: Implemented content hashing and deterministic dedupe keys, with efficient database queries to check for existing content
5. Rate Limiting and API Compliance
- Challenge: Respecting rate limits for GitHub, Stack Exchange, and other APIs
- Solution: Built per-domain rate limiting with exponential backoff and retry logic, with configurable limits per connector
6. Frontend Performance
- Challenge: Initial load performance issues (LCP 4.17s, CLS 0.46, INP 3,688ms)
- Solution: Implemented lazy loading, memoization, reduced DOM nodes, optimized CSS with GPU acceleration, and reduced animation complexity
7. AI Output Validation
- Challenge: Ensuring LLMs return structured, valid JSON for enrichment data
- Solution: Used Pydantic schemas with strict validation, prompt engineering for JSON-only output, and proper error handling for malformed responses
Integration Challenges
1. Celery Task Coordination
- Challenge: Ensuring idempotent tasks and proper error handling across distributed workers
- Solution: Implemented task retries with exponential backoff, comprehensive logging, and database-level idempotency checks
2. Docker Networking
- Challenge: Service discovery and communication between containers
- Solution: Used Docker Compose service names for internal networking, proper health checks, and dependency management
3. Search Query Building
- Challenge: Building complex Elasticsearch queries that support both keyword and semantic search simultaneously
- Solution: Created a flexible query builder that constructs bool queries with proper must/should/filter clauses based on search mode
🏆 Accomplishments That We're Proud Of
1. Full-Stack AI Integration We successfully integrated AI at multiple levels—semantic search with embeddings, content enrichment, and RAG-based question answering—all working seamlessly together. The hybrid search (keyword + semantic) delivers significantly better results than either approach alone.
2. Production-Ready Architecture Built a scalable, maintainable architecture with proper separation of concerns, background job processing, error handling, and comprehensive logging. The system can handle large-scale ingestion and search workloads.
3. Beautiful, Performant UI Created a stunning glassmorphism interface with smooth animations that doesn't sacrifice performance. Optimized from initial LCP of 4.17s down to sub-second load times while maintaining visual polish.
4. Comprehensive Testing and Documentation Implemented unit tests for critical components, created detailed flow diagrams, and maintained thorough documentation including setup guides, API documentation, and architectural decisions.
5. Open Source Best Practices Following best practices for open source projects: proper .gitignore, environment variable management, Docker-based deployment, comprehensive README, and clear contribution guidelines.
6. API Design Designed a clean, RESTful API with proper error handling, pagination, filtering, and support for both traditional and AI-powered search modes. The API is intuitive and well-documented.
7. Developer Experience Made the project easy to set up and run locally with Docker Compose, providing example configuration files and clear documentation. Developers can be up and running in minutes.
8. Respectful Data Usage Implemented proper attribution, licensing respect, and API compliance. The system never scrapes without permission and uses official APIs wherever possible.
📚 What We Learned
Technical Learnings:
Vector Search and Embeddings: Deep dive into semantic search, vector databases, and how to effectively combine keyword and vector search for optimal results.
Elasticsearch Advanced Features: Learned to leverage Elasticsearch's
dense_vectorfields, query DSL, highlighting, and complex bool queries for sophisticated search functionality.LLM Integration Patterns: Gained experience with prompt engineering, structured output generation, error handling with AI APIs, and building reliable RAG pipelines.
Async Python: Mastered FastAPI's async capabilities, proper async/await patterns, and coordinating async operations with Celery tasks.
Frontend Performance Optimization: Learned about Core Web Vitals, React optimization techniques (memoization, lazy loading), CSS performance (GPU acceleration, containment), and reducing JavaScript bundle sizes.
Microservices Coordination: Understanding service discovery, health checks, dependency management, and proper logging across distributed services.
Process Learnings:
Incremental Development: Building the MVP first (ingestion + basic search) then adding AI features incrementally proved more effective than trying to build everything at once.
Debugging Distributed Systems: Learned the importance of structured logging, instrumentation, and trace IDs when debugging issues across multiple services.
AI API Rate Limiting: Understanding and implementing proper rate limiting and retry logic is crucial when working with paid AI APIs.
User Experience Matters: Even with powerful backend features, the frontend experience determines user adoption. Investing time in UI/UX pays off.
Documentation as You Go: Maintaining documentation alongside code prevents it from becoming outdated and helps with onboarding.
🚀 What's Next for Info Hunter
Short-Term Roadmap (Next 3 Months)
Enhanced Search:
- [ ] Query autocomplete and suggestions
- [ ] Search history and saved searches UI improvements
- [ ] Advanced filters (author, license type, code quality metrics)
- [ ] Search result ranking improvements based on user feedback
Content Expansion:
- [ ] Add more data sources (Dev.to, Medium programming tags, official documentation sites)
- [ ] Support for more file types (Jupyter notebooks, API specs, tutorials)
- [ ] GitHub Gist integration
- [ ] Package manager documentation (npm, PyPI, etc.)
AI Improvements:
- [ ] Multi-model support for different use cases (faster/cheaper models for simple tasks)
- [ ] Code explanation generation for complex snippets
- [ ] Automatic code quality scoring
- [ ] Duplicate detection using embeddings
Medium-Term Roadmap (3-6 Months)
User Features:
- [ ] User accounts and personalization
- [ ] Custom collections and bookmarks
- [ ] Collaborative filtering ("users who viewed this also viewed")
- [ ] Comments and community annotations on code snippets
Advanced Features:
- [ ] Code snippet syntax highlighting improvements
- [ ] Side-by-side code comparison
- [ ] Code snippet execution in browser (sandboxed)
- [ ] Integration with IDEs (VS Code extension, JetBrains plugin)
Performance & Scale:
- [ ] Horizontal scaling for ingestion workers
- [ ] Elasticsearch cluster support
- [ ] CDN integration for static assets
- [ ] Caching layer (Redis) for frequent searches
Long-Term Vision (6+ Months)
Platform Features:
- [ ] Public API for developers to integrate Info Hunter search
- [ ] Webhooks for new content matching saved searches
- [ ] Browser extension for quick search from any page
- [ ] Mobile app (iOS/Android)
Community & Open Source:
- [ ] Contributor guidelines and project governance
- [ ] Plugin system for custom connectors
- [ ] Community-driven content curation
- [ ] Translation support for international developers
Enterprise Features:
- [ ] Self-hosted enterprise deployment options
- [ ] Private knowledge base support (internal docs)
- [ ] SSO integration
- [ ] Analytics and usage insights dashboard
Research & Innovation:
- [ ] Fine-tuned models for code understanding
- [ ] Automatic code example quality assessment
- [ ] Context-aware code suggestions
- [ ] Integration with AI coding assistants (GitHub Copilot, Cursor)
🛠️ Technologies Used
Backend
- Python 3.11 - Core programming language
- FastAPI - High-performance async web framework
- SQLAlchemy - Python SQL toolkit and ORM
- Alembic - Database migration tool
- PostgreSQL 15 - Relational database
- Elasticsearch 8.11 - Search and analytics engine
- Celery - Distributed task queue
- Redis 7 - In-memory data store and message broker
- Pydantic - Data validation using Python type annotations
- BeautifulSoup4 - HTML parsing and extraction
- Markdown - Markdown parsing library
- Requests - HTTP library for API calls
AI & Machine Learning
- OpenAI API - GPT models for text generation and embeddings
- Anthropic API - Claude models (alternative AI provider)
- text-embedding-3-small - OpenAI embedding model for semantic search
Frontend
- Next.js 15 - React framework with App Router
- React 18.3 - UI library
- TypeScript - Typed JavaScript
- Tailwind CSS - Utility-first CSS framework
- Framer Motion - Animation library for React
- Axios - Promise-based HTTP client
Infrastructure & DevOps
- Docker - Containerization
- Docker Compose - Multi-container orchestration
- Nginx - (Potential) reverse proxy and load balancer
Development Tools
- Git - Version control
- pytest - Python testing framework
- ESLint/Prettier - Code linting and formatting (frontend)
- Alembic - Database migrations
APIs & Services
- GitHub REST API - Repository and content access
- Stack Exchange API - Q&A site data
- RSS/Atom Feeds - Blog and article aggregation
Libraries & Utilities
- python-dateutil - Date parsing utilities
- uuid - UUID generation (Python stdlib)
- logging - Structured logging
- asyncio - Asynchronous I/O support
📊 Tech Stack Summary
Frontend Layer:
├── Next.js 15 (React 18.3)
├── TypeScript
├── Tailwind CSS
└── Framer Motion
API Layer:
├── FastAPI (Python 3.11)
├── RESTful endpoints
└── Async/await support
Data Layer:
├── PostgreSQL 15 (canonical storage)
└── Elasticsearch 8.11 (search index)
Background Processing:
├── Celery (task queue)
└── Redis 7 (broker & cache)
AI Integration:
├── OpenAI API
├── Anthropic API
└── Custom provider adapter
Infrastructure:
├── Docker
└── Docker Compose
Data Sources:
├── GitHub REST API
├── Stack Exchange API
└── RSS/Atom Feeds
Info Hunter represents a full-stack application that combines modern web technologies, AI capabilities, and best practices to solve a real developer pain point. It's a testament to how thoughtful architecture, clean code, and user-centric design can create a powerful tool that developers actually want to use.
Built With
- axios
- celery
- docker
- elasticsearch
- fastapi
- framer-motion
- github
- next.js
- openai-api
- postgresql
- python
- redis
- rss
- sqlalchemy
- stack-exchange
- tailwind-css
- typescript
Log in or sign up for Devpost to join the conversation.