Inspiration
CitaMind came from recent articles showing that many recently published papers at top Machine Learning conferences like NeurIPS 2025 had "hallucinated citations" — references to papers that simply don't exist, likely generated by AI without the author realizing. GPTZero's investigation found over 100 phantom citations across 51 NeurIPS papers, each missed by 3-5 peer reviewers. 55% of ChatGPT-3.5 citations are fabricated (Walters & Wilder, 2023). The US government's MAHA health report contained 19+ phantom citations. If the world's best researchers and peer reviewers can't catch this, we need automated tools that can.
Initial Disclosure
At the start of the hackathon, I was unaware of GPTZero's Hallucination Check tool, which addresses a similar problem. Our positioning evolved to focus on what makes CitaMind different: (1) a search-first UX where users find any paper from any venue without uploading a PDF, (2) a community-powered Phantom Registry that grows smarter with every scan, (3) real-time streaming verification with inline text highlighting, and (4) structured AI research summaries with gap analysis — features GPTZero does not offer.
What It Does
Users create an account, then search for papers across Nature, arXiv, ACM, PubMed, IEEE, Springer, and thousands more venues. They select a paper and it is analyzed in seconds. The system provides:
Phantom Citation Detection — identifies references to papers that do not exist in any academic database, verified across Semantic Scholar, OpenAlex, and CrossRef with Claude-powered confirmation.
Miscited & Retracted Detection — flags papers used to support claims not found in the cited work, and papers that have been officially retracted by their publisher.
AI Writing Score — calculates how much of the paper was likely written with AI, with per-section breakdown and flagged passages.
Research Summary with Gap Analysis — users can generate and listen to (via ElevenLabs) a structured AI summary covering key findings, methodology, contributions, and gaps/limitations in the research.
Community Phantom Registry — when users save a paper to their library (stored in HarperDB), all detected phantom citations automatically feed into a shared, open-source database. An author leaderboard, venue breakdown chart, and year-over-year trend graph visualize the data. The system grows smarter as more papers are scanned.
How We Built It
Frontend: Next.js 14 (App Router) + TypeScript + Tailwind CSS + shadcn/ui, with a retro box-style UI. Developed entirely in Zed.
Paper Search & Data:
- OpenAlex API — primary search engine with 16,000+ papers pre-cached across CHI, NeurIPS, ACL, EMNLP, UIST, CVPR, and 30+ venues
- Semantic Scholar API — reference fetching with three-tier fallback (DOI → ARXIV prefix → title search)
- CrossRef API — citation cross-verification with strong title matching (Jaccard + Levenshtein + containment)
Paper Rendering:
- ar5iv (ar5iv.labs.arxiv.org) — renders any arxiv paper as clean HTML with proper LaTeX math via MathML
- Citation numbers in the text are clickable pills that scroll to and highlight the corresponding reference
Verification Pipeline (SSE streaming):
S2 References → Title Quality Filter → Claude Haiku Title Check → Multi-Signal Scoring (title existence + author overlap + placeholder detection) → Claude Batch Confirmation → Final Classification
- References with S2 paperId are pre-verified
- Unverified refs checked against OpenAlex and CrossRef with author overlap verification
- Placeholder author detection catches "John Doe", "Jane Smith", "Firstname Lastname"
- Claude confirms borderline cases before marking phantom
AI Detection: Claude Sonnet analyzes each section for AI writing patterns with conservative scoring (0-100 scale, most papers score 0-5%).
Research Summary: Claude Haiku generates structured summaries (TL;DR, key findings, methodology, contributions, gaps). ElevenLabs reads the summary aloud with pause/resume support.
Database: HarperDB stores users, paper library, and the global Phantom Registry.
Data Visualization: Recharts for venue breakdown donut chart and year-over-year phantom trends. Author leaderboard ranks researchers by phantom citation count.
Challenges
- Semantic Scholar rate limits — 100 requests per 5 minutes unauthenticated. Solved with three-tier fallback (DOI → ARXIV → title search) and aggressive caching.
- False phantom detections — body text fragments appearing as citations from S2's API. Solved with regex title filters + Claude Haiku batch check.
- OpenAlex reference undercounting — many papers show 0 references in OpenAlex. Solved by always using Semantic Scholar for references.
- ar5iv theorem box sizing — oversized empty blocks from inline styles and SVGs. Solved by stripping all inline styles and SVGs from the HTML.
- Author-swap hallucinations — citations using real paper titles with fabricated authors. Solved with author overlap verification during the multi-source check.
What We Learned
- ar5iv is an incredible resource — any arxiv paper instantly becomes clean, parseable HTML with proper math rendering
- Academic citation hallucination is a much harder problem than "does this paper exist?" — 25% of hallucinations use real paper titles with fabricated metadata
- Community databases create network effects — every scan makes the phantom registry more valuable for everyone
- Conservative classification is critical — a single false positive destroys credibility in a demo
What's Next
- Author-level verification for S2-resolved references (comparing cited authors against database authors)
- Claim verification using Claude to check if cited papers actually support the claims made
- Browser extension for inline verification while reading papers on arxiv/ACM/IEEE
- Institutional dashboards for journals and conference program committees
Built With
- anthropic-claude-api
- ar5iv
- crossref-api
- d3.js
- elevenlabs-api
- harperdb
- katex
- next.js
- openalex-api
- react
- recharts
- semantic-scholar-api
- shadcn/ui
- tailwind-css
- typescript
- vercel
- zed
Log in or sign up for Devpost to join the conversation.