Inspiration
Most online explainers (and even some YouTubers) are entertaining but imprecise. They want user to enhance their business by ads. LLMs amplify this: they can sound confident while mixing facts, dates, and sources. The LLM model training process uses baselines and A/B tests with domain experts; everyday learners don’t. Hello exists to bridge that gap—turning trusted, traditional texts into verified, age-appropriate learning content with citations so people can learn from AI without hallucination risk. Also, for data wall buildup in the future, the high and decent education costs will rise again since the precise and expert fee will be more expensive due to LLM handling the general concepts and knowledge.
What it does
Hello transforms public-domain classics and reference works into rigorously sourced, readable versions for different audiences (child / teen / adult), all with inline citations.
Ingest & Clean: Import reliable editions (PDF/EPUB/HTML), OCR + de-noise, deduplicate, and segment by logical sections.
Understand & Index: Embed chunks and build a retrieval index with metadata (work, chapter, page, edition).
Write with Proof: A RAG writer composes level-appropriate explanations and summaries, auto-linking every claim to sources. Provenance Viewer: Hover to see the exact snippet, page image, and edition info; click to jump back to the original passage.
Reading Levels: One switch changes the voice (kid-friendly → exam prep → expert notes) while keeping citations intact.
Fact Checks: A verifier pass flags unsupported claims, missing citations, and mismatches (dates, names, definitions).
Author Mode (beta): Traditional authors can upload a manuscript and co-write with the AI assistant, keeping full control of tone, scope, and citations.
How we built it
Frontend: Next.js + TypeScript. Reading-level switcher, side-by-side “Explain ⟷ Source,” and a provenance drawer.
Backend: FastAPI service orchestrating pipelines and evaluations.
Pipelines:
OCR/cleanup → structure detection (chapters/sections/footnotes) → chunking with semantic+layout cues.
Embeddings + vector index (pgvector/Postgres) with edition/page metadata.
RAG composer that plans first, writes ssecond, andthen attaches citations per sentence.
Verifier loop: retrieval-augmented critique that checks for missing/weak sources and prompts the writer to revise.
Evaluations: A small harness that measures coverage (did we cite the right place?), support (is the claim in the source), and style fit per reading level; includes spot-check UI for human review.
Storage: Source files in object storage; normalized passages and citations in Postgres.
Challenges we ran into
Edition drift: Different editions paginate and phrase content differently; we built stable anchors beyond page numbers (work→chapter→paragraph hashes).
Citations at the sentence level: Getting the model to cite every claim without over-linking required a plan-then-write approach and a post-pass validator.
Reading-level fidelity: Simplifying language without losing meaning (and while preserving citations) demanded a constraints-first prompt design.
OCR noise and footnotes: Separating marginalia/footnotes from core text so we cite the right thing.
Accomplishments that we're proud of
A smooth Explain ⟷ Source experience where every sentence can prove itself.
Consistent, age-appropriate outputs from the same source—great for teachers, parents, and self-learners.
A verifier loop that actually changes behavior: the model now proposes revisions when support is weak.
An Author Mode (beta) that lets traditional writers keep their voice while gaining AI speed and rigorous sourcing.
What we learned
If you demand citations up front (not as an afterthought), the model writes differently—more modular, more careful.
UI matters as much as NLP: provenance that’s one click away builds trust.
“Reading level” isn’t just vocabulary; it’s examples, scaffolding, and what you choose not to say.
Human spot checks are still essential, but a good harness can focus that effort where it matters.
What’s next for Hello — a tool for traditional authors to collaborate with an AI creator
Author Workspace: Versioning, outline locking, “must-include sources,” and house-style presets.
Rubric-based grading: Domain-specific rubrics (history, biology, literature) to score support and coverage automatically.
Rights & Editions: Expand beyond public domain via publisher partnerships and edition-aware licensing.
Classroom Packs: Teacher dashboards with reading-level bundles, quizzes auto-sourced from citations, and printable handouts.
Multimedia: Cite-aware illustrations, timelines, and short videos whose captions also link back to sources.
Benchmarks: Public eval sets for “supported-claim rate” and “provenance precision” so others can compare fairly.
Built With
- next.js
- python
- react
- typescript
Log in or sign up for Devpost to join the conversation.