Most (>99.9%) languges are not used in the internet. They will soon DIE if they’re not used in the internet. We need to bring them to the internet. And to do that, we need to make them much easier to use. This project aims to establish framework to do that.

Not all languages are equal At least 3000 (!) written languages. Likely a LOT more Number of languages commonly used in the Internet: < 8 : ( We care about languages. Languages are cool!

Non-roman/english languages hard to type Not fundamentally, but because not enough work done. Why not? Not enough resources. Not enough data.

Think of students as almost-perfect teachers.Computers are BABIES!

Not a new idea AT ALL. Duolingo leverages languages learners’ services to sell translation products. Google Translate uses suggestions. Google sees computer aided services (translation) as aids, not solutions. Because even humans don’t agree on exact translations! We know random errors are averaged away, and systematic errors are easily detected.

Errors will be averaged out/ system wouldn’t be completely off There would be teacher supervision. Better for instructor because can coordinate answers in one screen, mass correct the problems If EVERYONE makes same mistake, we learn WHAT kind of errors humans make, and work more towards helping/fix

Instructor decides activity: fill-in-the-blanks, spelling/grammar correction, similar words, etc The system generates examples of it that it has trouble figuring out. Takes the students’ answer, and teachers’ validation, and improves itself : ) Better students, better autocorrect, better typing, better translation !

This is not an all-or-nothing game. Even telling the computer that cell_Lymphoma_PTCL does NOT mean peripheral in context is a BIG improvement.(real example) The system/students will make many, many mistakes, but the System will still be better than it is now. Computers are REALLY bad at this sort of stuff, and as we mentioned earlier, even YOUNG adults can teach computers languages : ) No one is going to spend billions of dollars in training data for thousands of languages. This is a chance for them!

Share this project:

Updates