Subglyph

Inspiration

Subglyph is basically a one-way substitution cipher, with a twist: it's based on homoglyphic similarity and allows one to copy and paste text that is legible by humans, but unindexible by computers without preprocessing. This is problematic for text-based data-science. The ideas behind the as-yet-incomplete machine learning component of Subglyph could be used to measure linguistic and glyphic drift. This could be used to understand how the the components of written languages and drifted across regions.

What it does

So far, Subglyph is a simple hand-coded dictionary, with between 5 and 15 possible subsitutions per Latin alphanumeric character. The genius is that it’s about 50 lines of code that could change online communication if an idea like it took off. It requires the social momentum of casual use to create an adequate “mask” against indexing.

As society gravitates towards virtualized communication tools, we asymptotically approach 100% of surveillable, indexable conversations. The difference between a surveillable conversation and an indexable converstaion is that indexing is cheap and can be automated, whereas surveilling conversations requires human intelligence.

Governments want filtering and indexing to be cheap. In cases where this filters messages from radical organizations like ISIS, that’s fine. In cases where indexing of textual communications helps isolate, psychologically profile and harass political dissidents, that’s not OK. Increasingly often, we’re seeing governments use technology to psychologically disrupt and sociologically stifle people who have simply been labelled. Unfairly and invisibly labelling people who are different invites sociopathic, carnivorous behavior and creates serious mental health issues.

Staving Off Deletion of Religious Texts

If one wanted to save their religious texts from deletion from the Internet Archive, they could use Subglyph to encode them before overlaying them on meme images and, finally, storing them on the archive. This ensures that the problem of data discovery is sufficiently hard and not too hard as with cryptography.

Another great idea for preserving crucial data: put a USB stick on a fucking comet and let it sail out to the Oort Cloud. It would need a basic solar-powered faraday cage for protection against ionizing radiation.

Generation of Unindexable Text with iOS/Android Keybords

Once the machine learning algorithms are sufficient, Subglyph can be used to create keyboards with custom homoglyphic dictionaries that rotate chars out as they’ve been used. This helps achieve the minimum social momentum to spur widespread adoption, which forces tech companies to adapt their products and algorithms to be more friendly to what was formerly seen as spam text. Meh, it’s a mixed bag.

For Information Warfare, Subglyph Can Penetrate Balkanized Internets

If subglyph became commonplace, it could be used so that political dissidents can access information that would formerly have them flagged. This usually occurs in countries that flag content providers that match keywords, which requires fast indexing. Because all warfare is essentially information warfare, subglyph could be useful … but on second thought, this would be a terrible idea because it doesn’t guarantee the safety of the dissidents. However, there may be some formuation of this idea which works for pushing information into these areas.

What's next for Subglyph

Machine Learning to Produce Custom Homoglyphic Unicode Dictionaries

That’s just the start. The second, more complicated portion of Subglyph is about training a machine learning algorithm to produce custom dictionaries. As this training component of subglyph gets closer to completion, updates will be made to my project page with a graphical interface for visualizing the training process.

Built With

javascript

Updates

david conner started this project — Feb 19, 2017 10:54 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.