Hausdorff Space - separating ideas

Inspiration

Saving time when doing research and helping the reader do a smarter reading.

What it does

It takes a webpage or simple text, parses everything out and decides which paragraphs are the most relevant basing on not only which words appear the most but on how they are correlated throughout the whole text and lastly choosing the paragraphs with the highest density of so-called "good" words, highlighting them. It also generates a sequence of hashtags to see clearly what is really the text about.

How we built it

We used Jsoup for the parsing (in a Java library), C++ for the algorithms and php and html for the webpage.

Challenges we ran into

Everything was basically a huge challenge which makes us really happy. Specially putting everything together, we have realised how bad crappy habits in programming are.

Accomplishments that we're proud of

We are overall really proud of creating something (handing something in). We are new to Hackathons and to webdev and all. Thus

What we learned

All the members of the team were absolute noobs regarding web coding and hackathons all around, so this has been completely new to us as mentioned above. We only knew C++ so everything besides that came completely new to us.

What's next for Hausdorff Space

We got, though have not had time to attach it to the webpage, some interesting graphics regarding word distribution and word repetition throughout the text. These would give a more insight look on how the text is structured and would improve the summarising algorism if taken into account. Implementing it in a chrome extension would be a long-term ambition.

Built With

Submitted to

HackUPC 2016

Created by

I worked on the backend of the project, I developed an algorithm to evaluate the importance of words and paragraphs based on how many times the words appeared in the text.

Carles Domingo
I worked on the back-end of the project. Looking for the "most rellevant words" in a text.
Based in this important word another team member and I disign an algorithm to sort the paragraphs in terms of which gives more information.

marti roset
I worked on the back-end of the project. Firstly in the parsing of html, and then writing a code to generate frequency distributions of the words. Finally, i made a code in Python to plot out the frequency of a specified word in a given text. Unfortunately, we haven't had time enough to include this functionality in the final project.

Eric Guisado
Carlos Segarra

Updates

Carlos Segarra started this project — Feb 21, 2016 05:52 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.