As freshman, we faced the great challenge of finding internships for the summer of 2017. We wanted to make it easier for us and other college students to find internship opportunities based on their skills and interests.
What it does
There are three aspects to Tintern. First is a web crawler which explores the vast web and creates a database of companies and the internships they offer, along with relevant data such as programming languages, location, position, industry type, and others. The second part is a Facebook Messenger chatbot which allows people to communicate their interest and skills in a natural way. The chatbot then displays relevant internships. (You must be added to our Facebook App as a developer in order to use the chatbot since we did not have time to go through the Facebook approval process for chatbots.) The third component is a companion website built from scratch, based on the design of Typeform which will display relevant internships based on the skills and interests you enter.
How we built it
We built the web crawler using Java and regular expressions to identify internships and relevant data. The data was stored in a json file which was uploaded to a Heroku Node.js express server so queries could be run against it. This data was used to construct an API which the server could access directly, and the website through a POST request. The Facebook Messenger chatbot was built using Facebook developer tools, and all of the language processing is done on our Heroku express server. In addition, we used jquery to make the website modern, clean and responsive, while following material design conventions.
Challenges we ran into
Web crawling presented a great challenge since websites do not necessarily share similar structures. Thus, information could not be extracted simply, requiring us to create a dynamic crawler which adapted to the structure of the website being crawled to better find and analyze data. Furthermore, it was difficult to decide what parameters to extract, because companies do not all have the same internship information. In the end, we decided to focus on company size, location, programming languages, platform, industry category and internship position.
Making the Node.js server was challenging because it had to host a Facebook Messenger chatbot while also hosting a dynamic website, along with an in-house search API. Setting up the express routing to retrieve the correct data and respond to POST and GET requests for the chatbot and website proved difficult. In addition, the algorithm for the in-house search API had to be optimized to minimize time delays while handling data requests from multiple sources. Our search API scored each internship position based on how well it fit the user's skills and interests. Furthermore, making a chatbot which responded to unstructured human input was nontrivial, and required the use of various contextual language processing methods.
Accomplishments that we're proud of
We are proud of the web crawler, chatbot, in-house API, and companion website.
What we learned
We learned a lot about web crawling and regular expressions, language analysis, Node.js frameworks, and jquery.
What's next for Tintern
The next step for Tintern is to submit the chatbot to Facebook for approval, and increase the number of internship position in our database. (We currently have over 200 positions in the database.)
We registered a custom domain name matchtintern.com on Domain.com this morning (it took a while to get the free code form the rep) so it still doesn't point to our server (DNS is currently updating).