We wanted to make an effective ChatBot that could take in a URL and then put all of the data in the websites FAQ into a JSON object that can be queried by a user with their voice.
The back end of the chatbot does this processing, finding all of the questions and answers on the website and mapping them into the JSON object. The front end is a web app that allows the user to speak an input to Prometheus, and then the program will convert the speech input into text, query the JSON object for similar questions (with a tolerance for error), and then once it finds a match Prometheus speaks the answer out loud to the user.
The webscrapper uses a node.js module (puppeteer) to run headless chrome and run automated web scraping. Then, a function is evaluated, with gets all leaf nodes (elements containing a textnode with at least 5 continuous characters). Then, element.getBoundingClientRect() is executed to obtain the origin (x, y) of the top left of the element. We then scanned the rectangle from ([x - 1px, x + 30px], [y, y-of-next-element]) for leaf nodes. Then, excess information eg: "Back to top" is discarded. Finally, the innerText is queried of each of the above elements, creating an object with a question, answer, and origin, with the stringified array being written out to disk. This concludes the technical aspects of actually scraping the web and parsing into a JSON array.
The chat bot is created with node.js & react, transcompiled down with babel, and then packaged together with webpack. The main view is a Carousel, that is controlled with an ArrayAdapter containing the WelcomeScreen view and the SpeechAi view.