Google advertising was directed. Normal fact apps are not directed Take inspiration from what google did and make a data driven fact app with question functionality

What it does

Server scrapes web databases for facts and puts them through watson topic analysis and saves dataset of facts with related topics & relevance in azure dataset

Scrape webpage for facts (using Watson API), pipe to Watson topic analysis and store them in dataset of facts.

Sends rawtext of webpage being browsed to Watson for topic analysis

poll trained azure model for most relevant facts and displays them somewhere convenient on the page

In the event the fact has been previously displayed it will replace the question subject with a blank that the user has to fill. (Potential future leaderboard)

tl;dr: displays random facts related to topic being read and adds facts on page to database also quizzes if fact has been previous displayed

How I built it

Scrape facts and pass for content analysis.

Train Azure ML with fact dataset. When input is received pipe content (list of topics from watson) of website to a python script on azure which sends topic table from azure dataset and the webpage content to RxNLP for text meaning similarity processing. Resultant similarities values returned to azure for processing and Fact is returned to the chrome extension.

Simple Chrome extension that just pipes data to trained ML models.

Challenges I ran into

Limited dataset: Azure training set was only ~1000 rows. Not enough

API Key limits: Most api keys were limited to 1000 calls, not enough. Had to get friends to sign up.

Mysterious errors from azure. (Error 0000) Generic error with no error log.

Accomplishments that I'm proud of

First utilization of data api's and using big data to drive applications

What I learned

Basic ML systems & accuracy/reliability of current ML technologies. What kind of datasets are better. (dataset optimization) Neural networks and genetic algorithms

What's next for ExtraFacts

Extend Fact database collect data from extension usage (increases data set) Using browsing history to construct more complex fact recommendations. Eg. User browses eggs then apples. Display facts on health foods / cooking Leaderboards & reward system

Share this project: