While in statistics class we realized inputting data into calculators can be quite menial, and a task that can be easily automated. Further more, students of lower socioeconomic backgrounds sometimes do not have access to tools like the TI84 calculator, which allow them to utilize statistical procedures, whereas they might have access to computers in their schools. We used the synergy of these two principles to inspire the statisticians of tomorrow.
What it does
Our Java application has the user specify URL with a search term, or something desired to be found on the web page. A java web crawler using Jsoup then finds urls, with that search term, and returns any relevant quantitative data to a statistics engine. The statistics engine performs user requested calculations and presents that calculated data in an appropriate format. It can perform basic statistical analysis of one or two links, Z-Test Confidence Intervals and Hypotheses tests, along with the corresponding T-Test Procedures. It can also calculate various forms or regression and upon choosing the optimal equation, display it graphically.
How we built it
Jsoup- With the help of Jsoup we created a java web crawler that finds the first instance of the inputted search term. The program then parses the page the search term is located on, and gets relevant quantitative data with regard to the search term. Statistical Engine- Using the Jsoup web crawler and integrating bing search query, we quickly found functions to use to perform calculations like normal distributions, student's distributions, as well as various regressions.
Challenges we ran into
Because we only found out about JSoup halfway through the hackathon, our original web-crawler tried checking the content using a rudimentary HTML Parser we wrote ourselves, which was susceptible to code that did not follow standard formatting conventions. To solve this, we researched, and replaced our method with a methods from JSoup that reversed our process and therefore received the same result without compromising memory.
Accomplishments that we're proud of
We are quite proud of the integration with Bing, because we had to study HTTP request protocols to integrate Bing search query with Java, such that our algorithm can get results that are fine-tuned to what the user requests. We are also quite proud of our dysfunctional, self-built HTML parser, but it needs to be refined before we can implement it in an algorithm such as this.
What we learned
We learned search engine query integration, as well as the complexity of many, many statistic formulas we often took for granted, as well as, of course, the importance of teamwork!
What's next for Stats Engine
We hope to improve the GUI, and include more statistical analysis options. We are also looking into refining the web-crawler such that results are loaded more efficiently and presented in an easy to view and understand fashion.