Whenever teachers, students, or researchers collect data, sifting through the sheer amount of sources made available through databases and search engines can be time-consuming and often degrades the specificity to which they conduct their analysis.
What it does
In order to effectively combat this predicament, I developed a programming algorithm that efficiently compiles data and primary sources of only historical research, making it a useful prototype to determine the feasibility of a compilation algorithm that encompasses research of all natural, social, and applied sciences.
How I built it
The project is built using Java and uses a web crawler library known as Selenium, which provides methods for searching and extracting web data. I created a class for each database which was developed slightly different since the process for extracting information for each bank differed slightly; however the general process remains the same. Within each class are two common instance variables, two primary methods, and any additional helper methods for parsing and data extraction. The two variables are ‘link’ and ‘body’; the link variable is the preset search link that will be used to find the sources in the database. When each object gets instantiated, part of the constructor includes the link component which is the search URL with the query inside its parameters. An easy way to visualize this is shown on the left. The body is constantly changing each time a new source is found, but is essentially all the text information of source page before its been converted and parsed. The most fundamental aspect of the program, however, are the two primary methods, “Initial Run” and “Secondary Run”. The initial run first searches for the query in the databases’ in-built searching system, likely a probability based linear search. Depending on how such databases display their data, the initial run converts the page into an XML or text file, analyzes its components, and extracts the individual links to the informational pages into an array known as “resources” using a loop with a set number of iterations; this is the primary focus of the initial run method. Then using this data set comprised of several links, the secondary run method opens each web page in the background and saves the relevant information as an object class I created known as “Page”. When this object is instantiated, it formats and parses the textual information to maintain the usability and visual appeal of the results page after a search is complete using a method I created called separate. This method essentially looks at the length of the page excerpt and then re-organizes all the additional gaps and spaces within the text to create one free-flowing paragraph. This process is repeated for each web page in the resource array, as it continuously compiles a string containing all the relevant information from each source. Then, the final compiled string gets returned and displayed from the jsp file onto a webpage along with the citation or description. Since each tab represents a different source, some JSP files are better optimized at displaying images than text.
Challenges I ran into
Some difficulties were getting the JSP files to properly integrate the JAVA server code and getting the dependancies to cooperate.
Accomplishments that I'm proud of
Creating a fully functional Java web application utlizing java, html, and JSP.
What I learned
I learned how to use webscrapers to increase the effeciency of searching techniques.
What's next for Super Search Algorithm
When completed, this algorithm will have a multitude of applications including basic research at a high school level across subject areas like literature and biology, assistance to teachers when creating projects and tests, more complex research and source selection at a college level, and any community of professionals who have very specific needs that require in depth searching through a select number of databases in a time constraint—for example lawyers looking to search through prior case law, car mechanics looking to troubleshoot specific vehicular issues, and electricians/plumbers looking to find solutions to unusual problems. The most notable aspect of the Super Search algorithm, however, is that it has the capacity to revolutionize the way we research due to its ability to provide an abundant number of sources at a fraction of the time it would normally take to locate just one.