Inspiration

Nowadays, millions of students are suffering from learning, physical, or visual disabilities that hinder their ability to access traditional printed educational materials. This is a particularly concerning issue in science, technology, engineering, and mathematics (STEM) subjects, as most educational resources are not designed to cater to the needs of these students. While technology has made written content more accessible, making educational visuals like graphs accessible remains complex and resource-intensive. Consequently, only a tiny fraction of educational materials are available to learners with these challenges unless machine learning can provide a solution.

Our project aims to develop a solution that automatically extracts data from five common types of charts typically found in STEM textbooks. By training on a dataset of pictures of various plots, this project has the potential to enhance the accessibility of graph-based content for millions of students with varying learning needs or disabilities.

What it does

Our project establishes a solid platform for converting plots into readable data series. For horizontal labels with non-numerical values, we map the x value, including but not limited to countries, month, and income level, to its y value, which ranges from population, sex ratio, etc. If the horizontal labels are numerical values, we map the two continuous numerical values together. At the end, we demonstrate the type of the graph, what the x and y labels mean, and all the data points in tuples.

How we built it

First, we fine-tuned two machine-learning models in image classification. The first model established whether an input image is a graph, and the second model categorized all the plots into the five groups they belong to. Afterward, we fine-tuned a real-time object detection model Yolov7 on a constructed dataset to identify the general location of all the plot's information near the x and y-axis by drawing a square on the valid data points. After getting the locations of the data points, we used a fine-tuned EasyOCR (optical character recognition) program to retrieve the text inside the square, regardless of whether they were slanted or shaded in a different color. With the position information of the label points, we can calculate how much value a point has per unit pixel length, and then we can calculate its real value for both x and y. Then, we pair up the data and put it into the terminal.

Challenges we ran into

The first challenge we ran into was finding the x-axis and y-axis. We first tried searching for related APIs, but we didn't manage to find one suitable and efficient for our specific task. Our final solution is to brute-force through the graphs and find the axes. The second challenge we ran into was how to scan the text. We found significant difficulty in some cases, such as discriminating between the number 0 and character o, which is especially difficult under blurry graphs. We went through various methods, and finally, we discovered a fine-tuned version of a deep-learning model, EasyOCR, which could scan all the texts with astonishing accuracy after fine-tuned them on a dataset constitute of extracted patches together with the correct texts in them. The third challenge we went into was connecting our local files with a website with user interaction functions. Sadly, we could not fix the issue of missing modules, which remains unresolved.

Accomplishments that we're proud of

For our image classification model, we are able to achieve 99.6% accuracy in our test cases, which accounts for 10% of the total more than 60,000 images we have. All of our accuracy, precision, and recall for the Yolov7 model is greater than 0.95. This implies that our model is not only making correct predictions most of the time but also that it is reliable both in confirming positive cases and in not misclassifying negative cases as positive. Designing the whole pipeline is also a lot of work. We have to make sure each segment of our work functions properly, and it can transfer into the next step smoothly. We are proud to declare that our whole project is overall a success.

What we learned

Initially, we need to have clear directions for work. We had a clear division of work and achieved our goals separately and efficiently. Additionally, we learned that we should do more instead of discussing. We found that, theoretically, we could solve our problems; however, some unexpected things could be improved in implementing many functions of our code. Also, we learned to weigh the pros and cons of traditional machine learning and the more innovative deep learning methods and choose to use them intelligently based on our needs. Moreover, we should have researched the related topics more to determine our developing directions. Knowing the cutting-edge discoveries of computer science is an excellent help in our project. Last but not least, we learned that integrating machine learning processes into web development is challenging work.

What's next for The "Plotters"

With regret, we could not finish the backend web-developing function within 36 hours. However, we still aim to resolve this issue, which can directly connect to our users who want accurate information from the graphs. This will give our project real-life meaning. We could further include more types of graphs, such as pie charts, into our scope of implementation. Also, we intend to make conversions between different graphs, e.g., horizontal bar graphs and dot graphs.

Built With

Share this project:

Updates