gpt-parsons

UI for students
instructor UI, generated exercises

Inspiration

We've dedicated ourselves to enhancing the Parson’s Puzzle, a creation by Dr. Dale Parson and Dr. Patricia Haden, described as an "automated interactive tool that provides practice with basic programming principles in an entertaining puzzle-like format." Our improvements span the GUI, algorithm for shuffling fragments, and the incorporation of data mining/machine learning to understand student behavior.

However, a persistent challenge arises in creating a Parsons’s puzzle, particularly in the core aspect of generating programming codes for puzzle development. The process of writing quality codes or searching online is both tedious and inefficient, especially in customizing codes for students from various proficiency and level of understanding.

Fortunately, the introduction of GPT last year sparked an idea—integrating Parson’s Puzzle with GPT. This approach would delegate code generation entirely to GPT, allowing developers to focus on app development.

Moreover, we strongly believe that the Parson’s puzzle offers an amusing method for learning programming across various age groups, especially for beginners. While we acknowledge the importance of reading and writing, the conventional methods can be tedious. Parsons Puzzles enable students simultaneously practice coding skills and learn from well-structured programming examples while solving puzzles.

Our vision is to introduce the “GPT-Parson” puzzle in this “Hackathon,” considering it as a groundbreaking learning method for not only programming, but also other subjects such as history, mathematics, reasons of chain, etc.

What it does

Parson’s Puzzle breaks content into fragments and then rearranges these fragments into random incorrect orders. Users are presented with a selection of these fragments and tasked with sorting them back into the correct order.

Our application comprises two interfaces. The student interface showcases the Parson’s puzzle for students to practice solving. The second interface is designed for instructors, allowing them to create puzzles using GPT. Instructors select the puzzle domain, topic, difficulty level, and other elements to generate the puzzle. Subsequently, instructors can edit the exercise, validate its legitimacy, and publish it for students.

How we built it

We use Azure stack of technologies for gpt-parsons.

arch

Front end is hosted in Azure Storage as a static web app. It contains a part for students (landing page) and instructors.

The instructor dashboard allows to start exercise creation procedure which is a long-running process (one OpenAI GPT response takes from 15 to 40 seconds). The dashboard lists created raw exercises, validated exercises, and approved puzzles. It also shows statistics of student performance on puzzles (moves of fragments, number of fail attempts, successes, and skips). The instructor can track the sessions of students. All these operations are considered privileged and protected with API key for the system.

The student UI allows to practice Parsons puzzles. For Python programming, the requirement is to rearrange fragments to form correct Python code that either matches to proposed GPT solution or passes GPT-generated unit tests. The UI allows manipulation with touch, drag and drop, or a keyboard. For the student's convenience, the code is highlighted.

Backend is implemented with Azure functions and deployed as gpt-parsons App Service. It follows REST paradigm around entities of student session, exercise_creation, exercise, and puzzle. The entities are preserved in CosmosDB NoSQL. This includes storing responses from GPT, manual and automated validations of generated content, fragmentation of exercises, and statistics on interactions.

AI service is developed with the help of Azure ML Prompt Flow technology (deployed as another App Service). promptflow

Challenges we ran into

Originally we planned to trigger GPT generation in student flow (anonymous access). However, we have to create a parallel workflow for instructors to control the costs of calling GPT and improve responsiveness (very slow responses from OpenAI service).

Developing good prompts is also a challenge. Advanced puzzles should contain complex enough code. In most cases, however, gpt-3.5-turbo resorts to simple well-known practice problems (factorial, fibonacci...). Prompt Flows allow to shift this burden partially onto the instructor by asking them to provide the topic, form of exercise, etc... in input parameters. Another issue with GPT generation is the duplication of responses. Instead of implementing some kind of prompt seeding, we include "avoid" content into it. These are meanings that GPT should not generate. When Python exercises are generated in one batch, the "avoid" is accumulated across calls as names of top-level entities in the code, functions, or classes. In the limit "avoid" could grow infinitely and it is not practical to send all previously generated symbols cached in Cosmos DB to get a new exercise.

Accomplishments that we're proud of

Initially, our distinctive application stands out as it autonomously generates Parsons puzzles through GPT, setting it apart from others that still rely on instructor-provided content.

In a remarkably short time frame, we've crafted a comprehensive application. Beyond the student interface for puzzle-solving practice, we've developed an instructor interface, offering enhanced customization options. Instructors can tailor puzzles based on preferences, adjusting parameters like difficulty level, lines, and bugs for knowledge testing.

The instructor interface also provides student statistics, enabling a detailed analysis of overall performance or individual student-solving behaviors, including move count and time taken.

Notably, our application is cost-effective, with each puzzle requiring only 70-80 tokens on average, showcasing efficiency in quiz generation through GPT. Additionally, our well-designed prompt flow ensures puzzle quality through robust validation, coupled with an infusion of entertaining elements to enhance the overall fun factor.

What we learned

Major points that we confirmed for ourselves about GPT being a provider of content:

The generated content has to be verified no matter how confident GPT is. For instance, in our prompt flow, we ask GPT to generate not just the correct solution but similar code with the introduced bug. Occasionally, it produces two correct alternative solutions. We validate the generated solution and student submission on unit tests proposed by GPT, but also we allow an instructor to manually validate and modify synthesized content.
It requires effort to control the quality of the GPT-generated content. LLM resorts to simple solutions and by default produce many duplicates.

Azure technologies mostly target simplification of backend software development. In this project, front-end development took a major chunk of time.

What's next for gpt-parsons

Initially crafted for beginners learning programming, the Parsons puzzle excels in its intended purpose. Recognizing its potential beyond programming, we've expanded it to include history and chain reasoning puzzles. Our goal is to diversify Parsons puzzles by introducing a broader range of topics, benefiting a more extensive user base. Additionally, we aim to integrate OpenAI in future development to enhance the quality of the puzzles. The tool also can be used in research on learner behavior in acquiring new knowledge of the domain.