How AI is Assisting With Mathematical Education
Mathematics and programming are joined at the hip. So much so that when one wishes to learn how to program, one of the first questions that is usually asked is: "Do I need to know maths to be a good programmer?" And whilst the answer is "No", it still remains the fact that mathematics and programming are closely connected together: their connections are particularly strong in the context of game programming, financial programming as well as scientific programming. One of the major problems that confronts both fields is that teaching both simultaneously in a classroom is difficult: teachers that are good at programming are not usually maths specialists, and teachers who are maths specialists are not usually experienced in programming. This disadvantages the student, leading to a student knowing one or another but not both.
The use case which this article relates to is education about statistics and statistical programming.
Description
The use case of this article relates to statistics and statistical programming and the artificial intelligence (AI) tool that this article relates to is ChatGPT, although similarities can be inferred with other similar tools. The target audience of this use case is anyone who works or studies in the natural sciences (e.g. Physics, Chemistry, Biology) or the social sciences (e.g. Economics, Psychology, Sociology, etc).
One of the great problems that ChatGPT and tools like it solve is that you can query the tool in a human-like format, and you get a response in a structured format presented in clear steps. For example, if I wanted to perform categorical analysis in the statistical programming language called R and was not sure how to do this, I could easily ask ChatGPT with the following query:
How can I perform categorical regression in R?
And sure enough, ChatGPT responds with the correct information, namely:
Categorical regression in R typically involves modeling the relationship between a categorical dependent variable and one or more independent variables. There are different techniques and models available, depending on the nature of your categorical dependent variable (e.g., binary, nominal, ordinal). I'll provide a brief overview of how to perform categorical regression for a few common scenarios:
1. Binary Logistic Regression:
Binary logistic regression is used when the dependent variable is binary (two categories, often coded as 0 and 1). You can use the glm() function in R to perform binary logistic regression. Here's a basic example:
# Load necessary library (if not already loaded)
library(stats)
# Create your data frame with dependent and independent variables
data <- data.frame(
outcome = c(0, 1, 1, 0, 1, 0, 1, 0),
predictor1 = c(2, 4, 3, 6, 5, 8, 7, 9),
predictor2 = c(1, 2, 2, 3, 3, 4, 4, 5)
)
# Fit a binary logistic regression model
model <- glm(outcome ~ predictor1 + predictor2, data = data, family = "binomial")
# Summary of the model
summary(model)
This code fits a binary logistic regression model to predict the binary outcome based on two predictor variables. Replace the data and variable names with your own dataset and variable names.
2. Multinomial Logistic Regression:
Multinomial logistic regression is used when the dependent variable has more than two categories. You can use the nnet or multinom functions from the nnet or nnet package to perform multinomial logistic regression, respectively. Here's a basic example using multinom:
# Load necessary library (if not already loaded)
library(nnet)
# Create your data frame with dependent and independent variables
data <- data.frame(
outcome = factor(c("A", "B", "C", "A", "B", "C")),
predictor1 = c(2, 4, 3, 6, 5, 8),
predictor2 = c(1, 2, 2, 3, 3, 4)
)
# Fit a multinomial logistic regression model
model <- multinom(outcome ~ predictor1 + predictor2, data = data)
# Summary of the model
summary(model)
This code fits a multinomial logistic regression model to predict the outcome with three categories based on two predictor variables. Adjust the data and variable names to fit your specific dataset.
3. Ordinal Logistic Regression:
Ordinal logistic regression is used when the dependent variable is ordered and categorical. You can use the polr() function from the MASS package to perform ordinal logistic regression. Here's a basic example:
# Load necessary library (if not already loaded)
library(MASS)
# Create your data frame with dependent and independent variables
data <- data.frame(
outcome = ordered(c("Low", "Low", "Medium", "High", "Medium", "High")),
predictor1 = c(2, 4, 3, 6, 5, 8),
predictor2 = c(1, 2, 2, 3, 3, 4)
)
# Fit an ordinal logistic regression model
model <- polr(outcome ~ predictor1 + predictor2, data = data)
# Summary of the model
summary(model)
This code fits an ordinal logistic regression model to predict an ordered outcome variable based on two predictor variables. Modify the data and variable names according to your dataset.
Remember to adjust the models, predictors, and data based on your specific research question and dataset. Additionally, you may want to assess the model's fit and interpret the results to draw meaningful conclusions from your categorical regression analysis. ```
-- \n This is really helpful and, of course, if one has a more specific model in mind then they can tailor their question to ChatGPT as needed. One of the really interesting things about the response by ChatGPT is that it distinguishes between multinomial and ordinal logistic regression, further highlighting the importance of understanding the differences between the two. This can really raise the productivity of users within the natural sciences as well as social sciences.
The problem this solves for those who don't have advanced experience with programming is that they can ask ChatGPT a question relating to programming and their subject discipline and ChatGPT will output the appropriate response in a well-structured manner. In the context of languages such as R, this is also very important because it is very "case specific", and so tools such as Github Copilot would not work or make much sense in this particular context. ChatGPT therefore fills an important void. It also increases productivity significantly. For example, the textbook Applied Econometrics in R by Christian Kleiber.
Tutorial for Best Use and Practice
In order to follow this use case, follow the instructions below:
- Register for a ChatGPT account. As of October 23rd 2023, registration for ChatGPT is completely free and there is no significant impact on accessibility between the free version and the premium offering known as ChatGPT Plus.
- Enter a question you would like to ask in the area at the bottom of the screen that says "Send a message". One of the amazing things about ChatGPT and Large Language Models (LLMs) is that you are not forced to know how to code and can write your question in a format similar to that as if you were talking with a human being. For the purposes of these instructions, we will use the following query as an example:
How can I do polynomial regression in R ? - Wait for ChatGPT to process the query and provide a response. The response is usually quickly, but may differ based on service and bandwidth availability.
4. The response provided by ChatGPT should look something like this:
Polynomial regression in R can be performed using the lm() function, which is used for linear regression, but you can extend it to polynomial regression by adding polynomial terms to your model formula. Here's a step-by-step guide on how to perform polynomial regression in R:
Load Your Data: First, you need to load your data into R using functions like
read.csv(),read.table(), or any other method that suits your data format.Create Your Model Formula: Decide the degree of the polynomial you want to fit to your data. For example, if you want to fit a quadratic polynomial, you'll need to include the linear and quadratic terms in your formula. Suppose you have a dependent variable
yand an independent variablex. To create a quadratic polynomial regression model, you can use the following formula:
formula <- y ~ x + I(x^2)
The I() function is used to indicate that x^2 should be treated as an independent variable and not as part of the formula itself.
- Fit the Polynomial Regression Model:
Use the
lm()function to fit the polynomial regression model with your data:
model <- lm(formula, data = your_data_frame)
Replace formula with your chosen formula and your_data_frame with the name of your data frame.
- View Model Summary: You can view the summary of the polynomial regression model to see coefficients, R-squared, and other relevant statistics:
summary(model)
This will provide detailed information about the fitted model, including coefficient estimates, p-values, and more.
- Plot the Polynomial Regression Line: To visualize the polynomial regression fit, you can create a scatter plot of your data points and overlay the regression line:
plot(x, y)
lines(x, predict(model), col = "red")
Replace x and y with your actual data.
- Predict New Values:
You can use your polynomial regression model to make predictions for new data points using the
predict()function:
new_data <- data.frame(x = c(new_values))
predicted_values <- predict(model, newdata = new_data)
Replace new_values with the values for which you want to make predictions.
That's it! You have now performed polynomial regression in R. Be sure to choose the appropriate degree of the polynomial based on the nature of your data and the relationship you want to model. You can experiment with different degrees (e.g., linear, quadratic, cubic) to see which one best fits your data.
- You can now open R or a similar platform that processes R (e.g. RStudio) and voila! You now have advice on how to use R within the context of your science discipline.
- I have personally found ChatGPT as a great resource to use within economics. One example that comes to mind is the use of Amazon Sagemaker within the context of Amazon Web Services (AWS). I do not have a lot of experience with Amazon Web Services or Amazon Sagemaker, and by asking ChatGPT where to begin I was able to increase my knowledge of the platforms substantially, and to better understand how to integrate it within economics.
Impact on Learning
One of the greatest impacts of ChatGPT on learning is that will smoothen the divide between mathematics and programming. Currently educational practitioners face some challenges in integrating the two, and ChatGPT will help to bridge the divide. This is especially important in disciplines such as economics, where most lecturers and academics have no programming experience whatsoever. This is helpful because it means it will be easier to learn how to build models.
Another advantage that ChatGPT has to offer with learning is that it will greatly improve productivity. Currently, in order to learn a programming language and find out how to integrate it into the discipline in which you are trained (e.g. economics), you often have to read a dedicated book on the subject, a lot of which will include material that you might not need to know. ChatGPT on the other hand helps point you in the right direction through a series of steps. For example, I am currently learning how to integrate the language Octave into economics and with the prompt How can I perform a regression analysis in Octave?, I get a very helpful series of steps as shown below:
Performing a regression analysis in Octave is similar to performing one in other programming languages like R or Python. Octave is an open-source alternative to MATLAB and is well-suited for data analysis and statistics. Here are the basic steps to perform a simple linear regression analysis in Octave:
- Load Your Data:
First, you'll need to load your data into Octave. You can do this using functions like
load,csvread, or by manually defining your data as arrays or matrices.
data = load('your_data_file.txt'); % Load data from a text file
- Define Your Variables:
Assuming you have a dependent variable
yand one or more independent variables (features)X, define them as variables in Octave.
y = data(:, 1); % Assuming the first column is your dependent variable
X = data(:, 2:end); % Assuming columns 2 and onwards are your independent variables
- Perform Linear Regression:
Octave provides functions for linear regression, such as
pinv(for finding the pseudo-inverse) or the backslash operator\to solve the normal equations. You can use either method to fit a linear regression model. Here's an example using the backslash operator:
% Add a column of ones to X for the intercept term
X = [ones(length(y), 1), X];
% Calculate the coefficients using the backslash operator
coefficients = X \ y;
- View Regression Results: You can view the coefficients and other relevant statistics to understand your regression model.
disp('Coefficients:');
disp(coefficients);
- Make Predictions: You can use the calculated coefficients to make predictions on new data points.
new_data = [1, new_x]; % Assuming new_x is a new data point
prediction = new_data * coefficients;
- Plot the Regression Line: You can create a scatter plot of your data and overlay the regression line to visualize the relationship between variables.
scatter(X(:, 2), y); % Assuming X(:, 2) is your independent variable
hold on;
plot(X(:, 2), X * coefficients, 'r', 'LineWidth', 2);
hold off;
This example covers simple linear regression. For more complex regression models (e.g., multiple regression, polynomial regression), you would need to modify the independent variable matrix and coefficients accordingly.
Remember that Octave also has various packages and libraries that can simplify and enhance your regression analysis, such as the statistics package. You may need to install and load these packages based on your specific analysis requirements.
Without ChatGPT, I would have to browse through many books to try to determine how Octave can be integrated in a discipline of my choice.
Limitations and Ethical Considerations
ChatGPT is very powerful. There is no doubt about it. However, having said this, it does have its limitations as with any technology. For example, ChatGPT - currently - cannot provide helpful information on whether a regression analysis is warranted or conforms to best practices; in other words, diagnostic tests such as the Durbin-Watson (DW) test still requires domain knowledge/expertise, and it's not something that ChatGPT can give advice on. Similarly, ChatGPT cannot really provide ethical advice on conducting a particular study, and that is something you will still need to obtain from your institution. In the same vein, it cannot provide advice on unethical practices such as "p-hacking" or R-squared manipulation. These are again things that you will likely need to obtain from your institution. Another limitation is that whilst ChatGPT can provide a step-by-step instruction of how to perform a particular statistical task, it cannot teach you from the ground-up. You still have to have some familiarity with basic statistical principles to capitalise on this particular use case.
Other Comments
In 2022, I was chosen by NASA as a "Subject Matter Expert" (SME) in Open Software. One of my responsibilities involves providing information to those in the natural sciences about why open software is useful to their line of work. ChatGPT arose to the fore after I had already been selected as a SME, and I believe it will be helpful in the future with work with NASA.
Built With
- chatgpt

Log in or sign up for Devpost to join the conversation.