Final Report

Due to DEVPOST MarkDown restrictions the Final Report is submitted to our github repository.

Check-in 3

Introduction

Many of us graduate students spend countless hours preparing application materials for well-known fellowships in order to receive the necessary funding to complete our research. The National Science Foundation’s Graduate Student Research Fellowship is arguably the most well-known graduate fellowship available to citizens of the United States, but as a result, it is a highly competitive fellowship to receive. Many fellowships such as NSF GRFP are also known for being extremely hard to predict, so many applications, even if appearing qualified by many standards on paper. We are seeking to better understand the features of a winning NSF GRFP essay and how applicable these features are to other similar graduate research fellowship applications. The fact that none of us can escape the issue of funding in graduate school (even at Brown) have initially motivated us to detect the features of a winning NSF GRFP essay. The accessibility of NSF GRFP winning essays on the internet have transformed this ambition to our class project. The problem we will be dealing with in this project is an Unsupervised Learning problem since our data will not have labels (winning vs. losing essay). Instead we are seeking to learn the features of the input data.

Challanges

The intended data to be retrieved originally comprised 288 entries. However, some of the files attached to these entries were found to be corrupted. We have addressed this issue through hyperlink manipulation by passing the document Google ID for Google Drive documents, to the default Google export URL, and by sending direct hyperlink requests along with headers for various search engines. Optimizing the evaluation metric was expected to be a challenge for this project. We have begun to develop our approach to handle this, but we are still working on the code for this part.

Insights

It is important to note that this project is novel, both in its focus on NSF applications and in its use of outlier detection as a method. No other sources have implemented such a method for this subject. Therefore, producing the first performance metrics represents a significant milestone, especially since the main goal of the project is not to outperform existing examples initially. Concrete results and comparable metrics will be produced next week. Our expectations for the preprocessing phase have been met so far. Training and test datasets have been created, and model development has begun on these datasets.

Plan

We are largely on track with our project plan. We had expected to complete the preprocessing phase by this point, as retrieving the data posed a significant challenge. We have now prepared the training and test datasets. Encoding has been completed, and we are actively working on the model. These achievements indicate that we are aligned with our target goals, and we may even reach our stretch goal if we do not face many unexpected challenges. We need to devote more time to optimizing our evaluation metric and further developing the model in the next steps.

Can you write a winner? Feature identification and outlier detection of winning NSF GRFP essays

Aley Abdel-Ghaffar, Hannah Snell, Shevaughn Holness, Yu Zhu, Zeynep Kilinc

Introduction

Many of us graduate students spend countless hours preparing application materials for well-known fellowships in order to receive the necessary funding to complete our research. The National Science Foundation’s Graduate Student Research Fellowship is arguably the most well-known graduate fellowship available to citizens of the United States, but as a result, it is a highly competitive fellowship to receive. Many fellowships such as NSF GRFP are also known for being extremely hard to predict, so many applications, even if appearing qualified by many standards on paper. We are seeking to better understand the features of a winning NSF GRFP essay and how applicable these features are to other similar graduate research fellowship applications. The fact that none of us can escape the issue of funding in graduate school (even at Brown) have initially motivated us to detect the features of a winning NSF GRFP essay. The accessible of NSF GRFP winning essays on the internet have transformed this ambition to pur class project. The problem we will be dealing with in this project is an Unsupervised Learning problem since our data will not have labels (winning vs. losing essay). Instead we are seeking to learn the features of the input data.

Related Work

Many deep learning models for automated essay grading and evaluation have been made before (https://link.springer.com/article/10.1007/s41237-021-00142-y) but not specifically for fellowship essays . This is less of a classification project but more of an exploratory work to identify the key factors in what makes a “winning” NSF GRFP essay. In terms of other related work comparison, the key obstacle we’re encountering is an imbalance of data, that in the dataset we found, there are much more funded data points than the unfunded ones. Therefore, while it sounds like a relatively simple classification problem, we need some model that can deal with such data imbalancing. One paper we plan to re-implement is “Anomaly Detection Using One-Class Neural Networks.” The main purpose of this paper is to develop a deep learning based outlier detection model, that says, they determine the data points that look “abnormal”, namely outliers based on some latent representations from an autoender. It is suitable for our problem as we can treat the funded data points as “normal class” and the unfunded data points as outliers. The novelty of this paper is training the feature representation process (i.e., the encoder) using a modified outlier detection objective, while the previous method first gets the feature representation from an autoencoder structure and then feeds it into a separate metric for outlier detection. We’ll adapt their structure except make the encoder and decoder suitable for the text. “Text Classification via Large Language Models” is another approach we plan to try for this imbalance dataset. As the proposals for the grant are basically some texts, a large language model should also be suitable for our condition. Specifically, the Clue and Reasoning Prompting method developed in the manuscript is capable to do the text classification with not only the classification result, but also the key words they extract from the text (namely “clue”) and the reasons they decide to be the output (“reasoning” part). Moreover, CARP has performance under zero and few-shot conditions. It is feasible for our dataset while there are a limited number of unfunded data points.

Data

In this project, NSF GRFP application data will be used to analyze the acceptance features. Data has been obtained from a public database (https://docs.google.com/spreadsheets/d/1xoezGhbtcpg3BvNdag2F5dTQM-Xl2EELUgAfG1eUg0s/edit#gid=0) and includes the proposal of applicants, their seniority, personal statements, and their area of interest. A total of 287 applicant records are included in the data set. A slight preprocessing is required to prepare the proposals and personal statements of the applicants for use in the application model.

Methodology

We plan to implement two model architectures (1) We will use a One-Class Neural Network (OC-NN) based on the following paper (https://arxiv.org/pdf/1802.06360.pdf). While we’ll adapt the most architectures and object function from that paper, we’ll modify the 2D CNN in both the encoder and decoder blocks (designed for images) to be 1D CNN along with an additional layer for embedding transformations from the tokens. (2) We also plan to use another language-based model Clue and Reasoning Prompting (CARP) based on this paper (https://arxiv.org/pdf/2305.08377.pdf), which is a few-shot prompting method that works well one low resources. We’ll follow their pipeline, but probably preprocess our texts in certain way to make it feasible to the model (for example, we probably crop the text or summarize with the assistance of language model to input enough prompt-answer pairs)

Metrics

The “success” can be measured by the accuracy of classification results. Specifically, the model has two major outputs: if the input is an outlier (categorical) and the distance from the cluster space (numerical). In order to train the model, we will need to add NSF-GRFP winners and honorable mentions. A successful model will be one where the distances for the honorable mentions are significantly different from the distances of the winners. Additionally, the distances of any other graduate application type (i.e The Ford Foundation) would get further than the honorable mention. We are planning to experiment on the interpretability metrics of our model to evaluate its success even further. On top of that, for the stretch goal; we are aiming to simulate negative cases to assess the performance of the model with more rejected cases. For this project the notion of “accuracy” does not apply in the same way we have been thinking about in class. There is no percentage or exact way to quantify how well the model works. It is more so a measure of are we seeing what we expect to see. For this purpose, the paper that we are getting our foundation from is “ANOMALY DETECTION USING ONE-CLASS NEURAL NETWORKS” by a Chalapathy, Menon, and Chalwa. The purpose of the paper was to create a deep learning one-class outlier detection model that is training the neural network with the explicit purpose of outlier detection.

The reason we are doing this is because we were originally interested in creating a deep learning model to predict if an NSF-GRFP draft would be funded or not. The problem arose where we had access to many successful NSF-GRFP examples, but few unsuccessful ones as people do not post their failures often. Thus, outlier detection-which only requires the group and some anomalies-became our focus. The innovation on top of that is to investigate what features of the input data the model found most important to creating the cluster (model interpretability) and investigating how close accepted NSF-GRFPs are to other competitive graduate level fellowship programs.

Goal type
Base goal	The base goal of this project is to be able to identify the features of a selected NSF GRFP grant through detecting the outliers in a group of funded and unfunded applications.
Target Goal	We aim to outperform regular machine learning algorithms to assess whether a grant application will be accepted by the NSF.
Stretch Goal	We intend to simulate data from unfunded project applications. With the assessment of the quality of this data, our goal is to use the generated data to evaluate the success of our model even further.

Ethics

This investigatory approach to looking at applications allows us to interpret what resources are valued during the review process. Given the hefty weight of funding and the pressure to receive external funding, we can see the effect of undergraduate or graduate institutions and research opportunities, as well as writing resources. These opportunities are not innately equitable and will provide insight as to who gets heard and whose ideas are invested in

The amount of text data in this specific area is vast. Each year tens of thousands of applicants will submit their essays, and close to a thousand will be admitted. There is a general outline or framework for coaching students who are writing this essay, but a deep learning approach allows quantifiable analysis of qualitative data. It will also allow for much faster analysis of large data that can be useful for future applicants.

Student	Labor
Aley Abdel-Ghaffar	Data preprocessing, Model testing, paper writeup
Hannah Snell	Data preprocessing, Model building, paper writeup
Shevaughn Holness	Model building, accuracy metrics + interpretability, paper writeup
Yu Zhu	Data scraping, Model testing , paper writeup
Zeynep Kilinc	Data scraping, accuracy metrics + interpretability, paper writeup