Inspiration
We are concerned about the injustice that pervades our society during the conviction of crimes. There are many who are wrongly accused of committing a crime only to be exonerated much later or worse, executed. Since 1989, there had been 3249 exonerations in the USA, causing a total of 27200 cumulative years lost for the individuals wrongly convicted [1].
There are AI crime prediction mechanisms that wrongfully accuse and convict innocent people based on certain biases. For example, in a 2020 case, a man was convicted due to a faulty facial recognition algorithm and hence had to serve jail time [2].This could have been avoided if additional layers of checks had been administered to determine the probability of conviction.
[Conventionally, AI models predict an individual’s likelihood of being a convict using a model trained only with a dataset of convicts. This is inherently biased as the model is predicting convicts using a narrow dataset which will result in a greater number of convicts being wrongfully predicted. This is because predictions are being made using feature/variables from the convicts dataset, which only shows whether a person having those features/variables will be convicts and does not show what features/variables an individual can have to be regarded as innocent. For example, if a dataset of convicts determine that feature A is necessary to predict an individual as a convict, if an innocent individual with that specific feature is run through the model, they will be wrongfully convicted. This model is thus unethical as it embodies biases and tends to lead to wrongful convictions due to the skewed dataset.
Therefore, our model will make use of the combined datasets of convicts and wrongful convicts to better determine the correct features/variables that an individual should and should not have in order to be convicted. This will thus reduce the bias of the previous model as the set of features/variables that are needed to determine an individual as a convict is obtained from a balanced dataset. For example, feature A from the previous model may have appeared in the dataset of wrongful convicts, and thus its importance in predicting an individual as a convict would have been reduced. Therefore, it is more ethical to train the model using a balanced dataset that considers both sides of actual convicts and wrongful convicts will help to create predictions that are more fair for all.]
What it does
In our report, we are only considering individuals that have been arrested. While the investigative process prior to one’s arrest has many inherent biases, we feel that it would be more impactful if we focus on the last mile, which is the conviction process. If one is arrested, we would like to reduce the chance that one is wrongfully convicted as they have to go through all of the stress and suffering from the sentence.
Here, we define a wrongfully convicted individual as one that has been arrested but exonerated afterwards. Similarly, a correctly convicted individual is defined as one that has been arrested and did commit a crime. We acknowledge that it is a sweeping statement to say that individuals that are not exonerated afterwards means that they have indeed committed a crime. However, without further analysis on the investigative process and understanding each case in detail, we can only assume that the professional judgement of the judiciaries are correct and thus these individuals are correctly convicted.
With that, we hope to be able to create an AI that helps to detect whether an arrested individual is correctly convicted using data from past correctly convicted individuals and wrongfully convicted individuals so that even if someone is wrongfully arrested, they will not be wrongfully convicted.
Additionally, we would like to understand the causes of the wrongful convictions
How we plan to build it
The machine learning model is written in python. Data preparation is performed using the Python Pandas library. The appropriate machine learning algorithm will be chosen by the PyCaret library, which will also conduct the learning process.
Challenges we ran into
Difficulty in choosing the correct family of variables. There are multiple variables that have to be considered that can have an effect on an individual’s probability of being convicted. Thus, we have to choose the best set of variables that is able to produce a model that can predict whether the individual should be convicted. Dataset has many variables, resulting in the difficulty to produce models. Initially, the dataset we obtained had too many variables which would render the generating of the model to be inefficient. Hence, we had to reduce the number of variables to be fed into the algorithm to generate the best model most efficiently. Creating a coherent and ethical solution.
Accomplishments that we're proud of
Our model is able to make predictions with high accuracy
What we learned
There are numerous ways to improve the fairness of an AI model asides from making modifications to the proportion of the dataset. Let A be a feature. Most models created are only able to determine if a particular input exhibits feature A. Often these models associate the absence of feature A to be equivalent to the presence of another feature B. This is an erroneous conclusion as the absence of A does not necessarily imply that it exhibits feature B.
What's next for Crime Watch
Since the Crime Watch is able to accurately determine if an individual is correctly convicted (with sufficient data), we intend to run the model on all prisoners. For prisoners who are found to be wrongly convicted through the model, additional investigations can be conducted to ascertain if the prisoner should truly be exonerated.
We are confident that our model will improve the accuracy of the justice system and in doing so, free up more resources that can be used for other meaningful purposes.
Log in or sign up for Devpost to join the conversation.