In the age of AI, it's easy to fall into the trap of thinking that quantitative and automated methods remove bias. However, this is far from the truth - in fact, AI can amplify the biases of those who design the models. Using predictive policing algorithms to predict crime can be incredibly harmful when these algorithms are trained on biased data stemming from discriminatory policing. In response to this, Equity in Action was created to analyze and identify potential bias in policing using data from Boston Police Department.
Since May 25, 2021, Boston Police Department (BPD) has adopted a policy aimed at building and strengthening trust with all members of the community. The policy acknowledges that bias can occur at both individual and institutional levels and that biased practices are unfair, ineffective, promote mistrust, and perpetuate negative and harmful stereotypes. As per the department's policy, all individuals who have contact with BPD personnel will be treated fairly, impartially, and without bias, with no consideration of specified characteristics. Officers are prohibited from discriminating based on personal prejudices or partiality towards classes of people based on specified characteristics.
Equity in Action analyzes the data to determine whether bias has been eliminated from policing post-policy implementation. The focus is on neighborhood bias, as demographic factors such as race, ethnicity, gender identity, religion, socioeconomic status, and others often influence the policing district or neighborhood where people live. The aim is to provide evidence-based insights into the effectiveness of BPD's commitment to fair and impartial policing for all members of the community.
What it does
Equity In Action is an comprehensive policing data dashboard that utilizes cutting-edge visualizations, natural language processing, and tree-based models to help users identify and analyze potential bias in policing. The dashboard features three datasets from the BPD, all from 2022, with descriptions of the columns and the first 5 rows displayed on the home page.
A user-friendly navigation bar allows easy switching between pages, including Correlation Heat Maps, Frequency Plots, Scatterplots, Pie Charts, Bias Buster: Best AI Detective Around, Decision Trees, and Support Vector Machines. Each page provides a unique form of analysis, with detailed methodology outlined at the top of each page explaining my approach of analysis and how I produced what you see on the page.
How I built it
Prior to deploying the web application, I performed data cleaning using Pandas for preprocessing. This involved extracting time variables from strings, converting categorical variables into binary variables through one-hot encoding, and grouping data by policing district to obtain summary statistics. I performed any other pre-processing and data cleaning steps using Pandas as well.
To create dynamic and personalized visualizations, I utilized seaborn, as well as Plotly's graph_objects and express packages. With these tools, I could customize and create interactive visualizations that allowed users to zoom in and pull up new tabs for a better view. Users are also able to choose which variables they would like to visualize, providing a more personalized journey towards uncovering potential bias in policing data.
To develop the tree-based machine learning models, I utilized the sci-kit learn package to create both test and training data. After training and testing the models, I generated predictions and calculated key metrics such as accuracy, precision, f1, and recall scores to showcase the efficacy of the models to the users. In addition, I created bar graph visualizations for each model, Decision Trees and SVM, to highlight the correlation between specific features and the outcome variable, which in this case was policing district. These visualizations emphasized the features associated with policing and how they can represent biases that officers may have depending on the neighborhood.
To enhance the project, I integrated natural language processing to create a chatbot called Bias Buster. This AI Bias Detective aims to identify potential biases in police interactions with civilians. I used OpenAI's API to analyze the notes taken by police officers during these interactions and detect underlying biases. The chatbot has two main features: "Bias Buster Knows All" and "Find the Bias". With "Bias Buster Knows All", users can engage in conversation with the chatbot by asking questions and receiving responses. "Find the Bias" allows users to input a specific case number, which serves as the primary key for a police interaction, and view the corresponding notes along with Bias Buster's analysis of whether bias was present or not.
To develop and launch my web application, I utilized the Streamlit framework. One of Streamlit's unique features is the ability to display balloons on the home page, which adds a fun and creative touch to the dashboard.
Additionally, the dashboard's website includes detailed information about the methodology used at each stage of the project.
Challenges I ran into
The initial challenge I faced was dealing with mostly categorical variables. Most of the packages I explored required numerical variables or encoded categorical variables. Therefore, I first determined the encoding method and then selected the features to include in the model. Initially, I randomized or used all features, but the model performance was low, which became discouraging. This led me to consider pivoting away from using machine learning to identify the influential features on crimes and policing.
To address this issue, I explored different packages, such as SelectPercentile, f_classif, mutual_info_classif, chi2, and SelectKBestModels from sci-kit learn to determine the best features to use. Despite comprehensive feature selection, the model performance was still low. Given more time, I would explore boosting and stacking to improve model outputs. However, for now, I found a way to demonstrate their value through feature importance and visualizations.
Accomplishments that I'm proud of
I am extremely proud of the level of comprehensiveness achieved in Equity in Action dashboard, despite working alone. It was challenging to bring everything together, but it all eventually came together beautifully, and I am ecstatic for the future of this project. I utilized every variable, column, and data value provided to me, including NULL values (which I am interested in analyzing further for patterns to see how lack of documentation may be related to bias and policing).
Initially, my focus was solely on developing a machine learning project to detect bias in a given neighborhood, but I overcame several obstacles to come up with a much more robust and diverse approach for detecting bias. The inclusion of the interactive visualizations, AI Detective Bias Buster, and tree-based models makes this dashboard one of the most advanced data dashboards, especially when compared to current dashboards on the Boston Police Department and other police department websites.
One of the most empowering features of this product is its ability to educate users on statistical and data analytics tools while providing them with control over how the tools are curated and implemented. As I further develop this project and introduce it to community leaders and the police department, I believe it can have significant implications for the tools we use to identify bias, as well as how the average citizen can become more involved and aware of the policing taking place in their community.
What I learned
Through building this web dashboard, I have gained valuable experience in creating interactive and comprehensive data visualization tools. Specifically, I have become proficient in utilizing streamlit for my dashboarding needs, and I plan to use this tool for future projects. I have also expanded my knowledge on Support Vector Machines, including their functionality and practical applications in machine learning. Additionally, I have discovered various methods of visualizing data to better communicate insights and tell a story, such as the use of Seaborn and Plotly's graphing packages. Overall, this project has allowed me to grow my skills in data analysis and presentation, and I am excited to continue building on these skills in future projects.
What's next for Equity In Action
Equity in Action has a bright future ahead with the potential to address systemic issues that still exist for marginalized communities. As someone passionate about AI Ethics, the future of community-centered policing and a marginalized individual myself who has been affected by over policing and bias in policing, I see this project as just the first step.
Based on my analysis, I found temporal correlation to be a trend when it came to predicting policing district. To build on this, I want to implement time series analysis using methods such as animations, ARIMA, and ACF to visualize how policing changed over time in different neighborhoods in 2022.
I also plan to include a map of each district that visualizes attributes like the most common crime, most common victim gender, and police officer responsible for most frisks that is customizable for users. To look more closely at intersectional identities and how they affect the outcome of police and civilian interactions, I will implement feature engineering.
To further improve the project, I want to make all the models and visualizations more customizable and advanced. I also plan to incorporate additional natural language processing techniques with the chatbot, such as sentiment analysis for policing notes and how it varies based on civilian characteristics.
I believe that gathering more data from civilians about their opinions and experiences with the police would be invaluable, especially from those who are homeless or incarcerated, as homelessness is a significant issue in Boston.
Overall, I am committed to building something special that will change the lives of those affected by biased policing. If we want to use predictive models to predict crimes we must first ask ourselves, have we fully committed to create datasets that won't amplify the biases of the individual police officer? Can we detect bias in these datasets and in policing first before we deploy them into models that will have a systemic effect?