Title:

Multimodal Lie Detection | No Cap

Introduction: What problem are you trying to solve and why?

Lie detection is an important task we not only perform in our daily lives, but is also a powerful tool in law enforcement work and in various other fields. In many situations where lie detection is crucial, we need to be able to detect lies in natural language and be able to determine the validity of a statement retroactively. Current leading systems for lie detection require physical, real time monitoring and are shown to work well only on answers to specifically worded questions. Many of these leading systems also require human administration which is prone to human error and bias. Our multimodal system explores one possible solution to lie detection which can be applied retroactively to natural language without human administration.

This is a supervised learning problem for classification and unsupervised learning problem for feature extraction.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?

Deep Neural Networks for Lie Detection with Attention on Bio-signals | IEEE Conference Publication The state of the art method for lie detection is a polygraph, automatic detection is gaining momentum.This paper outlines a stacked model for lie detection that focuses on facial landmark detection and voice pitch frequency analysis. The implementation uses python’s DLib library to detect facial features, and matches them to feature mappings from a training dataset of faces. For the audio frequency analysis, an inverse fourier transform is performed on the audio signal with an adjusted spectrum for standard deviation. Since the signal is one dimensional it can be matched to other signals using a one dimensional CNN. The final detection strings these two detection methods together into a stacked learning approach that combines the results to determine if the subject is telling the truth.

Helpful resources for understanding the data we are working with:

Deception Detection using Real-life Trial Data K-Means Clustering

Data: What data are you using (if any)?

https://lit.eecs.umich.edu/downloads.html#Real-life%20 Deception This is a dataset that was constructed for a Deception Detection paper at University of Michigan. It contains 121 video clips with audio from real trials, where each video corresponds to a label of ‘Deceptive’ and ‘Truthful’, corresponding to whether the statements made in the clips are true or false.

Preprocessing:

Video: sampling image frames, using dlib python library facial feature detection

Audio: separating it from video, possibly denoising, isolating subject’s responses

Methodology: What is the architecture of your model?

Extract visual features for deception / truth (k-means clustering)

Backup: Consider/Exploration of different distance metric

Extract audio features for deception / truth (k-means clustering)

Backup: Consider/Exploration of different distance metrics

Backup: Text analysis of transcript, finding text features to use

Linear regression for classification on data pre-processed and reshaped to have n visual features and m audio features.

Backup: Consider Logistic Regression or explore other classification algorithms

Explore different loss metrics and optimizers

Testing, Analysis, and Tuning parameters

Metrics: What constitutes “success?”

Testing: We plan on having a subset of validation data to measure our model’s performance. We also plan on testing on some portion of external data (videos from other trials or datasets) to gauge real world performance.

We plan on measuring accuracy by percentage of validation data properly classified In addition to classification accuracy, we will look at F1-Score, and the confidence interval of our predictions (i.e. we’re 40% confident this is a lie)

What are your base, target, and stretch goals?

Base Goal: > 50% validation accuracy

Target Goal: ~60% validation accuracy

Stretch Goal: > 65% validation accuracy

Ethics:

Lying is an issue to society as it makes trusting others more challenging, particularly in cases where knowing the truth is important. It is also tied to the spread of misinformation, which can have a major impact on society as people may make choices based on inaccurate information (such as vaccine misinformation). Deep learning may be a good approach to this problem as it may have an easier time capturing microexpressions humans may miss or features such as subtle changes in tone through audio data.

One major issue, however, with the setup is the data present. The data available is somewhat small, which may make this problem less generalizable. Furthermore, signs of lying aren’t necessarily consistent across different people and thus relying too heavily on specific traits may lead to overfitting. This appears to be borne out by the other models we have observed meaning we should be prepared for low accuracy on our validation set at least. Depending on how high-impact the situations the model is applied to, this could be problematic. In the case of false negatives, we may not catch misinformation in time to correct it, ensuring that the information is able to spread quickly. In the case of false positives, we may assume someone is lying when they are not. This could have consequences in criminal trials particularly if other biases are present. Thus, honesty about concerns with overfitting and inaccuracy must be made clear in future applications. These worries have also motivated using confidence interval measures to make our concerns clear and suggest a holistic judgment of the situation is needed (i. e., the model need not be the only data point).

Division of labor: Briefly outline who will be responsible for which part(s) of the project.

Preprocessing - Jadey

Visual Feature Extraction - Whitney

Audio Feature Extraction - Julianne

Linear Regression - Henry

Testing / Tuning parameters - Everyone

Written work - Everyone (TBD)

Check in 2

Intro: Lie detection is an important task we not only perform in our daily lives, but is also a powerful tool in law enforcement work and in various other fields. In many situations where lie detection is crucial, we need to be able to detect lies in natural language and be able to determine the validity of a statement retroactively. Current leading systems for lie detection require physical, real time monitoring and are shown to work well only on answers to specifically worded questions. Many of these leading systems also require human administration which is prone to human error and bias. Our multimodal system explores one possible solution to lie detection which can be applied retroactively to natural language without human administration.

Challenges: One challenge encountered was becoming familiar with new libraries in combination and determining the ideal project structure. In the process of ensuring different features were labeled, the dlib package was used to extract facial features. This was a new package used, so it required a certain learning curve to apply properly. The most challenging part of creating the model structure was finding a way to make it generalizable so that it could be adapted to the varying needs and data types that we are incorporating in our project.

Insights: Currently we have functioning code for data preprocessing, audio feature extraction, video/image feature extraction, and a model structure. These have been implemented separately according to code contracts we developed. These components have been individually tested during development. We are now working to put these pieces together to create a complete model. Plan: We are currently on track with our project as we plan on spending more time in the next day or so combining the different elements of our model . The biggest obstacle remaining for us is integrating the various components that individual members of the group have written into one working pipeline that we can then use to run experiments.

Going Forward: A basic dlib inspired model was used to ensure we have data to extract, meaning this step is complete enough for us to run our model. Furthermore, the dlib model was trained on a prelabeled dataset, which may not be representative of our data. Labeling a subset of the data collected specifically for our model, thus, may be worth exploring. In the future we would like to try other activation functions such as a relu and a sigmoid to determine which is the best for interpreting the outputs of our feature extraction. We will also dedicate time to tuning hyperparameters to optimize model performance.

Final Report: https://docs.google.com/document/d/1OhZVjsFf1G-J7kJ9PMI3r4wpL3xvoo6DLd6H1AsZ2dE/edit?usp=sharing

Final Poster: https://docs.google.com/presentation/d/1z5qU39i77QAZ3ysL6lkzSqtejqah_ZhtU9FgQhQwPpo/edit?usp=share_link

Final Video: https://drive.google.com/file/d/1SYrvxPK-3mAkqpYS29ea_uQ6daNK_v-w/view?usp=sharin