Inspiration

Both of the members of our team have been fortunate enough to work in management over the last few years. One of the most stressful times of year for supervisors and their direct reports is annual performance review time. Even those rare managers who have been diligent note-takers need to find their notes, compile them and communicate with the users. The feedback loop is too slow and too much information is lost for meaningful improvements to me made, even when everyone commits that "it will be different this year".

Out project explores an approach to performance appraisal that is participative, transparent and almost instantaneous, all without having to fill out annoying forms or send emails back and forth.

What We Built

Generating Synthetic Report Data

As we didn't have access to real performance appraisal material we elected to generate some using chatGPT. We elected to use the setting of the Nuclear Power Plant from 'the simpsons' to give the model extra context to draw on. The model was given variants on the following prompt to generate lists in the language of performance appraisal notes:

Pretend you are Mr Burns from the TV show 'The Simpsons'. In 2021 you observed Homer Simpson's work and kept a log of his performance. What are 10 entries from that log? Each entry should be of the form:

On <date> @name was observed to perform to a <poor / satisfactory / good / very good / excellent> standard. This was evidenced by  <summary of action>. Their actions show <list attributes here that match the action, each prepended with a #>

For each summary of action in the log, fabricate some specific examples, using other characters or locations from around the nuclear power plant in the show 'the simpsons'

Small corrections, modifications and specific event prompts were made to generate approximately 210 observations across 10 Nuclear Power Plant Employees.

See Chat-GPT for more

Parsing

Structured data is more useful than unstructured. It is typically also more laborious to produce and a deterrent to on-the-fly note takin. Our approach considered user ergonomics, and the knowledge that anything we can do to reduce the difficulty of use will increase likelihood of adoption. Guided by the realization that the use of very informally structured language utilizing symbols like '@' for entities and '#' for themes we design our approach to embrace their use, rather than try to pull users towards a less natural approach.

Our parsing approach aims to maximize the amount of information extracted from the logs themselves, as well as expose managers and employees to meta aspects like the sentiment of language used by individuals, changes in performance over time and themes that might cut across entire sectors of the workforce. Bringing together this information without drowning anyone in it is a key step towards an improved performance appraisal system.

Report Summarization

Text summarization comes in 2 forms: Exctractive and Abstractive. Extractive summarization involves selecting a few key sentences or phrases from a long text, to produce a shorter summary, while leaving the rest of the information out. Abstractive summarization involves synthesizing a summary similar to an abstract, constructed of different sentences or phrases than appear in the long text being summarized. This method often involves combining sentences that have redundant information, and skipping details that aren't key to the overall message of the text. For this project, we perform Abstractive Summarization on synthetically generated Employee Performance Reports, as part of a tool that is aimed at helping companies, supervisors, and their employees understand key performance metrics at a glance.

The Kibana Dashboard 'Report Summaries' were generated using a well-established Natural Language Processing Text-to-Text Transformer model, T5, that was trained on common crawl data. The summaries are generated via the pre-trained T5 model, which we modified to operate on nested batches of report data, allowing us to summarize reports chronologically, based on when they were generated.

The model condenses the entire collection of reports for each employee into a few key takeaways about his or her performance. The original reports are stored in the database and can be accessed to verify the content of the summaries for accuracy at the time of each employee's evaluation.

Report Sentiment Analysis

To further aid in understanding trends across reports, we used Python's Natural Language Tool Kit NLTK to perform Part of Speech (POS) Tagging and Sentiment Analysis on the content of the reports. Linguistically, adjectives and adverbs are typically most indicative of the sentiment of a piece of text, so we extracted those words from the reports and flagged them, creating an additional field in the ElasticSearch schema to allow for further analysis of sentiment trends via the Kibana dashboards. We also used VADER (Valence Aware Dictionary and sEntiment Reasoner), an open-source lexicon and rule-based sentiment analysis tool, to compute overall sentiment scores for each report entered in the database.

What we learned

We took a deep dive on NLP techniques like text summarization and sentiment analysis, and learned a ton about data engineering, including new technologies like Elastic and Kibana, AWS, and Twillio. We developed out Python skills and connected everything together to solve the problem we set out to address at the start of Bitcamp.

What's next for Mr. FAT (Management and Reporting Feedback Analysis Tool)

Future work for this begins from a position where we have achieved almost all of the goals we set for the hackathon. We want to maximize our usability. That means finding ways to keep the interface (simple text with familiar language) simple while improving the integrity of the data. We see improvements to our parsing suite, deduplication of the database and cleaning up the codebase as essential. We want to improve the availability, we see first steps as extending the SMS functionality to an 'always-on' system and migrating the whole system to the cloud.

Share this project:

Updates