Large projects and teams often struggle with organizing their issues. Sometimes duplicate issues are created, other times it can be hard to locate related issues when different words or terms are used. Traditional search methods can't find matches when different words or vocabulary are used. I wanted to create a simple interface that automatically displayed similar issues based on issue concepts, rather than words.
What it does
Similar Issues AI uses word vectors (embeddings) to calculate how similar two issues are at the conceptual level. Similar issues are presented as an issue Glance, ordered by similarity. Users can include Closed tickets with a simple filter.
How I built it
A Django API communicates with postgresql, elasticsearch, and a word embedding RPC server to facilitate all the connections. Web panels are built in React, proxied through the Django server.
Once the app is installed, all the issues in the account are indexed as vectors into Elasticsearch. When a user views an issue, we fetch similar results by extracting the tokens with the highest
tf-idf from the input issue, then calculating the cosine distance against all other issues.
Challenges I ran into
Word embeddings work less well when you have to average a vector over many sentences or phrases. Due to limitations in the
cosineSimilarity features in Elasticsearch, it wasn't feasible to construct and search a vector for each sentence or phrase within an issue. Instead, we select the tokens with the highest TF-IDF score from the issue we are searching against, and use that to generate a vector to calculate distances.
I also struggled a bit with Atlaskit components, since many of them are not truly intended to be used externally to Atlassian. I wanted to use SmartCards or would have loved a canonical Issue component, but instead built my own.
I'd also like to create a way to issue the Glance status up to date continuously so users can see up front how many potential similar tickets there are. Currently this is an expensive operation, since every time an issue is added or updated, it could impact the count of every other ticket.
Accomplishments that I'm proud of
The first time I was able to display related tickets that didn't share any words with each other, but that were linked conceptually, was gratifying. I wasn't sure if it would work.
What I learned
I had an opportunity to try about 10 different natural language and word embedding libraries. It was also good to get acquainted with Atlassian's API for future projects.
What's next for Similar Issues AI
I'd like to create a general search page so that users can find issues conceptually rather than using traditional stemming methods. If there's a way to plug into the existing advanced search, that would be preferable. After some more iteration I'd like to make it available on the marketplace.