Thousands of Americans are wrongly misquoted on their health insurance yearly due to the ineffective monitoring of ICD (International Classification of Diseases) codes. However, it is difficult to automate or automatically flag mistakes, because there are so many such codes. Simultaneously, the field of Natural Language Processing has provided advances in “embedding,” which open the door to making the classification problems with many outputs a more tractable problem. We believe such advancements are crucial in any personalized medicine informatics workflow.

What it does

ICD codex creates a vector embedding for ICD codes. With just a few lines of code, practitioners can efficiently adapt their algorithms and take advantage of superior model architecture.

How we built it

We used the network library to build a graphical representation of the ICD coding structure, which was fed into a word2vec implementation. We also followed the scikit-learn API and used Twine to deploy onto PyPi. Our website was built using Sphinx.

Challenges we ran into

Our project was based on a simple idea, hinged on execution and polish. Despite the difficulty of implementing our neural network models and cleaning XML data (which is how the ICD hierarchy itself is documented), our workflow had to be seamless from the perspective of the user. It was a challenge to organize developers working on different aspects of analysis and design, especially remotely.

Accomplishments that we're proud of

We are proud to have brought our vision to reality. Today, anyone in the world can run pip install icdcodex and use our software to build healthcare informatics applications for personalized medicine in just a few lines of code. Furthermore, they can access well-designed documentation at, making the barrier to entry quite reasonable.

What we learned

Our team learned the importance of data representation in personalized medicine. Thoughtfully designed algorithms and data structures for representing a patient’s health paves the way for more automation, fewer chances for error and a better allocation of hospital resources. Furthermore, this experience taught us the importance of high-level documentation to ensure clarity and understanding of our work, which we were able to do through a Sphinx website.

What's next for ICD-Codex

Going forward, ICD-Codex will serve as an easy-to-use and impactful API for personalized medicine workflows. Our next step is to partner with hospitals to double-check their medical coding so point out errors in a timely and more accurate manner.

Built With

Share this project: