The story

Artificial intelligence (AI) can help provide remote health-care service. But AI is not a silver bullet, it only assists humans, it does not make a call for an ambulance, nor does it prescribe medical treatment. The value aiDiagnose is to provide fast-forwarding steps for two of the actors involved

  • health care professionals, by preparing the facts for making diagnosis
  • patients, as using remote service usually involves far to many form filling questionnaires In the intersection of the two described use cases stands aiDiagnose.


We have background in telecom domain in recognizing types of error in syslog records and we aim to apply this knowledge to text recognition of medical data sets.

Overview of problems by actors

Having in mind a perspective from the eyes of each actor, two key problems are detected:

  1. Patient can not be expected to provide good medical description alone
  2. Doctor invests a lot of time to find out symptoms from the patient (live or in live-stream)

Hybrid approach

We evaluate existing approaches to support remote diagnosis and mark negative sides (-), positive sides (+)

Questionnaires approach

  • (+) Systematically guided by expert knowledge
  • (+) Number of flow charts exist in medical handbooks
  • (-) Strict form, many user actions, steps, bad UX

Textual description symptom extraction

  • (+) natural language Better UX,
  • (-) No guidance from expert knowledge
  • (-) More difficult implementation of symptom extraction, requires data science, AI, etc

In order to benefit from both approaches, aiDiagnose applies a hybrid approach described in a high-level usage scenario.

High-level usage scenario decomposition

  1. Patient inputs textual description.
  2. aiDiagnose extracts symptoms from the text
  3. aiDiagnose lists out a family of best matching diagnosis
  4. Questionnaires or live stream with healthcare professional can take over

Scenario starts with inputting textual description (which is more natural to the patient). Next, aiDiagnose applies data science algorithms and AI to allow textual extraction of symptoms from text. After symptoms are extracted, the algorithm determines a point of switching to questioning approach to ask for missing information before proceeding to live stream with the doctor.


  • annotations keywords and phrases as symptoms and association of symptoms with diagnosis

The solution overview

A machine learning engine that is capable of text processing, web component that is supporting text words highlighting and annotating symptoms, a component for pairing symptoms with diagnosis, backend database for storing data sets and learned data.

During the hackathon

In the scope of hackathon a PoC system. We reuse our knowledge of template recognition in order to build the system which can apply to medical data sets

  • Find adequate data set for marking symptoms
    • Import dataset into clustered database Apache Druid
    • FE tokenizer and highlighter of keywords
    • Solr BE support for FE tokenizer and highlighter
    • Storage of annotated symptoms and diagnosis

After the hackathon

  • Fix UX, bugs
  • Learning ML engine to make inference of diagnosis based on symptoms


Data set for medical purposes. We have been looking FHIR standard and it is a big specification. It would help if we could get hints on which data set and which attributes to analyse.

We encourage all health care professional to engage in crowdsourcing of data sets

Feel free to post a comment if you know of any data sets useful for our case or for any other project out there


Let us know what you think

Built With

Share this project:


posted an update

We have created a slack channel #open-medical-datasets during hackathon in order to help gather knowledge on public medical data sets

We are looking for

  • standardized data sets (like FHIR, but not excluding other)
  • flowcharts of diagnostic procedures

Log in or sign up for Devpost to join the conversation.