Note about demo: The demo is available at http://www.astr.xyz, but due to costs, I couldn't afford a server that could handle the deep learning models. However, both the "Astr Screening" and "Astr Mind" are available.
Despite machine learning showing amazing results in many tasks, it will still be a long time before it can be implemented into real diagnostic systems due to both pure accuracy and public trust. Based on both the shortcomings of machine learning and state of the art research, I'm proposing a platform that implements machine learning in a more assistive sense by building upon, instead of replacing, hospital infrastructure.
What it does
It consists of three parts: Astr Insight - A modular pipeline that uses deep learning to implement better preprocessing techniques from image superresolution to anomaly detection for both human-based and machine learning-based diagnoses. It also uses the GradCAM architecture to provide insight into the "thought process" of neural networks and why a certain classification was made, assisting doctors in diagnosis instead of just producing a separate classification value.
Astr Screening - It uses existing information within hospital databases or from routine checkups to automatically detect diseases that a given patient may be at risk for. Because of the availability of the input features, it's able to scan large amounts of patients and serve as an automatic "early warning" system that then leads to consults with physicians. The main interface is a REST API designed to be implemented into hospital software.
Astr Mind - Connects patients with therapists in a more personal way by using a chatbot interface that mimics natural conversation. It alters its mode of speaking in order to reflect or "empathize" with the sentiment of the patient. Finally, it uses keyword processing algorithms to automatically detect what kind of therapist the patient needs based on natural conversation alone.
How I built it
Astr Insight - There are 4 deep learning models in this module. The superresolution and denoising are "Residual Dense Networks" that are designed to be better at capturing local information in images. The anomaly detection is accomplished by a variational autoencoder that probabilistically models image distributions. With a trained VAE, you can calculate a lower bound on the log likelihood of seeing a given image. Based on the histogram of lowerbounds, boundaries were determined to classify images as "anomolous". The skin cancer detection model was very simple: just a ResNet50 with a dense classifier fine tuned on the HAM10000 dataset. I didn't spend too much time on that one as pure classification was not the focus.
Astr Screening - Because of the time restrictions, I was only able to implement 4 detection systems. The classifiers used were Random Forest and Gaussian Naive Bayes. They were trained on publicly available datasets of common diseases, with features optimized for both accuracy and occurrence in regular hospital data.
Astr Mind - I scraped articles from Wikipedia concerning various psychological conditions. They were preprocessed by the removal of stopwords, stemming, and lemmatization. Those articles were then passed through the "Rapid Automatic Keyword Extraction" algorithm to extract keywords. It is based on those keywords that classification is made. As a backup, for ambiguous text, a general therapist search is made instead. The sentiment classification is based on the polarity score given by TextBlob.
Challenges I ran into
The preprocessing and classification pipeline was difficult due to resource constraints both in my own personal computer and in the server. There are ways around it - e.g. I used tflite for the variational autoencoder, but I needed to calculate gradients for GradCAM, which isn't available in TFLite. It is for that reason that the online demo does not include the "Astr Insight" feature.
Accomplishments that I'm proud of
I definitely got better at web design in this process, I became more familiar with a css framework, which will be very useful for later projects. The screening machine learning models, along with the keyword-based classification, also turned out better than I thought.
What I learned
I became a lot more familiarized with many different aspects of both machine learning and webdesign. I feel much more confident doing something similar to this in the future.
What's next for Astr
I want to find a way to reduce the resource costs of the different models. I think that it wouldn't be too difficult getting the memory usage of the RDN models down, but the classifier is much more difficult as it has to be compatible with GradCAM.