Hackathon Results.


The advancement of medical research has contributed immensely to the well-being of people globally. This is particularly evident in the fact that contemporary forms of medical approaches spans through a wider dimension of possibilities in bringing solutions to health problems which were quite mysterious in the past dispensations. For this making effort in this regard, I heartily applaud medical scientists and researchers who endeavor so hard to bring these benefits to citizens of the world. I truly am grateful.

It is a high privilege, I guess, to any person who can facilitate, in one way or another the noble mandate that is the business of medical scientists, to participate as such. I think that the rapid advancement of medical solutions greatly depend on people's collective contribution; and if as a data analyst, or a geospatial specialist I can participate in this growth, then I gladly devote my time, thought and skills.

What it does

Using softwares distributed under a permissive open source license, I have developed a web application that

  • reads raw data from CSVs files, and for each file,
  • represents the data therein in appropriate formats, tabular or plots, depending on the contents of the file and the expected analysis output, or in ways suitable to present some useful information.

In many of the instances, there are python functions that receive data (in this case the csv contents through a dataframe) and perform more or less the same data analyses across each e.g. Working out quantile, etc.

Also, using GDAL (a raster reading and manipulation library distributed by OSGEO) (QGIS as the interface) I have identified the AOIs identified from the scanned rasters by a querying for a particular cell value or a range of values to Digitize these particular tissues. This is then visualized in a mapping interface with the ability to make alterations like labels, or deletions.

These information might be relayed through:

  • Maps: For data with spatial attributes
  • Statistical Charts: For data with numerical records (e.g bar charts, pie charts, box plots...)
  • Tabular summaries, e.g. description of a DataFrame, quantile tables, etc

How we built it

This app essentially functions by reading the various csv datasets, treating them as python *args or **kwargs and displaying some output.

The whole project runs on a python server, built with the Django web framework. The web pages that serve results, takes user inputs and display visuals of maps and charts are developed by Django in collaboration with PostgreSQL database.

I have used urllib function in python to download the data since they are initially based on a remote server. The decision to give it this kind of approach is informed by two main advantages:

  1. These data can be downloaded easily at any time, hence convenient when records are dynamic.

  2. The option doesn't necessitates sharing already downloaded data amongst colleagues because they can be readily accessible by anyone remotely.

Using pandas, these data are then written into the csv format which is common for data analysis tools.

The SKLEARN python library has been used to perform KMeans clustering.

Still with Pandas library (and occasionally SQL) , I have accessed these data, extracted vital information, like statistical analysis, datasets descriptions, quantiles, standard deviations, and other measures of dispersions.

With information already extracted, I have used bokeh plotting library tovisualize the outputs.

The way all this was done was by making specialized functions for every desired result, i.e. There is a single python function for every category of output displayed. E.g. The function that plots the Boxplot is one. It takes certain data and a few other parameters specified to give the results matching the dataset fed. The same is true for the function working the quantiles, or plotting a relation graph, etc.

A sample of Disease2BScan and Normal2BScan raster image scans have been digitized by identifying the portions highlighted as AOIs.

I used QGIS software to extract the digitized areas from the raw scans. This basically involved:

  • georeferencing the scan images-- assigning the images coordinate information,
  • working the cumulative statistics at every pixel-- to turn the 3-band rgb scans into a single image band
  • specifying the portions of the image to keep by performing an algebraic formula based on the identified cell values of the now single-band image,
  • generating the corresponding vector data.

These digitized data which requires a database with geometry support, hence use of PostGIS, have been stored into the database and reading made possible by use of openlayers javascript library. Openlayers is a free and open source platform that makes it possible to view spatial data and interact with them in a number of useful supported ways.

Challenges we ran into

In extracting the desired cells from the raster scan, sometimes the methods that successfully handle the scan A doesn't work quite well with scan B. Exploring the varied options would almost always give a working alternative.

Software upgrades from the manufacturers in the process of this development were noted, and quite intimidating was the fact that some of the used packages would not be available remotely hence transfering the app might raise some issues. An example is gdal library version 3.1.4.

Also, understanding the specific implications of the data was a little intimidating, we think because it from an advanced academic field of genomics and tissue scanning. As a data scientist, I understand my roles in performing analysis to the large datasets and possibly reducing and visualizing them to comprehensible dimensions that would be beneficial to the medical scientists.

Accomplishments that we're proud of

It will be seen that the application offers a a considerable decent interface when working with it. Data will be effectively downloaded, written to local storage and even uploaded to the app for analysis. We have an array of descriptive analyses options that cut across many of the datasets, thanks to the functional possibilities of python language. I am proud that my initial planning of the manner in which the project would ideally be implemented was attained and even surpassed because I was able to incorporate spatial mapping and a reliable process of identification of AOI in scanned images, size irregardless.

What we learned

I've learned the various practical use case of the pydata libraries that facilitated the delivery of this project. The other tools used, like the Django web framework and python as a whole, have been greatly appreciated and I believe the continued usage of these resources in the course of this development solidified my deeper comprehension of them.

What's next for Data Analysis & Tissue Spatial Mapper

I intend to delve even further into the field of data science and Geospatial technology. This is with the hope and confidence that a knowledge and proper understanding of the concepts therein will serve in an array of disciplines in this era of information and data.

Share this project: