Spatial transcriptomics has already proved to be a powerful method to profile gene expression across tissue sections without loss of spatial information. This technology reveals the complex composition of cellular structures and previously undervalued heterogeneity of tissues. Unlike conventional RNA-seq methods spatial transcriptomics technologies, e.g. NanoString Digital Spatial Profiler, results in much more data that should be aggregated and analysed together to gain deeper insights about biological processes. The data includes multi-dimensional gene expression data, coordinates of the profiled tissue segments, color intensities of several marker proteins constituting the image of the profiled tissue. And that multi-dimensionality of the data provided by the technology poses a challenge to scientists and scientific tools developers.
In addition to it, our project is markedly inspired by applications in remote sensing, an established field dealing with interaction between multi-spectral data and spatial patterns. Both fields spatial omics and remote sensing deal with similar problems as the sparsity of localized measures, like the gene profile in a ROI(region of interest) or the water precipitation in a small delimited region. An interesting task is to infer properties outside these measured regions, this could be done through the use of interactive visualizations or statistical methods. In particular our front-end design brings elements from google maps and previous Nanostring visualization tools.
What it does
In our project we decided made an attempt to solve a sophisticated problem of integrating gene expression information with immunofluorescence images. In this task we were inspired by geospatial image analysis and some works on tissue image processing using CNNs.
How we built it
Our approach makes use of spatial features extracted from the IF images by the pre-trained neural network resnet50 and the ssGSEA enrichment scores. ssGSEA can be considered as the procedure for dimension reduction of gene expression data that successfully keeps biologically meaningful information about molecular processes in specific tissue regions. The neural network results in more than 2,000 features that are redundant and require some filtering. We decided to filter special features by coefficient of variation, as the most variable features should represent the heterogeneity of tissue and the difference between DKD and healthy samples. The selected spatial features made us able to map the correlated enriched gene sets onto the image and to see the tissue regions with elevated activity of specific biological processes.
Challenges we ran into were:
- With the time provided we could implement the first steps before we realized that our spatial features are not very selective or informative or easy to interpret. Thus the mapping of ssGSEA results may not enrich our understanding of transcriptomics changes in different parts of tissue.
- Another aspect was to define a goal, despite having a decent understanding of the data and participating in the weekly office hours, the hackathon objectives remained too open ended and it was difficult to guess which kind of tool a scientist might find useful.
However, we believe our approach is promising, though it requires some elaborate refinement.
What we learned:
There are a few ideas how we can improve the approach we have.
- Use of the CNN that was specifically pre-trained on the IF images dataset, the more thorough selection of spatial features and the design of better metrics to find ssGSEA scores correlated with spatial features.
- We tested a lot of tools, some of which were not part of our final deliverable but nonetheless will be useful knowing in future projects, some of those: heroku, qgis, leaflet, terracotta, umap, bokeh, flask, R, R Shiny.
- As lessons in project management, we learnt to keep goals more concrete and narrow and to estimate better the time required to implement application features. And we also learnt adding new member towards the end of the project does not help to finish deadlines.
What's next for Nanostring Data Visualization:
Though our project at the time of submission does not represent the successful solution of the stated problem, we still think that the approach we developed has value and can be used by scientists. A larger scale project could consider enhancing the collaborative work between scientist through a web application displaying public datasets, with all the tools required in conventional analysis and computer resources provided by cloud third parties. From the pattern minning side we look ahead to the development of diverse techniques to extract spatial features from inmunofluorescence images and correlate them with the gene profile. Nonetheless several challenges must be adressed: how to standardize the extraction of spatial features? how to interpret them? how to be confident of generalizations done outside the regions of interest?