Sequence to Sequence learning is a common paradigm in machine learning, which has been successfully applied to many real-world problems such as machine translation, automated video captioning, text summarization, and more. Functional MRI also captures sequential information in the brain; however, there is still substantial mystery about how sequential sensory information such as text, video, or audio, can be used to predict those dynamics.In this work, I apply Sequence to Sequence learning to the problem of predicting functional MRI connectivity.
I hope this can be useful both in clinical, entertainment, and research applications, as we all continue to uncover the mysteries of the human brain.
What it does
The model currently takes text data as input, and predicts how connectivity will change in the brain based on that input. With a simple web-based widget, users can explore how different kinds of text input influence connectivity in the brain.
How I built it
The model is trained using a Sequence2Sequence transformer, which was pretrained on a Sequence2Sequence natural language translation task. We accumulate several data sets of subjects reading, watching video, and listening to audio from the OpenNeuro open source database, preprocess these images using an AWS instance, and then train the model on a Google Cloud instance with an NVIDIA GPU.
The trained model is then served via a react app which should allow users to explore how text data, and perhaps other sequential data if my models finish training, influence changes in connectivity in the brain.
All of the data was processed entirely through AWS batch, with about 1000 MRI volumes ultimately processed with the service. I implemented a docker image for the TReNDS center's Group ICA and dFNC analysis toolbox in order to compute ICA and functional connectivity states on all of the volumes.
Challenges I ran into
Memory and CPU restricts on AWS make processing a large amount of NeuroImaging data in a short time-span difficult. Also, Deep Neural Networks are notoriously difficult to train, especially under time and resource constraints. I started a bit too late on training, since I had to gather and preprocess all of the data on AWS.
The model needs a lot of tweaking, and I had grand plans for elaborate architectures, pretraining schemas, etc; however, these went out the window due to time constraints. The model works well enough with being as simple of an adaptation of Seq2Seq as it is, that I think extending it and improving it will make a good research paper.
I wish I had more time to work on visualization and presentation.
Accomplishments that I'm proud of
The sleekest part of my pipeline is the preprocessing being handled through AWS batch, ECR and ECS. Getting ICA and FNC measures for a lot of subjects with a day of coding and a night of running is awesome.
I'm glad I got a model actually training and getting decent loss.
What I learned
I basically learned AWS from the ground up, and am glad I got something that works so well in the time.
What's next for NeuroSeq
Extending the model, making it presentable, and improving results.