Characterizing Normal Pregnancy related to Utero Arsenic Exposure
The goal of this project is to characterize gene expression in the context of pregnancy and environmental exposures. I specifically chose to focus on arsenic and am able to correlate urine and drinking water arsenic levels with transcriptomics to identify genes that might be perturbed by exposure to arsenic.
Final cross-sectional and longitudinal data were leveraged from several studies from the NCBI GEO Database which related to the criteria specified above. The key words that were used to search for the data are preterm, pregnancy, prenatal, and postnatal, from which a total of 165 studies were found. From there, 16 studies were selected which had identifiable number of normal/term samples with a total sample count of 793. From these studies, seven were longitudinal with a sample count of 196 and nine were cross-sectional with asample count of 597.
From there, I analyzed one experimental study to identify the genes that are associated with utero arsenic exposure. The Affy data was then normalized from .CEL files. I extracted the GSE data using GEOmetadb, a GEO Microarray Search Tool. The correlation was found between the gene expression and arsenic values for all 38 patients. Highest correlation coefficient would signify most highly expressed gene. Using the R programming language, I looped through all rows of genes in the gene annotation file and found the correlation with arsenic values of all samples. This was stored in an array and found the highest correlation coefficient, leading me to the gene with highest correlation.
The coefficient is 0.6419006, the index of gene is 21662, and the name of most highly expressed gene has been found to be FAXC, a protein coding gene. Therefore, I was able to find the most correlated gene with utero arsenic exposure.
In the future, I would like to rank the genes by correlation value and perform a meta-analysis using the remaining indentified studies. This would result in a study of the various correlation of genes with different conditions such as pre-eclampsia and choline intake with samples taken from a range of body tissues. This will give a more complete understanding of genes’ relation to term pregnancy.
I worked on this project for a duration of 12 weeks, from early June to the end of August. This opportunity came about when I emailed Professor Marina Sirota at the University of California, San Francisco (UCSF). First, I needed to pass a test: two weeks to learn the R coding language, proving it by analyzing data statistically. I worked under both Professor Sirota and her post-doc student Hongtai Huang. Under their guidance, I collected the datasets and performed the statistical analysis of the correlations of the gene expressions. I then presented my findings to the UCSF Sirota Lab team who suggested me to continue my research for a successful publication of a paper.