Who:
Bitao Jin (bjin8) James Ro (jro3) Christopher Tripp (crtripp)
Introduction:
We can deduce important information about a galaxy by studying its morphology. In particular, its shape and features can tell us about its star-forming activity, its evolutionary history, and its interaction with its local environment. In the past, astronomers have manually examined images of galaxies taken by astronomical surveys. However, modern large-scale surveys capture so many images that it is no longer possible for professional astronomers to manually inspect them all. This has led to the “citizen- science” project Galaxy Zoo, which allows anyone to analyze random images of galaxies from a given data set and record information that they perceive about the galaxy’s visual features. This has in turn led to the establishment of large, labeled (with respect to morphology) sets of galaxy images. In this project we hope to design a deep neural network which is able to classify images in the Galaxy10: DECaLS data set to a high level of accuracy.
Related Work:
Most previous studies of galaxy image classification have used fewer than six classes, but in a 2022 study (https://arxiv.org/abs/2211.00397), Gharat and Dandawate noted that new images are of sufficiently high quality that using a greater number of classes is warranted, since that could take advantage of finer-grain distinctions among morphologies. In their study, they designed a CNN which achieved test accuracy of 84.73% on 21785 images from the Sloan Digital Sky Survey (SDSS) using Galaxy Zoo labels.
In another 2022 study (https://arxiv.org/abs/2211.00677), Ciprijanovic et al. applied off-the-shelf ResNet50 on a subset of the Galaxy10: DECaLS data and achieved 43% classification accuracy.
One of our goals is to build upon these recent results and design a deep neural network which, when applied to the Galaxy10: DECaLS data, achieves a high classification accuracy.
Data:
The Galaxy Zoo Data Release 2 (GZ DR2) dataset consisted of ~270,000 images taken from the Sloan Digital Sky Survey (SDSS), alongside corresponding labels (with respect to 10 broad classes of galaxy morphologies) established through the Galaxy Zoo project. Later, the Galaxy Zoo also established labels for a large number of images taken from the Dark Energy Camera Legacy Survey (DECaLS), which are much higher resolution and image quality compared to the images from SDSS.
We will be using the Galaxy10 DECaLS data set, which contains 17,736 galaxy images (256x256 pixels, colored) from DECaLS, labeled (through Galaxy Zoo) according to 10 classes: Class 0 (1081 images): Disturbed Galaxies Class 1 (1853 images): Merging Galaxies Class 2 (2645 images): Round Smooth Galaxies Class 3 (2027 images): In-between Round Smooth Galaxies Class 4 ( 334 images): Cigar Shaped Smooth Galaxies Class 5 (2043 images): Barred Spiral Galaxies Class 6 (1829 images): Unbarred Tight Spiral Galaxies Class 7 (2628 images): Unbarred Loose Spiral Galaxies Class 8 (1423 images): Edge-on Galaxies without Bulge Class 9 (1873 images): Edge-on Galaxies with Bulge
This data should require relatively little pre-processing, and has been provided at https://github.com/henrysky/Galaxy10
Methodology: We anticipate using a convolutional neural network, given its appropriateness for image classification tasks. We have not yet established the specifics of the architecture.
We will be dividing the 17,736 images from the data set into training, validation, and test sets.
Metrics:
Our primary metric will be classification accuracy on the test set, which we will calculate using argmax and categorical cross-entropy. One goal is to surpass the 43% accuracy achieved by Ciprijanovic et al. using ResNet50 on a modified version of the Galaxy 10 DECaLS data. Another goal is to surpass the 84.73% accuracy achieved by Gharat and Dandawate using a CNN; although Gharat and Dandawate used a data set from SDSS, their data set is of comparable size to ours, and used some of the same Galaxy Zoo labels (ours just uses higher quality images from DECaLS in such cases), so it seems like a somewhat informative comparison.
Ethics:
The results of our project should have little, if any, obvious societal impact, given the object of study.
One issue worth considering is that our labels were provided by crowd-sourcing through a citizen science project. While this shouldn’t introduce any bias issues, it is worth considering the ethics of profiting (in the intellectual sense, not the financial sense) from the labor of hundreds if not thousands of volunteers.
Division of labor:
We anticipate that labor will be divided equally and organically among the three group members as the project proceeds.
Second Check-in | Reflection
The reflection was posted as an update. But another link: https://docs.google.com/document/d/1SooQjTE2KhFK_M0pQXFnBMc5w1mIR7DI0lEbQYEgDKg/edit?usp=sharing
Final Reflection | Write-Up
https://docs.google.com/document/d/1lzMJgbuC2rYswJVTbebGH2BO0BVxImgPpkoxlVQUPLI/edit
Our Video Recording
https://drive.google.com/file/d/1qp8j2NSI3WMJsBWva-VLUZj7xK5G9fjR/view?ts=63964e57
Our Poster
https://docs.google.com/presentation/d/1RHx1ts7_qiU5ITfbLAocp5uhFEa9LutLCd1K_QUXffE/edit#slide=id.p
Built With
- tensorflow
Log in or sign up for Devpost to join the conversation.