Introduction We can deduce important information about a galaxy by studying its morphology. In particular, its shape and features can tell us about its star-forming activity, its evolutionary history, and its interaction with its local environment. In the past, astronomers have manually examined images of galaxies taken by astronomical surveys. However, modern large-scale surveys capture so many images that it is no longer possible for professional astronomers to manually inspect them all. This has led to the “citizen- science” project Galaxy Zoo, which allows anyone to analyze random images of galaxies from a given data set and record information that they perceive about the galaxy’s visual features. This has in turn led to the establishment of large, labeled (with respect to morphology) sets of galaxy images. In this project we hope to design a deep neural network which is able to classify images in the Galaxy10: DECaLS data set to a high level of accuracy.
Challenge The biggest problem we encountered during this project is that CNN models tend to overfit which is determined by the nature of neutral networks. We observed that the training error in image classification is significantly lower than the error in the testing set, which serves as a strong indicator for overfitting. We conclude that the model is incapable of generalized prediction or classification beyond the scope of the training set. We applied the method of Data Augmentation by artificially expanding the size of the training dataset and also adjusting the CNN architecture, and eventually the validation error drops as a signal of regular fitting.
Insight There are various factors that could affect the efficiency, accuracy and performance of CNN models. By implementing the methods proposed, we gained practical experiences in terms of the following two aspects: Firstly, we gained insight on optimized selection. A common choice for image classification using CNN is to do optimization on gradient-based methods. The loss function was minimized and yielded the optimal solution parameter set. While the original version of gradient descent requires the evaluations of the gradient based on the whole data set and in our case the time complexity can increase exponentially. In our project, the data size is huge, and the loss function is of a more complicated analytical form, which makes the original version impractical. An alternative way is stochastic gradient descent, which only uses one instance to evaluate the gradient at a time while the convergence cannot be ensured since stochastic method only ensures the most steep direction on average and the optimal solution can easily get trapped by bouncing back and forth. Our insight here on optimizer selection is to use a more balanced optimization between fully stochastic and original methods. Possible choice here could be batch gradient descent or mini batch gradient descent. Secondly, we learned more on the selection of enhancement methods. For the purpose of generating an output of better quality, we found it is better to utilize methods like AdaDelta, Adagrad, and momentum. In practice we do realize the combination of AdaDelta along with batch gradient descent achieves the best performance.
Plan Up to this point, we'd consider our project somewhat on track. The data is processed, the proposed method is basically implemented, and various methods and architecture are being used and compared to get the best model performance. We still need more time to polish our project presentation, report and change some of our codes to be bug-free. Also, we think the data analysis coding can be optimized to achieve a better time complexity. The thing we want to change here is about data representation because it is closely related to data augmentation and even feature selection. We do believe that batch normalization could be helpful if we have more time.
Log in or sign up for Devpost to join the conversation.