Deep Image clustering of Mars Orbital Survey (DeIMOS)

Project Members: George Hu (ghu5), Joe Han (qhan3), Lee Ding (lding9)

{Final Report Document Link: Google Docs}

Introduction:

As humans turn their sights towards exploring worlds beyond the realm of Earth, understanding the landscape of interstellar objects will play a crucial role in our understanding of our planetary neighbors and our prospects for future missions to them. NASA has provided 73,031 Martian landmark images from the High Resolution Imaging Experiment (HiRISE). While some of the images are labeled, many of the landmark images are unclassified but thought to be of importance. Given this data, we are interested in developing a deep learning-based clustering model for generating class labels for these unclassified images.

Data:

We will be using a NASA High Resolution Imaging Experiment (HiRISE) dataset that has been made accessible to the general public. The dataset contains around 73,031 images of various Martian geographic landmarks taken by the HiRISE camera aboard NASA’s Mars Reconnaissance. The images themselves should be comprehensive and representative of the Martian surface, as the Reconnaissance orbiter occupies a near-polar orbit, meaning it is able to see all areas of Mars at some point during its orbit. In the dataset, 10,433 of the landmark images were extracted from HiRISE browse images, while the remaining 62,598 landmark images were extracted from the 10,433 original landmarks. While some of the landmark images are already labeled with defined classes, the majority of the images in the dataset are classified as other, where they fit none of the defined classes of interest. Each landmark image was resized to be of dimension 227x227, and additional landmark images were produced using rotation, flipping, and random brightness adjustment. Thus, we do not need to do extra preprocessing to augment the images, but will still plan to take random samples from the images using a package like PIL or OpenCV.

Methodology:

We plan to divide our model into various parts in order to develop a sufficiently explanatory clustering model. The main architecture we will implement is Deep Adaptive Clustering, which essentially is a standard convolutional network to reduce the image into some vector, but the prediction is the cosine similarity between the features of the two images. It then uses that similarity to determine whether those two images should be in the same cluster. To generate the final clusters, this process is randomly done between pairs of images over all the images, and we will reassign images in clusters that are too small to clusters with the greatest similarity.

But to train this, we need initial ground-truth labels for whether images are in the same cluster. One approach to generate these ground-truth labels for training data is to have our model be initialized as some pre-trained, or even randomly generated convolutional network, and generate vector features from them periodically. Then, the vectors that are very similar or dissimilar are used as examples of images in the same cluster or different clusters. The generation of these training examples, and then doing the actual training on these examples, will be done in an alternating way. Another addition we are considering is using our labeled images; we will include the labeled landmark image pairs as an addition to our training data.

Most of the difficulties that we expect to encounter will be related to implementing this relatively complex architecture, along with the training pipeline that requires alternating between generating training pairs and then doing the actual training. There could be some issues related to our final goal of predicting interpretable clusters; our model is quite expensive and does not have guarantees in terms of interpretable clusters, but we think that even if the results are poor there are many insights to be gained.

Metrics:

For clustering, there are various metrics that are generally used to evaluate the “quality” of the clustering where we don’t have ground-truth labels. These include DB-index, Dunn Index, Silhouette Coefficient, and we plan to use all three of these indexes to make quantitative evaluations of our clusters. In terms of what the final experiments are, we will use either a large subset or all (depending on memory/efficiency constraints) of the unlabeled data we have and generate our predicted clusters. In the reference paper for Deep Adaptive Clustering, the authors used Adjusted Rand Index and Mutual Information measures, but these are dependent on ground truth labels which we do not have.

Our goals are actually more related to interpretability than thresholds for quantitative metrics, but a baseline we can use is applying K-Means clustering naively to the dataset. Our target goal would be a modest improvement from K-Means, and our stretch goal would be a significantly large improvement from K-Means. For interpretability, we are hoping that the images within clusters have notable geological similarities, and images in differing clusters geological differences.

Ethics:

Why is Deep Learning a good approach to this problem?

The NASA HiRise publicly available dataset alone is over 70,000 images. As future missions gather more data, the task of analyzing all collected data becomes more and more labor-intensive and tedious. Compounding the difficulty is that the Martian landscape is unfamiliar to humans and less intuitively differentiable. A Deep Learning approach will take advantage of existing data to efficiently classify new surface features as they are documented.

Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm?

Currently, Mars researchers will have primary interest in any solutions to this problem. Conceivably within the next few decades though, we could see interest from various groups both in the private sector and at the nation-state level that have goals such as resource extraction or colonization on Mars.

Division of Labor:

While we are still fleshing out the specific details of this project, we plan to divide up the development of our model such that George will be tasked with building and training the model and Joe and Lee will be in charge of building the data pipeline and preprocessing. These divisions are not completely set in stone, and we will adjust our roles accordingly as we continue to work on the project.

Built With

python
tensorflow

Updates

Joe Han posted an update — Nov 24, 2020 01:05 AM EST

Introduction:

Challenges:

Our k-means clustering algorithm which we are using as a baseline comparison point against our model is rather weak. Although it will certainly provide a better-than-random comparison point for our performance metrics, it might be hard to quantify any model improvements that are significantly better than k-means. In addition, the deep adaptive clustering model will have strongly varying degrees of performance depending on thresholds for clusters and other hyperparameters. Given the seeming arbitrary nature of the clusters initially and the large computational complexity, choosing these parameters will pose challenges.

Insights:

So far, we have extracted parts of our data set to gain a better understanding of how it is formatted and any modifications we might need to make for it to work with our model. Although the preprocessing stage is not done yet, we have laid most of the groundwork for completing that part.

We have also run a k-means clustering algorithm on part of our data set, so we have a better idea of what our minimum performance metric looks like and what number of clusters best fits the unlabeled data. In this, we have implemented a feature extractor using MobileNet and PCA, which reduces the dimensionality to 10 before running k-means. With a small data sample of ~800 unlabeled images, the silhouette score, which ranges from -1 to 1 (-1 means the worst possible cluster fit, 0 means similar to random noise, and 1 means optimal cluster fit), for most values of k range in the 0.2 to 0.3 range, indicating a difficult task ahead. The highest value is k=3 for 3 broadly defined clusters. This basic intuition will be quite helpful in guiding our full model implementation using deep adaptive clustering.

Plan:

As of now, we believe that we are on track with our project. We plan to stick to our existing project plan, where we compare our model against a baseline k-means clustering algorithm to determine a minimal performance threshold. Once we meet that threshold in various metrics such as the Silhouette score, we will proceed by qualitatively examining our clustering for effectiveness.

While we are currently working on our project on local machines, we are likely going to move our work to a cloud provider like Google Cloud Platform or AWS once we have some concrete results. Currently, we plan to devote more time to laying the groundwork for an easy transition to cloud computing once our model is ready.

Log in or sign up for Devpost to join the conversation.

Joe Han started this project — Nov 13, 2020 08:04 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.