Geodesic Flow: An Incremental Approach to Dense Semantic Correspondence
Cole Foster: cfoste18
Links:
- Github Repo: https://github.com/cole-foster/geodesic-flow
- Initial Project Proposal (2021-11-12): Check-In #1
- Project Update (2021-11-30): Check-In #2
- Final Project Reflection (2021-12-10): Final Check-In
Introduction
Geodesic Flow is a continuation of work done by Berk Sevilmis within the Laboratory of Engineering Man and Machine Systems (LEMS) in the Department of Electrical and Computer Engineering at Brown University.
The dense correspondence task involves two images, a source and a target, and producing a flow field such that each pixel in the source points to its matching pixel in the target. Semantic correspondences involve two images of the same class, where the goal is to align the two semantic objects.
Deep-Learning Based Approaches are currently dominating the field as they are able to learn semantic features. This model aims to improve any existing Dense Correspondence method by providing an series of intermediate images that can be used to incrementally warp the source image to the target image. For this, we construct a Latent Image Manifold by organizing a dataset of images, and we return the geodesic path as the shortest path between two images along the manifold.
In order to construct our manifold, we need a way to measure the similarity of two images. For this, we train an Image Embedding Network. We leverage the features extracted by SFNet to provide semantic awareness, and we add additional convolution and fully-connected layers to reduce the image to a 1024D embedding. The Euclidean distance between these embeddings can then be used as a metric of similarity between the images they were extracted from.
With the Image Embeddings, we construct the Latent Image manifold over the SPair-71k Train/Validation dataset. We extract the embeddings from each image, and create the manifold using the Relative Neighborhood Graph. Finally, given a source and target image, we can return the geodesic path by finding their nearest neighbors in the manifold and computing the shortest path along the manifold. This path can then be returned, and the images can incrementally warped along the path to compute the Geodesic Flow.
Related Work
The Dense Correspondence problem is a popular field, with some state of the art methods including Transformers (CATs, 2021) and Graph Matching (Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers, 2020).
SFNet is a slightly older CNN-based approach to the dense semantic correspondence problem. It uses ResNet-101 feature extraction and trains adaption layers by utilizing binary foreground masks. These masks help enforce loss only on the semantic objects, allowing SFNet to learn semantic features. SFNet further creates a 20x20x20x20 correlation map that is matched to produce a 20x20 flow field. This flow field is bilinearly interpolated to return the final flow field. Github: https://github.com/cvlab-yonsei/SFNet
This project uses SFNet for a variety of purposes:
- Feature extraction for the Embedding Network
- Its loss term is used to train the EmbeddingNetwork
- Pretrained SFNet is used for the incremental warps
Data
- VOC2012. SFNet was trained on the segmentation set from VOC2012. It provides 2,791 images (training and validation) with segmentations. These segmentations are used to produce the binary foreground masks. This data is also used to train the Embedding Network.
- SPair-71k. This is a popular dataset used for evaluating dense semantic correspondences. It contains 1,800 images, and the 1304 training/validation images are used to construct the Latent Image Manifold.
- PF-Pascal. This paper introduces the PF-PASCAL dataset for dense semantic correspondence evaulation. This dataset is used to evaluate our methods accuracy by PCK (percentage of correct keypoints).
Methodology
Image Embedding Network
To provide a measure of image similarity, we decided to produce an Image Embedding Network (creatively named EmbeddingNet). The similarity (or dissimilarity) of two images can be measured by the Euclidean distance between the embeddings. When approaching the architecture of this EmbeddingNet, we wanted the desired characteristics:
- A 1024D embedding vector is produced.
- This embedding is dependent on semantic features, not background noise.
EmbeddingNet was trained in a potentially novel way. Typically, embedding networks are trained by Triplet Loss.T his requires specifying three images, two of the same class (anchor and positive) and one of a different class (negative). The loss function enforces that the distance between the anchor and positive embedding is less than the distance between the anchor and negative embedding.
However, our semantic flow problem inherently deals with images of the same class. Thus, training the embedding network is not a straightforward approach. The following solution was proposed:
- Take three images of the same class. Choose one image as the anchor image.
- The other two images are the positive and negative images, but the assignment is unknown
- Use an existing image similarity metric to decide which of the two images is most similar to the anchor image. This image becomes the positive image. The later becomes the negative -Use triplet loss on these three images.
In this project, pretrained SFNet was used to provide a metric of image similarity. In SFNet training, binary foreground masks are used to loss from a correspondence. Here, the intuition is that the lower the loss, the better the warp, and thus the more similar the images are. This was extended to measure the similarity of anchor to positive and anchor to negative.
Latent Image Manifold
Once the image embeddings are extracted, they can be compared by Euclidean distance, or the 2-norm. The nearest neighbor of a query image can be returned as the image whose embedding is closest to the query embedding.
The latent image manifold was constructed over 1304 images from the SPair-71k dataset. Embeddings were extracted from each image, and each embedding acts as a node in the 1024D latent space. The Relative Neighborhood Graph was used to construct the manifold.
Geodesic Flow
Since the Relative Neighborhood Graph is a connected graph, we can always define shortest path between two embeddings on the manifold. Given a source and target image, we find their nearest neighbors on the manifold and return the shortest path between them as the geodesic path.
We directly warp consecutive images by SFNet (or any method) on the path to produce a series of incremental warps, which are combined (in order) to produce the final flow field. This flow is called the Geodesic Flow.
Metrics
The metric used to measure success in dense semantic correspondence is Percentage of Correct Keypoints (PCK). In the test set, hand annotations are used to identify semantic keypoints in the images. Then, the keypoints of the warped source image is compared to the keypoints of the target image. PCK measures the correctness of keypoints up to a certain accuracy.
Ideally, success of this project would show higher PCK for the datasets of PF-PASCAL, PF-WILLOW, and SPAIR-71k when using geodesic flow compared to not using it. We directly compare SFNet with and without geodesic flow.
Ethics
Deep Learning is a good solution to the problem of producing an image embedding because models can use learned semantic features by going through lots of semantic images. Image embeddings are a low risk area, as they are mostly used for retrieval. However, our work in word embeddings has shown that bias can exist in embeddings. This bias could show in the geodesic paths between images.
We measure the success of this algorithm by PCK. Success in this project requires performing better than regular SFNet within the metric of PCK since we are using SFNet as a base. Luckily, this topic of semantic correspondence is low risk, so the implications of error or success do not extend far.
Division of Labor
Cole
- Creating EmbeddingNet Architecture.
- Training EmbeddingNet based on Triplet Loss Function.
Cole Foster
- Manifold Creation using EmbeddingNet on SPair-71k
- Full Pipeline from dataset to embeddings to manifold, and Saving/Loading it.
- Nearest Neighbor Search and Geodesic Path Along Images
Cole Riley Foster
- Incremental Warping via Geodesic Path with normal SFNet
- Parent Geodesic Flow Class to handle NNS, Geodesic Path, and Incremental Warp
- Evaluation on PF-PASCAL
Log in or sign up for Devpost to join the conversation.