We implemented a published paper about an image caption neural network model that builds on previous conventional transformer models. The paper argues that the decoder of the conventional model has little idea of whether or how well the attention result of the encoder is related to the query, and processes the input as is even when there’s no worthwhile information from the candidate vectors at all. The proposed model tries to solve the problem by an “Attention on Attention” model, which deploys extra attention related structure to the model. It is claimed that a result superior to that of the conventional baseline transformer model can be achieved by the proposed structure.

Please find links to the documentations and writeup for this project in the updates section below.

Built With

  • tensorflow
Share this project:

Updates

posted an update

We completed building/training our model and submitted our final poster, writeup, and presentation video as a .zip file to Devpost.

The google slide for our poster can be found here: https://docs.google.com/presentation/d/1ds9ogIzO1TE5ecZWnDjqTGU2qbTA2qJtOVjrak3dJaU/edit?usp=sharing

The writeup can be found here:
https://docs.google.com/document/d/1dUnxpMmp-c9EyahydT1V839UsS9wHCscxJJOEuAq6Ig/edit?usp=sharing

Log in or sign up for Devpost to join the conversation.