We implemented a published paper about an image caption neural network model that builds on previous conventional transformer models. The paper argues that the decoder of the conventional model has little idea of whether or how well the attention result of the encoder is related to the query, and processes the input as is even when there’s no worthwhile information from the candidate vectors at all. The proposed model tries to solve the problem by an “Attention on Attention” model, which deploys extra attention related structure to the model. It is claimed that a result superior to that of the conventional baseline transformer model can be achieved by the proposed structure.
Please find links to the documentations and writeup for this project in the updates section below.
Built With
- tensorflow


Log in or sign up for Devpost to join the conversation.