Project Check in 3
Introduction: This can be copied from the proposal.
For our project, we are implementing a Cornell University paper that created a token-based image recognition model that performs better and faster than CNNs. (The paper can be found here). We thought that this would be an interesting model to implement because CNNs are already so widely used and considered to be a current industry standard. Being able to implement a model that would be faster and more effective was very exciting to all of us.
Challenges What has been the hardest part of the project you’ve encountered so far?
So far the hardest part of the project has been understanding some new concepts that we did not encounter in class like creating a distillation token from a pre-trained CNN to extract the most important patterns that the CNN has learned so that our VT is able to stay small in size while still having great accuracy. The coding aspect of our project does not look overly complicated, so most of our struggles have come from understanding the machinery underneath the model. Mainly, we're having difficulty converting the Pytorch implementation into a Tensorflow one.
Insights: Are there any concrete results you can show at this point? How is your model performing compared with expectations?
Plan: Are you on track with your project?
What do you need to dedicate more time to?
What are you thinking of changing, if anything?
So far we have created a CNN model with 70% accuracy that will be the benchmark that we will be trying to beat with the Visual Transformer. We also started to convert the existing pytorch implementation into Tensorflow and have translated most of the classes so far. Since we are not yet done though, we are unsure of its effectiveness. Right now we feel like we are on track to finish the project on time - we got a somewhat late start but we have made good progress everytime we meet to work on project. As long as we keep committing the amount of time that we are right now until the project’s completion, we should be fine. If we had to pick something specific to focus on it would finishing up the VT model so we can start debugging it in case we did not properly translate everything from pytorch to tensorflow. As of right now, we do not have anything that requires changing.
Log in or sign up for Devpost to join the conversation.