The architecture of our application


Short videos nowadays are a popular way for entertaining and showing one's charisma on social medias such as Instagram. For example, many apps for special face effects emerge and draw great public attention. In contrast, few apps are designed to make effects on the whole body. This kind of apps should bring out more social interaction like what TikTok achieves and make much more fun between friends. However, people could spend lots of time on editing their videos before releasing to social medias. This mainly reduces users' willingness to shot a video. As a result, it is vital to speed up the process and even automatically replace some funny clips.

Source from

What it does

Based on the above reason, we make an effort on developing a system for calculating motion similarity. The motion presented as a series of keypoints will firstly be collected through PosNet built on Pytorch. By the calculation, the most similar video clip will be used to replace the original clip, and the edited video will have an amazing transition effect.

How we built it

The system can be simply divided into frontend and backend. We develope an iOS app as the user interface, and the interface will let users shot their own video. After that, the video will be sent to backend for further processing, and the result will eventually be visualized in the frontend (which is the iOS app). The backend is responsible for two important items which are pose prediction and automatic video editing. The weight of PosNet originates from this GitHub repo, which is built on Pytorch. All functions related to image processing are implemented by Python including basic I/O. Our system is deployed through ngrok and can be accessed through the iOS app.

Challenges we ran into

First of all, a proper length of video clips as the basic unit for both data transmission and image processing should be tested and determined. In addition, the preparation of the video database is time-consuming and only videos with merely few people and a clean background are filtered. Finally, the biggest challenge occurs in the calculation of similarity because of the difference in scale and angle of targets in different videos. Additional transform matrix should be applied to the alignment of their keypoints.

By the way, since we decided to complete our work through a "hackathon" way, the work just began two days before the deadline. Although we come up with the idea few weaks ago, we leave a extremely tight schedule for programming😂🖖

Accomplishments that we're proud of

We believe that the application can strengthen the connection of people by a brand new way! The idea can be extended to the exising social media apps or video-sharing service🇹🇼

What we learned

  • More familiar with Pytorch🔥
  • New Knowledge
    When surveying the way for deploying the Pytorch model, we had found torchserve, which was released just 4 months ago. Although we did not adopt this way in the end, this tool has a great potential for practical use. Besides, through the implementation of PosNet, we had learned more about how keypoints of human body are predicted.
  • Great TeamWork Leads To Success
    Although the schedule is quite tight, it is helpful for us to appropriately split jobs in advance. This makes us successfully achieve the application on time!

What's next

To better achieve our goal, the application should be compatiable with popular social medias to share the edited video with friends. Afterwards, we should expand our database for similarity calculation by collecting more videos. Furthermore, it might fascinating to allow users to define and customize their own video database. Since the size of current database is quite small, a more efficient way for calculating similarilty is required to handle a huge volume of data in the future. Segmentation on person is also a interesting development direction because the result can help apply the facial effects to the human body.

Share this project: