๐Ÿฆ‹ Hermes Video Editor

Video editing is complicated - but it doesn't have to be. Video editors are notoriously complex - hundreds of tiny buttons with tiny symbols on them representing every editing function possible. But it doesn't have to be that way - Hermes video editor is a web based, cloud powered, voice controlled video editor.

Hermes makes editing easy.

โœจ Mission + Inspiration

Make video editing easy.

I love to record videos about programming, but I hate to edit them - it takes forever and editing never feels natural. Often, I was re-watching a large, raw video 5+ times - editing became an all day thing. I wanted to build an app that editing feels natural, where commands get inserted naturally and I need not struggle with a complex UI.

๐Ÿ“ˆ Features

VOICE COMMANDS

entirely controlled by voice commands, Hermes can add cuts, mutes, fast forwards and much more to segments your video.

TWO WAYS TO INSERT COMMANDS

by clicking and dragging on the track, you can create a tethered voice command that's tied to the segment that you selected. You can also click on the track, hold "s" on your keyboard, and just say exactly what you want done!

ROBUST LANGUAGE PROCESSING

"I don't want to hear the next 10 seconds" Is a valid command, and will mute the next 10 seconds of your audio. Whatever you say, Hermes will do. "Remove the next 10 seconds", "Delete the next minute", "Get rid of the last 30 seconds" are all valid and will do exactly what me and you think they will do.

CLOUD PROCESSING

"Do you hear that? It's your computer's fans thanking you." One of my personal problems with video editors is that they render my computer unusable for however long it takes to render the video. The computer slows to a crawl and the cooling fan makes it hard to think: That's why Hermes does the editing in the cloud. Just input an email, and Hermes will put you in the queue. Then, when your video is ready, you'll get an email with a download link to your video.

COMMANDS

There are a bunch of commands loaded in.

  • cut = removes a part of the video.
  • fast forward = fast forwards a part of the video
  • mute = mutes a part of the video
  • [type] music = adds [type] background music to a segment of the video. [Epic, Sad, Happy, Background, Calm]
  • add a [color] caption that says [caption text] = adds a caption to that part of the video. Caption has color text.

There are two ways to insert the commands:

  • click on the track where you want to insert the command, and say something similar to: "add a [command] for [duration] [after/before]"
  • click and drag your mouse over the track. while dragging, say your command: "[command]"

Other good to know things while operating:

  • Dragging on the LEFT side of a command card moves it.
  • Dragging on the RIGHT side resizes it.
  • Holding down the command card and clicking BACKSPACE on your keyboard deletes it.

๐Ÿงฑ Architecture

For schema, refer to schema image.

HOW WIT.AI IS USED

wit.ai is the core of the app. each command is processed and analyzed into a json that javascript then extracts info from to create a command. The flexibility of wit.ai is used to a large degree, extracting a lot of info from natural speech to render commands on the screen. By training the model on hundreds of inputs, wit became accurate at dissecting intents and features of speech, allowing the app to truly be a natural language experience.

DESIGN CHOICES

Frontend: React. React does a good job with web apps, and it truly feels like a native app experience rather than a webpage. This was my first time using React's context API, (My internship used redux) and I liked it a lot - it made large-scope state easy to manage.

Backend: fastAPI's python framework is quickly becoming my favorite framework to build API's with. It supports a whole lot out of the box and is easily extensible, creating a truly agile development experience. As a one - man team, it allows me to quickly iterate and create without worrying about the weeds.

FFMPEG: FFMPEG is a command line video editor. commands are processed into an FFMPEG readable format and executed by the server.

SECURITY

Hermes is light on security because there's not much to secure!

  • Emails are stored in a passworded Redis queue, and after the email is sent, it's discarded.
  • Downloading the files requires a specific link, and once that link is clicked the video is discarded.
  • No passwords are sent, and bruteforcing for video files is not viable (due to bruteforce slowdown).
  • Logs are cronjobbed to be wiped after 5 days.
  • SSL is used for all communication, so no MITM either.
  • SSH is locked down to only use SSH keys.
  • No need for JWT's or any auth at all - Hermes is free and open to use.

Hermes is a tight ship!

OTHER INFO

  • Hermes is hosted on a linux server (supported by linode.com).
  • Nginx is used as a reverse proxy into the local app.
  • Uncomplicated Fire Wall (UFW) is used as a firewall.
  • Hermes uses a Redis Queue to queue up ffmpeg executions. This is because, due to resource splitting, having concurrent ffmpeg commands takes an exorbitant amount of time - by having them run consecutively, I can make sure that files get processed in reasonable amount of time without taking up too much resources.
  • Hermes uses a cloud storage bucket because video files are generally large, and storing them on a small machine is not viable.

โœ”๏ธ To Do

  • Add more filters! Ideas include transitions, color-filters, and zooming. (perhaps "zoom in on my face" should be a viable command?)
  • allow user to input many files and stitch them together
  • standardize ffmpeg process
  • polish up emails
  • create CI/CD pipeline
  • write unit tests

๐Ÿ“šLearning

I learned that voice control feels great - being able to say what I want to do when I edit and have it be processed by the computer for me makes editing a much more intuitive experience. Wit.ai itself is incredibly easy to operate. The UI is slick enough to feel like I'm not working with a state of the art machine learning algorithm, and adding intents, entities and traits is a quick and easy process.

This was my first time working with ffmpeg, and it was a great experience - although it was slightly confusing at times, I quickly picked up the gist of it, and when I got stuck there was more than enough documentation to look through and figure out what filter, command, or flag I needed.

React Context API is probably how I'll build React apps from now on - it's much easier than Redux, and nothing happens "behind the scenes" which was one of my biggest issues with Redux.

๐Ÿ‘บ Extra

  • "Hermes" is the Greek god of speech and communication, and there is a lot of that going on in this project.
  • This was a fun one to build! I had a good time working with the API's.
  • Building this solo was difficult, and I had to scrap a lot of feature ideas I had to meet the deadline - but I'm excited to continue work on Hermes.
Share this project:

Updates