Try it out here:

1 Intro Demo (2 min):

  1. Complete Demo:
  2. Download pipeline here:
  3. Documentation to use this pipeline:
  4. Complete source code (1.44 GB):
  5. website:


Video lectures are present in abundance but the mocap data of those video lectures is 10 times ahead in the form of precise data. High quality and a large amount of data are one of the requirements of best argmax predicting ML models, so we have used here the mocap data. Despite the availability of such promising data, the problem of generating bone transforms from audio is extremely difficult, due in part to the technical challenge of mapping from a 1D signal to a 3D transform (translation, rotation, scale) float values, but also since humans are extremely attuned to subtle details in expressing emotions; many previous attempts at simulating talking character have produced results that look uncanny( two company- neon, soul-machine). In addition to generating realistic results, this paper represents the first attempt to solve the audio speech to character bone transform prediction problem by analyzing a large corpus of mocap data of a single person. As such, it opens to the door to modeling other public figures, or any 3D character (through analyzing mocap data). Text to audio to bone transform, aside from being interesting purely from a scientific standpoint, has a range of important practical applications. The ability to generate high-quality textured 3D animated character from audio could significantly reduce the amount of bandwidth needed in video coding/transmission (which makes up a large percentage of current internet bandwidth). For hearing impaired people, animation synthesis from bone transform could enable lip-reading from over-the-phone audio. And digital humans are central to entertainment applications like movies special effects and games.

What it does

Some of the cutting edge technologies like ML and DL have solved many problems of our society with far better accuracy than an ideal human can ever do. We are using this tech to enhance our learning procedure in the education system.

The problem with every university student is, they have to pay a big amount of money for continuing to study at any college, they have to interact with the lecturers and professors to keep getting better and better. We are solving the problem of money. Our solution to this problem is, we have created here an e-text data to human AR character sparse point mapping machine learning model to replace the professors and use our ai bots to teach the same thing in a far more intractable and intuitive way that can be ever dome with the professors. The students can learn even by themselves AR characters too.

How we built it

This project explores the opportunities of AI, deep learning for character animation, and control. Over the last 2 years, this project has become a modular and stable framework for data-driven character animation, including data processing, network training, and runtime control, developed in Unity3D / Unreal Engine-4/ Tensorflow / Pytorch. This project enables using neural networks for animating character locomotion, face sparse point movements, and character-scene interactions with objects and the environment. Further advances on this project will continue to be added to this pipeline.

Challenges we ran into

For Building, first of all, a studio kind of environment, we have to collect a bunch of equipment, software, and their requisites. Some of them have been listed following.

  1. Mocap suite- SmartSuite Pro from - single: $2,495 + Extra Textile- $395
  2. GPU + CPU - $5,000
  3. Office premise – $ 2,000
  4. Data preprocessing
  5. Prerequisite software licenses- Unity3D, Unreal Engine-4.24, Maya, Motionbuilder
  6. Model Building
  7. AWS Sagemaker and AWS Lambda inferencing
  8. Database Management System

Further, we started building.

Accomplishments that we're proud of

The thinking of joining a virtual class, hosting a class, having a realtime interaction with your colleagues, talking with him, asking questions, visualizing an augmented view of any equipment, and creating a solution is in itself is an accomplishment.

  1. Asking questions with your avatar professors,
  2. discussing with your colleagues,
  3. Learning at your own time with these avatars professors and many more. some of the detailed descriptions have been given in the submitted files.

What we learned

This section can be entirely technical. All of the C++ and Blueprint part of a Multiplayer Game Development. We have started developing some of the designs in MotionBuilder, previously we have been all using the Maya and Blender.

What's next for castme

  1. We are looking for a tie-up with many colleges and universities. Some of the examples are Galgotiah University, Abdul Kalam Technical University (AKTU), IIT Roorkee, IIT Delhi.
    1. Recording an abundance amount of the lecture motion capture data, for better training our (question-answering-motion capture data) machine learning model.

Built With

Share this project: