Talking in Science with GeminiAI-based SCABO robot

GIF
Talking in science with the cardboard robot - SCABO
3D car mechanics with different colors for different parts for Gemini AI
3D Simulation of human hearts. Keep the human heart as Physically based rendering texture.
3D simulation of differential gear in cars in AR. Each part of the 3D is textured by its own color for AI to distinguish
3D simulation of projectors. The light bulb is the yellow sphere. The color wheel is RBG. The grey plane is the mirror and the lens is grey
Students assembling SCABO cardboard robot at a science camp
Assembling time for students to make their own cardboard toys from recycled cardboard
SCABO robot with students in a science camp

Inspiration

Firstly, the main goal of the project is to integrate Gemini AI and Vertex AI solutions as a AI scientific assistant to SCABO cardboard toys allowing children or students to communicate with the toys in AR scenarios to learn 3D scientific simulations in a variety of subjects such as mechanics, biology, math, chemistry and etc. Students or children can use their mobile devices with ARCore/ARKit to talk to their cardboard toys to gain knowledge in open ended scopes. The SCABO cardboard robots or toys play the role of AI scientific assistant answering the questions from users by displaying the answer in AR. In addition, the toys can act as AI scientific buddy to raise multiple choice questions to challenge the students or children given the 3D simulations in AR. Furthermore, the Gemini AI is also integrated into the authoring tool allowing teachers to generate AR edutainment contents faster and more entertaining for their children to play with cardboard toys. Last but not least, we strongly believe that our cardboard toys with AI assistant functions can demonstrate an eco-friendly solution for the toy industry where students and teachers can enjoy their learning and teaching activities.

What it does

Our cardboard toys consists of recycled cardboard paper sheets and electronics boxes. Children and students can assemble their own cardboard toys either at homes with their families or at schools with their friends and teachers with step-by-step instructions in 3D. Then, they can use their Android/iOS mobile devices to scan the cardboard toys in physical playground to display AR contents such as Apollo 11 or human heart or jet engine besides the cardboard toys. Children or students can ask questions related to the 3D polygonal meshes directly to the cardboard toys. Then, the cardboard toy can answer the questions in AR to the users. The cardboard toys with AI assistant can even point out several parts of the 3D simulations related to the answer to explain in details for children or students. In terms of AI assistant supporting the teachers in generating AR edutainment contents, our AR platform allows teachers to generate quizz-based games in 3D mode with their text prompts. Similarly to gen AI in editing images, the authoring tool supports teachers to edit their contents either manually or with the support from Gemini AI. Teachers can manually edit their AR scenarios in large variety of topics such as history, mechanics, biology or chemistry, etc. The AI assistant in the authoring tool can support teachers to generate contents based on their prompts to pick and select 3D objects or to generate multiple choice questions to challenge their students. Then, students can enjoy the story-based quiz games with their cardboard toys.

How we built it

We have been developing hardware parts for SCABO cardboard toys in the last two years with IP registrations. In order to track the position and orientation of the cardboard robots and toys in real-time to integrate into virtual scenarios, Android/iOS apps has been implemented in native code in Java and Swift with ARCore/ARKit SDK. Our 3D scientific inventory system includes thousands of 3D models and simulations is implemented to allow children or students to play with 3D models based on their preferences. Gemini AI 1.5 Pro is utilized to implement the AI scientific assistant for the cardboard robot to answer questions from children in learning 3D science models. In addition, Gemini AI 1.5 Pro provides a large variety of options for teachers to generate their edutainment contents in the authoring tool - the same AR app with students. In order to allow AI assistant understand the 3D polygonal meshes, Vertex AI is utilized to deploy OWL ViT v2 models in the Google Cloud Run. Then the bounding box of object detection module is processed in the Android/iOS devices to interact with 3D models to display the illustrative content.

Challenges we ran into

Currently, the Gemini AI 1.5 Pro does support multimodal AI data including images, videos and text. However, there are NO support for 3D polygon meshes as input since our AR simulation platform includes thousands of 3D models and simulations in human anatomy, mechanics (cars, transmission gear, etc) and biology. Our goal is to utilize the Gemini AI multimodal capabilities with our new 3D data format.

Hallucination in AI. For example, in the AR scenarios with human heart with 3D simulation although only yellow arrows demonstrates the flow of blood in the human heart, Gemini AI made up the yellow and blue arrows in the video clips. To overcome the weakness in understanding 3D simulation, our team has designed and modeled our 3D models and simulations with different colors for different modules to let Gemini AI distinguish the different parts of simulations via their colors. Surprisingly, Gemini AI works way better in explaining 3D simulations for teenagers with this type of data types. For example, the differential gear in cars explained very well by Gemini AI given the input image taken from AR scenarios - Gold: Axle Shafts: these shafts connect the differential gear to the whees, transmitting power to them. Grey: pinion gears and bevel gears, Blacks: wheels and gear ring. Another example is a 3D simulation of the light in the projector. We simulated the projector working mechanism to explain the way that light travels inside the projector. Gemini AI can understand very well and explain in details different parts of the projectors as follows: Light Source (yellow) - a bright light bulb that generates white light. Color Wheel (Red, Green and BLue) - a spinning wheel is the heart of the projector. Mirrors (grey) - helps direct the light from the lamp through the color wheel and then towards the lens. Lens (White cylinder): the lens is like a magnifying glass. We do NOT explicitly label the parts in 3D polygon mesh, but our method has support AI to understand and explain in details for children especially teenagers.

In real use cases, children or learners will ask questions about the parts of the scientific subjects. For example: where is the human heart atrium? Please show me the location of the differential gear! Would you please show me the jet engine fan. Therefore, localizing the parts in the 3D polygon meshes are NOT trivial problems since Gemini AI can NOT understand a variety range of 3D model data types from polygon mesh edited by CAD tool, signed distance function - implicit representation, polygonal mesh from 3D reconstruction algorithm. Our team figured out the best way to answer that kind of problem is to combine object detections in 2D input image with ray casting to the 3D polygon mesh. Given an question and 2D input image with the foreground object - a scientific simulation, OWT ViT2 model is utilized to detect the unknown items in the image. Then we cast a ray from the camera position to 2D detection box in the image plane to collide with the 3D polygon mesh. The collision position asymptotically gives the answer in the 3D simulation for the learners. The result are shown in the pictures.

Accomplishments that we're proud of

We have chances to demonstrate our cardboard toys with Gemini AI assistant to many students in Vietnam and Thailand. We have seen the children creativity and curiosity in science and in interacting with AI. In addition, we got a lot of feedbacks from from teachers and parents to improve the solution. We are looking for opportunities to collaborate more with schools in other countries. If you have any schools in your mind in rural areas, please let us know. Our team is happy to send our FREE cardboard toys to students and teachers.

What we learned

We have learned the strength and the weakness of Gemini AI Pro Vision or Gemini AI Pro versions by testing its capabilities in understanding 3D scientific contents. We were impressed with Gemini AI 1.5 Pro in understanding images or videos containing foreground scientific objects (projectors, differential gear in cars, suspension in cars or human heart). However, understanding 3D polygonal model seems to be infeasible task for most of AI engines currently. It requires polygonal meshes segmentation in the new input 3D data types (implicit representation like NERF or signed distance function OR mesh data types OR explicit representation) . Therefore, our team managed a 2D to 3D mapping solution to partially solved the problem for better user experiences of interacting with 3D models or simulations. We strongly believe that the next step of AI will be multimodal given any arbitrary types of input data.

What's next for Talking in Science with GeminiAI-based SCABO robot

Firstly, our goal is to increase the object detection module implemented in OWL ViT2 with our own dataset. Currently, the result is NOT stable in detecting specific domain knowledge terms such as automatic transmission, manual transmission, etc. Secondly, visually illustrative demonstration is an intuitive way to learn scientific subjects like Physics, Biology, Chemistry, or Math. However, the ultimate goal of the AI assistant is to support learners or practitioners in quantitative learning as well. The next scope for the project is to upgrade the AI learning assistant in physics simulation - force analysis (collision between 3D objects, then force decomposition for force analysis in Newton law motions - drawing 3D force vectors by AI), human anatomy analysis with visual inspection in 3D models, chemistry simulation with quantitative methods to simulate the reaction or visual inspection to draw force field in mechanics. In addition, our goal is to improve the prompt in the more narrative way to tell science stories to promote learning science in children. The best example of narrative story telling in science are Cartoon Guide series by Larry Gonick or How machines work by David Macaulay. In terms of the authoring tool, currently the authoring tool only supports basic mode to generate terrain with via procedural generation or generating quiz for teachers. Our goal is to integrate the state-of-the art text-to-3D AI model to encourage users to be the king their own virtual worlds.

READ ME - before testing

To use our SCABO toy, you need to activate the Android/iOS application and a physical toy. If you are from Devpost team or Google AI team, please contact us via email to get support how to use the app and we can send you a cardboard toy kit. Then, we can send you the user manual how to use AI features in the application. In addition, the Vertex AI is costly. The forecasted cost for our deployment is $10K USD by the end of this month. Therefore, please let me know when you test our SCABO cardboard, we will enable our Vertex AI API for testing purposes. Since the product is not launched yet, please feel free to give us your feedback and forgive us for the inconvenience in UI/UX and playing with the cardboard robot.

Built With

arcore
arkit
filament
geminiai
google-cloud-run
tensorflowlite
vertexai

Updates

Hoang Phuong started this project — May 03, 2024 02:52 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.