Goobvision : SteroVision Camera

blueprint of thee project

Inspiration

Robots and AR devices still struggle to “see” the world in 3D without bulky, expensive LiDARs, especially in tight indoor spaces and on small rovers. We wanted to see how far we could push plain webcams plus geometry, inspired by pseudo‑LiDAR research that turns stereo depth into LiDAR‑like point clouds. Goobvision was born from the question: can two cheap cameras give us dense, reliable 3D vision that feels like a Kinect strapped onto any robot?

What it does

Goobvision is a stereo‑vision camera that converts a pair of webcams into a real‑time pseudo‑LiDAR, streaming dense RGB 3D point clouds of the scene. It computes per‑pixel disparity, converts it to depth using triangulation, then reprojects every pixel into 3D using the camera calibration to form a live point cloud. A dual‑view UI shows a colored depth map and a “ghost view” 3D scatter plot you can orbit while moving your hand or objects in front of the rig.

How we built it

We rigidly mounted two cameras with a fixed baseline and ran stereo calibration in OpenCV to recover intrinsics, distortion, and extrinsics R, T. After stereo rectification, we used a tuned Semi‑Global Block Matching pipeline to generate dense disparity maps and converted disparity d to depth Z = f*B/d, where f is focal length and B is baseline. Using the Q reprojection matrix, we lifted each pixel into 3D to generate a point cloud, then rendered it in real time with OpenGL/Matplotlib‑style visualization and simple controls to rotate, zoom, and toggle color modes.

Challenges we ran into

The first fight was calibration: tiny misalignments gave huge reprojection error, so depth would “swim” whenever the rig moved even slightly. We had to design a sturdier mount, repeat chessboard captures under different poses, and iterate until reprojection error dropped to an acceptable threshold. Getting clean, dense disparity on low‑texture or shiny surfaces forced us to experiment with SGBM parameters, pre‑filters, and post‑processing (speckle removal, median blurs) to balance noise vs detail. Finally, we hit the classic stereo limitation: error grows with distance, so we had to explicitly focus Goobvision on near‑field (roughly 0.4–2 m) applications instead of pretending to be a highway‑range LiDAR.

Accomplishments that we're proud of

We built a fully working stereo pipeline—from calibration to live point cloud—in a hackathon timeframe, not just offline scripts. Seeing our own hands appear as a dense 3D “ghost” cloud in real time was a huge milestone and made the math feel tangible. We also implemented a simple evaluation harness to log depth error vs distance against LiDAR or tape‑measure ground truth, so we can talk about Goobvision with actual numbers rather than just cool visuals.

What we learned

We learned how critical good calibration, rigid hardware, and rectification are to any stereo system—software alone cannot rescue a wobbly rig. We got hands‑on with the full pseudo‑LiDAR pipeline described in research papers: disparity, depth mapping, and 3D reconstruction, and saw how design choices (baseline, resolution, matching algorithm) change accuracy. We also appreciated the trade‑offs between LiDAR and stereo: LiDAR dominates long‑range but stereo can match or approach LiDAR accuracy at close range while providing much denser RGB‑aware point clouds for manipulation tasks.

What's next for Goobvision : StereoVision Camera

Next, we want to mount Goobvision on a small rover/arm and use the 3D point cloud for real tasks like obstacle avoidance, grasp planning, and human‑aware navigation. On the algorithm side, we plan to integrate learning‑based stereo / depth completion to improve performance on low‑texture surfaces and extend range while keeping the same cameras. Longer term, we see Goobvision evolving into a drop‑in, open‑source stereo depth module with ROS integration and calibration tools so any team can add dense 3D perception to their robot without needing a LiDAR.