Hardware Platform
Custom mobile robot platform featuring RealSense D455 depth camera, NVIDIA Jetson compute module, and omnidirectional drive system.
Robot capable of autonomous navigation and object detection currently in development. Initial applications are household cleaning tasks and delivery. A GUI showing detections and point clouds in 3D is included, along with the predicted 3D center point of the object which is used as the pick-point.
Custom mobile robot platform featuring RealSense D455 depth camera, NVIDIA Jetson compute module, and omnidirectional drive system.
cd /path/to/CleaningRobot
# Build
docker compose -f docker/docker-compose.yml build
# Run (this automatically runs entrypoint.sh inside the container)
docker compose -f docker/docker-compose.yml up
Noticing consistent vibration from the small hard mecanum wheels. I expected some due to the omnidirectional design, but as you can see in the video there is quite a bit. It’ll be an interesting stress test for visual SLAM and point cloud stability once implemented, but realistically this mechanical noise will likely challenge pose estimation and depth consistency, so I will have to figure out a mechanical solution soon as well.
Short captures demonstrating picking and point-cloud visualization during runtime.
Picking demo (bloopers for now)
Point cloud demo
RealSense D455 streams 720p RGB + aligned depth at 30 FPS via /color, /depth, and /imu topics.
YOLOv8 TensorRT detects custom objects (clothes, dirt, debris) with 83%+ confidence. Outputs bounding boxes via /detections_output.
For each YOLO bounding box, crop the corresponding depth ROI (100-5000mm range). Compute median depth from central 40% window to establish object plane. Filter pixels within ±50mm of median to isolate object from floor/background. Project filtered pixels to 3D using camera intrinsics (fx, fy, cx, cy).
Compute object center from median point cloud coordinates. Publish via /detected_objects_pointcloud and /detected_objects_3d. Stream to WebSocket server for real-time Three.js visualization at 15+ FPS.
Isaac ROS Visual SLAM processes camera feed and IMU data to generate real-time odometry (/odom) and global pose estimates (/pose).
Isaac ROS nvblox constructs real-time 3D occupancy maps and 2D costmaps from depth data, enabling obstacle-aware path planning.
Isaac ROS Nav2 computes collision-free trajectories from current pose to goal positions using costmaps and dynamic replanning.
Robot controller translates path commands to motor velocities via /cmd_vel, actuating the mecanum wheel base for omnidirectional movement.
Tools for collecting training data and fine-tuning YOLOv8:
collectData2DRGB.py captures calibrated images at 1-second intervals# Can also test custom model standalone with ros2 example:
ros2 launch isaac_ros_examples isaac_ros_examples.launch.py \
launch_fragments:=realsense_mono_rect_depth,yolov8 \
model_file_path:=${ISAAC_ROS_WS}/models/yolov8/clothes.onnx \
engine_file_path:=${ISAAC_ROS_WS}/models/yolov8/clothes.plan \
confidence_threshold:=0.83
Custom WebSocket-based viewer built with Three.js for real-time 3D visualization:
web_viewer_server_3d.py subscribes to ROS2 topics and serves data via WebSocket endpoints (/3d, /center, /full_scene). The HTML viewer connects and renders using Three.js with efficient buffer geometry updates at 15+ FPS.