MobileRobot

Robot capable of autonomous navigation and object detection currently in development. Initial applications are household cleaning tasks and delivery. A GUI showing detections and point clouds in 3D is included, along with the predicted 3D center point of the object which is used as the pick-point.

View repository In Development • 2024-present

Hardware Platform

Mobile Robot Platform

Custom mobile robot platform featuring RealSense D455 depth camera, NVIDIA Jetson compute module, and omnidirectional drive system.

How to Run

cd /path/to/CleaningRobot

# Build
docker compose -f docker/docker-compose.yml build

# Run (this automatically runs entrypoint.sh inside the container)
docker compose -f docker/docker-compose.yml up

Robot in Motion

Noticing consistent vibration from the small hard mecanum wheels. I expected some due to the omnidirectional design, but as you can see in the video there is quite a bit. It’ll be an interesting stress test for visual SLAM and point cloud stability once implemented, but realistically this mechanical noise will likely challenge pose estimation and depth consistency, so I will have to figure out a mechanical solution soon as well.

Demo Clips

Short captures demonstrating picking and point-cloud visualization during runtime.

Picking demo (bloopers for now)

Point cloud demo

System Architecture

3D Object Detection & Visualization

1. Sensor Input

RealSense D455 streams 720p RGB + aligned depth at 30 FPS via /color, /depth, and /imu topics.

2. Object Detection

YOLOv8 TensorRT detects custom objects (clothes, dirt, debris) with 83%+ confidence. Outputs bounding boxes via /detections_output.

3. Point Cloud Generation

For each YOLO bounding box, crop the corresponding depth ROI (100-5000mm range). Compute median depth from central 40% window to establish object plane. Filter pixels within ±50mm of median to isolate object from floor/background. Project filtered pixels to 3D using camera intrinsics (fx, fy, cx, cy).

4. Pose Estimation & Visualization

Compute object center from median point cloud coordinates. Publish via /detected_objects_pointcloud and /detected_objects_3d. Stream to WebSocket server for real-time Three.js visualization at 15+ FPS.

Autonomous Navigation (Planned)

1. Visual SLAM

Isaac ROS Visual SLAM processes camera feed and IMU data to generate real-time odometry (/odom) and global pose estimates (/pose).

2. 3D Mapping

Isaac ROS nvblox constructs real-time 3D occupancy maps and 2D costmaps from depth data, enabling obstacle-aware path planning.

3. Path Planning

Isaac ROS Nav2 computes collision-free trajectories from current pose to goal positions using costmaps and dynamic replanning.

4. Motion Control

Robot controller translates path commands to motor velocities via /cmd_vel, actuating the mecanum wheel base for omnidirectional movement.

Custom Model Training

Tools for collecting training data and fine-tuning YOLOv8:

  • Data Collection: collectData2DRGB.py captures calibrated images at 1-second intervals
  • Annotation: Images labeled using Roboflow for bounding box annotation
  • Training: YOLOv8 models trained on domain-specific datasets (clothes, dirt)
  • Optimization: Models converted to ONNX and compiled to TensorRT for ~10x speedup on Jetson
  • Deployment: Dynamic model loading via ROS2 launch parameters
# Can also test custom model standalone with ros2 example:
ros2 launch isaac_ros_examples isaac_ros_examples.launch.py \
  launch_fragments:=realsense_mono_rect_depth,yolov8 \
  model_file_path:=${ISAAC_ROS_WS}/models/yolov8/clothes.onnx \
  engine_file_path:=${ISAAC_ROS_WS}/models/yolov8/clothes.plan \
  confidence_threshold:=0.83

3D Visualization

Custom WebSocket-based viewer built with Three.js for real-time 3D visualization:

Features:

  • Dual-stream rendering: object-only clouds + full scene context
  • Live 3D center point markers with wireframe spheres
  • Adjustable point size (0.1-15px) via slider
  • Orbit controls (rotate, zoom, pan) with dampening
  • Coordinate axes and ground grid for reference
  • RGB color preservation with sRGB conversion
  • Debug logging panel for status and statistics

Architecture:

web_viewer_server_3d.py subscribes to ROS2 topics and serves data via WebSocket endpoints (/3d, /center, /full_scene). The HTML viewer connects and renders using Three.js with efficient buffer geometry updates at 15+ FPS.