MobileRobot

Robot capable of autonomous navigation and object detection currently in development. Initial applications are household cleaning tasks and delivery. A GUI showing detections and point clouds in 3D is included, along with the predicted 3D center point of the object which is used as the pick-point.

View repository In Development • 2024-present

Role

Software/Hardware Engineer

Tech

• ROS2 + Isaac ROS

• YOLOv8 TensorRT

• Three.js

• Python + OpenCV

Hardware Platform

Custom mobile robot platform featuring RealSense D455 depth camera, NVIDIA Jetson compute module, and omnidirectional drive system.

How to Run

cd /path/to/CleaningRobot

# Build
docker compose -f docker/docker-compose.yml build

# Run (this automatically runs entrypoint.sh inside the container)
docker compose -f docker/docker-compose.yml up

Robot in Motion

Noticing consistent vibration from the small hard mecanum wheels. I expected some due to the omnidirectional design, but as you can see in the video there is quite a bit. It’ll be an interesting stress test for visual SLAM and point cloud stability once implemented, but realistically this mechanical noise will likely challenge pose estimation and depth consistency, so I will have to figure out a mechanical solution soon as well.

Demo Clips

Short captures demonstrating picking and point-cloud visualization during runtime.

Picking demo (bloopers for now)

Point cloud demo

System Architecture

3D Object Detection & Visualization

1. Sensor Input

RealSense D455 streams 720p RGB + aligned depth at 30 FPS via /color, /depth, and /imu topics.

2. Object Detection

YOLOv8 TensorRT detects custom objects (clothes, dirt, debris) with 83%+ confidence. Outputs bounding boxes via /detections_output.

3. Point Cloud Generation

For each YOLO bounding box, crop the corresponding depth ROI (100-5000mm range). Compute median depth from central 40% window to establish object plane. Filter pixels within ±50mm of median to isolate object from floor/background. Project filtered pixels to 3D using camera intrinsics (fx, fy, cx, cy).

4. Pose Estimation & Visualization

Compute object center from median point cloud coordinates. Publish via /detected_objects_pointcloud and /detected_objects_3d. Stream to WebSocket server for real-time Three.js visualization at 15+ FPS.

Autonomous Navigation (Planned)

1. Visual SLAM

Isaac ROS Visual SLAM processes camera feed and IMU data to generate real-time odometry (/odom) and global pose estimates (/pose).

2. 3D Mapping

Isaac ROS nvblox constructs real-time 3D occupancy maps and 2D costmaps from depth data, enabling obstacle-aware path planning.

3. Path Planning

Isaac ROS Nav2 computes collision-free trajectories from current pose to goal positions using costmaps and dynamic replanning.

4. Motion Control

Robot controller translates path commands to motor velocities via /cmd_vel, actuating the mecanum wheel base for omnidirectional movement.

Custom Model Training

Tools for collecting training data and fine-tuning YOLOv8:

Data Collection: collectData2DRGB.py captures calibrated images at 1-second intervals
Annotation: Images labeled using Roboflow for bounding box annotation
Training: YOLOv8 models trained on domain-specific datasets (clothes, dirt)
Optimization: Models converted to ONNX and compiled to TensorRT for ~10x speedup on Jetson
Deployment: Dynamic model loading via ROS2 launch parameters

# Can also test custom model standalone with ros2 example:
ros2 launch isaac_ros_examples isaac_ros_examples.launch.py \
  launch_fragments:=realsense_mono_rect_depth,yolov8 \
  model_file_path:=${ISAAC_ROS_WS}/models/yolov8/clothes.onnx \
  engine_file_path:=${ISAAC_ROS_WS}/models/yolov8/clothes.plan \
  confidence_threshold:=0.83

3D Visualization

Custom WebSocket-based viewer built with Three.js for real-time 3D visualization:

Features:

Dual-stream rendering: object-only clouds + full scene context
Live 3D center point markers with wireframe spheres
Adjustable point size (0.1-15px) via slider
Orbit controls (rotate, zoom, pan) with dampening
Coordinate axes and ground grid for reference
RGB color preservation with sRGB conversion
Debug logging panel for status and statistics

Architecture:

web_viewer_server_3d.py subscribes to ROS2 topics and serves data via WebSocket endpoints (/3d, /center, /full_scene). The HTML viewer connects and renders using Three.js with efficient buffer geometry updates at 15+ FPS.

Hardware (parts list)

NVIDIA Jetson Orin Nano Developer Kit
Waveshare RoArm M2-S 4-axis robotic arm
Intel RealSense D455
HiWonder Black Mecanum Chassis Kit
- 4× JGB37-528 encoded geared motors
- 4× Mecanum omni-directional wheels
- Layered aluminum alloy frame
- Motor driver (4-channel)
Logic Level Shifter
11.1V LiPo Battery
Power Distribution Board
Adjustable Buck Converter
3D Printed Bracket
Main Power Switch

Tech Stack

• ROS2 Humble

• Isaac ROS (NVIDIA)

• YOLOv8 + TensorRT

• Python 3.10

• OpenCV + NumPy

• Three.js (WebGL)

• WebSockets

Performance

Detection: ~15 FPS
Points/object: 500-5000
Full scene: 25k pts @ 15 FPS
Floor plane: 2000+ inliers
3D pose latency: <100ms

Trained Models

clothes.onnx — Clothes on carpet
clean.onnx — Dirt/Debris/Dust on wood
trash.onnx — Trash on wood

ROS2 Topics

📥 /detections_output

📥 /aligned_depth_to_color

📥 /image_rect

📤 /detected_objects_pointcloud

📤 /detected_objects_3d

📤 /full_scene_pointcloud

Future Work

Visual SLAM integration
nvblox 3D mapping
Nav2 autonomous navigation
Multi-object tracking
Grasp pose estimation

Quick Links

Repository YOLOv8 Fork Isaac ROS Docs