Visual Odometry (VO) is a technique used in robotic navigation, where the position and orientation of a robot are estimated by analyzing the sequence of images captured by an onboard camera. This method has been notably used in projects such as NASA's Mars Exploration Rover mission in 2004 demonstrating its practical applications in various fields.
The early developments in VO, including technologies like MonoSLAM, have been foundational in the advancement of robotic navigation. These technologies provide a basis for robots to understand and move through their environment using visual data.
This project is centered around the application and exploration of ORB-SLAM3, a SLAM/visual odometry system, within an robotics context.
 
            Direct methods of visual odometry work by directly using pixel intensity values from images to estimate motion. This approach does not explicitly extract features from the images. Instead, it utilizes the raw image data to compute the motion between frames.
In direct methods of visual odometry, pixel intensity matching is used to estimate motion by comparing pixel intensities between successive images. This approach is particularly robust in low-texture environments where extracting features might be challenging, although it can be sensitive to lighting changes and might require more computational resources. An example of this method is the Semi-direct Visual Odometry (SVO), which uniquely combines feature-based and direct approaches.
Indirect methods, also known as feature-based methods, involve detecting and matching key features between different images. These features could be points, edges, or other distinct elements in the environment.
Indirect methods, or feature-based methods, involve extracting distinct features like points or edges from images, and then matching these features across different frames to estimate motion. This approach tends to be more robust against changes in lighting and is generally more efficient, but it might face challenges in environments with few distinct features or in highly dynamic scenes. ORB-SLAM, which utilizes Oriented FAST and Rotated BRIEF features, is a widely-used example of an indirect method.
In addition to purely direct or indirect approaches, there are hybrid methods that combine elements of both. These methods aim to leverage the strengths of each approach to improve accuracy and robustness.
Hybrid methods in visual odometry, such as Semi-direct Visual Odometry (SVO), blend the advantages of both feature extraction and pixel intensity analysis. This combination allows for accurate feature matching and enhanced motion estimation by directly using pixel data. SVO is a prime example of such a hybrid approach, offering a sophisticated balance of direct and indirect visual odometry techniques. For more insights into SVO, visit Semi-direct Visual Odometry (SVO) at the University of Zurich.
ORB-SLAM3 delves into the latest advancements in Visual SLAM and VO, highlighting the role of loop closing techniques in merging the domains of VO and SLAM. It underscores the importance of MAP estimation and BA in both geometric and photometric contexts. This research extends beyond simple ego-motion tracking, focusing on creating and using comprehensive environmental maps. It details three types of data associations—short-term, mid-term, and long-term—and builds upon the foundations set by ORB-SLAM and ORB-SLAM Visual-Inertial. The paper brings to light the concept of multi-map data association, aiming to refine map accuracy and usability, a core objective of SLAM systems.
This work is grounded in the principles established by two key papers: ORBSLAM 2 and ORB SLAM VI (Visual-Inertial Monocular SLAM with Map Reuse), which have been instrumental in advancing SLAM technologies in varied environments.
Feature extraction plays a pivotal role in the implementation of ORB-SLAM. It involves identifying and using key points in images to track the movement and orientation of a camera through an environment. ORB-SLAM leverages the ORB (Oriented FAST and Rotated BRIEF) algorithm for this purpose due to its efficiency and effectiveness in real-time applications. This is crucial for ORB-SLAM, which requires quick and reliable feature detection and matching to accurately map and navigate through spaces.
Scale-Invariant Feature Transform, Rich feature detection, ideal for scale and rotation variations, High computational resource requirement
Speeded Up Robust Features, Faster than SIFT, with similar feature richness, Effective in image matching and 3D reconstruction
Oriented FAST and Rotated BRIEF, Fast and efficient, suitable for real-time applications, Rotation invariant and less resource-intensive
In this phase of the project, I initiated by setting up a test environment in Gazebo ROS. The chosen environment was an office world from the collection available at Gazebo Models and Worlds Collection by leonhartyao. This selection provided a realistic and complex setting, ideal for testing the capabilities of the SPOT Boston Dynamics model in Gazebo.
For the integration of the SPOT model, I utilized CHAMP, an open-source framework found at CHAMP by chvmp. CHAMP is designed for constructing quadrupedal robots and developing control algorithms, drawing inspiration from the hierarchical control implementation on the MIT Cheetah robot. This framework was crucial in setting up the SPOT model, enabling dynamic locomotion control and effective simulation.
            The control of the robot's movement was facilitated by champ_teleop, a fork of teleop_twist_keyboard adapted for quadruped robots, which can be found at champ_teleop by chvmp. This software modification allowed for the control of the robot's entire body pose, including roll, pitch, and yaw.
        
For seamless ROS integration, the Spot ROS Driver from chvmp/spot_ros (branch: gazebo) was employed. This integration enabled effective communication between the Gazebo simulation and the ROS environment.
Lastly, to acquire the specific Boston Dynamics' Spot model needed for the project, I used resources from chvmp/robots. This repository provided a range of pre-configured robots, including the Spot model, complete with instructions for running demonstrations in Gazebo.
This setup formed the foundation of my project, allowing for a comprehensive simulation of the SPOT Boston Dynamics model in a realistic office environment, paving the way for further development and experimentation in autonomous robotics.
ORB-SLAM3, released on 2021, by authors Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel, and Juan D. Tardos, represents a significant advancement in SLAM technology. The system supports monocular, stereo, and RGB-D cameras, accommodating both pin-hole and fisheye lens models. In all sensor configurations, ORB-SLAM3 competes robustly with the best systems available, offering marked improvements in accuracy.
Examples provided allow for running ORB-SLAM3 on datasets like EuRoC and TUM-VI, with configurations ranging from stereo or monocular setups to those with or without IMU. The software is a continuation of the work done in ORB-SLAM2, with significant enhancements.
For my project, I utilized the source code of ORB-SLAM3, available at UZ-SLAMLab/ORB_SLAM3 on GitHub. Integrating it into my test environment proved to be a learning experience, as it required a detailed understanding of the necessary nodes and adapting the system to fit my specific needs. This process involved not only integrating but also re-implementing parts of the ORB-SLAM3 code, particularly the tracking components. This effort was aimed at gaining a deeper insight into the workings of the ORB algorithms and enhancing my practical skills in advanced SLAM technologies.
The teleoperation test was conducted using Telo Key, enabling remote control of the robot. This test demonstrated the robot's responsiveness and maneuverability under direct human control.