University of Essex Background
Visual Simultaneous localisation and mapping otherwise known as SLAM, technique is one of the most important research areas in machine vision and robotics with a broad range of vertical applications spanning from autonomous vehicles to virtual reality. SLAM systems can estimate the robot pose and environment maps and their uses also include robots for inspection, such as visual inspection and assessment of industrial equipment and infrastructures in harsh environments.
The majority of visual SLAM techniques are based on vision geometry and optimisation algorithms and use stereo images. These systems cannot learn automatically from raw images or benefit from continuously increased datasets. There are some visual SLAM techniques which are based on deep neural networks. However, these systems are trained on defined, labelled data sets. Labelling large amounts of data is difficult and expensive, which limits the potential application scenarios. Furthermore, visual SLAM systems have typically suffered from reduced accuracy in challenging conditions such as low light. These limitations have significantly impeded the adoption of SLAM technologies in key application industries.
Researchers at The University of Essex made a significant breakthrough recently. The accuracy and robustness of their system outperforms the existing SLAM systems. It has the potential to make further improvement as more training data becomes available.
A team at the University of Essex have developed a novel monocular SLAM approach, based on deep neural networks, which generates a pose trajectory, depth map, and 3D point cloud simultaneously. This patented system is trained using unsupervised deep learning, allowing the system to benefit from continuously increased data sets to update and improve its performance in real time as image data is received from each new environment. It is therefore not restricted to the finite, pre-set environment of the training data.
The essential elements of the invention are the system architecture (), the system training scheme (), and the computer code.
The system architecture includes two main deep neural networks: Mapping-Net and Tracking-Net. The system input is a monocular image sequence and for each image, the Mapping-Net estimates the depth and the Tracking-Net estimates the pose. Additionally, the system includes a loop detection network (Loop-Net) that assesses spatial and temporal image losses to reduce the accumulated drift in pose estimation and fine-tune the pose accuracy, thus continuously re-training the system in each new environmental context.
Unsupervised learning reduces the system’s reliance on annotated ground-truth data sets, decreasing the demand for labelled data which is limited in its availability and costly and labour-intensive to generate
Application environments are not restricted to those with annotated ground-truth data sets, greatly expanding the range of applications when compared to supervised SLAM system
Performance is continuously improved in each new environment
Enhanced performance in challenging environments such as low light, using just a single, cheap camera
This technology could be employed in the following areas:
Unmanned aerial vehicles
Autonomous underwater vehicles
Space exploration vehicles and robots
Healthcare (invasive surgeries)
Augmented and Virtual Reality
The university is looking for partners who wish to license this technology and/or collaborate in the development of novel systems trained on environments for specific industry applications.