The Computer Vision: September 2011

Monday, September 26, 2011

The next step: loop detection and graph optimization

Last week I summarized the work I have been doing during the summer for my Final Year Project. In that entry I also published a video which showed the reconstruction of a map with the last version of my FYP. The problem of that implementation was that consisted in the alignment of consecutive frames (pairwise alignment), so that suffered from error accumulation.

In today's entry I will briefly talk about the advances of this week as well as the steps I'm taking to accomplish the next milestone: loop detection and graph optimization to avoid error accumulation.

Loop detection and graph optimization: in this phrase we can distinguish two different tasks, although closely linked. The first task consist in the identification of a previously visited place (loop). This way, when a loop is detected, a new relation is added to the graph that relates the current pose with the pose in the past where we visited the same place. The second task tries to reduce the accumulated error from the pose estimation based on pairwise alignment.

Thus, each time a certain place is revisited (a loop is detected), a new relation is added to the graph and the graph optimization process is launched to minimize the accumulated error. This will get more consistent maps, especially when a place is revisited after a long way.

I haven't integrated the graph optimization process in my project yet, nevertheless I'm working in that direction. I first decided to start implementing a new version of my project that could detect loops. To this end, I opted for a simple implementation that is based on visual feature matching to determine if an area has been visited or not.

Broadly speaking, what I did was the following:

I have created a keyframe chain that stores a subset of previously grabbed frames.
In each iteration, if the camera has substantially shifted, feature matching is performed with stored keyframes. This way, if the number of resulting inliers is above a certain threshold, I consider that the current frame and the keyframe to which pairwise alignment is performed, correspond to the same place.

Furthermore, to avoid performance penalty matching features against every stored keyframe, I have considered the pose information of each keyframe. Thus, feature matching is only performed with keyframes which pose are close enough to the current pose.

I have done some test in several rooms and the loop detection results are acceptable. In the coming days I will try to integrate the graph optimization process to see if I manage to reduce the accumulated error this way. The framework I pretend to use for this task is g2o, which has also been used in "Realtime Visual and Point Cloud SLAM" by Nicola Fioraio and Kurt Konolige that could be found here: http://openslam.org/g2o.

Sunday, September 18, 2011

My Final Year Project: a brief overview of the summer work

One of the reasons why I created this blog, has been to announce the progress that I am carrying out my Final Year Project (FYP). I started working on my FYP in early june 2011, although I had thought the subject of my project since mid-march. The title of my FYP is "Development and implementation of a method of generating 3D maps using the Kinect sensor" which is to develop an aplication of 6D SLAM using RGB images and point clouds captured from the Kinect.

As I was saying, I started the project in early june, and since then I have been working on hard to finish it as soon as possible. The first stages of my FYP consisted on the reading of a series of articles related to my project, as well as the familiarization with the open source libraries that I would use to implement the project.

Some of the articles I have read and about wich I am basing my project are:

Simultaneous Localization and Mapping (SLAM): Part I, by Hugh Durrant y Tim Bailey
Simultaneous Localization and Mapping (SLAM): Part II, by Hugh Durrant y Tim Bailey
RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments, by Peter Henry, Michael Krainin, Evan Herbst, Xiaofeng Ren y Dieter Fox.
Generalized-ICP, by Aleksandr V. Segal, Dirk Haehnel y Sebastian Thrun.
Real-time 3D visual SLAM with a hand-held RGB-D camera, by Nikolas Engelhard, Felix Endres, Jürgen Hess, Jürgen Sturm y Wolfram Burgard.
Scene reconstruction from Kinect motion, by Marek Šolony.
Realtime Visual and Point Cloud SLAM, by Nicola Fioraio y Kurt Konolige.

The main open source libraries I am using for my project are the following:

OpenCV: http://opencv.willowgarage.com/wiki/Welcome
PCL: http://pointclouds.org/
MRPT: http://www.mrpt.org/
CUDA: http://developer.nvidia.com/cuda-toolkit-40

I am also using the original implementation of GICP that uses the ANN and GSL libraries. You can find the original GICP code in the following link: http://www.stanford.edu/~avsegal/generalized_icp.html

The first developments that took place were some test like 2D feature matching and 3D point cloud alignment.

2D feature matching

ICP alignment (only)

The next step consisted on developing the first SLAM aplication using point clouds only. In this first aproach I used the PCL ICP implementation to align consecutive point clouds. The main problem this version had was that ICP wasn't initialised, so the convergence time, as well as the estimated pose were horrible.

ICP only SLAM

To reduce execution time of ICP and get better convergences, I started to work on the visual pose approximation. The first thing I did was to get the 3D points corresponding to the 2D features of the RGB image.

3D points corresponding to 2D visual features

Getting a good visual pose approximation to allow ICP converge to good solutions took me quite some time. In late august I got a first version that allowed align two consecutive RGB-D frames using RGB images and point clouds.

Pairwise alignment with visual approximation and ICP refinement

Over the last month I have been making multiple optimizations of my code. I introduced GPU SURF to get a faster feature detection and descriptor extraction, I integrated the original implementation of GICP for improving the alignment between point clouds, I replaced the frame capture functionality from the MRPT to the PCL functions to avoid copies between data structures and much more.

The video that follows shows a map generated with the latest version I've implemented. Still under development, but this is one of the first functional versions. Future releases will introduce loop detection and graph optimization to avoid cumulative error.

Wednesday, September 14, 2011

Introduction

Hi reader, my name is Miguel Algaba. I am a Computer Science student in Málaga (Spain). I started my engineering studies in september 2006 and I finished my last courses in june 2011.

I have always been attracted to the idea that someday we will be able to build machines that can perceive its environment the way we do. One day, in one of the subjects that have fascinated me the most in my career, a good teacher in my class told us that "Most of sensory information we perceive is visual". Since then, I have always wanted to explore ways to make a machine perceive its environment with its own eyes.

I have created this blog to relate my experiences and to present what I am learning. This is a space to share my efforts with people like you that have the same motivations. I invite you to my blog to talk with you about computer vision.