The Computer Vision: 2011

Wednesday, December 21, 2011

Kinect RGB-Depth GraphSLAM 6D

It has been a long time since my last entry and it's time to write a new one summarizing the steps I have taken. In the last entry, I talked about the refactoring process I was carrying out in order to detect bugs and build a more structured project. Since then I have been working hard to incorporate the possibility to close loops.

In one of my first entries, I spoke of the need to avoid accumulation of errors caused by odometry. In that entry I discussed very briefly the concept of loop detection and loop closure for a typical GraphSLAM application. During this time, I've been implementing this functionality.

For the loop detection step (front-end) I've made a simple implementation based on the number of inliers resulting from the keyframe matching. This implementation has an advantage and a drawback. The main advantage is that it is easy to implement. However, the disadvantage that arises is that only gives acceptable results for small graphs and environments with abundant and distinguishable textures. The reason for having made this implementation is that the problem of detecting loops in an efficient and robust way is a research field in itself and I preferred to opt for a simple solution that would address the problem of GraphSLAM completely.

For the optimization part of the graph of poses (back-end), I've integrated two different implementations. The first uses the graph-slam module of the MRPT library, while the second uses the G2O library and represents the state of the art in graph optimization.

Here's a video showing the resulting maps from the optimization of the graph of poses. As can be seen, the resulting maps after optimization are more consistent, especially in the area that has been revisited after a while.

During this time I've also been doing other equally important tasks. In recent weeks I've been writing part of my Final Year Project Report and I've read several articles and technical reports related to GraphSLAM. I've also added the ability to reject visual outliers using the fundamental matrix. In my tests, however, the outlier rejection seems to perform better using the homography matrix instead of the fundamental matrix.

Also, this week my Preliminary Project Report has been accepted, so I hope to present my Final Year Project before February. After many months of hard work, I now see the end of an era and the beginning of a new one.

Sunday, October 30, 2011

Better design and organization are good news:

As I said in my previous post, I decided to start to restructure my entire project so that I could isolate problems. In these last two weeks I have been working hard to get to the point where I was before. Yesterday I finally managed to reach the position where I was, but now I have one advantage: the project is better organized and the problems will be easier to detect.

I haven't integrated the loop detection and graph optimization functionality yet, however I now have the necessary classes to perform visual odometry. During these past few days, I have also done many optimizations and incorporated the possibility of using ORB (Oriented FAST + Rotated BRIEF) to the features detection and descriptors extraction step.

I have reconstructed a room so you can get an idea of the results I am getting. This room was a challenge because it was poorly lit and lacked from visual features, however the last implementation of my project has been able to reconstruct the room quite accurately. I leave you a video of the process below:

Reconstruction of a room using a handheld Kinect (visual odometry). This approach is based in pairwise alignment and uses SURF-GPU for 2D feature matching and ICP for pose refinement.

In my experiments, ORB has shown to be considerably faster than SURF-GPU in the features detection and descriptors extraction process. At first I thought it would be an excellent alternative to SURF-GPU since it could significantly reduce the computation time. The problem of ORB is that, detects few 2D features when there are not many "corners" in the image. This lack of features makes the visual pose approximation process less accurate and, finally ICP converges to worse solutions. In the other hand, SURF-GPU is considerably slower than ORB. However, SURF-GPU produces a huge set of features ("blobs") in many situations, leading to good pose approximations. Hence, SURF-GPU+ICP converges to good solutions even with there are few "corners" in the image.

In this version I have also added the ability to use the original Stanford GICP implementation. This implementation demonstrates better results than ICP when the point clouds are relative far apart, yet produces similar results than ICP when the point clouds are close enough. As usually the visual pose approximation is relatively good, GICP and ICP produces very similar results, hence I decided to use ICP instead of GICP since the first takes less computation time.

[Updated]

This is another video using SURFGPU for 2D feature matching and Generalized ICP for pose refinement:

Wednesday, October 12, 2011

Take three steps backwards to take a leap forward:

It has been more than two weeks since my last entry and I have decided to write a new one to summarize what I have been doing during those days.

As I mentioned in my previous entry, the next milestone of my FYP consist in the construction of a graph of keyframe's poses as nodes and rigid transformations between nodes as edges. The main objective consist in the optimization of that graph to avoid accumulation of error performing visual odometry.

The first task was to integrate the g2o library in my project so that I could generate and optimize the graph. This task was not too difficult although I must confess that I took more time than originally expected.

Once integrated this part, I started to test the application comparing the resulting global map from the optimization process with the global unoptimized map. The bad news came to see that the results were pretty bad, and guilt obviously, would not be the developers of the g2o but mine.

After few days trying to fix those problems with the g2o library, I decided to do the graph optimization part with the MRPT library to see if this way I could get better results. This task took me some days too and unfortunately didn't work as I expected. Curiously, the global maps obtained from the optimization of small graphs with the MRPT library, seemed to be slightly better than the unoptimized maps. Instead, when I tried to optimize a graph reconstructing a room, the results where much worse optimizing than just doing odometry.

Global map without graph optimization:

Global map graph optimization (small MRPT graph):

At this point, I decided that the best thing I could do was a cleaning of the code and restructuring in classes to help me find the problem. This is what I have been doing during the last four days and I think It will take at least one or two weeks more refactoring the whole project.

Therefore, in the coming weeks my work will not consist in the addition of new functionality, but improve what I have and rebuild the project over a new base. Perhaps in this way I could be able to find the problem and fix it, in any case this won't be work in vain. I'll take three steps backwards to take a big leap forward!

Monday, September 26, 2011

The next step: loop detection and graph optimization

Last week I summarized the work I have been doing during the summer for my Final Year Project. In that entry I also published a video which showed the reconstruction of a map with the last version of my FYP. The problem of that implementation was that consisted in the alignment of consecutive frames (pairwise alignment), so that suffered from error accumulation.

In today's entry I will briefly talk about the advances of this week as well as the steps I'm taking to accomplish the next milestone: loop detection and graph optimization to avoid error accumulation.

Loop detection and graph optimization: in this phrase we can distinguish two different tasks, although closely linked. The first task consist in the identification of a previously visited place (loop). This way, when a loop is detected, a new relation is added to the graph that relates the current pose with the pose in the past where we visited the same place. The second task tries to reduce the accumulated error from the pose estimation based on pairwise alignment.

Thus, each time a certain place is revisited (a loop is detected), a new relation is added to the graph and the graph optimization process is launched to minimize the accumulated error. This will get more consistent maps, especially when a place is revisited after a long way.

I haven't integrated the graph optimization process in my project yet, nevertheless I'm working in that direction. I first decided to start implementing a new version of my project that could detect loops. To this end, I opted for a simple implementation that is based on visual feature matching to determine if an area has been visited or not.

Broadly speaking, what I did was the following:

I have created a keyframe chain that stores a subset of previously grabbed frames.
In each iteration, if the camera has substantially shifted, feature matching is performed with stored keyframes. This way, if the number of resulting inliers is above a certain threshold, I consider that the current frame and the keyframe to which pairwise alignment is performed, correspond to the same place.

Furthermore, to avoid performance penalty matching features against every stored keyframe, I have considered the pose information of each keyframe. Thus, feature matching is only performed with keyframes which pose are close enough to the current pose.

I have done some test in several rooms and the loop detection results are acceptable. In the coming days I will try to integrate the graph optimization process to see if I manage to reduce the accumulated error this way. The framework I pretend to use for this task is g2o, which has also been used in "Realtime Visual and Point Cloud SLAM" by Nicola Fioraio and Kurt Konolige that could be found here: http://openslam.org/g2o.

Sunday, September 18, 2011

My Final Year Project: a brief overview of the summer work

One of the reasons why I created this blog, has been to announce the progress that I am carrying out my Final Year Project (FYP). I started working on my FYP in early june 2011, although I had thought the subject of my project since mid-march. The title of my FYP is "Development and implementation of a method of generating 3D maps using the Kinect sensor" which is to develop an aplication of 6D SLAM using RGB images and point clouds captured from the Kinect.

As I was saying, I started the project in early june, and since then I have been working on hard to finish it as soon as possible. The first stages of my FYP consisted on the reading of a series of articles related to my project, as well as the familiarization with the open source libraries that I would use to implement the project.

Some of the articles I have read and about wich I am basing my project are:

Simultaneous Localization and Mapping (SLAM): Part I, by Hugh Durrant y Tim Bailey
Simultaneous Localization and Mapping (SLAM): Part II, by Hugh Durrant y Tim Bailey
RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments, by Peter Henry, Michael Krainin, Evan Herbst, Xiaofeng Ren y Dieter Fox.
Generalized-ICP, by Aleksandr V. Segal, Dirk Haehnel y Sebastian Thrun.
Real-time 3D visual SLAM with a hand-held RGB-D camera, by Nikolas Engelhard, Felix Endres, Jürgen Hess, Jürgen Sturm y Wolfram Burgard.
Scene reconstruction from Kinect motion, by Marek Šolony.
Realtime Visual and Point Cloud SLAM, by Nicola Fioraio y Kurt Konolige.

The main open source libraries I am using for my project are the following:

OpenCV: http://opencv.willowgarage.com/wiki/Welcome
PCL: http://pointclouds.org/
MRPT: http://www.mrpt.org/
CUDA: http://developer.nvidia.com/cuda-toolkit-40

I am also using the original implementation of GICP that uses the ANN and GSL libraries. You can find the original GICP code in the following link: http://www.stanford.edu/~avsegal/generalized_icp.html

The first developments that took place were some test like 2D feature matching and 3D point cloud alignment.

2D feature matching

ICP alignment (only)

The next step consisted on developing the first SLAM aplication using point clouds only. In this first aproach I used the PCL ICP implementation to align consecutive point clouds. The main problem this version had was that ICP wasn't initialised, so the convergence time, as well as the estimated pose were horrible.

ICP only SLAM

To reduce execution time of ICP and get better convergences, I started to work on the visual pose approximation. The first thing I did was to get the 3D points corresponding to the 2D features of the RGB image.

3D points corresponding to 2D visual features

Getting a good visual pose approximation to allow ICP converge to good solutions took me quite some time. In late august I got a first version that allowed align two consecutive RGB-D frames using RGB images and point clouds.

Pairwise alignment with visual approximation and ICP refinement

Over the last month I have been making multiple optimizations of my code. I introduced GPU SURF to get a faster feature detection and descriptor extraction, I integrated the original implementation of GICP for improving the alignment between point clouds, I replaced the frame capture functionality from the MRPT to the PCL functions to avoid copies between data structures and much more.

The video that follows shows a map generated with the latest version I've implemented. Still under development, but this is one of the first functional versions. Future releases will introduce loop detection and graph optimization to avoid cumulative error.

Wednesday, September 14, 2011

Introduction

Hi reader, my name is Miguel Algaba. I am a Computer Science student in Málaga (Spain). I started my engineering studies in september 2006 and I finished my last courses in june 2011.

I have always been attracted to the idea that someday we will be able to build machines that can perceive its environment the way we do. One day, in one of the subjects that have fascinated me the most in my career, a good teacher in my class told us that "Most of sensory information we perceive is visual". Since then, I have always wanted to explore ways to make a machine perceive its environment with its own eyes.

I have created this blog to relate my experiences and to present what I am learning. This is a space to share my efforts with people like you that have the same motivations. I invite you to my blog to talk with you about computer vision.