Sunday, October 30, 2011

Better design and organization are good news:

As I said in my previous post, I decided to start to restructure my entire project so that I could isolate problems. In these last two weeks I have been working hard to get to the point where I was before. Yesterday I finally managed to reach the position where I was, but now I have one advantage: the project is better organized and the problems will be easier to detect.

I haven't integrated the loop detection and graph optimization functionality yet, however I now have the necessary classes to perform visual odometry. During these past few days, I have also done many optimizations and incorporated the possibility of using ORB (Oriented FAST + Rotated BRIEF) to the features detection and descriptors extraction step.

I have reconstructed a room so you can get an idea of ​​the results I am getting. This room was a challenge because it was poorly lit and lacked from visual features, however the last implementation of my project has been able to reconstruct the room quite accurately. I leave you a video of the process below:

Reconstruction of a room using a handheld Kinect (visual odometry). This approach is based in pairwise alignment and uses SURF-GPU for 2D feature matching and ICP for pose refinement.

In my experiments, ORB has shown to be considerably faster than SURF-GPU in the features detection and descriptors extraction process. At first I thought it would be an excellent alternative to SURF-GPU since it could significantly reduce the computation time. The problem of ORB is that, detects few 2D features when there are not many "corners" in the image. This lack of features makes the visual pose approximation process less accurate and, finally ICP converges to worse solutions. In the other hand, SURF-GPU is considerably slower than ORB. However, SURF-GPU produces a huge set of features ("blobs") in many situations, leading to good pose approximations. Hence, SURF-GPU+ICP converges to good solutions even with there are few "corners" in the image.

In this version I have also added the ability to use the original Stanford GICP implementation. This implementation demonstrates better results than ICP when the point clouds are relative far apart, yet produces similar results than ICP when the point clouds are close enough. As usually the visual pose approximation is relatively good, GICP and ICP produces very similar results, hence I decided to use ICP instead of GICP since the first takes less computation time.


This is another video using SURFGPU for 2D feature matching and Generalized ICP for pose refinement:


  1. wow.
    how much registration fps can you get using this implementation?

  2. This implementation runs at 2Fps approximately, which is not ready for real-time. However, this is running on a intel Core 2 Duo 2.0Ghz with an integrated NVIDIA GeForce 9400M and 4GB of 1066 MHz DDR3 SDRAM. Furthermore, the RGB-D frame grabbing and VoxelGrid point cloud downsampling takes half the time. Useful computation takes 0.25 seconds per loop in my machine. I would like to test this implementation in a better machine.