Friday, August 12, 2011

GPU-based Image Processing on Tegra 2 and Microsoft Kinect

In this tutorial, we will focus on the real-time image processing of the depth data and RGB data we collect from the Microsoft Kinect (see last tutorial if you want to configure your Tegra 2 to interface with the Kinect). In particular, we will show you how we can write custom fragment shader programs that will perform the image transformation (i.e., mapping the color map back to the depth map data) using the GPUs on the Tegra 2.

When we mention GPGPU (General-Purpose computation on Graphics Processing Units) nowadays, one may immediately think of CUDA C/C++, ATI Streams, or OpenCL.  In fact, prior to these tools (about 7 years ago), people have already invented ways to utilize GPUs for computing demanding tasks.

To harness the GPU processing power on the Tegra 2, we will first introduce vertex shader and fragment shader programs in OpenGL ES2.

OpenGL ES2 and Shader Programs: 
The Nvidia Development Kit (NDK) provides a easy-to-use shader wrapper (nv_shader.cpp) to interface with the graphics cards. The idea behind GPGPU is simple, we basically create a texture (a piece of memory on the GPU, either shared with the CPU or not). Then, we write a custom vertex shader and a fragment shader to perform image transformation on that texture. Similar to the CUDA's kernel calls, the fragment programs will execute in parallel and provide us a reasonable speedup if we can utilize it properly. Of course, all these come with limitations that we will explore next.

RGB+D? (Simple calibration on GPU)
The Kinect RGB camera and depth camera are physically located at two different places.  In addition, the depth camera and rgb camera has different focal length, principle point, etc... Explaining the underlying camera calibration methods, camera models, and is beyond the scope of this tutorial. Feel free to check out the work from others below.

Important: You may need to run this and extract the intrinsic and extrinsic parameters of your Kinect. Just update the parameters in our fragment program accordingly.

Fragment Shader for Mapping RGB values to depth map
Fragment program is basically a small program that runs on the graphics card on a per-pixel basis. Here is the fragment program that we have created to map the RGB (640x480) color image back to the depth map (640x480). Notice that we have stored the depth value in the alpha channel of the RGBA texture. The detail of the calibration is shown below. The source code shall be straightforward to others who have done GPU programming.

precision mediump float;
varying vec2 v_texCoord;  
uniform sampler2D s_texture;

//my own calibrated data.. replace the following with your own.
//xxx_rgb - parameters for the color camera
//xxx_d - parameters for the depth camera
const float fx_rgb = 5.0995627581871054e+02;
const float fy_rgb = 5.1009672745532589e+02;
const float cx_rgb = 3.2034728823678967e+02;
const float cy_rgb = 2.5959956434787728e+02;

const float fx_d = 5.5879981950414015e+02;
const float fy_d = 5.5874227168094478e+02;
const float cx_d = 3.1844162327317980e+02;
const float cy_d = 2.4574257294583529e+02;

//Size of the image, use to transform from texture coord to image coord..
vec2 img_coord = vec2(640.0,480.0);

//Rotation and translation matrix
vec3 T = vec3(2.7127130138943419e-02, -1.0041314603411674e-03,-5.6746227781378283e-03);
vec3 R1 = vec3(9.9996078957902945e-01, -8.5633968850082568e-03,-2.2555571980713987e-03);
vec3 R2 = vec3(8.5885385454046812e-03, 9.9989832404109968e-01, 1.1383258999693677e-02);
vec3 R3 = vec3(2.1578484974712269e-03, -1.1402184597253283e-02, 9.9993266467111286e-01);

void main()
        //get the depth data from the texture.
 float depth = texture2D(s_texture, v_texCoord).w; 
 if(depth == 0.0 || depth == 1.0){
  gl_FragColor = vec4(0, 0, 0, 1);
        //transform to image coordinate first, texture coord is from 0 to 1
 float x_d = (v_texCoord.x)*img_coord.x;
 float y_d = (v_texCoord.y)*img_coord.y;
        vec3 P3D;
 vec3 P3D_1;
 vec2 P2D_rgb;
 float real_depth = (2.0 * depth)+0.35;
        //this should be in metric 3D space (world coordinate)
        P3D.x = real_depth * (x_d - cx_d) / fx_d;
 P3D.y = -real_depth * (y_d - cy_d) / fy_d; //negative because +y is up.
 P3D.z = real_depth;

 //transform this then project to the camera using the extrinsic parameters.
 P3D_1 = vec3(dot(R1, P3D)-T.x, dot(R2, P3D)-T.y, dot(R3, P3D)-T.z); 
 //now we map this back to the image using the intrinsic parameters of the color camera
 float P3D_1_1 = 1.0 / P3D_1.z;
 P2D_rgb.x = (P3D_1.x * fx_rgb * P3D_1_1) + cx_rgb;
 P2D_rgb.y = -(P3D_1.y * fy_rgb * P3D_1_1) + cy_rgb; //negative for the +y is down 
 //transform back to texture coordinate 
 P2D_rgb = P2D_rgb/img_coord;
 //extract the RGB value, linearly interpolated, 
        //and store the final result (1.0-depth inverted the result such that objects are brighter if they are closer to the camera)

 gl_FragColor=vec4(1.0, 1.0, 1.0, 0)*texture2D(s_texture, P2D_rgb)+vec4(0,0,0,1.0-depth);

I/Render Loop:(  842): Display loop 0.029735 (s)
I/Render Loop:(  842): Display loop 0.031630 (s)
I/Render Loop:(  842): Display loop 0.032471 (s)
~ about 0.03 ms per frame!
It takes about 0.03ms to perform such computation on the GPU. This includes the time requires to update the texture, to perform rgb to depth mapping which includes pixel interpolation, and to display the result on screen. Pretty impressive! What's next? We will now ready to render these in 3D, and perhaps allow user to change the perspective using the touchscreen.

Source Code:
svn co multitouch

Demo Video:
Non-calibrated RGBD (using fragment program to adjust the brightness of the pixel). Notice the misalignment.

With the proper calibration, now the RGB (color) image will map to the depth image.

Special Thanks:
James Fung, Nvidia Technology Development for supplying the Ventana Development Kit.

Related Articles:
OpenGL ES2 Tutorial
Camera Matrix Tutorial
To be continued....