## Project Summary

A fixed object is observed with a fixed camera under different light direction. See figure below for the setup. The aim of the project is to infer depth model from these set of images.

## Acknowledgement

This project is a part of the Computer Vision course (COMP 5421) during spring 2014. The project partners were Manohar Kuse (myself) and Sunil Jaiswal. The course instructor was Prof. C. K. Tang.

I further acknowledge that the part of text and images presented here are borrowed from the course webpage.

## Downloads

Source Code : [ZIP] (MatLab)

Input Dataset : [Data02] (~48MB), [Data04] (~167MB)

Results : 3D Depth Models [ZIP], Reconstructed Normal [ZIP]

## Results

## Method

**Input
**

The input consists of a set of images (~2000 images) of the same object with fixed camera but with different illumination direction. In addition to the images, the dataset also provides the light direction vectors for each of the images. This is stored in the text file (named ‘lightVec.txt’). The data was acquired by Wu and Tang [1] in a dark room (see picture below). For details on the setup and how the light direction vector was determined refer to Wu and Tang [1].

**Uniform Re-sampling**

The data which is collected from above setup is unevenly scattered on the light direction (see figure below). This will produce undesirable biases to the results. This is because the raw dataset contains larger number of images for some light directions and very few images with some other light directions. To resolve this situation, it is suggested to uniformly sample surface of a hemi-sphere and interpolate the images to illuminate the objects along these new light directions. For details on how this is performed see section 4.2 of Wu and Tang [1]. In our code, this is implemented in “resamplingDataSet.m”.

**Estimation of Surface Normal for each Pixel Position**

The object is assumed to be a Lambertian surface. According Lambert’s Cosine law, the observed pixel intensity (I) is related to surface normal (N) and light direction (L) as —

\pho is the constant of proportionality also known as albedo. Note that N and \pho are the same for each corresponding pixel over time. To eliminate \pho, we choose a denominator image (I_d). This image is chosen such that it has minimum number of shadow pixels. The final objective is to determine the surface normals at each pixel location.

Define N = [ x y z ]’, L = [ lx, ly, lz ] and L_d = [ ldx ldy ldz ]. Substitute in the above equation and cross multiply to form a linear equation. Note that x,y,z are the only unknowns here. Each image shall give a linear equation (per pixel basis).

Thus, if we have ‘K’ uniformly sampled images and assuming we choose a denominator image from this set, we have ‘K-1’ (K>3) linear equations with 3 unknowns. This can be represented in matrix notation as

The unknowns (x, y, z) can easily be estimated from this over-determined set of equations using Singular Value Decomposition (SVD). The motive behind using SVD is that it explicitly enforces the unity constraint on the unknown (forces the unknown to be unit vector). The solution turns out to be the right-singular vector of the above matrix ( k-1 x 3 ) corresponding to the smallest singular value. This operation is performed for all pixels, thus we are able to estimate the surface normals at each position. Note that surface normal is a 3-tuple for each pixel.

In our code, this is implemented in “localNormalEstimation.m”.

**Visualization of Surface Normals**

One possible way to visualize the shape of the object from surface normals is using shape-from-shapelets. Implementation of this is available from Peter Kovesi (link) [2]. In our implementation we make use of Peter Kovesi’s code. It has been tailored to visualize the surface normals and can be found in “visualizeSurfaceNormals.m”. Beware that this script depends on some other files in the folder.

Example surface reconstruction is shown in the results section above. You can download other 3D depth models from the downloads section of this page.

## References

[1] Wu, Tai-Pang, and Chi-Keung Tang. “Dense photometric stereo using a mirror sphere and graph cut.” *Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on*. Vol. 1. IEEE, 2005.

[2] Kovesi, Peter. “Shapelets correlated with surface normals produce surfaces.”*Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on*. Vol. 2. IEEE, 2005.

[3] Jens Ackermann and Michael Goesele (2015), “A Survey of Photometric Stereo Techniques”, Foundations and Trends® in Computer Graphics and Vision: Vol. 9: No. 3-4, pp 149-254. http://dx.doi.org/10.1561/0600000065

One cool idea could be to generate synthetic images with shadows. Can be produced with three.js based on this example http://threejs.org/examples/#webgl_lights_physical. The thing is camera has to be at fixed position and light source be moving to estimate the 3d shape of the object.