CS585: Image and Video Computing

Active Appearance Models

Lukas Lepicovsky

Problem Definition:

Often in the fields medical image processing and computer vision, the need arises to fit the shape of an object(ie 2d or 3d points) to an image. In the case where the object is rigid active appearance models are not needed,. However when the object is non-rigid such as a heart or human face active appearance models can be used to match a user defined set of points to images using their texture information as the matching criteria.

Method & Implementation:

The Training Set
Suppose that we have some initial estimate for the shape of the object in an image, then in order to estimate a better guess towards what the real shape in the image is, we need to create a model uses the error using the current shape to create a new better fitting shape. In order to build such a model we will use a set of training images and shapes(2d points).

(a sample from the training set, the input image with its shape vector overlayed
courtesy of the IIM Face Database)

The Average Shape Vector and Shape Alignment
Using this dataset we can compute the average shape of the training set,

Xave=sum(Xi)/n where Xi is the ith shape vector of the dataset (xi1,yi1,...xim,yim) and n is the number of training images / shape vectors.

(average shape vector)

In order to minimize the effect of translation, scaling and in plane rotation on our model we align our data using an iterative approach described by Cootes. We record the translation parameters tx,ty and the rotation and scaling parameters s1,s2.

Computing the Shape-Free Texture Vector
Note that we wish to train our model only with points inside the convex hull of the shape vector, so in order to do this we use Delaunay triangulation on the average shape vector Xave to compute the convex hull. We then use the Xave as the position coordinates and Xi as the texture coordinates in opengl texture mapping to compute a shape free texture vector.


(left) A triangulated shape vector and source image
(right) resulting shape free texture vector

The Average Texture Vector and Texture Alignment
After recovering the texture vectors we compute the average texture vector and align the texture vectors to the average allowing scaling and offsets to minimize the effect of illumination, we recover parameters u1, and u2 (as described by Cootes).

(average texture vector)

Principal Component Analysis
Note that in the case of face matching we used 58 pointsfor the shape vector and up to 60,000 values for the texture vector. This is an extremly large number of dimensions to work with so we use principal component analysis(PCA) to reduce the dimentionality of the data.

To perform PCA we first calculate the covariance matrix.

CovMat=sum[(Xi-Xave)*tranpose(Xi-Xave)]/(n-1) where sum ranges over 1 to n(size of training set)

We next perform singular value decomposition on the covariance matrix to recover the the matrices U, E where the columns of U hold the eigen vectors of the covariance matrix and E holds the corresponding eigen values(squared). U and E should be sorted by the eigenvalues by size. Note that when n (size of the training set) is smaller than the size of X then only the first n eigen values will be non zero. Also note that we do not need to keep all n eigen values / eigen vectors, we can select only the amount we need to represent a desired percentage of the data. The percentage of the data of the first n eigen vectors is correlated to the sum of the first n eigenvalues / sum of all eigenvalues.

The Shape Models
Let Px equal the orthonormal eigen vector matrix obtrained applying PCA on the aligned shape vectors. Each of the eigen vectors describes a mode of variation.


(First 3 modes of shape model)

Since the eigen vectors of Px serve as a orthonormal basis for the shape vectors we can rewrite each of the shape vectors as a function of Px and a shape parameter Bx.

Xi~=Xave+PxBxi

Therefore each trainingset shape vector Xi has a corresponding shape model parameter vector Bxi.

Bxi~=Tranpose(Px)*(Xi-Xave)

The Texture Model
Similarly the texture vectors can be decomposed using PCA and we obtain the matrix Pg which serve as orthonormal basis for the space spanned by the texture vectors. These vectors are the modes of variation found in the texture vectors.


(First 3 modes of the texture model)

Let the value nImg represent the size of a texture vector(up to 60000). One might expect Pg to be extremly large, (nImg x nImg). Since the vectors (Gi-Gave) only span a small portion of the nImg dimentional space, the matrix Pg will actually be much smaller since only n eigen vectors will be non-zero. Pg will be of size (nImg x n).

The training set texture vectors can be described as a linear combination of the eigenvectors in Pg,

Gi~=Gave+PgBgi

The texture model parameter Bg solved for as,

Bgi~=tranpose(Pg)*(Gi-Gave)

The Combined Appearance Model
Since there is a correlation between shape and texture models we want to controls both models using only one paramater. An example is when the shape model parameter creates a smile there will likely be a white texture in the mouth area of the texture model parameter.

To create this model we concatinate the shape and texture model parameters into one vector Bi=Tranpose(Bxi|Bgi). We perform PCA one more time on the vectors Bi, which gives the combined apperanace model matrix Pc. Futhermore we have Pc=transpose([Pcx|Pcg]). Therefore each texture and shape can be described by a single appearance model parameter Bc.

Bi=PcCi

Xi~=Xave+PxPcxCi~=Xave+PxBx (shape)

Gi~=Gave+PgPcgCi~=Gave+PgBg (texture)

The vector Ci controls the modes of apperance variations,

In order to obtain the the apperance parameter C for a shape and texture, if we already have the combined parameter B we apply the following,

C=tranpose(Pc)*B


(Image reconstruction using apperance model)
(left) original (right) reconstruction

The Residual Vector
Once we are ready to match a apperance model to an image we need a method to calculate the error between our current estimate of the appearance and the actual appearance. Using the model parameter L=(C,tx,ty,s1,s2,u1,u2) which controls the shape, texture,position, scaling,rotation, and illumination of the model, we can estimate the current model texture vector, Gm=Gave+Pcg*Pg*C. The image texture vector,Gi, can be obtain by sampling the source image at X=Xave+PcxPxC after performing the inverse of (tx,ty,s1,s2) on Xi. Gi must then be scaled and offset by the inverse of(u1,u2). The residualis then R(L)=Gi-Gm. The error is then tranpose(R(L))*R(L)

Iterative Search Algorithm
Our goal is now to set R(L) to the zero vector, meaning that there is no residual. Given a current L we want to choose dL such that R(L+dL)=O, where O is the zero vector. By taking a first order taylor expansion we get

R(L+dL)~= R(L)+(dr/dl)dl=0, the solution is then dl= -D*R(L). Where D is the pseudo-inverse of dr/dl

dr/dp is considered approximetely constant since it is created in a normalized reference frame and is computed only once. It is computed using an offset from training set by displacing the correct L by a small amount and then measuring the disparity.

dri/dlj=sum(weight(cjk)(R(L+cjk)i-R(L)) , sum over all samples and then averaged

The iterative algorithm is first get some initial model paramters, L0. Then calculate the residual R(L0), get the error e0=transpose(R(L0))*R(L0). Use matrix D to calculate to estimate dl, dl=-D*R(L0).

Then calculate the new error1 = transpose(R(L0+k*dl))*R(L0+k*dl), where k=1. If this error is last than error0 then use L0+k*dl as L0, and loop. Otherwize try k=1.5,.75,.5...etc. If none of those k's give a lower error than declare convergance.

This algorithm is powerformed at multiple resolutions and allowed to converge at each resolution before the results, L0 are passed to the next level up.

Slow Search Algorithm
Unfortunetly I was unable to get the above algorithm working correctly, it simply would not decrease the error by a significant amount. I tried playing with the weights, and the disparities cjk, and the constant k. However they didnt solve the problem. I devised a modification to the above algorithm that made it work better, but still not as well as described in the paper.

The slow search algorithm is as follows,

Get initial model parameters L0.
Calculate residual R(L0), and error e0
Get dl= -D*R(L0)
For(i=1;i<=size(dl);i++)
{

Set temp=L0
Set temp(i)=k*dl(i)
calculate R(temp)
if(error(temp)>e0)
then out(i)=0
else out(i)=k*dl(i)

}

Calculate R(L0+out), if error > e0 then try a different k, once all k's have been tried declared convergence.
Otherwize set L0=out+L0 and loop.

Clearly this algorithm is very slow but it is the only way I could get any results.

Experimental Results:

Here are some results of my slow search method, the fast one yielded no convergence so I didnt show any results for that method.

The 2 images where in my training set, the last one was not. The first and 3rd converged to a satisfactory result.

(Top Left) Source Image with initial position of shape vector overlayed
(Top Right) Convergance at 128x128 resolution
(Bottom Left) Convergence at 256x256 resolution
(Bottom Right) Convergence at 512x512 resolution

The slow search method yields somewhat acceptable results, however it is at least 100 times slower than the actual method. To correctly implement active appearance models the correct search method would have to work.

 

Source Code & Excecutable:

The project use trains with the IIM image database (include on submitted cd) .
To compline the opengl, opencv and glut libraries must be installed.

Important Functions
calcMatrices -the main training function, program should be reexecuted with different TRAIN_IMAGE_SIZE resolutions such as 64-128-256-512

aamSearch- searches the AAM_IMAGE using the input P0 as the initial appearance estimate

calcEigenScrambled- calculates the scrabled eigen vectors of a set of arrays Vs

calcResidual- calculates the residual of the current appearance model with the given image

xyTrans - multiplies all points (x,y) in a shape vector by matrix (row1) [a -b x] (row2)[b a y]

renderTriangles - renders the triangle list at the given position using the current uv's

renderTriFlat- renders the triangle list at the given position with no texture( for alpha channel)

drawTexture -draws an image vector model to the gl buffer(for debugging / display only)

calcAlphaBuff- returns an image that is the alpha buffer (255 for pixels inside the convex hull of the shape and 0 for outside)

calcImgVector - calculates the image vector from a position and a source image

alignData- aligns the shape data

alignImages- aligns the image data

Conclusion:

It is very unfortunate that the fast iterative search solution did not work in my project. I wanted to use this program as an input for facial animation for 3d characters but because the search method is so slow it has no chance of being real time.

References / Credit:

Tim Cootes, Statistical Models of Appearance for Computer Vision http://www.wiau.man.ac.uk/~bim/Models/app_models.pdf

IMM Face Database http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=3160

Delaunay Triangulation TriNet, www.pixtopo.com

Special Thanks To:

Magrit Betke for helping with convergance problems

Rui Li for help with debugging and help with some alignment questions

Stan Sclaroff for help with some questions about PCA