Matrix Palette Skinning Using Vertex Shader 1.1
Created: Oct 9, ‘04
Last update: Dec 3, ‘06
Matrix Palette Skinning Using Vertex Shader 1.1
Download Example Source [Includes C++ code snippets and .fx files]
This paper discusses how to create skeletal animation systems that utlizes the CPU for bone transformations and interpolation. Vertex transformations will take place on the GPU, which results in a huge speed gain. To see this stuff in action you can d/l the latest version of the GODZ Engine Demo and view the benchmarks (make sure the UseHWSkinning is set to true in the SkeletalMeshes.lua config file). This tutorial builds
upon the foundation we laid in the Software Skinning tutorial. We will actually use this exact same code. The only difference is that we can now move all of our vertex transformation code to the GPU.
Let’s take a moment before we get to the action and clearly define how Matrix Palette Skinning is different from Indexed Vertex Blending. They both achieve the same thing and the HLSL code are nearly identical. However, the big difference between Matrix Palette Skinning (MPS) and Indexed Vertex Blending (IVS) is that any card that supports Vertex Shader (VS) 1.1 supports Matrix Palette skinning. However, Indexed Vertex Blending is not supported on all cards (not even geforce 6800 line) but it is supported by the fixed function pipeline in DirectX 8+.
The first structure that I will present is BlendVertex. This is our basic building block for our skeletal models.
static const int MAX_MATRIX_INDEX=4;
//skinned vertex
struct BlendVertex
{
float x,y,z; //position
float nx,ny,nz; //normals
float u,v; //texture coords
float weights[MAX_MATRIX_INDEX]; //weights
float matrixIndicies[MAX_MATRIX_INDEX]; //bones
};
Every skin vertex has a weight associated with each bone it’s influenced by. So, if a vertex is only influenenced by 1 bone, then weight[0] will be 1.0. matrixIndicies[0] will point to the matrix that deforms it. All the other components of these arrays are initialized to zero. Keep in mind within the shader 1.1 standard if statements are not directly supported. They are emulated by the card. Setting a weight to zero will null out an operation, so this way we can keep things simple within our shader and avoid using a compare instruction. In case you were wondering, there isn’t an FVF code for a skinned vertex. You have to create your own Vertex Declaration. Below is the DirectX 9 representation of this declaration:
//skinning declaration for the vertex shader
static const D3DVERTEXELEMENT9 DVE_MATRIX_PALETTE[] =
{
{ 0, 0, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_POSITION, 0 },
{ 0, 12, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_NORMAL, 0 },
{ 0, 24, D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 0 },
{ 0, 32, D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 1 },
{ 0, 48, D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 2 },
D3DDECL_END()
};
Like I said before, we can use all the code from software skinning and perform all of our vertex transformations on the GPU via HLSL, Microsoft’s high level shading language. Do you remember the Bone::Transform method used in the previous article? We just need to update this method to only transform the bones. We no longer need to transform the vertices on the CPU. This is a significant speed gain because we reduce the amount of information we need to send from the CPU to the GPU. Additionally, this frees up the CPU to do other things.
void Bone::Transform(float dt, AnimController *animControl)
{
animControl->Update(this, dt);
if (parentIndex > -1)
{
Bone* parent = skMesh->GetBone(parentIndex);
m_final = m_offset * parent->m_final;
}
else
{
}
//update child transformations
vector ::iterator childIter;
for(childIter=children.begin();childIter!=children.end();childIter++)
{
udword childIndex = (*childIter);
//update child transformations
Bone* child = skMesh->GetBone(childIndex);
child->Transform(dt, animControl);
}
}
Another important step is setting up the initial vertex position. Vertex Shaders work a bit differently than the software solution presented in my Software Skinning article. If you step through a pass, you’ll notice that the vertex always stays at the initial frame. Transformations done on the GPU are not retained. To illustrate this, let’s say the vertex was located at 0,25,-75 in frame 200. In the shader, we transform the vertex using the bone matrices. In frame 201, the vertex is once again positioned at 0,25,-75. This is radically different from software skinning, in which we always have to spend time restoring the intial location of the vertex.
So, this simplifies our task. Now all we need to do is use a vertex position that only needs 1 matrix to transform it to it’s final position. One thing someone might conclude is that they simply set the vertex to a position relative to all the bones that deform it
by using a method similar to the code below:
void SkeletalMesh::TransformVerts()
{
//Save relative transformation
size_t size = duplicates.size();
MeshInstance* mi = this->GetMeshInstance(0);
SkelMeshInstance *skMesh = SafeCast<SkelMeshInstance>(mi);
ModelResource *model = skMesh->GetModelResource(0);
for(int i=0;i<size;i++)
{
VertexDuplication *vd = duplicates[i];
//Transform the vertex relative to the bone
size_t num = vd->indices.size();
WVector tempPos, tempNorm;
if(num > 0)
{
BlendVertex *bv =
(BlendVertex*)model->GetVertex(vd->indices[0]);
for(int k=0;k<MAX_MATRIX_INDEX;k++)
{
if (bv->weights[k] < 0.000001f)
continue;
Bone *bone =
skMesh->GetBone(bv->matrixIndicies[k]);
WVector pos(vd->pos.x,vd->pos.y,vd->pos.z);
WVector norm(vd->normal.x,vd->normal.y,
vd->normal.z);
bone->m_init.InverseTranslateVect(&pos.x);
bone->m_init.RotateVect(&pos.x);
bone->m_init.RotateVect(&norm.x);
tempPos += pos * bv->weights[k];
tempNorm += norm * bv->weights[k];
} //loop weights
vd->pos.x = tempPos.x;
vd->pos.y = tempPos.y;
vd->pos.z = tempPos.z;
vd->normal.x = tempNorm.x;
vd->normal.y = tempNorm.y;
vd->normal.z = tempNorm.z;
//loop through duplicates
for(udword k=0;k {
BlendVertex *bv = (BlendVertex*)
model->GetVertex(vd->indices[k]);
bv->x=tempPos.x;
bv->y=tempPos.y;
bv->z=tempPos.z;
bv->nx=tempNorm.x;
bv->ny=tempNorm.y;
bv->nz=tempNorm.z;
}
} //loop actual verts
else
{
//huh? why no indicies????
_asm nop;
}
}
}
The code takes the vertex that has an absolute position from the initial pose of the character and it transforms it relative to all the bones that influences the vert. This logic works just fine for meshes that have only 1 bone but for more complicated meshes that have verts referencing multiple bones the above code can cause some distortions in the mesh such as areas that have a lot of influences caving in, etc.
The correct method is to simply use the verts from the initial pose (aka reference pose). When we upload the bones to the GPU, we can then take the final position of the bone and multiply it by the inverse of the bones initial transform. This will transform the vertex to it’s position relative to the bone and then transform it to it’s final position all in one operation. The following code sample demonstrates this concept (taken from D3DXSkinMeshShader.cpp that is included in the example zip file):
for (udword i=0;i<size;i++)
{
Bone* bone = skMesh->GetBone(i);
WMatrix mat = bone->m_init.Inverse() * bone->m_final;
m_pBones[i] = &mat._11;
}
The code for matrix palette skinning is pretty straight forward. However, there is a catch. There is a limit on the number of bones we can render. Normally this max is around 60 or more bones (this is limited by the numer of constants allowed on the card). You can get around this by using quaternions for rotation I suppose. However, what I do is simply upload the entire transformation matrix. Below I will present a solution to this however there are many workarounds. You can always fallback to CPU Skinning if you can’t fit everything into the constants. The good thing about falling back to CPU is that you are no longer bound by restrictions such as number of bones (for instance the BlendVertex structure above defines a max number of 4 bones per vertex).
Another solution, if shader standard 3.0 or greater is available, attempt to pass all the matrices through a texture. Since in VS 3.0 you can access a texture in a vertex shader. This is something I haven’t tried personally.
The solution below presented in this article assumes the worst case scenario, that we
can only upload bout 60 bones to the card and you’d rather just chop up the mesh. For meshes that are under this number they will be throttled to the card in one shot. Any models that contain more than this number will be split up into groups of bones that will require multiple passes through the shader. So, if you have a mesh that contains 93
bones, this method will make multiple passes through the shader. This technique will
first divide the mesh by material id. Then it proceeds to further break down
those sections according to the MAX_BONE constant. Below I will present
the structures required for these operations and explain each in detail:
struct FaceInfo
{
WORD index[3];
};
//bone remap
struct BoneMap
{
//index within the matrix array
udword m_nOrigBoneIndex;
udword m_nNewIndex; //vertex shader
};
A bone map simply maps a bone to it’s remapped index. Let’s say the mesh contains 93 bones. Only 60 can fit within a pass. However, the vertices in the mesh all point to the bones going from 0 to 93. What we do is that when we split the mesh into groups we remap the vertices contained within this group to use the bone map to tell the vert which matrix it really points to during a pass. This is the major hurdle we have to overcome with this solution, making sure we know which group a vert belongs to and remap that structure to point to the right bone within a pass.
//Stores a section of bones and related faces and vertices. We need
//a new vertex buffer for each section because sections can not share
//verts (bone matrices indicies change)
struct SkinSection
{
IDirect3DIndexBuffer9* m_IB;//index buffer
IDirect3DVertexBuffer9* m_VB;
udword m_nMinVertex;//min vertex index (unused)
udword m_nNumVertices;
udword m_nNumIndicies; //needed to get primitive count
std::vector m_pMap; //bone remapping
std::vector m_pFaces; //temp data - stores faces
std::vector m_pVerts;
void AddBone(int boneIndex);
bool ContainsOrigBoneIndex(int boneIndex);
BoneMap* GetMap(int boneIndex);
size_t GetNumBones();
void RemapVertex(BlendVertex* bv);
int FindVertex(BlendVertex *bv);
SkinSection()
{
m_IB=0;
m_VB=0;
}
~SkinSection()
{
if (m_IB)
{
m_IB->Release();
m_IB=0;
}
if (m_VB)
{
m_VB->Release();
m_VB=0;
}
releaseVector<BlendVertex>(m_pVerts);
}
};
//for each material we have a meshChunk
struct MeshChunk
{
std::vector<SkinSection*> m_pSections;
SkinSection* m_pCurr;
// returns the map for a bone
// [out] section - sections that stores the map
BoneMap* GetBone(int boneIndex, SkinSection **section);
//returns a section that will store the
//required amount of bones
SkinSection* GetNextSection();
SkinSection* GetSection(std::vector<int> bones);
MeshChunk()
{
m_pCurr=0;
}
~MeshChunk()
{
releaseVector<SkinSection>(m_pSections);
}
};
Above I just presented two structures: SkinSection and MeshChunk. For every material in the mesh we have a corresponding MeshChunk. This structure divides the mesh into material regions. So if a mesh uses 3 materials, there are 3 mesh chunks. Every mesh chunk will contain at least one SkinSection, in which stores a bone map for every bone displayed in the group. We also store every vertex that we will need to display.
Below we review the algorithm to create the mesh chunks, which is the heart of this article. Keeping in mind all of the structures I have just shown you are only used for meshes that contain bones over the limit (MAX_BONES).
- Iterate through all of the faces
- For every triangle in the mesh, determine if the bones used by
the vertices are
already appended to the current
section we are building. - If all of the bones fit, we append this vertex to the section and all of the
bones it requires - Create Bone maps for all of the new bones getting added.
- However, if the vertex has more bones than we can
squeeze into the section, we proceed to create
a new section and append this to the mesh chunk. - After we have appended the vertex to a section, remap the vertex matrixIndicies
array to point to the m_nNewIndex member of the corresponding bone map (this is
found by inquiring the skin section for the matching bone map related to the
vertex).
As you can see, the algorithm is fairly simple and straight forward. After we have constructed all of mesh chunks and it’s sections, we can begin to build the index and vertex buffers. I have included code demonstrating this entire process. However, it is code used by the GODZ Engine. So you will have to alter the code regions to work for you.
References:
Great article. Nicely informative. Thanks.
I was totally thinking of this as a n^(max wieght on a vert) problem. while it is, this approach saves time and generally should work out without to much rendant sets depending on the asset. thanks!
Nice article.
thanks a lot.