|
Chr0n1x
|
 |
« Reply #2 on: 2007/04/16 23:41:47 » |
|
ShaderPerf is a shader profiling tool to help you optimize the shader, it doesnt do it for you.
There isnt really a general way to speed up shaders, it all depends on what the shader does and the capabilities required for the technique. One simple way is to use half precision instead of full float precision, this way you can have 2 variables in every register instead of only one, but it limits the range of values avaliable. There are other ways, like reducing the number of texture calls needed, reducing the number of textures needed.
Also move unnecessary calculations, you dont need to calculate the viewprojection every vertex, so why do it, rather do that once at the beginnning of the scene render call and then pass it in, much faster and you only do the calculations once. As for worldviewproj, do it per model rather than per vertex, that way all you need to do per vertex is multiply the world space position by the worldviewproj which is passed in.
Another more advanced technique is to replace complex math functions with texture lookups. If you have a calculation that doesnt NEED to be calculated every pixel/frame, then do it in a pre-shader and then use that texture to get the values, that way you cut down the cost of the shader and only sacrifice some interactivity, which in some cases isnt needed.
Finally the most simple thing, optimize your algorithms. Take a look through and see what can be changed, what is not required. You might find you are doing something that could be replaced with something much simpler. Screw code readability, this is 3D programming, use comments to make your code readable. To use an analogy: you need to get to work quickly, do you take the long scenic route or the quick and direct route?
Hmm, thats all I can think of now, those are some basic steps you can do to make your shaders run a bit faster, in some cases it will work, in some cases not, all depends on the shader, and thus shader optimization is something you can only do once you know the details of the shader and its technique. If you post your shader we could help in this case, but most general things you can do are covered by the compiler as well, as long as shader optimization is turned on. (Default)
[EDIT] Some small stats on the WorldViewProjection multiplication. For a simple Matrix*Matrix, there are 256 multiplication operations, and 240 addition operations. To get the WVP, we do 2 Matrix*Matrix multiplications, using the result of the first one, as a parameter for the second one. So in total we have 512 multiplication operations and 480 addition operations. Note that these are FLoating point OPerationS. (FLOPS) So they are not as simple as a normal integer multiplication or addition.
Now if we do this per model, thats fine, but if we do this per vertex, with a say, 4000 vertex model, thats 2,048,000 multiplication operations in total for that model, and 1,920,000 addition operations for that model. In total 3,968,000 FLOPS for the model, and this is a rather small model, and only one. Most complex scenes today have many more vertices, even with culling enabled there can still be more than that on screen at once.
(Note that I dont know exactly what the compiler does when optimizing, so it might calculate this once and then store it in a register rather than recalculating every vertex later. Maybe, maybe not.)
|