How to push your pc to it's limit with cardboard
07 July 2018
Recently went through all my old projects and was reminded about an old project I did. It was a stop-motion made for my high school where my team decided to only use cardboard. It was an interesting limitation. But that made me think about how interesting it would be to have a whole game only in cardboard. bringing my mind to the original limitation of that stop motion project: Lumino City.
Where the obvious art is the fact that it was all created in the real world. But that doesn’t hold me back from trying to create a shader that mimics the effect. I don’t expect the shader to be cheap; It’s a nice experiment, nonetheless.
Also got in the mood to play minecraft again and anything you are holding is a png texture with a given depth.
These things together are the reason why I decided to make this shader.
My goals for this shader are the following:
While I know that I’ll have to push multiple points eventually. I’ll try to keep that count low.
Second, because I’ll base everything from a png texture I want to make it possible to use a piece of a sprite sheet or a whole sprite. I’ll try to do as mush as possible on the gpu and therefore don’t expect great fps with high quality images.
Setting vertex limit
One of the more important things was knowing how big I could make the textures, knowing the output is limited to 1024 floats. Because I want to make a shelled version of any texture provided, the worst shape possible would be a checkerboard.
This would mean that every provided pixel creates 6 sided, or 24 vertices. All with a position (4 floats) a texture coordinate (2 floats) and a normal (3 floats).
So every vertex is 9 floats, making one pixel able to create 216 vertices. Meaning I could only push 4.7 pixels at a time or a 2x4 sprite.
This isn’t a lot. At this point I started exploring ways to up that amount. One thing I discovered was geometry instancing, the other was realizing that I could push multiple passes. This also meant that I could calculate normals in separate passes, removing 3 floats per vertex and removing 20 vertices per pixel in a geometry shader run, resulting in the possibility to push a 6x7 or 170 vertices.
This does mean that everything that comes after this would get calculated 6 times.
Using instances would mean only being able to use 2x4 texture, but that for every geometry instance. The biggest sprite possible in those 32 instances is 6x5. Giving me a 48x20 texture per point.
At first I didn’t want to get into geometry instances, so I started with the passes. When done, I decide to also try to make it work with instances. From here on out I’ll be using the instanced one as that also fixed some errors I was having and didn’t fix in the old shader.
When selecting a draw rect size it gets subdivided multiple times. Ones into multiple points in the material and ones in every geometry instance.
While the draw rect can be any size. A point gets outputed for ever 48x20 or less. Then in every instance it first check if this instance is working outside of the draw rect. The same happens ones we loop over the pixels to check what pixels are outside of our draw rect.
We force our shape into a 1x1 scale, to get this we take the object size and divide it by our draw rect and take the smallest value. this makes sure that they are uniform quads.
While I also use a texture size I don’t take the smallest value, because this would mean our texture is forced to be a square and that’s something that’s not needed necessarily.
For the biggest part our texCoords if a float4. We use it as the bounds of our pixel.
// texCoord is a 'Rect' type // xy = xMin yMin // zw = xMax yMax float4 texCoord; texCoord.x = x / gTextureSize.x; texCoord.y = y / gTextureSize.y; texCoord.z = texCoord.x + texCoordSize.x; texCoord.w = texCoord.y + texCoordSize.y;
When tracing all our pixels of our instance rect. I first check if the pixel is outside of the needed draw rect. if so we ignore it.
next we check if our current pixel has an alpha value below our cut off point. This gets done in
AreaAlphaToLow, The witch will be explained later.
If the point is valid then it check the surrounding pixels to know what sides it need to draw. The
GetPixelBorders function executes
AreaAlphaToLow for the immediate surrounding pixels (top, bottom, left, right). and returns a int4 used as a bool for each side.
After that it draw every allowed side.
This is where most of the heavy lifting happens so returning out of here as soon as possible is always a good idea. With that in mind, I always sample the center of the current pixel first. If it’s not too low we return, otherwise we continue with the function.
Next I check if there is any smoothing happening. when smoothing it also checks surrounding pixels to find a valid point, giving it a more cut out feeling.
The higher the
gSmoothing the more neigboring pixels we check. This gets gpu heavy really quickly as this also gets done when we get the pixel borders. In a real world application this should be preprossessed for every pixel and added as a seperate texture. But this isn’t a real world application so lets just continue. 🙃
Next we loop over every pixel in our smoothing range. To optimize this a bit I sample 4 pixels at a time. This is done by using our top left point of a pixel instead of the middle and using a linear sampler state instead of a point.
If we find a single pixel that has a high enough alpha value than the current pixel is considered valid.
Creating cube sides
To avoid redundancy, I have a single function that creates every side. This is possible because when using flat planes you always only work in 2 directions (XY, XZ, YZ).
Because if this it is possible to combine all that in a single position function. and passing its x and y position not as a single float, but as a float2 and only filling one value according to the above schema.
Also, some of my underlining maths act strange at times. This more often than not shows up with high smoothing. I haven’t found what is causing it.
Its exceptionally heavy on gpu. A 4x2, full alpha, no smoothing already takes 20% of my gtx1070.
The obligatory teapot 152x116 gets me at 40%. With smoothing at 10 it goes up to 90%.
And then the dae logo 843x596 goes up to 99%. and with smoothing at 10 I hit 100% gpu.
Furthermore, I only go below 60fps with the dae logo and smoothing at 4 - at smoothing 10 I hit 13 fps.
A friend of mine who was so nice to let me fry his GTX 960M, hit 1fps when all to the max.
Basically, this is unnecessarily heavy.
This idea would maybe become practical when using a simplified outline of the texture. so that there isn’t 1-6 quads for every pixel. Or for low pixel counts.