How to render perfect pixel art with Apple’s Metal Shading Language
If you create games on any of Apple’s platforms, there’s a decent chance your game is using Metal as the renderer. This not only applies to games you create with a third party engine like Unity or Unreal. It also applies to Apple’s game tools like Sprite Kit and Scene Kit.
Even if you’re nutty enough to create your own game engine like I have, and you’re shipping to one of Apple’s platforms, you will likely be working with Metal as well.
In this article I will show you how to render perfect pixel art that scales by any arbitrary multiplier, using Apple’s Metal shading language as the renderer. You will learn why it is important to scale pixel art when shipping a modern game, and why it isn’t as straightforward as it may seem.
This article will also provide code examples you can use in your own game projects. I’m putting together a sample project so you can see the entire render pipeline for a professional pixel art game, from start to finish. You will see everything in context so there’s no confusion.
I would not classify this article as “beginner”. It is for people who are a little further along in developing their own game and have run into the same problems I have.
Back when I had to solve this problem, I managed to find some great articles addressing it. This article is more or less a port of other peoples’ work, focused on addressing the Metal shading language specifically. I wouldn’t have been able to do this without standing on their shoulders.
Aside from the Metal port, my work isn’t much of anything new. I’m just attempting to explain it to other people who are a little less inclined to thinking directly in terms of mathematical abstractions. It’s more for people who think concretely with specific examples.
Here are those articles, for reference:
- Scaling Pixel Art Without Destroying It (Cole Cecil)
- Manual texture filtering for pixelated games in WebGL
Most modern pixel art games need to scale pixel art by a non-integer multiplier
It’s hard to make pixel art look good. Let me rephrase that. It’s hard to make pixel art look good and make a game with modern design sensibilities.
Modern games look best in fullscreen mode. That’s how you get the most immersive experience. Unfortunately, that also means your game needs to look good in fullscreen on a variety of different devices with different screen sizes (and orientations if you’re making a mobile game).
They didn’t have these concerns back in the heyday of pixel art games. The NES had a single predictable screen size. You could design your sprites to fit inside of that screen size without having to scale them using multiplication.
In today’s world, there is simply no avoiding the need to scale pixel art. If you want a fullscreen experience on a variety of devices, you need to scale your pixel art.
The alternative is to letterbox your game inside of a fixed width and height. You can do that, but I think you will be sacrificing immersiveness and a certain degree of professionalism.
It’s okay for early prototyping, but you probably wouldn’t want to publish a letterboxed game and try to make money on it. Or you could just be a rebel and prove me wrong!
Why is it hard to scale pixel art?
On the surface, it seems like this should be easy. Can’t you just tell the GPU to draw a bigger quad and then have the fragment shader read your texture at the scaled up size? The nearest neighbor filtering option ought to make sure those pixels have hard edges, right?
Not quite. It’s actually way more complicated than that, and for several reasons. To understand why, you need to learn about the different texture filtering algorithms, nearest neighbor and bilinear filtering.
Nearest neighbor and fixed integer multiples
First off, if all you’re doing is scaling pixel art by a fixed integer multiple (2x, 3x, 4x, etc.), you shouldn’t have a problem using the nearest neighbor method.
That’s because the texture-pixel (a.k.a. texel) the shader reads will have a one-to-one correspondence with the texture your game needs to draw on the screen. That is to say, you will never encounter a situation where the source texture and the destination texture aren’t perfectly aligned.
So let’s say you have a source texture which is 3x3 pixels and a destination texture that is 6x6 pixels. Nearest neighbor will always pick the correct pixel matching the source texture.
Nearest neighbor and non-integer multiples
This isn’t the case when you need to multiply the size of your sprite art by a non-integer multiple. When that happens, you run into situations where you get distortions in the resulting image.
If a destination pixel lies on a texel boundary, nearest neighbor will choose the pixel color corresponding to the texel where the pixel position is sampled. This is difficult to understand intuitively, so pay close attention to the following graphic.
When presented in this way, the distortion is readily apparent. You can see how a clean 3x3 graphic has been transformed into a slightly distorted 7x7. Instead of the pattern being two blue, two red like in 6x6 example, there is an alternating pattern of two blue, three red, two blue.
What does nearest neighbor look like at scale?
It’s hard to predict exactly how nearest neighbor will make your pixel art look when scaled up by an arbitrary multiplier. Sometimes the end result is mostly okay, but usually if you look really closely, you can see some funny business like a pixel for a character’s eye being in the wrong place or somewhat incorrectly sized.
Nearest neighbor distortion is especially visible when you’re scaling somewhere in between the 1x to 2x range. There usually aren’t enough pixels at the destination image to prevent game character faces from looking kind of weird and grotesque.
You can get away with nearest neighbor when you’ve got a nice big retina iMac, but you will certainly notice it on older lower resolution screens.
Bilinear texture sampling
If you aren’t using nearest neighbor sampling in your fragment shader, it would seem that your only other option is bilinear sampling. I am guessing you have already tried this and have noticed it has taken your beautiful pixel art and rendered it blurry, like so.
Why the bilinear filter makes pixel art look blurry
When scaling pixel art from a small size to a larger size, you are effectively creating a space between the original texel values. Bilinear filtering fills in this space by blending the texels together on both the x and y axes. It’s no different from the typical gradient blends you are used to.
Close to the original texels, the resulting color is mostly the same as the texel nearest to it. But as you get to the midpoint between texels, the resulting color becomes a blend of both texels.
You can see why this would create a blurry result. Each intermediate pixel value is a constantly changing blend of the four neighboring texels (remember that bilinear filtering works on both x and y axes), and it never quite converges on a crisp and distinct pixel color like nearest neighbor does.
Here’s the same 3x3 to 6x6 diagram from earlier, but this time the resulting pixels have gone through a bilinear blend. Behold how blurry it is!
What you actually want is a combination of the two
As other game developers have stated in their articles, you want a fragment shader that combines the best of nearest neighbor with the best of bilinear filtering. Ideally, it preserves the crisp pixelated quality of the original artwork while carefully blending along a thin edge to avoid the distortions that come with nearest neighbor.
Let’s try to visualize this.
The area inside of the black bordered squares will use the nearest neighbor filter while the spaces between the squares will use bilinear filtering to blend the edges and avoid the hard uneven distortions.
It doesn’t matter how much you scale pixel art when using this approach. The majority of the texel is rendered with nearest neighbor, and blending only occurs along the edges.
Implementing the custom pixel art shader in Metal
If you have spent some time playing around with fragment shaders in Metal (or most graphics libraries for that matter), you will likely have noticed you don’t have many options when it comes to texture sampling filters. Metal only offers bilinear and nearest neighbor.
You can spot the problem. If I just told you that neither of those will work, and Metal doesn’t offer us a third option, how can we implement the custom pixel art shader?
Getting bilinear to act like nearest neighbor for a small region
As it turns out, there is a way to get bilinear filtering to act like nearest neighbor. We just need to modify the texture coordinates to match what the nearest neighbor algorithm does. This is the approach Cláudio Santos takes in the Manual Texture Filtering For Pixelated Games in WebGL article.
If you make it so the bilinear filter consistently samples at the center of each texel whenever the texture coordinate is sufficiently “close” to that texel by a bounding box you specify, the bilinear filter will “blend” the exact same texel location together for several sampled texture coordinates. The end result is effectively the same as doing nearest neighbor for that region.
This is basically UV coordinate hacking, where you tweak the input texture coordinates in such a way as to force the fragment shader to sample at the center of texels instead of sampling in the normal linear fashion.
Working out the math, starting with a specific example
Let’s say you have a source texture that is 16x16 by pixels. Texture coordinates are typically UV mapped in Metal, which means the texture will start at zero in the x/y direction and end at one. The centers of each consecutive texel will then be 0.5/16, 1.5/16, 2.5/16, and so on, up to the end of the texture.
That’s nice, but it can get annoying to work with UV coordinates this way. Since it is implied that the width of the texture will be divided at each point, perhaps we can express each texel step more concisely by factoring the division out.
UV Position(x) = (texel position + 0.5)/texture width
UV Position(y) = (texel position + 0.5)/texture height
That way, texel positions can go from (0, 0) to (texture width, texture height). This mapping is a little more natural to reason about.
If the uv coordinates are now represented as (0, texture width) and (0, texture height), we can start to think about the texture sampling behavior we might want to see as the shader program samples over the entire texture.
As it turns out, the specific texel position isn’t the important part. That is to say, it doesn’t matter if the texture coordinate we are sampling is close to the first or the sixteenth texel. On the order of each individual texel step, the behavior is the same.
What matters is the fractional portion of the uv coordinate after it is scaled by the width or height. To get the center of the first texel, we use texel zero (0) plus a fractional part of (0.5). To get the second, it’s (1) plus a fractional part of (0.5) yet again.
Important observation: at each texel step, the fractional portion of the uv coordinate after scaling by the width or height is 0.5
If we could control texture sampling in such a way as to always return 0.5 as the fractional portion of the scaled uv coordinate, we would get nearest neighbor filtering while having bilinear filtering enabled.
Let’s say X is the fractional portion of the scaled UV coordinate, and we are sampling over the region surrounding the first texel. X prime, then, is the modified fractional component, or the desired result if we want to use bilinear filtering to achieve a “nearest neighbor” like effect.
Adding bilinear filtering at the margins
Of course, we don’t simply want to re-create nearest neighbor. We still want to do some bilinear filtering, just on the margins of the texel. That means we don’t always want to return 0.5 as the fractional protion of the scaled uv coordinate. We want to start at zero, go in a line to 0.5, stay constant through the middle, and then go in a line up to 1.
It looks like this
Here the margins around the texel are represented with the value alpha. This value basically determines how much smoothing we want to do around the edges of each texel.
If alpha is zero, then the function turns back into a constant line, and we get nearest neighbor behavior.
If alpha is 0.5, the function will turn into a straight line with a slope of one, and we will get pure bilinear filtering behavior.
The math equations
The line in the graph above can be modeled as the sum of two clamped linear equations. It is effectively a piecewise equation expressed in terms of x and the alpha parameter.
The first part of the equation is given by
x’ = clamp((x/2a), 0.0, 0.5)
The second part is given by
x’ = clamp(((x-1)/2a) + 0.5, 0.0, 0.5)
The Metal shader will sum these equations together, resulting in
x’ = clamp((x/2a), 0.0, 0.5) + clamp(((x-1)/2a) + 0.5, 0.0, 0.5)
As others have noted in their articles, the alpha value that you pick depends on several factors, most important of which is how much you need to scale your pixel art.
The greater you magnify your source texture, the smaller the alpha value can be. That’s because you have more pixels to work with, meaning there’s less of a chance for nearest neighbor distortions like the ones I pointed out at the beginning of this article.
Similarly, if you don’t magnify your source texture all that much, you will need to use a slightly larger alpha value and therefore use a little more bilinear filtering, since you have fewer onscreen pixels to work with.
Again, I’m just trying to say the same things other authors have already said, just with a slightly different take. I would suggest reading their articles a few times and working through their math to get the idea to sink in. I needed to do that to properly understand what’s going on here.
Now for some Metal shader code
Before we write some shader code, I just want to point out that what you are about to see is my preferred way to do 2D games in Metal. Some of it may be unconventional, but I will walk you through it.
First of all, I like to make it so I can express coordinates in terms of the raw pixel values on the screen, and I prefer to use the bottom left corner as the origin point.
Metal doesn’t do this by default. The x-axis is left to right, but the y-axis is top to bottom, so I needed to make a few small adjustments.
Metal also uses normalized device coordinates by default, however when rendering 2D games, I don’t work with them directly.
I prefer to map the normalized device coordinates to the monitor screen rectangle, meaning a 100x100 rectangle at (0,0) will be 100 pixels wide and tall, starting at the bottom left corner of the screen and going up.
With that out of the way, let’s take a look at the RasterizerData struct we will be using
The first two parameters are pretty standard in most Metal shaders that do 2D game graphics, but the third and fourth are special parameters we need to pass into the fragment shader.
vUv, or the scaled texture coordinates, can be calculated inside of the vertex shader. The texture size, on the other hand, needs to be passed into the vertex shader as an external parameter.
This confused me for quite some time, since the documentation on it is pretty murky. That said, I eventually figured out how to do it using MTLRenderEncoder.
The code I’ve written is Objective-C, but you could just as well use Swift if you prefer.
The key thing to note is the usage of the setVertexBytes: method on MTLRenderEncoder. This is how you pass parameters into the vertex shader.
Plenty of 2D games use 2D texture arrays or texture atlases, and mine are no exception. You usually know the texture width/height from the size you use in the atlas, and you can just hard code that in. Nothing fancy there.
If none of this is familiar to you, I would recommend taking a look at some of Apple’s demo projects to gain some understanding of how they setup rendering pipelines in Metal.
Apple’s examples are a little too OOP Kool-Aid for my tastes, but we don’t have time to go over that in this article and I haven’t yet posted the full source code for a working project that uses this shading technique.
The 2D scaling pixel art vertex shader
In Metal, the vertex shader is just a program to convert from your own coordinate system to normalized device coordinates and to pass in whatever other parameters the fragment shader might need to use.
Here is the vertex shader I use for 2D scaling pixel art
You can see that this shader performs the vUv calculation and then sets it on the out parameter. You also need to remember to copy the texture size parameter, since it will be used in the fragment shader.
The viewport size parameter is something I use when doing 2D games in Metal. You can use it to convert from screen coordinates to the normalized device coordinates the fragment shader needs.
The scaling pixel art fragment shader
Once you have prepared the data for the fragment shader, you can use it to perform the necessary calculations. I will walk you through those so you can get a clearer picture of what they are doing.
That’s it. Aside from figuring out how to pass the texture size into the vertex shader as a parameter, it’s pretty much the same fragment shader used in the Manual Texture Filtering For Pixelated Games in WebGL article.
I am currently putting together an article series comprising some of the things I have learned from making my own 2D game that uses Metal. I’m calling it my Mac Game Engine series.
You can also check out the sample project on Github. It not only shows you all of the code for the 2D pixel art shader, it also shows you how to setup a rendering pipeline from scratch on Mac OS with minimal object oriented garbage and few external dependencies.
My 2D game Cove Kid uses this shader
Check it out. Cove kid is a puzzle platformer with 25 unique challenges, a game where you solve puzzles by editing the level. If you like it, leave a comment.
I am currently revisiting the gameplay and improving the learning curve so more people can get into it. I’m also making the game less linear.
Thanks for reading, and if you found this useful be sure to like and subscribe.