After watching the amazing video from Sebastian Lague over on youtube, I decided to finally implement my own Rasterizer instead of "just" using OpenGL to better understand the fundamentals on how pixels come to live; and ofc so I might one day run Doom on a custom Kernel/OS without the need of reimplementing any graphiccard drivers directly!


First things first, I needed to decide on an language. Since I'm currently very much in love with Rust, the coice wasn't hard what I choose here :D

Of vectors and dots

After some very simple bootstraping (honestly just a cargo init), it was time to implement the first structure: a Vector. I decided to use f32 as the basis for all operations, as it should be faily widespread available for when I'm trying to integrate it into anything low-level:

#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Float3 {
	pub x: f32,
    pub y: f32,
    pub z: f32,
}

Add some quick initializers (a new and also zero and one for convienience), and also a macro-based system to generate getter and setter for each component, with optional aliases to r, g and b so it also could act as an color, and you have a basic vector. Next would be ofc some basic operations, such as the dot product, and also the length (or magnitude), as well as some methods for vector normalization. The cherry on top is then some overlads thanks to rust's std::ops module, so Vectors could be multiplied and added together, as well as multiplied and devided by an single f32.

The color buffer

Next would come a basic color buffer! Or atleast that what it will be used for soon. Now it's only called Texture and it's job is to hold pixeldata. Let's get right to it:

pub struct Texture {
    width: u32,
    height: u32,
    data: *mut u8,
}

impl Drop for Texture {
    fn drop(&mut self) {
        let layout = unsafe { Layout::from_size_align_unchecked((self.width * self.height * 4) as usize, 1) };
        unsafe { alloc::alloc::dealloc(self.data, layout) };
    }
}

Note that we'l use an *mut u8 as the data here, which as a couple of reasons: First, since an Texture ideally never or only very rarely changes, the overhead of Vec<u8> is quite noticeable, espc for when we need to fill large areas in, which is most noteable when we try to "clear" the colorbuffer later as it needs to iterate over the whole buffer. Secondly, an u8 is used since we'll still need an way to display what we're actually using and I decided on using Raylib via the same-named crate for this task, and Raylib only accepts a &[u8] for texture updates. This is also the reason we allocate the data region as a multiple of 4, as raylib also wants RGBA textures.

At the start, the rasterizer will not support alpha colors, and I want it to be like OpenGL, so we'll be using our Float3 as an way of expressing an RGB color. For less memory fences on a write, we'll be writing each pixel as a packed 32bit value:

let rgba = color.into_packed_rgba(1.0);
let mut ptr = self.data as *mut u32;
unsafe { ptr = ptr.add( ((y * self.width) + x) as usize ) };
unsafe { ptr.write(rgba) };

Which ofc needs an new function on our vector to pack the color tightly into the values raylib expects. This method will convert the f32 from an range of 0.0 - 1.0 to the correct range of an 8bit value 0 - 255, using a lerp (also called linear interpolation). For simplicity we simply set the alpha channel to always be fully opacue (1.0 / 255). After that, we simply create a u32 from all four u8 via u32::from_ne_bytes.

Like mentioned above, we will need an method to clear the entire buffer in the future, so I'll also threw in a handy fill method that takes a Float3 as argument for which color to clear to / fill in as.

The Heart of it all

Now comes the trickiest part: Rasterization. For starters, I defined a simple Surface type that holds a texture for the color buffer (i.e. where we want to draw to):

pub struct Surface {
    texture: Texture,
}

There are a couple of methods out there how to do this, and I decided on using barycentric coordinates. Essentially, they are 3 values that represent if any given point (x,y,z) is inside the triangle or outside, which is determined by the sign of the value. I also has the added benefit that each of the 3 values recieved also express how "close" our chosen point is to any of the 3 points of a triangle by the values magnitude.

These properties stem from the fact that they represent the area of a smaller triangle between two points / an edge and the point we're searching for.

Calculating them is rather simple: all we need is an so-called "edge function":

pub fn edge_function(a: Float3, b: Float3, p: Float3) -> f32 {
    (b.x - a.x) * (p.y - a.y) - (b.y - a.y) * (p.x - a.x)
}

All we need to do with it, is to calculcate it for each pair of points (also called an edge (shocking i know!!)), and some point to check; since we want to draw a triangle on an grid, we can simply create an point for every position in the grid:

const COLOR_RED: Float3 = Float3::new(1.0, 0.0, 0.0);

for y in 0..self.texture.height() {
    for x in 0..self.texture.width() {
    	let p = Float3::new(x as f32, y as f32, 0.0);
    	let u = edge_function(point1, point2, p);
        let v = edge_function(point2, point0, p);
        let w = edge_function(point0, point1, p);

        if (u >= 0.0 && v >= 0.0 && w >= 0.0) {
        	self.texture.set_pixel(x, y, COLOR_RED);
        }
    }
}

And voilá, we have an very basic, verrry crude way of rasterizing triangles!

For convinience, I optet to pass an custom struct as reference to the rendering method, which also could hold a color:

pub struct Triangle {
    pub pos: [vector::Float3; 3],
    pub color: vector::Float3,
}

Glue code

Whats left is some gluecode (to get Raylib up and running) and also an example. I figured a good way was to re-implement the first stresstest from the video of Sebastian, where he draw 250 triangles moving in random directions.

So I used the rand crate to generate a Vec<Triangle> with 250 elements with random coordinates (except for z, which was just a static value), and a random color. Additionally I also generate a same-sized vec of velocities for both the x and y axis in the range -5.0 .. 5.0.

The raylib code was also relatively easy, just create a texture from an image

let tex = {
	let image = Image::gen_image_color(width, height, Color::BLACK);
	rl.load_texture_from_image(&thread, &image).unwrap()
};

and in the main loop we first call our rasterizer, and then use the data as a slice to update the texture:

tex.update_texture(&pixel_data).unwrap();

Throw in some drawing code (and an FPS counter)...

let mut d = self.rl.begin_drawing(&self.thread);
d.clear_background(Color::WHITE);
d.draw_texture(&self.tex, 0, 0, Color::WHITE);
d.draw_fps(5, 5);

... and we're done! Let's run it :D

Well, atleast there are SOME triangles present :D But an FPS of 1 isn't going to cut it (in reality it's even less but the raylib code cannot handle anything less it seems). Running it in release mode changes this to around 5 FPS consistently. So let's start optimizing shall we?

Optimizing it

Backfaces

The first check we can do is for backfaces; our rasterizer currently only works correctly if the points of a triangle are in a clockwise order. Try for example these both triangles:

let mut tri_cw = Triangle::new();
tri_cw.pos[0] = Float3::new(  20.0, 700.0, 1.0); // Bottom Left
tri_cw.pos[1] = Float3::new( 640.0,  20.0, 1.0); // Top
tri_cw.pos[2] = Float3::new(1260.0, 700.0, 1.0); // Bottom Right
tri_cw.color = Float3::new(1.0, 0.0, 0.0);

let mut tri_ccw = Triangle::new();
tri_ccw.pos[0] = Float3::new(1260.0, 700.0, 1.0); // Bottom Right
tri_ccw.pos[1] = Float3::new( 640.0,  20.0, 1.0); // Top
tri_ccw.pos[2] = Float3::new(  20.0, 700.0, 1.0); // Bottom Left
tri_ccw.color = Float3::new(1.0, 0.0, 0.0);

You will notice that the counter-clockwise (tri_ccw) will not render at all! To fix this we can utilize another property about our edge function: it can also use it with all three points to detect if we're clockwise or not:

let area = edge_function(tri.pos[0], tri.pos[1], tri.pos[2]);
if area <= 0.0 {
	// Is counter-clockwise so we cull the backface.
	return;
}

Note: you can do the same check in your triangle generation, and just reverse the position array if they are counter-clockwise!

Ofc this dosnt help us for 250 front faceing triangles, so lets continue!

Bounding boxes

We are currently iterating over the whole pixelgrid for each triangle we want to render; Thats 921.600 unique positions! Thats a lot, espc if you consider that we're doing it 250 times for each frame.

To help with this we can create a bounding box (also sometimes called AABB) around each triangle to only ever get the range of posible coordinates the triangle will be inside of:

pub struct AABB {
    pub min_x: f32,
    pub max_x: f32,
    pub min_y: f32,
    pub max_y: f32,
}

To fill it in, we simply use the combined minimum and maximum of the x and y coordinate for each of the 3 points of a triangle: min_x: f32::min(self.pos[0].x, f32::min(self.pos[1].x, self.pos[2].x)),. We can then change our for headers to use aabb.min_y as u32..aabb.max_y as u32 (and similarily for x) to massively reduce the amount of iterations done.

A keen eye will have spotted now a new problem we introduced with this change: If we just would let it run, it would crash when the first triangle reaches the border of the window. Which makes sense: the AABB just makes a box around the our coodrinates, which can be either negative or above the size of our color buffer! The solution is to add clamping (also called clipping) to our AABB to restrict all values:

  • min_x and max_x to the range 0.0 .. width and,
  • min_y and max_y to 0.0 .. height

AABB

Woha! 5 FPS in debug, and a freacking 100 in release!! But we can push it just a little bit further...

Incremental edge-functions

The most costly calculation we're doing right now is the edge function, it's fairly complex to calculcate on each pixel. Even if it's now only a few hundred times per triangle.

Let's look at it again:

(b.x - a.x) * (p.y - a.y) - (b.y - a.y) * (p.x - a.x)
If we look closely, the only thing changing in each iteration is p.y and p.x! Everything else stays the same. We can utilize this to calculate a base value and only updating the parts on each loop that actually changed.

To do this we first pick the lowest point to calculate all three coordinates as usual; for this we'll use the min_x and min_y in our AABB:

let minp = Float3::new(aabb.min_x, aabb.min_y, 0.0);
let mut e1 = edge_function(tri.pos[1], tri.pos[2], minp);
let mut e2 = edge_function(tri.pos[2], tri.pos[0], minp);
let mut e3 = edge_function(tri.pos[0], tri.pos[1], minp);

We then calculate the "steps" we need to take on each iteration:

let A01 = tri.pos[0].y - tri.pos[1].y; let B01 = tri.pos[1].x - tri.pos[0].x;
let A12 = tri.pos[1].y - tri.pos[2].y; let B12 = tri.pos[2].x - tri.pos[1].x;
let A20 = tri.pos[2].y - tri.pos[0].y; let B20 = tri.pos[0].x - tri.pos[2].x;

Disclaimer: I got these from another great article here, since when I tried it din't worked for some reason.

We now need to redesign our loop quite drastically:

for y in ... {
    let mut e1_row = e1;
    let mut e2_row = e2;
    let mut e3_row = e3;
    for x in ... {
        if e1_row >= 0.0 && e2_row >= 0.0 && e3_row >= 0.0 {
            ...
        }
        e1_row += A12;
        e2_row += A20;
        e3_row += A01;
    }
    e1 += B12;
    e2 += B20;
    e3 += B01;
}

If we run that now, we're looking at 10FPS debug, and something like 170 FPS release (you might need to change the velocity here; a range of -1.0 .. 1.0 makes it a lot better).

Conclusion & Goals

Rasterization is hard but not impossible. It took me personally just 1-2 days to get this to work, and that even with some basic shadeing in form of color interpolation! (stay tuned for the next articel in this series to see that)

The next steps include talking about a lot of things, like vertex attribute interpolation, the depth buffer, how we can interpolate the Z coordinate, perspective correction and so on. And ofc about how we can write an OpenGL compatible API for it so applications can use "consume" it.