Ever wondered how you can simulate a physical camera in a 3D web app? In this blog post, I’ll show you how it can be done using Three.js, a popular JavaScript library for building 3D web applications, and OpenCV. We’ll start by simulating the pinhole camera model and then we’ll add realistic lens distortion. Specifically, we’re going to take a close look at the two distortion models of OpenCV and replicate them using post-processing shaders.

Having a realistic simulated camera allows you to render a 3D scene on top of an image captured by a real camera. This can for example be used for augmented reality, but also for robotics and autonomous vehicles. That’s because robots and autonomous vehicles often have a combination of 3D sensors (like lidars) and cameras, and visualizing the 3D data on top of the camera images is important to verify the sensor calibration. It can also be very helpful when creating and reviewing 3D annotations, which is why I’ve tackled this issue at Segments.ai.

To test our camera simulation, we’ll use a frame from the nuScenes dataset, placing a 3D point cloud captured by a lidar on top of a camera image. Whether you’re working in robotics/AV, developing visualization tools, working on an AR application, or just interested in computer vision and 3D graphics, this guide will hopefully teach you something new. So let’s get started!

## The Pinhole Camera Model

To replicate a camera in 3D, we first need a way of mathematically representing a camera, i.e., a camera model. Fundamentally, a camera maps 3D world points to a 2D image plane. We’re thus looking for a function that takes in a 3D point $[x y z]$ and outputs a 2D point position $[u v]$ (usually defined in pixel coordinates).

The simplest camera model is the pinhole camera model. A pinhole camera does not have a lens; light simply enters through a single point (the “pinhole”) and forms an image on the image plane. This type of camera – also known as a *camera obscura* – has been constructed for thousands of years (chances are you’ve made one yourself as a child).

The pinhole model can be represented mathematically as a simple linear transformation if we use homogeneous coordinates. This transformation can be written as a 3 x 4 matrix called the camera matrix $M$. Usually, we split this matrix up into two matrices: a 3 x 3 intrinsic camera matrix, and a 3 x 4 extrinsic matrix. The camera pose, i.e. its position and rotation in the world are encoded in the extrinsic matrix. The intrinsic matrix contains the focal length, pixel size, and image origin of the camera.

$M = K [R\ T] = \begin{bmatrix} f_x & s & o_x & 0 \\ 0 & f_y & o_y & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\begin{bmatrix}R_{3\times 3}&T_{3\times 1}\\0_{1\times 3}&1\end{bmatrix}$

- $f_x$ and $f_y$ are the focal lengths in pixels ($f_x = f_y$ for square pixels)
- $s$ represents the skew coefficient between the x and the y axis, and is often 0
- $o_x$ and $o_y$ are the (absolute) offsets of the principal point from the top-left corner of the image frame (in pixels)
- $[R T]$ is the transformation from world coordinates to camera coordinates. $R$ is a rotation matrix, and $T$ is a translation vector.
- Because we’re working in homogeneous coordinates, we add an extra column of zeros to $K$ and a row of zeros ending with a one to the $[R T]$ matrix.

The intrinsic and extrinsic parameters can be estimated by a process known as camera calibration. This typically involves capturing images of a known calibration pattern (e.g. a checkerboard) from different viewpoints. OpenCV includes functions that estimate the intrinsic and extrinsic camera parameters, as well as distortion coefficients (more on those later). Check out this OpenCV tutorial to learn how to calibrate a camera using a checkerboard pattern.

For this example image, the calibration parameters are:

## The Pinhole Camera Model in Three.js

After calibrating the camera, we can now simulate the camera in the browser. Browsers have two major APIs for efficiently rendering 3D content: WebGL and the newer WebGPU. However, these APIs are very low-level, so instead of using them directly, we’ll be using the popular Three.js library.

We’ll start by creating a web page with the image and our 3D app overlaid on it:

Next, we’ll create the *index.ts *file, where we’ll set up a basic Three.js scene with the camera we’ll be making and a renderer. By setting the *alpha* value of the renderer to *true*, we can see the image under the 3D scene.

We’ll use the *PCDLoader* from Three.js to load the point cloud. When it’s loaded, we’ll give it a color and add it to the scene.

To create our pinhole camera, we’ll start by creating a new class that extends the *PerspectiveCamera* class from Three.js:

When we call the constructor of *PerspectiveCamera*, we have to pass in a field of view (FOV) value. This value is used by Three.js in the *updateProjectionMatrix* method, but we’ll override this method and use the focal length from the intrinsic matrix instead, so the initial FOV won’t be used.

### Setting the Extrinsic Camera Parameters

We can set the camera pose (position + orientation) based on the extrinsic camera parameters as follows:

Note that we have to invert the extrinsic matrix before using it to set the camera position and heading. This is because $[R T]$ represents a transformation from world to camera, and we need the transformation from camera to world (which is equivalent to the position/heading of the camera in world coordinates).

### Setting the Intrinsic Camera Parameters

Setting the intrinsic camera parameters is a bit more complicated. Three.js doesn’t use the same intrinsic matrix as we obtained during our camera calibration. Instead, it uses the same matrices as WebGL, and our intrinsic matrix roughly corresponds to the “projection matrix” in WebGL. Luckily for us, Kyle Simek wrote a blog post explaining how to turn an intrinsic matrix into a valid projection matrix.

We’ll use the *glOrtho* method described in the blog to obtain the perspective matrix. However, we don’t have direct access to OpenGL functions, so we have to reimplement *glOrtho* in the *makeNdcMatrix* function. For the *makePerspectiveMatrix* method, we’ll also make a small change: we do not have to negate the third column of the intrinsic matrix, as the camera looks down the positive z-axis in OpenCV.

Now we can override the *updateProjectionMatrix* method of the *PerspectiveCamera* class.

The *relAspect* is necessary to account for the difference in aspect ratio between the original camera image and the browser window.

Bringing it all together, we can see the point cloud overlaid on the camera image.

Refresh the browser inside the CodeSandbox if you don’t see the point cloud.

## Lens distortion

Most lenses camera lenses cause the image to be distorted (except special rectilinear lenses). The distortion can be especially strong when working with fisheye cameras. The simulated pinhole camera does not take this lens distortion into account, so if you would use it on images straight from a camera, the point cloud would not line up perfectly with the image. The images in the nuScenes dataset are rectified, i.e. the lens distortion has been eliminated, which is why the point cloud did line up with the image in the previous section.

You could follow the approach of nuScenes by estimating the lens distortion during camera calibration (e.g. following the OpenCV tutorial mentioned earlier) and then undistorting the image using the distortion coefficients. However, when undistorting a fisheye image, a large part of the image is discarded. Thus, in this section, we’ll show how to simulate lens distortion with the distortion coefficients in Three.js. This way, we can overlay the 3D scene directly on a distorted camera image.

### Lens distortion using shaders

Before we can start writing code, we first need to know how distortion models work and how we can implement them using shaders. In the OpenCV documentation, we can find multiple distortion models. The default camera model uses the following distortion coefficients:

- $k_1, dots, k_6$ for radial distortion
- $p_1, p_2$ for tangential distortion
- $s_1, dots, s_4$ for thin prism distortion
- $tau_x, tau_y$ to account for tilted image sensors

The lens model with just the $k_1, k_2, p_1, p_2, k_3$, distortion coefficients is called the Brown-Conrady or “plumb bob” model, after papers by Brown (1966) and Conrady (1919). This is the most popular distortion model, and it is the first kind of distortion we’ll be replicating in Three.js.

The second distortion model we’ll replicate is the fisheye model described on this page of the OpenCV docs. This model is based on the Kannala-Brandt model, which can model wide-angle lenses better than the Brown-Conrady model. The fisheye camera model has four distortion coefficients: $k_1, dots, k_4$.

To implement lens distortion in Three.js, we’ll write a post-processing shader in GLSL (OpenGL Shader Language). A shader is a function that’s run in parallel for every vertex (= vertex shader) or every pixel (= fragment shader) when rendering a scene. This parallel execution happens on the GPU, which is specifically designed for this sort of computation. Normally, different shaders are used to render objects with different materials in the 3D scene. For our use case, we want to apply the lens distortion shader to the whole rendered 3D scene in a post-processing step.

To emulate lens distortion, we could use either a vertex shader or a fragment shader. The advantage of using a vertex shader is that we can directly use the distortion formulas to determine where each vertex should end up in the distorted image. The downside is that edges between vertices remain straight, while in real life lens distortion would curve them. If you’re working with high-resolution 3D models where each edge is very short, this might not be a problem. If you just want to overlay point clouds on the camera image, this approach also works great (as there are no edges). The following table taken from “Realistic Lens Distortion Rendering” by Lambers et al. contains some further pros and cons:

Vertex shader | Fragment shader | |
---|---|---|

Distortion model completeness | full | limited to radial and tangential |

Prerequisites | finely detailed geometry | none |

Result completeness | full | may have unfilled areas |

Rendered data types | all | limited to interpolatable relocatable data |

Complexity | geometry-dependent | resolution-dependent |

In this tutorial, we’ll be using a fragment (or pixel) shader to emulate lens distortion. The advantage of this approach is that it works regardless of what’s in the 3D scene. We can also overcome the problem of the unfilled area by zooming out the pinhole camera and zooming back in in the shader (see zoomForDistortionFactor later).

Using a fragment shader does make implementing the shader a bit more complex, as we can’t directly use the formulas from the OpenCV docs. To see why, you can imagine applying the shader as looping over an empty image and filling each pixel with a certain color, like in this pseudo-code:

The purpose of the fragment shader function is thus to output the color of a single pixel, given the previous rendered image as an input. For lens distortion, the previous rendered image is the undistorted 3D scene (i.e. the render we obtained in the Pinhole Camera section), and the output image should be the distorted 3D scene. Thus, for every pixel in the output image, we have to figure out which pixel from the input image ends up there and copy its color. That is, given output coordinates $i$ and $j$, we want to find the undistorted coordinates $i’$ and $j’$ and take the color at these undistorted coordinates. You can see that this is the opposite of the formulas on the OpenCV page (since they map undistorted coordinates to distorted coordinates).

Now we’re ready to write the actual GLSL shaders for the two distortion models introduced earlier. I won’t go over all the details of GLSL. If you’ve never written a shader before, you might want to check out this blog post by Maxime Heckel before continuing, so you’ll have no trouble understanding the code.

### Brown-Conrady (plumb bob) distortion

As explained in the previous section, we need to find a way to calculate the undistorted coordinates in the shader. For the Brown-Conrady model, we can use formula 2 from the “Realistic Lens Distortion Rendering” paper. This formula is only an approximation and it does not use the $k_3$ distortion coefficient. If you’re interested in a more precise camera simulation, you can use the technique in the next section on fisheye distortion.

A couple of notes about the shader code:

- The vUv vector contains the output image coordinates corresponding to i and j in the pseudo-code. The tDiffuse texture corresponds to the renderedImage in the pseudo-code and is automatically set by Three.js.
- We again need the relAspect to account for the difference in aspect ratio between the camera image and the browser window, as do not want our lens distortion to be stretched.
- Shaders work with normalized coordinates called “UV coordinates”. However, the distortion formulas work with pixel coordinates, so we need to multiply the coordinates by the image width and height, and divide again at the end.
- We need to account for the uZoomForDistortionFactor at the end (which is used to avoid unfilled areas in the distorted image).
- The texture2D function is used to look up the color at the undistorted coordinates in the (undistorted) input image.

### Fisheye (Kannala-Brandt) distortion

For fisheye distortion, there is no inverse function that we can evaluate in the shader. Instead, we’ll use a lookup table (LUT). A LUT is a matrix where we can store some pre-calculated values. We’ll store the undistorted coordinates in the LUT. In the shader, we simply have to “look up” the undistorted coordinates using the distorted coordinates as the index.

Hold on, how does that solve our problem? How can we calculate the values of the LUT if we have no inverse distortion formula? The trick is to use the normal distortion formula that maps undistorted points to distorted points. Here’s how we’ll do it:

- Loop over the undistorted image pixels.
- For each pixel, calculate the distorted coordinates using the formula from the OpenCV docs.
- Save the undistorted coordinates in the LUT at the distorted coordinates.

Here’s the code:

More code notes:

- We do not create a LUT as big as the image, but we use a matrix of 256×256 instead. Increasing the LUT size will improve the accuracy of the distortion simulation, but will also increase the computation time and memory usage.
- We have to take into account the zoom again.
- We extend the sample domain outside of the image dimensions (sampleDomainExtension) as undistorted points outside of the image can still end up in the distorted image bounds.
- We use a DataTexture to pass the LUT to the shader. This will also give us interpolation for free in the shader.

This function is adapted from the initUndistortRectifyMap method in OpenCV. The source code is available here. Note that the intrinsic matrix and the inverted intrinsic matrix are different (i.e. not simply the inverse) from each other. This is because we need to take the zoomForDistortionFactor into account for the latter, as well as a principal point offset. We compute this adjusted inverse intrinsic matrix once since it stays the same throughout the loop.

Two small notes:

- The pixels on the edge of the distorted image will be repeated across the margins left/right or above/below of the image. To avoid streaking effects, we set these border pixels to a black overlay with an opacity of 40%.
- If the value in the LUT is zero, it means it probably wasn’t filled in, so we ignore these pixels and simply return a black overlay.

### Implementing a post-processing shader in Three.js

Now that we have our shaders, it’s time to use them in a post-processing pass. To use the pass, we first render the scene to a “render target” (a buffer), then we apply the pass to this render target, and finally, we render the result to the screen.

To set up this pipeline, we’ll use the EffectComposer from Three.js. Rendering the scene to a render target is achieved by using a RenderPass. We also need to tweak our animate function.

Now we need to create a pass for our distortion shader. We can use the ShaderPass from Three.js for that. Afterward, we can pass variables to our custom shader using the uniforms object that exists on the pass.

#### Shader Pass Set-Up for Brown-Conrady Distortion

After updating the calibration.json with the Brown-Conrady distortion coefficients and implementing the zoomForDistortionFactor in the PinholeCamera as well, we can now overlay the point cloud on the original undistorted image.

#### Shader Pass Set-Up for Fisheye Distortion

## Conclusion

In conclusion, simulating real cameras allows us to overlay 3D scenes on top of camera images in a realistic way. In this blog post, we showed you how to simulate the pinhole camera model in Three.js and add realistic lens distortion by implementing OpenCV’s distortion models using post-processing shaders.

At Segments.ai, we’ve integrated these simulated cameras into our multi-sensor data labeling tools. We’ve even gone further and implemented a synced viewer that follows your pointer, as well as a 2D-3D viewer with zooming and panning. If you like working at the intersection of computer vision, computer graphics, and web development, you can always check out our job openings or apply for an internship.

I don't share a lot of web dev work on here, but check out these cool image viewers I made with @threejs!

— Tobias Cornille (@TobiasCornille) March 17, 2023

The first viewer follows your pointer in the 3D scene. The second is locked to one image, and allows you to inspect the image + 3D overlay. pic.twitter.com/2Eqjh6QeGd

Hope you’ve learned something new! Feel free to ask me questions on Twitter (@tobiascornille) or via email.