By Tobias Cornille on August 21st, 2023

Ever wondered how you can simulate a physical camera in a 3D web app? In this blog post, I’ll show you how it can be done using Three.js, a popular JavaScript library for building 3D web applications, and OpenCV. We’ll start by simulating the pinhole camera model and then we’ll add realistic lens distortion. Specifically, we’re going to take a close look at the two distortion models of OpenCV and replicate them using post-processing shaders.

Having a realistic simulated camera allows you to render a 3D scene on top of an image captured by a real camera. This can for example be used for augmented reality, but also for robotics and autonomous vehicles. That’s because robots and autonomous vehicles often have a combination of 3D sensors (like lidars) and cameras, and visualizing the 3D data on top of the camera images is important to verify the sensor calibration. It can also be very helpful when creating and reviewing 3D annotations, which is why I’ve tackled this issue at Segments.ai.

To test our camera simulation, we’ll use a frame from the nuScenes dataset, placing a 3D point cloud captured by a lidar on top of a camera image. Whether you’re working in robotics/AV, developing visualization tools, working on an AR application, or just interested in computer vision and 3D graphics, this guide will hopefully teach you something new. So let’s get started!

To replicate a camera in 3D, we first need a way of mathematically representing a camera, i.e., a camera model. Fundamentally, a camera maps 3D world points to a 2D image plane. We’re thus looking for a function that takes in a 3D point and outputs a 2D point position (usually defined in pixel coordinates).

A pinhole camera

The simplest camera model is the pinhole camera model. A pinhole camera does not have a lens; light simply enters through a single point (the “pinhole”) and forms an image on the image plane. This type of camera - also known as a *camera obscura* - has been constructed for thousands of years (chances are you’ve made one yourself as a child).

The pinhole model can be represented mathematically as a simple linear transformation if we use homogeneous coordinates. This transformation can be written as a 3 x 4 matrix called the camera matrix . Usually, we split this matrix up into two matrices: a 3 x 3 intrinsic camera matrix, and a 3 x 4 extrinsic matrix. The camera pose, i.e. its position and rotation in the world are encoded in the extrinsic matrix. The intrinsic matrix contains the focal length, pixel size, and image origin of the camera.

- and are the focal lengths in pixels ( for square pixels)
- represents the skew coefficient between the x and the y axis, and is often 0
- and are the (absolute) offsets of the principal point from the top-left corner of the image frame (in pixels)
- is the transformation from world coordinates to camera coordinates. is a rotation matrix, and is a translation vector.
- Because we’re working in homogeneous coordinates, we add an extra column of zeros to and a row of zeros ending with a one to the matrix.

The intrinsic and extrinsic parameters can be estimated by a process known as camera calibration. This typically involves capturing images of a known calibration pattern (e.g. a checkerboard) from different viewpoints. OpenCV includes functions that estimate the intrinsic and extrinsic camera parameters, as well as distortion coefficients (more on those later). Check out this OpenCV tutorial to learn how to calibrate a camera using a checkerboard pattern.

For this example image, the calibration parameters are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

{
"K": [
809.2209905677063, 0, 829.2196003259838,
0, 809.2209905677063, 481.77842384512485,
0, 0, 1
],
"R": [
-0.99994107, -0.00469355, 0.00978885,
-0.00982374, 0.0074685, -0.99992385,
0.00462008, -0.9999611, -0.00751417
],
"T": [-0.00526441, -0.27648432, -0.91085728],
"imageWidth": 1600,
"imageHeight": 900
}

After calibrating the camera, we can now simulate the camera in the browser. Browsers have two major APIs for efficiently rendering 3D content: WebGL and the newer WebGPU. However, these APIs are very low-level, so instead of using them directly, we’ll be using the popular Three.js library.

We’ll start by creating a web page with the image and our 3D app overlaid on it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

<html>
<head>
<title>PinholeCamera</title>
<style>
body {
margin: 0;
overflow: hidden;
}
canvas {
width: 100%;
height: 100%;
position: absolute;
top: 0;
left: 0;
}
img {
width: 100%;
height: 100%;
object-fit: contain;
}
</style>
</head>
<body>
<img src="https://segmentsai-prod.s3.eu-west-2.amazonaws.com/assets/admin-tobias/353346e3-1d10-4343-94d2-95c826755ab9.jpg">
<div id="app"></div>
<script src="src/index.ts"></script>
</body>
</html>

Next, we’ll create the `index.ts`

file, where we’ll set up a basic Three.js scene with the camera we’ll be making and a renderer. By setting the `alpha`

value of the renderer to `true`

, we can see the image under the 3D scene.

We’ll use the `PCDLoader`

from Three.js to load the point cloud. When it’s loaded, we’ll give it a color and add it to the scene.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

import {
WebGLRenderer,
Scene,
Matrix3,
Vector3,
PointsMaterial,
Color,
} from "three";
import calibration from "./calibration.json";
import PinholeCamera from "./PinholeCamera";
import { PCDLoader } from "three/examples/jsm/loaders/PCDLoader";
const { K, R, T, imageWidth, imageHeight } = calibration;
// fromArray reads in column-major order
const matrixK = new Matrix3().fromArray(K).transpose();
const matrixR = new Matrix3().fromArray(R).transpose();
const vectorT = new Vector3().fromArray(T);
const scene = new Scene();
const camera = new PinholeCamera(
matrixK,
matrixR,
vectorT,
imageWidth,
imageHeight,
window.innerWidth / window.innerHeight,
0.1,
1000
);
const loader = new PCDLoader();
loader.load(
"https://segmentsai-prod.s3.eu-west-2.amazonaws.com/assets/admin-tobias/41089c53-efca-4634-a92a-0c4143092374.pcd",
function (points) {
(points.material as PointsMaterial).size = 2;
(points.material as PointsMaterial).color = new Color(0x00ffff);
scene.add(points);
},
function (xhr) {
console.log((xhr.loaded / xhr.total) * 100 + "% loaded");
},
function (e) {
console.error("Error when loading the point cloud", e);
}
);
const renderer = new WebGLRenderer({
alpha: true,
});
renderer.setSize(window.innerWidth, window.innerHeight);
document.body.appendChild(renderer.domElement);
function animate() {
requestAnimationFrame(animate);
renderer.render(scene, camera);
}
animate();

To create our pinhole camera, we’ll start by creating a new class that extends the `PerspectiveCamera`

class from Three.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

export default class PinholeCamera extends PerspectiveCamera {
K: Matrix3;
imageWidth: number;
imageHeight: number;
constructor(
K: Matrix3,
R: Matrix3,
T: Vector3,
imageWidth: number,
imageHeight: number,
aspect: number,
near: number,
far: number
) {
super(45, aspect, near, far);
this.setExtrinsicMatrix(R, T);
this.K = K;
this.imageWidth = imageWidth;
this.imageHeight = imageHeight;
this.updateProjectionMatrix();
}
setExtrinsicMatrix(R: Matrix3, T: Vector3) {
// TODO
}
updateProjectionMatrix() {
// TODO
}
}

When we call the constructor of `PerspectiveCamera`

, we have to pass in a field of view (FOV) value. This value is used by Three.js in the `updateProjectionMatrix`

method, but we’ll override this method and use the focal length from the intrinsic matrix instead, so the initial FOV won’t be used.

We can set the camera pose (position + orientation) based on the extrinsic camera parameters as follows:

1
2
3
4
5
6
7

setExtrinsicMatrix(R, T) {
const rotationMatrix4 = new Matrix4().setFromMatrix3(R);
rotationMatrix4.setPosition(T);
rotationMatrix4.invert();
this.quaternion.setFromRotationMatrix(rotationMatrix4);
this.position.setFromMatrixPosition(rotationMatrix4);
}

Note that we have to invert the extrinsic matrix before using it to set the camera position and heading. This is because represents a transformation from world to camera, and we need the transformation from camera to world (which is equivalent to the position/heading of the camera in world coordinates).

Setting the intrinsic camera parameters is a bit more complicated. Three.js doesn’t use the same intrinsic matrix as we obtained during our camera calibration. Instead, it uses the same matrices as WebGL, and our intrinsic matrix roughly corresponds to the “projection matrix” in WebGL. Luckily for us, Kyle Simek wrote a blog post explaining how to turn an intrinsic matrix into a valid projection matrix.

We’ll use the `glOrtho`

method described in the blog to obtain the perspective matrix. However, we don’t have direct access to OpenGL functions, so we have to reimplement `glOrtho`

in the `makeNdcMatrix`

function. For the `makePerspectiveMatrix`

method, we’ll also make a small change: we do *not* have to negate the third column of the intrinsic matrix, as the camera looks down the positive z-axis in OpenCV.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

function makeNdcMatrix(
left: number,
right: number,
bottom: number,
top: number,
near: number,
far: number
) {
const tx = -(right + left) / (right - left);
const ty = -(top + bottom) / (top - bottom);
const tz = -(far + near) / (far - near);
const ndc = new Matrix4();
// prettier-ignore
ndc.set(
2 / (right - left), 0, 0, tx,
0, 2 / (top - bottom), 0, ty,
0, 0, -2 / (far - near), tz,
0, 0, 0, 1,
);
return ndc;
}
function makePerspectiveMatrix(
s: number,
alpha: number,
beta: number,
x0: number,
y0: number,
near: number,
far: number
) {
const A = near + far;
const B = near * far;
const perspective = new Matrix4();
// prettier-ignore
perspective.set(
alpha, s, x0, 0,
0, beta, y0, 0,
0, 0, -A, B,
0, 0, 1, 0,
);
return perspective;
}

Now we can override the `updateProjectionMatrix`

method of the `PerspectiveCamera`

class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

updateProjectionMatrix() {
if (!this.K) {
return;
}
// column-major order
const fx = this.K.elements[0 + 0 * 3];
const fy = this.K.elements[1 + 1 * 3];
const ox = this.K.elements[0 + 2 * 3];
const oy = this.K.elements[1 + 2 * 3];
const s = this.K.elements[0 + 1 * 3];
const imageAspect = this.imageWidth / this.imageHeight;
const relAspect = this.aspect / imageAspect;
const relAspectFactorX = Math.max(1, relAspect);
const relAspectFactorY = Math.max(1, 1 / relAspect);
const relAspectOffsetX = ((1 - relAspectFactorX) / 2) * this.imageWidth;
const relAspectOffsetY = ((1 - relAspectFactorY) / 2) * this.imageHeight;
const left = relAspectOffsetX;
const right = this.imageWidth - relAspectOffsetX;
const top = relAspectOffsetY;
const bottom = this.imageHeight - relAspectOffsetY;
const persp = makePerspectiveMatrix(s, fx, fy, ox, oy, this.near, this.far);
const ndc = makeNdcMatrix(left, right, bottom, top, this.near, this.far);
const projection = ndc.multiply(persp);
this.projectionMatrix.copy(projection);
this.projectionMatrixInverse.copy(this.projectionMatrix).invert();
}

The `relAspect`

is necessary to account for the difference in aspect ratio between the original camera image and the browser window.

Bringing it all together, we can see the point cloud overlaid on the camera image.

Refresh the browser inside the CodeSandbox if you don't see the point cloud.

An image with fisheye lens distortion

Most lenses camera lenses cause the image to be distorted (except special rectilinear lenses). The distortion can be especially strong when working with fisheye cameras. The simulated pinhole camera does not take this lens distortion into account, so if you would use it on images straight from a camera, the point cloud would not line up perfectly with the image. The images in the nuScenes dataset are rectified, i.e. the lens distortion has been eliminated, which is why the point cloud did line up with the image in the previous section.

You could follow the approach of nuScenes by estimating the lens distortion during camera calibration (e.g. following the OpenCV tutorial mentioned earlier) and then undistorting the image using the distortion coefficients. However, when undistorting a fisheye image, a large part of the image is discarded. Thus, in this section, we’ll show how to simulate lens distortion with the distortion coefficients in Three.js. This way, we can overlay the 3D scene directly on a distorted camera image.

Before we can start writing code, we first need to know how distortion models work and how we can implement them using shaders. In the OpenCV documentation, we can find multiple distortion models. The default camera model) uses the following distortion coefficients:

- for radial distortion
- for tangential distortion
- for thin prism distortion
- to account for tilted image sensors

The lens model with just the distortion coefficients is called the Brown-Conrady or “plumb bob” model, after papers by Brown (1966) and Conrady (1919). This is the most popular distortion model, and it is the first kind of distortion we’ll be replicating in Three.js.

The second distortion model we’ll replicate is the fisheye model described on this page of the OpenCV docs. This model is based on the Kannala-Brandt model, which can model wide-angle lenses better than the Brown-Conrady model. The fisheye camera model has four distortion coefficients: .

To implement lens distortion in Three.js, we’ll write a post-processing shader in GLSL (OpenGL Shader Language). A shader is a function that’s run in parallel for every vertex (= vertex shader) or every pixel (= fragment shader) when rendering a scene. This parallel execution happens on the GPU, which is specifically designed for this sort of computation. Normally, different shaders are used to render objects with different materials in the 3D scene. For our use case, we want to apply the lens distortion shader to the whole rendered 3D scene in a post-processing step.

To emulate lens distortion, we could use either a vertex shader or a fragment shader. The advantage of using a vertex shader is that we can directly use the distortion formulas to determine where each vertex should end up in the distorted image. The downside is that edges between vertices remain straight, while in real life lens distortion would curve them. If you’re working with high-resolution 3D models where each edge is very short, this might not be a problem. If you just want to overlay point clouds on the camera image, this approach also works great (as there are no edges). The following table taken from “Realistic Lens Distortion Rendering” by Lambers et al. contains some further pros and cons:

Vertex shader | Fragment shader | |
---|---|---|

Distortion model completeness | full | limited to radial and tangential |

Prerequisites | finely detailed geometry | none |

Result completeness | full | may have unfilled areas |

Rendered data types | all | limited to interpolatable relocatable data |

Complexity | geometry-dependent | resolution-dependent |

In this tutorial, we’ll be using a fragment (or pixel) shader to emulate lens distortion. The advantage of this approach is that it works regardless of what’s in the 3D scene. We can also overcome the problem of the unfilled area by zooming out the pinhole camera and zooming back in in the shader (see `zoomForDistortionFactor`

later).

Using a fragment shader does make implementing the shader a bit more complex, as we can’t directly use the formulas from the OpenCV docs. To see why, you can imagine applying the shader as looping over an empty image and filling each pixel with a certain color, like in this pseudo-code:

1
2
3
4
5
6

function applyShader(renderedImage)
outputImage = new Image(imageWidth, imageHeight)
for i in [0, imageWidth[
for j in [0, imageHeight[
outputImage[i, j] = distortionShader(i, j, renderedImage)

The purpose of the fragment shader function is thus to output the color of a single pixel, given the previous rendered image as an input. For lens distortion, the previous rendered image is the undistorted 3D scene (i.e. the render we obtained in the Pinhole Camera section), and the output image should be the distorted 3D scene. Thus, for every pixel in the output image, we have to figure out which pixel from the input image ends up there and copy its color. That is, given output coordinates and , we want to find the undistorted coordinates and and take the color at these undistorted coordinates. You can see that this is the opposite of the formulas on the OpenCV page (since they map undistorted coordinates to distorted coordinates).

1
2
3

function distortionShader(i, j, renderedImage)
iPrime, jPrime = calculateUndistortedCoordinates(i, j)
return renderedImage[iPrime, jPrime]

Now we’re ready to write the actual GLSL shaders for the two distortion models introduced earlier. I won’t go over all the details of GLSL. If you’ve never written a shader before, you might want to check out this blog post by Maxime Heckel before continuing, so you’ll have no trouble understanding the code.

As explained in the previous section, we need to find a way to calculate the undistorted coordinates in the shader. For the Brown-Conrady model, we can use formula 2 from the “Realistic Lens Distortion Rendering” paper. This formula is only an approximation and it does not use the distortion coefficient. If you’re interested in a more precise camera simulation, you can use the technique in the next section on fisheye distortion.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

uniform sampler2D tDiffuse;
uniform float uCoefficients[5];
uniform vec2 uPrincipalPoint;
uniform vec2 uFocalLength;
uniform float uImageWidth;
uniform float uImageHeight;
uniform float uRelAspect;
uniform float uZoomForDistortionFactor;
varying vec2 vUv;
void main() {
float relAspectFactorX = max(1.0, uRelAspect);
float relAspectFactorY = max(1.0, 1.0 / uRelAspect);
float relAspectOffsetX = ((1.0 - relAspectFactorX) / 2.0);
float relAspectOffsetY = ((1.0 - relAspectFactorY) / 2.0);
vec2 inputCoordinatesWithAspectOffset = vec2(vUv.x * relAspectFactorX + relAspectOffsetX, vUv.y * relAspectFactorY + relAspectOffsetY);
float k1 = uCoefficients[0];
float k2 = uCoefficients[1];
float p1 = uCoefficients[2];
float p2 = uCoefficients[3];
vec2 imageCoordinates = (inputCoordinatesWithAspectOffset * vec2(uImageWidth, uImageHeight) - uPrincipalPoint) / uFocalLength;
float x = imageCoordinates.x;
float y = imageCoordinates.y;
float r2 = x * x + y * y;
float r4 = r2 * r2;
float invFactor = 1.0 / (4.0 * k1 * r2 + 6.0 * k2 * r4 + 8.0 * p1 * y + 8.0 * p2 * x + 1.0);
float dx = x * (k1 * r2 + k2 * r4) + 2.0 * p1 * x * y + p2 * (r2 + 2.0 * x * x);
float dy = y * (k1 * r2 + k2 * r4) + p1 * (r2 + 2.0 * y * y) + 2.0 * p2 * x * y;
x -= invFactor * dx;
y -= invFactor * dy;
vec2 coordinates = vec2(x, y);
vec2 principalPointOffset = vec2((uImageWidth / 2.0) - uPrincipalPoint.x, (uImageHeight / 2.0) - uPrincipalPoint.y) * (1.0 - uZoomForDistortionFactor);
vec2 outputCoordinates = (coordinates * uFocalLength * uZoomForDistortionFactor + uPrincipalPoint + principalPointOffset) / vec2(uImageWidth, uImageHeight);
vec2 coordinatesWithAspectOffset = vec2((outputCoordinates.x - relAspectOffsetX) / relAspectFactorX, (outputCoordinates.y - relAspectOffsetY) / relAspectFactorY);
gl_FragColor = texture2D(tDiffuse, coordinatesWithAspectOffset);
}

A couple of notes about the shader code:

- The
`vUv`

vector contains the output image coordinates corresponding to`i`

and`j`

in the pseudo-code. The`tDiffuse`

texture corresponds to the`renderedImage`

in the pseudo-code and is automatically set by Three.js. - We again need the
`relAspect`

to account for the difference in aspect ratio between the camera image and the browser window, as do not want our lens distortion to be stretched. - Shaders work with normalized coordinates called “UV coordinates”. However, the distortion formulas work with pixel coordinates, so we need to multiply the coordinates by the image width and height, and divide again at the end.
- We need to account for the
`uZoomForDistortionFactor`

at the end (which is used to avoid unfilled areas in the distorted image). - The
`texture2D`

function is used to look up the color at the undistorted coordinates in the (undistorted) input image.

For fisheye distortion, there is no inverse function that we can evaluate in the shader. Instead, we’ll use a lookup table (LUT). A LUT is a matrix where we can store some pre-calculated values. We’ll store the undistorted coordinates in the LUT. In the shader, we simply have to “look up” the undistorted coordinates using the distorted coordinates as the index.

Hold on, how does that solve our problem? How can we calculate the values of the LUT if we have no inverse distortion formula? The trick is to use the normal distortion formula that maps undistorted points to distorted points. Here’s how we’ll do it:

- Loop over the undistorted image pixels.
- For each pixel, calculate the distorted coordinates using the formula from the OpenCV docs.
- Save the undistorted coordinates in the LUT at the distorted coordinates.

Here’s the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

export interface FisheyeCoefficients {
k1: number;
k2: number;
k3: number;
k4: number;
}
export function computeFisheyeLUT(
intrinsicMatrix: Matrix3,
coefficients: FisheyeCoefficients,
imageWidth: number,
imageHeight: number,
zoomForDistortionFactor: number
) {
const resolutionOfLUT = 256;
const rgbaDistortionLUT = Array.from(
{ length: resolutionOfLUT * resolutionOfLUT * 4 },
() => 0
);
const newIntrinsicMatrixInverse =
computeIntrinsicMatrixInverseWithZoomForDistortion(
intrinsicMatrix,
zoomForDistortionFactor,
imageWidth,
imageHeight
);
const sampleDomainExtension = 0.3;
const minSampleDomain = 0 - sampleDomainExtension;
const maxSampleDomain = 1 + sampleDomainExtension;
const sampleStep = 1 / (resolutionOfLUT * 4);
for (let i = minSampleDomain; i < maxSampleDomain; i += sampleStep) {
for (let j = minSampleDomain; j < maxSampleDomain; j += sampleStep) {
const undistortedCoordinate = { x: i * imageHeight, y: j * imageWidth };
const { x: distortedX, y: distortedY } = distortCoordinateFisheye(
undistortedCoordinate,
intrinsicMatrix,
coefficients,
newIntrinsicMatrixInverse
);
const distortionLUTIndexX = Math.round(
(distortedX / imageWidth) * (resolutionOfLUT - 1)
);
const distortionLUTIndexY = Math.round(
(1 - distortedY / imageHeight) * (resolutionOfLUT - 1)
);
if (
distortionLUTIndexX < 0 ||
distortionLUTIndexX >= resolutionOfLUT ||
distortionLUTIndexY < 0 ||
distortionLUTIndexY >= resolutionOfLUT
) {
continue;
}
const u = j;
const v = 1 - i;
rgbaDistortionLUT[
distortionLUTIndexY * resolutionOfLUT * 4 + distortionLUTIndexX * 4
] = u;
rgbaDistortionLUT[
distortionLUTIndexY * resolutionOfLUT * 4 + distortionLUTIndexX * 4 + 1
] = v;
// Blue and Alpha channels will remain 0.
}
}
const distortionLUTData = new Float32Array(rgbaDistortionLUT);
const distortionLUTTexture = new DataTexture(
distortionLUTData,
resolutionOfLUT,
resolutionOfLUT,
RGBAFormat,
FloatType
);
distortionLUTTexture.minFilter = LinearFilter;
distortionLUTTexture.magFilter = LinearFilter;
distortionLUTTexture.needsUpdate = true;
return distortionLUTTexture;
}

More code notes:

- We do not create a LUT as big as the image, but we use a matrix of 256x256 instead. Increasing the LUT size will improve the accuracy of the distortion simulation, but will also increase the computation time and memory usage.
- We have to take into account the zoom again.
- We extend the sample domain outside of the image dimensions (
`sampleDomainExtension`

) as undistorted points outside of the image can still end up in the distorted image bounds. - We use a
`DataTexture`

to pass the LUT to the shader. This will also give us interpolation for free in the shader.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

interface Coordinate {
x: number;
y: number;
}
function distortCoordinateFisheye(
undistortedCoordinate: Coordinate,
intrinsicMatrix: Matrix3,
coefficients: FisheyeCoefficients,
newIntrinsicMatrixInverse: Matrix3
): Coordinate {
const { x, y } = undistortedCoordinate;
const { k1, k2, k3, k4 } = coefficients;
const fx = intrinsicMatrix.elements[0 + 0 * 3];
const fy = intrinsicMatrix.elements[1 + 1 * 3];
const cx = intrinsicMatrix.elements[0 + 2 * 3];
const cy = intrinsicMatrix.elements[1 + 2 * 3];
const iR = newIntrinsicMatrixInverse;
let distortedX: number, distortedY: number;
const _x =
x * iR.elements[1 * 3 + 0] +
y * iR.elements[0 * 3 + 0] +
iR.elements[2 * 3 + 0];
const _y =
x * iR.elements[1 * 3 + 1] +
y * iR.elements[0 * 3 + 1] +
iR.elements[2 * 3 + 1];
const _w =
x * iR.elements[1 * 3 + 2] +
y * iR.elements[0 * 3 + 2] +
iR.elements[2 * 3 + 2];
if (_w <= 0) {
distortedX = _x > 0 ? -Infinity : Infinity;
distortedY = _y > 0 ? -Infinity : Infinity;
} else {
const r = Math.sqrt(_x * _x + _y * _y);
const theta = Math.atan(r);
const theta2 = theta * theta;
const theta4 = theta2 * theta2;
const theta6 = theta4 * theta2;
const theta8 = theta4 * theta4;
const theta_d =
theta * (1 + k1 * theta2 + k2 * theta4 + k3 * theta6 + k4 * theta8);
const scale = r === 0 ? 1.0 : theta_d / r;
distortedX = fx * _x * scale + cx;
distortedY = fy * _y * scale + cy;
}
return { x: distortedX, y: distortedY };
}

This function is adapted from the `initUndistortRectifyMap`

method in OpenCV. The source code is available here. Note that the intrinsic matrix and the inverted intrinsic matrix are different (i.e. not simply the inverse) from each other. This is because we need to take the `zoomForDistortionFactor`

into account for the latter, as well as a principal point offset. We compute this adjusted inverse intrinsic matrix once since it stays the same throughout the loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

function computeIntrinsicMatrixInverseWithZoomForDistortion(
intrinsicMatrix: Matrix3,
zoomForDistortionFactor: number,
width: number,
height: number
) {
const principalPointOffsetX =
(width / 2 - intrinsicMatrix.elements[0 + 2 * 3]) *
(1 - zoomForDistortionFactor);
const principalPointOffsetY =
(height / 2 - intrinsicMatrix.elements[1 + 2 * 3]) *
(1 - zoomForDistortionFactor);
const newIntrinsicMatrix = [
[
intrinsicMatrix.elements[0 + 0 * 3] * zoomForDistortionFactor,
0,
intrinsicMatrix.elements[0 + 2 * 3] + principalPointOffsetX,
],
[
0,
intrinsicMatrix.elements[1 + 1 * 3] * zoomForDistortionFactor,
intrinsicMatrix.elements[1 + 2 * 3] + principalPointOffsetY,
],
[0, 0, 1],
];
const newIntrinsicMatrixInverse = new Matrix3()
.fromArray(newIntrinsicMatrix.flat())
.transpose()
.invert();
return newIntrinsicMatrixInverse;
}

Finally, we can implement the fisheye distortion shader itself. This one is very easy since it just has to look up the undistorted coordinate in the LUT. However, the same normalization as in the Brown-Conrady shader is necessary.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

uniform sampler2D tDiffuse;
uniform sampler2D uDistortionLUT;
uniform float uRelAspect;
varying vec2 vUv;
void main() {
float relAspectFactorX = max(1.0, uRelAspect);
float relAspectFactorY = max(1.0, 1.0 / uRelAspect);
float relAspectOffsetX = ((1.0 - relAspectFactorX) / 2.0);
float relAspectOffsetY = ((1.0 - relAspectFactorY) / 2.0);
vec2 inputCoordinatesWithAspectOffset = vec2(vUv.x * relAspectFactorX + relAspectOffsetX , vUv.y * relAspectFactorY + relAspectOffsetY);
// discard pixels on the edge to avoid streaking
float threshold = 0.001;
if (
inputCoordinatesWithAspectOffset.x <= 0.0 + threshold ||
inputCoordinatesWithAspectOffset.x >= 1.0 - threshold ||
inputCoordinatesWithAspectOffset.y <= 0.0 + threshold ||
inputCoordinatesWithAspectOffset.y >= 1.0 - threshold
) {
// show black overlay
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.4);
return;
}
// look up distortion in LUT
vec2 outputCoordinates = texture2D(uDistortionLUT, inputCoordinatesWithAspectOffset).rg;
if (outputCoordinates.x == 0.0 && outputCoordinates.y == 0.0) {
// show black overlay
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.4);
return;
}
vec2 coordinatesWithAspectOffset = vec2((outputCoordinates.x - relAspectOffsetX) / relAspectFactorX, (outputCoordinates.y - relAspectOffsetY) / relAspectFactorY);
gl_FragColor = texture2D(tDiffuse, coordinatesWithAspectOffset);
}

Two small notes:

- The pixels on the edge of the distorted image will be repeated across the margins left/right or above/below of the image. To avoid streaking effects, we set these border pixels to a black overlay with an opacity of 40%.
- If the value in the LUT is zero, it means it probably wasn’t filled in, so we ignore these pixels and simply return a black overlay.

Now that we have our shaders, it’s time to use them in a post-processing pass. To use the pass, we first render the scene to a “render target” (a buffer), then we apply the pass to this render target, and finally, we render the result to the screen.

To set up this pipeline, we’ll use the `EffectComposer`

from Three.js. Rendering the scene to a render target is achieved by using a `RenderPass`

. We also need to tweak our `animate`

function.

1
2
3
4
5
6
7
8
9
10
11
12
13

...
const composer = new EffectComposer(renderer);
const renderPass = new RenderPass(scene, camera);
composer.addPass(renderPass);
composer.setPixelRatio(1 / zoomForDistortionFactor);
function animate() {
requestAnimationFrame(animate);
composer.render();
}
animate();

Now we need to create a pass for our distortion shader. We can use the `ShaderPass`

from Three.js for that. Afterward, we can pass variables to our custom shader using the `uniforms`

object that exists on the pass.

**Shader Pass Set-Up for Brown-Conrady Distortion**

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

const distortionPass = new ShaderPass(BrownConradyDistortionShader);
distortionPass.uniforms.uCoefficients.value = [
distortionCoefficients.k1,
distortionCoefficients.k2,
distortionCoefficients.p1,
distortionCoefficients.p2,
distortionCoefficients.k3,
];
distortionPass.uniforms.uPrincipalPoint.value = new Vector2(
matrixK.elements[0 + 2 * 3],
matrixK.elements[1 + 2 * 3]
);
distortionPass.uniforms.uFocalLength.value = new Vector2(
matrixK.elements[0 + 0 * 3],
matrixK.elements[1 + 1 * 3]
);
distortionPass.uniforms.uImageWidth.value = imageWidth;
distortionPass.uniforms.uImageHeight.value = imageHeight;
distortionPass.uniforms.uZoomForDistortionFactor.value =
zoomForDistortionFactor;
distortionPass.uniforms.uRelAspect.value =
window.innerWidth / window.innerHeight / (imageWidth / imageHeight);
composer.addPass(distortionPass);

After updating the `calibration.json`

with the Brown-Conrady distortion coefficients and implementing the `zoomForDistortionFactor`

in the `PinholeCamera`

as well, we can now overlay the point cloud on the original undistorted image.

Refresh the browser inside the CodeSandbox if you don't see the point cloud.

**Shader Pass Set-Up for Fisheye Distortion**

1
2
3
4
5
6
7
8
9
10
11
12

const distortionPass = new ShaderPass(FisheyeDistortionShader);
const distortionLUTTexture = computeFisheyeLUT(
matrixK,
distortionCoefficients,
imageWidth,
imageHeight,
zoomForDistortionFactor
);
distortionPass.uniforms.uDistortionLUT.value = distortionLUTTexture;
distortionPass.uniforms.uRelAspect.value =
window.innerWidth / window.innerHeight / (imageWidth / imageHeight);
composer.addPass(distortionPass);

Refresh the browser inside the CodeSandbox if you don't see the point cloud.

In conclusion, simulating real cameras allows us to overlay 3D scenes on top of camera images in a realistic way. In this blog post, we showed you how to simulate the pinhole camera model in Three.js and add realistic lens distortion by implementing OpenCV’s distortion models using post-processing shaders.

At Segments.ai, we’ve integrated these simulated cameras into our multi-sensor data labeling tools. We’ve even gone further and implemented a synced viewer that follows your pointer, as well as a 2D-3D viewer with zooming and panning. If you like working at the intersection of computer vision, computer graphics, and web development, you can always check out our job openings or apply for an internship.

I don't share a lot of web dev work on here, but check out these cool image viewers I made with @threejs!

— Tobias Cornille (@TobiasCornille) March 17, 2023

The first viewer follows your pointer in the 3D scene. The second is locked to one image, and allows you to inspect the image + 3D overlay. pic.twitter.com/2Eqjh6QeGd

Hope you’ve learned something new! Feel free to ask me questions on Twitter (@tobiascornille) or via email.

Share: