3D Transformations: What are they used for in data annotation?

5 min read -
Avatar photo
- May 27th, 2024 -

Use of 3D Transformations in 3D Data Annotation

3D transformations play a significant role in 3D data annotation, essential for training high-quality, safe deep learning models.

Camera Calibration

The extrinsic parameters of a camera, which refer to its position and orientation in space, often require 3D transformations. They are used during the camera calibration to establish correspondences between the 3D world and the 2D image. By transforming the coordinates of the points in the 3D world to the 2D image plane, camera perspectives can be derived.

For example, Segments.ai has a feature that projects the point cloud on the corresponding 2D images, making it easier to label your data in 3D.

Try it out here – no account is needed.

Advanced visualization of a city street using 3D LiDAR point cloud data. The scene is overlaid with multiple color-coded grids and bounding boxes indicating object detection and segmentation. Vehicles, trees, and architectural features are highlighted with different colors, demonstrating the application of sensor fusion technology for urban mapping and analysis.

Ego Pose

Ego pose refers to the position and orientation of a sensor (like a camera or LIDAR) with respect to a fixed reference point. In self-driving cars, for example, the ego pose is often used to represent the car’s position and orientation. Accurate ego pose estimation is a crucial prerequisite for navigation, and it greatly depends on 3D transformations, which provide the mathematical groundwork for calculating and updating the pose.

For example, you can merge point clouds captured by a driving car if you know its ego pose. For example, a car drives around and captures point clouds. The point clouds are relative to the LIDAR sensor; the car’s pose is in world coordinates.

Try this for yourself by clicking on the “merged point cloud mode” in the top right of the point cloud.

How to Represent 3D Transformations

A matrix typically represents each 3D transformation. This matrix allows for the unified representation of translation, scaling, rotation, and other transformations.

Note. We make it clear when referring to vectors instead of scalars, such as matrix M and scalar a, by writing the name in capital.

  • Translation. Moving an object in 3D space along the X, Y, and Z axes is achieved by transforming a point from position P1 = (X1, Y1, Z1) to P2 = (X2, Y2, Z2) using translation matrix T. Then P2 = P1 + T.
  • Rotation. Changing the orientation of an object around one or more axes involves rotating about a vector. Angles and axes like X, Y, or Z axes define rotations R. Then P2 = R • P1.
  • Scaling. Resizing an object in 3D space is accomplished by scaling factors along the X, Y, and Z directions. P2 = S • P1.

By utilizing transformation matrices, various 3D transformations can be efficiently represented and applied to objects in three-dimensional space.

Note. Translation and rotation are the core operations (scaling and shearing are not used as often in robotic use cases). For this reason, we will focus on translation and rotation. Often, the X-axis is used as the forward driving direction, the Y-axis points to the vehicle’s left, and the Z-axis points up. We follow this convention in this blog.

Matrix Translations

Translation is the most straightforward 3D transformation operation. It involves moving an object along the x, y, or z axis from one position to another without changing its size or orientation. This operation is represented mathematically using addition operations.

A translation can be described mathematically by adding a matrix to a point.

$ \begin{bmatrix}
x_2 \\
y_2 \\
z_2
\end{bmatrix}
=
\begin{bmatrix}
x_1 + t_x \\
y_1 + t_y \\
z_1 + t_z
\end{bmatrix}
\Leftrightarrow
P_2 = P_1 + T$

where

$P_1 = \begin{bmatrix}
x_1 \\ y_1 \\ z_1
\end{bmatrix}
T = \begin{bmatrix}
t_x \\
t_y \\
t_z \\
\end{bmatrix}
P_2 = \begin{bmatrix}
x_2 \\ y_2 \\ z_2
\end{bmatrix}$

We can use a simple trick to write the translation as a multiplication. This is handy when combining several transformations into one matrix (for more information, see homogenous coordinates).

$\begin{bmatrix}
x_2 \\ y_2 \\ z_2 \\ 1
\end{bmatrix}
=
\begin{bmatrix}
x_1 + t_x \\ y_1 + t_y \\ z_1 + t_z \\ 1
\end{bmatrix}
\Leftrightarrow
P_2 = T P_1$

Where

$\text{Matrix Translation } = \begin{bmatrix}
0 & 0 & 0 & t_x \\
0 & 0 & 0 & t_y \\
0 & 0 & 0 & t_z \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
P_1 = \begin{bmatrix}
x_1 \\ y_1 \\ z_1 \\ 1
\end{bmatrix}
P_2 = \begin{bmatrix}
x_2 \\ y_2 \\ z_2 \\ 1
\end{bmatrix} $

Matrix Rotations

On the other hand, rotation involves changing an object’s orientation around a fixed point or axis. This is typically achieved using trigonometric functions, namely sine and cosine.

When an object makes a U-turn in the XY plane, we can describe this rotation as follows.

$
\begin{bmatrix}
-1 & 0 & 0 \\
0 & -1 & 0 \\
0 & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
x \\
y \\
z \\
\end{bmatrix}
=
\begin{bmatrix}
-x \\
-y \\
z \\
\end{bmatrix}
$

We can apply the same trick as we applied to the translation matrix (use a 4×4 matrix) to make the rotation matrix compatible with a 4×4 translation matrix $T$.

$
\begin{bmatrix}
-1 & 0 & 0 & 0 \\
0 & -1 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix} x \\ y \\ z \\ 1 \\ \end{bmatrix}
=
\begin{bmatrix} -x \\ -y \\ z \\ 1 \\ \end{bmatrix}
$

We can generally describe a rotation $R$ with a 4×4 matrix multiplication.

$\text{Matrix Rotation} =
\begin{bmatrix}
r_{x_1} & r_{y_1} & r_{z_1} & 0 \\
r_{x_2} & r_{y_2} & r_{z_2} & 0 \\
r_{x_3} & r_{y_3} & r_{z_3} & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}$

Euler Angles

Describing a rotation as a matrix multiplication is not the most intuitive way in most cases. Another option is to use so-called Euler angles. They are a sequence of three rotations performed one after the other.

Euler angles, despite their ease of understanding and implementation, suffer from a notorious problem known as Gimbal Lock. This results in losing an entire degree of freedom due to the alignment of two of the rotational axes. This results in an errant behavior in motion control, which is undesirable in many applications, notably in 3D graphics and robotics.

Mathematically, we can represent Euler angles as follows

$
\text{Euler Rotation} = \alpha \beta \gamma
$

where alpha, beta, and gamma are rotations around an axis, applied one after the other.

Educational diagram illustrating the three-dimensional coordinate system with axes labeled X, Y, and Z. The diagram includes three ellipses representing rotation planes for α (alpha), β (beta), and γ (gamma) angles around the respective axes. Each axis is color-coded: X is red, Y is blue, Z is green, enhancing clarity and visual distinction of the coordinate system.

You can choose the order of rotations however you like. One popular order is Yaw → Pitch → Roll.

Diagram showing the axes of rotation for an airplane: Roll Axis, Pitch Axis, and Yaw Axis.

Quaternions

Quaternions are another option next to matrices and Euler angles, and they solve the gimbal lock problem. Represented as a four-component entity, quaternions facilitate encoding rotation operations in 3D space without the risk of gimbal lock. Unlike Euler angles, quaternions use a different mathematical framework for describing rotations, resulting in complex numbers. With this unique attribute, they provide consistent rotational movements, making them common in 3D applications.

Mathematically, you can describe a quaternion using four floating-point numbers: a, b, c, and d.

$\text{Quaternion} = a+bi+cj+dk$

where

$ i^2=j^2=k^2=ijk = -1 $

PyQuaternion is a useful package for working with quaternions in Python. It allows you to quickly obtain the 4×4 transformation matrix representing the rotation and combine it with a 4×4 translation matrix.

Rule of Thumb for Working With 3D Transformations

As you can see, there are several ways to deal with rotations and translations in 3D. In my experience, it is often easiest to work with 4×4 matrices that contain both the translation and the rotation components.

$
\text{Rotation & Translation} = \begin{bmatrix}
r_{x_1} & r_{y_1} & r_{z_1} & t_x \\
r_{x_2} & r_{y_2} & r_{z_2} & t_y \\
r_{x_3} & r_{y_3} & r_{z_3} & t_z \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
$

This representation solves the pain when you want to reverse a 3D transformation. If you use a 4×4 matrix, you can easily take the inverse of this matrix, and you are done. Mathematics ensures that translation and rotation are applied in the correct order.

However, if you do this with a translation matrix and a rotation matrix, you must first undo the translation, followed by the rotation. The rotation represents the orientation of the object in its local reference frame. When dealing with many transformations, taking this order into account quickly becomes a source of bugs and will cost you precious developer time.