Mastering Point Clouds: A Complete Guide to Lidar Data Annotation

9 min read -
Avatar photo
- May 2nd, 2024 -

Point clouds are collections of data points in a three-dimensional coordinate system. They are used in the automotive, construction, and robotics industries to capture detailed information about objects and environments.
Lidar sensors are commonly used to generate those point clouds. These sensors use laser light to measure distances and analyze the resulting reflections. This data then creates highly accurate 3D representations of the surroundings. Other sensors, such as radar and sonar, also generate point clouds.

Working with point clouds, however, comes with its own set of challenges:
The amount of data points can be overwhelming.
When a point cloud is sparse, it can be hard to recognize an object.
Noise and outliers can make analysis difficult.
Precise annotation requires specialized tools and expertise.
To overcome these challenges, we need effective storage, preprocessing, and annotation solutions.

Multi-sensor - point cloudMulti-sensor - image labeled

Working with point clouds, however, comes with its own set of challenges:

  • The amount of data points can be overwhelming.
  • When a point cloud is sparse, it can be hard to recognize an object.
  • Noise and outliers can make analysis difficult.
  • Precise annotation requires specialized tools and expertise.

To overcome these challenges, we need effective storage, preprocessing, and annotation solutions.

Understanding Point Clouds

Diagram of a Cartesian coordinate system with axes x, y, and z. Point P(x, y, z) and vector r from origin O to P are shown.A 3D point cloud is a collection of data points defined in a three-dimensional coordinate system. Each point represents a specific location in space, characterized by its Cartesian X, Y, and Z coordinates. These points collectively depict the external surface of an object or scene, providing a detailed spatial representation.

Lidars, radars, or stereo camera pairs generate a 3D view of the surrounding environment. The data can come from a single sensor or multiple sensors combined into a single point cloud.

A point cloud can contain additional values, such as object intensity or RGB values, that offer color information, enriching the point cloud.

Multi-Sensor Fusion for Additional Context

image from ChatGPT: image depicting a person walking through an airport terminal, pulling a suitcase on one side and holding a child's hand on the other.

Multi-sensor fusion techniques combine information from multiple sensors, such as radar, cameras, and GPS, to provide a more comprehensive understanding of the environment. This results in more accurate and comprehensive datasets.

One example is the data annotation of the traffic at an airport terminal. When you see a human-shaped point cloud with a smaller blob of points on its left side, this could be a suitcase or a kid. In a picture, this is instantly visible. But in a point cloud, this could be impossible to realize.

Combining camera images with lidar data improves object detection accuracy and provides richer contextual information for machine learning models.

Working with Different Point Cloud Formats

Point cloud data is stored in various formats, each with unique characteristics and specific use cases. In robotics and autonomous navigation, 90% of point clouds are stored in .pcd files.

See all supported file formats.

Choosing the Right Format

  • Compatibility: Ensure the chosen format is compatible with your annotation tools. For instance, formats like PCD are well-supported in open-source libraries like PCL.
  • Data Complexity: For tasks requiring extensive metadata or high precision, consider using LAS or FLS formats.
  • Large Scale: LAS files are most interesting in mapping applications due to their ability to store large datasets efficiently.

Choosing the suitable point cloud format can impact the efficiency of your lidar data annotation process. To ensure consistency across sensors during annotation, you might find it helpful to refer to labeling guides and templates provided by Segments.ai.

An open file format developed by the Point Cloud Library (PCL). It supports many point cloud processing tasks, making it popular in research and development.

A simple format that lists points using x, y, and z coordinates. Often used for quick visualization or initial data inspection.

Similar to XYZ, ASC files store point cloud data as plain text. They are straightforward but may require more processing power due to their uncompressed nature.

Supports both ASCII and binary encoding. PLY files can store additional attributes like color and normals, providing more detailed representations of objects.

A widely-used binary format for lidar data. LAS files are especially prevalent in geospatial applications due to their ability to store large datasets efficiently.
The LAS file format is currently only recommended for huge point clouds (e.g., merged maps) that cannot be tiled otherwise.

Another text-based format that stores point cloud data in a structured manner, often used for exchanging data between different software platforms.

While primarily a 3D graphics format, OBJ files can store point clouds alongside mesh information, making them versatile for various 3D modeling applications.

A flexible and compact binary format designed for 3D imaging systems. E57 files are excellent for long-term storage and interoperability across different platforms.

Storing and sharing multi-sensor data

For recording and replaying multi-sensor data in robotics, ROSBags have been the traditional choice, especially with ROS 1. However, a newer format called MCAP (Message Capture) by Foxglove is gaining traction. MCAP offers several advantages for multi-sensor data management.

MCAP is designed specifically for efficient storage and access of data streams from various sensors. It acts like a container, allowing you to store different data types (point clouds, images, sensor readings) within a single file. MCAP also supports different message formats for broader compatibility with various tools. Compared to ROSBags (especially with SQLite storage in ROS 2), MCAP offers faster data access, even over remote connections. This makes MCAP a compelling choice for new robotics projects or users transitioning to ROS 2, especially when dealing with complex multi-sensor datasets.

Read more on MCAP vs Ros here

Preprocessing point cloud data is important in ensuring its quality and usability. High-quality data leads to more accurate annotations and, subsequently, better-performing models. Key preprocessing techniques include noise removal and outlier detection.

Noise Removal

Raw point cloud data often contains noise caused by various environmental factors or sensor inaccuracies. Noise can distort the spatial representation, leading to errors in annotation. Techniques such asstatistical outlier removalor radius outlier removal can be employed to eliminate these inconsistencies.

Downsampling

Downsampling is a technique for reducing the number of points in a point cloud. It helps manage data size while maintaining significant detail. If you upload a dataset of millions of points to a data annotation platform, loading each frame into the browser might take a long time.

Another solution for large point clouds is tiling. Think of Google Maps. When you zoom in on the map, the tiles load progressively, allowing for a smoother user experience. Similarly, you can divide your point cloud into smaller tiles or chunks, making it easier to manage and process. This way, you only load the necessary tiles for annotation or analysis, significantly reducing computational resources and time.

Coordinate Transformation

Transforming coordinates to a common reference frame is important to ensure consistency across different datasets. This transformation process typically involves translation, rotation, and scaling.

In various scientific and technological fields, mainly computer graphics and robotics, 3D transformations are essential mathematical operations. These operations encompass three primary actions: translation, rotation, and scaling. Each action serves a specific purpose in moving, rotating, or resizing an object within three-dimensional space.

For instance, consider modeling a robot driving around a scene. This can be represented as a 3D translation and rotation. Driving on the road corresponds to a translation within the XY plane while making a U-turn, which can be mathematically described as a rotation of 180 degrees around the z-axis.

Read more about 3D transformations.

Camera calibration

Camera extrinsic parameters, which determine the camera’s position and orientation in space, often involve 3D transformations. These transformations are employed during camera calibration to establish correspondences between the 3D world and the 2D image. We can derive camera perspectives by converting the coordinates of points from the 3D world to the 2D image plane.

Ego Poses

Ego pose refers to the precise position and orientation of a sensor, such as a camera or lidar, in relation to a fixed reference point. In the context of self-driving cars, the ego pose is commonly used to represent the car’s exact location and orientation. The accurate estimation of ego pose is essential for navigation and for calculating and continuously updating the pose.

A practical application of ego pose estimation is the ability to merge point clouds captured by a driving car. By knowing the car’s ego pose, it is possible to merge point clouds captured by the lidar sensor. This capability allows for enhanced analysis and understanding of the surrounding environment.

Filtering

Filtering is a preprocessing step in lidar data annotation that helps to remove unwanted points from a point cloud dataset. By applying specific criteria, such as height or intensity, filtering allows us to focus on relevant features and remove noise or outliers that could affect the annotation process’s accuracy.

Filtering can also enhance the visibility of specific features within the point cloud. By setting criteria that highlight specific characteristics, such as height thresholds or intensity ranges, we can emphasize features of interest while de-emphasizing less important areas. This selective highlighting makes it easier for annotators to focus on the relevant information during the annotation process.

Filtering helps reduce the computational load by reducing the size of the dataset. By removing unnecessary points, we can streamline subsequent processing steps, such as feature extraction or object detection algorithms. This improves overall efficiency and reduces processing time.

Annotating Point Cloud Data for ML

Lidar data annotation is the process of converting raw point cloud data into organized information. Annotated lidar data serves as the benchmark for training machine learning models, enabling them to identify and respond to different objects and obstacles in real-world situations. This is also often called the ground truth data.

Two primary techniques are widely used in this field: segmentation and object detection.

Segmentation

Segmentation involves classifying each point in a point cloud into predefined categories. This process allows for a detailed understanding of the environment by labeling points as belonging to specific objects like cars, pedestrians, roads, and buildings. Here are some key aspects:

  • Granularity: Each point is individually classified, providing high-resolution information about the scene.
  • Applications: Very detailed tasks, such as picking weed or fruit in an orchard.
  • Challenges: Requires substantial computational resources and precise manual labeling to ensure accuracy.

Object Detection

Object detection focuses on identifying and locating objects within the point cloud data. Unlike semantic segmentation, it groups points to form bounding boxes around objects of interest. Key considerations include:

  • Bounding Boxes: 3D bounding boxes encapsulate detected objects, providing spatial information about their location and dimensions.
  • Use Cases: Essential for collision avoidance systems, warehouse inventory management, and robotic navigation.
  • Complexity: Handling overlapping objects and varying point densities can be challenging.

Tools and Software Solutions

To streamline the annotation process, several software solutions offer advanced features:

  • Segments.ai: Provides tools for efficient labeling, including support for semantic segmentation and object detection.
  • Deepen specializes in data labeling services and multi-sensor calibration.
  • Kognic provides enterprises with a flexible toolset for sensor-fusion annotation.

Leveraging annotation technologies enables annotators to achieve greater efficiency and precision in their work. Additionally, improved image viewers offered by certain software solutions can further augment the annotation process by providing enhanced visualization capabilities.

Read more on the top 8 labeling tools in 2024 for point cloud annotations.

Model assisted labeling

Object tracking, segmentation, and trajectory labeling in lidar data annotation are time-consuming and complex tasks. However, leveraging model predictions can significantly accelerate the point cloud annotation process, optimizing workflow efficiency and precision.

Using pre-trained models, you can leverage their ability to predict object classes, positions, and dimensions within the point cloud data. This allows annotators to focus their efforts on refining these predictions rather than starting from scratch.

The use of model predictions reduces the amount of manual annotation required, saving valuable time and resources. Annotators can quickly validate or correct the predicted labels instead of annotating every object from scratch.

There are three types of model-assisted labeling setups:

Bar chart showing the evolution of machine learning in labeling tools. The chart starts with ML-powered labeling tools, followed by zero-shot model-assisted labeling, domain-specific model assisted labeling, and peaks at customer-specific model assisted labeling.

Zero-shot model-assisted labeling

This method involves utilizing pre-existing models trained on large amounts of data but not specifically for similar tasks. One example is a zero-shot implementation of SAM. While zero-shot models can yield satisfactory results, they often lack the necessary qualitative output for successful zero-shot model-assisted labeling workflows. Taskers typically spend more time correcting the predictions than starting from scratch.

The domain-specific model-assisted labeling setup

Instead of general models, the models derive knowledge from the particular domain or use case. For instance, you can use public automotive object detection models to pre-label your data, even if your data comes from different hardware or has slightly different labeling guidelines.

Customer-specific model-assisted labeling setup

In this approach, you utilize your own models trained on your dataset, with your own ontology and labeling guidelines. You apply the models to new data and direct the annotation workforce to correct the pre-labels. This method offers the highest ROI as human efforts in the loop are most effective, but it requires a large labeled dataset and a high-performing model.

Public data & models

For those interested in exploring real-world applications of these preprocessing techniques, resources like these lidar driving datasets and state-of-the-art 3D point cloud models for autonomous driving.

Ensuring Quality Assurance in Point Cloud Annotation

Balancing speed and efficiency versus quality and accuracy

It is essential to have a clear definition of “quality.” Once the desired level of quality is determined, efforts should be made to achieve and assess the output against this standard.

While it is useful to include specific metrics, relying solely on an overall accuracy percentage may not accurately represent accuracy in the context of street scenery. Accuracy encompasses factors like annotation tightness and categorization precision.

Although starting with quality seems obvious, many companies prioritize the price and quantity of annotations or frames, emphasizing throughput over quality. Requests for Proposals (RFPs) and Requests for Quotations (RFQs) often inquire about labeling turnaround time and cost comparisons between annotating different quantities of frames before even discussing the actual labeling requirements.

Establish Clear and Consistent Labeling Guidelines

Develop and follow strict labeling guidelines to maintain uniformity across the dataset. For instance, define specific criteria for what constitutes an object boundary or how to handle overlapping objects.

Download your free labeling specs guide with pre-filled defaults.

Importance of Rigorous Quality Control Measures

Reliable annotations are the baseline for training effective perception models. Implementing rigorous quality control measures ensures that the labeled data meets the high standards necessary for model training. This includes regular audits, consistency checks, and adherence to predefined guidelines.

Addressing Common Challenges

Maintaining consistency and accuracy throughout the annotation pipeline is challenging due to varying annotator expertise, complex scenes, and evolving guidelines. Continuous training for annotators, updating guidelines regularly based on feedback, and using advanced tools that assist with complex annotations can help address these challenges.

Regular check-ins with all parties involved can have a significant impact. These meetings don’t have to occur weekly and can be conducted monthly or optionally. The goal is to cover topics such as throughput, priorities, and any unique cases the annotators observe.

One aspect that often goes unnoticed is the utilization of features. Segments.ai consistently develops new features based on customer requests, which are then made available to users. However, not all annotators on your team may immediately understand how these features can enhance labeling speed or annotation accuracy. Discussing this with the team can improve the overall quality and efficiency of annotations.