By Tobias Cornille on April 25th, 2022
Lidar sensors use laser beams to capture the world in 3D. The sensors output 3D point clouds, which are simply collections of points in 3D. Machine learning models can be used to detect and track objects in these point clouds, or even to classify every single point (segmentation). This enables autonomous vehicles to understand their surroundings, and can also be used to make cities smarter, to create AR/VR applications, and for indoor design/real estate applications.
In this article, we give an overview of 10 public labeled lidar datasets that you can use in your autonomous driving projects. The mentioned datasets contain either 3D bounding box (cuboid) labels or segmentation labels. We’ll also show how you can create your own 3D point cloud dataset, in case the open datasets do not fit your use case or if their licenses are too restrictive (only 2 datasets can be used commercially).
KITTI is a dataset of lidar sequences of street scenes in Karlsruhe, Germany. The dataset was launched in 2012 and different labels have been added over the years.
License: Creative Commons Attribution-NonCommercial-ShareAlike 3.0
Lidar sensor: Velodyne HDL-64E
Tasks
Object detection
7481 labeled frames, 8 categories
Information + download (account required)
Label format
Multi-object tracking
21 labeled sequences, 8 categories
Information + download (account required)
Label format
Semantic + panoptic segmentation (SemanticKITTI)
11 labeled sequences, 28 categories
Information + label format (email required)
Download
nuScenes is a large-scale autonomous driving dataset consisting of urban street scenes captured in Singapore and Boston, U.S.
Download (account required)
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International, or acquire a commercial license
Tasks
Multi-object tracking
850 labeled sequences, 23 categories
Information
Semantic + panoptic segmentation (nuScenes-lidarseg)
850 labeled sequences, 32 categories
Information
Waymo Open is a diverse autonomous driving dataset. It includes scenes captured in 6 U.S. areas in a wide variety of environments and weather conditions.
Dataset information
Download (Google account required)
License: custom non-commercial license
Lidar sensors: 1x mid-range + 4x short-range
Tasks
Multi-object tracking
1200 labeled sequences, 4 categories
Semantic segmentation
1150 labeled sequences, 23 categories
A2D2 stands for Audi Autonomous Driving Dataset (A2D2). The data was captured in 3 German cities.
Dataset information
Download
License: Creative Commons Attribution-NoDerivatives 4.0 International
Lidar sensors: 5x sensor
Tasks
Multi-object tracking
12,499 labeled frames, 14 categories
Semantic segmentation
41,280 labeled frames, 38 categories
Labels obtained from 2D semantic segmentation on camera images
Argoverse 2 is a collection of open-source autonomous driving data from six U.S. cities.
License: Attribution-NonCommercial-ShareAlike 4.0 International
Lidar sensors: 2x Velodyne VLP-32C
Tasks
Multi-object tracking
1000 labeled sequences, 30 categories
Information
Download
ApolloScape is an autonomous driving dataset created by Baidu research. The dataset was collected under various lighting conditions and traffic densities in Beijing, China.
License: academic use only
Lidar sensors: 2x Riegl VMX-1HA
Tasks
Multi-object tracking
53 labeled sequences, 5 categories
Information
Download (Google/Baidu/Github account required)
PandaSet is a high-quality dataset for autonomous driving created by lidar producer Hesai. Its 100+ scenes are selected from two routes in Silicon Valley.
Dataset information
Download (account required)
License: Creative Commons Attribution 4.0 International
Lidar sensors: 1x Pandar64 + 1x PandarGT
Tasks
Multi-object tracking
28 categories
Semantic segmentation
37 categories
WADS is a dataset of 20 scenes for autonomous driving collected in severe winter weather in Michigan, U.S.
Dataset information + download
License: Attribution-NonCommercial-ShareAlike 4.0 International
Tasks
Semantic + panoptic segmentation
22 categories
Seeing Through Fog is a driving dataset part of the DENSE project. The data includes different weather conditions like fog, snow, and rain and was captured in northern Europe.
Dataset information
Download (registration required)
License: custom academic license
Lidar sensor: Velodyne HDL-64E S3
Tasks
Object detection
12000 labeled frames, 28 categories
Toronto-3D is a detailed dataset of 1km of road in Toronto, Canada.
Dataset information
Download
License: Attribution-NonCommercial 4.0 International
Lidar sensor: Teledyne Optech Maverick
Tasks
Semantic segmentation
8 categories
JRDB is a dataset collected by a social robot called JackRabbot. It features sequences from different indoor and outdoor locations on the Stanford University campus. Since the robot’s size is comparable to a human, the data has a different perspective than the other car-based datasets.
Dataset information
Download (account required)
License: Attribution-NonCommercial-ShareAlike 3.0 Unported
Lidar sensors: 2x Velodyne VLP-16
Tasks
Pedestrian tracking
57600 labeled frames
Label format
Every machine learning system needs the right data to perform well. Public datasets can help you experiment quickly, but they often are not suited for training your final models. The ML models might perform worse when you deploy them, because the data in public datasets might be recorded in a different environment (country, weather condition), or because your sensor set-up is different. To avoid this performance loss, we’ll show you how you can create your own 3D point cloud dataset.
Creating a dataset requires three steps:
Data collection
Data collection involves acquiring the right tool to capture new data, e.g. a vehicle with a lidar sensor, and going out and capturing the data. You should capture data in an environment that matches the real production environment as closely as possible.
Data selection/curation
Next, you often have to select which captured data you want to include in the dataset, as it can be infeasible and inefficient to use all captured data. Here, it is important to choose diverse data that covers all the different scenarios you captured. Discarding boring data can speed up labeling and improve model performance.
Data labeling
Finally, you have to label your selected data. For object detection/tracking, this means drawing 3D bounding boxes (cuboids) around the objects you want to detect. For segmentation, you have to annotate the individual points in your point clouds. This can be a tedious and time-consuming process, but with the right tools you can speed up your labeling significantly.
Segments.ai has dedicated labeling interfaces for 3D point cloud data. If you work with sequential data, you can use our interpolation feature to label faster. To speed up your labeling even further, Segments.ai also allows you to set up model-assisted workflows, where you train an initial model on a small set of labeled data, and then use the model to help label the complete dataset. Finally, you can choose whether to label the data in-house or work with an external workforce.
Check out SemanticKITTI on Segments.ai. You can also try the platform for free for 14 days, or book a demo. We’re always happy to see if Segments.ai is the right fit for your use case, so do not hesitate to get in touch.
A showcase of Segments.ai’s lidar interface
Autonomous vehicles use lidar sensors to see the world around them in 3D. To detect objects and understand the scene, we need 3D point cloud datasets. In this article, we highlighted 10 lidar datasets for autonomous driving. The datasets can be used for tasks such as 3D object detection, 3D MOTS, and 3D point cloud segmentation.
If you want to create a machine learning model for a different application, if you want to use different categories, or if your data in production differs from the data in the public datasets, you’ll have to create your own dataset. For this, you need to collect, curate, and label data. For lidar data, Segments.ai is the best tool for labeling your data and managing the labeling workforce.
Hope this was useful! If you have any questions or suggestions, feel free to send us an email at support@segments.ai