In this article, we’ll look at what panoptic segmentation is, which public datasets exist, and how you can create your own panoptic segmentation dataset.
Panoptic segmentation is the combination of instance segmentation and semantic segmentation. It was first introduced in a 2018 paper by Kirillov et al.
Instance segmentation means detecting and masking each distinct object of interest in a scene. For example, in this image, you can see that each car and each person is detected and has a segmentation mask.
Semantic segmentation is the task of assigning a class label (car, person, vegetation, ...) to each point/pixel in the scene. As you can see in this image, the different cars and persons in the scene now get the same label.
Thus, panoptic segmentation means each point is assigned a class label and an instance label. A point can mean a pixel in a regular 2D image or a point in a 3D point cloud.
In panoptic segmentation, an instance can either represent a distinct thing or a region of stuff. Things are countable objects such as pedestrians, animals, or cars, while stuff represents uncountable amorphous regions such as the sky or grass. For example, in this image, the different cars are labeled as different things and are thus separate instances. Meanwhile, the road is seen as stuff and is thus labeled as a single instance.
Panoptic segmentation labels provide more context than instance segmentation labels and are more detailed than semantic segmentation labels. This makes them very useful for scene understanding applications in ML systems.
Panoptic segmentation can be used for autonomous vehicles (AV), healthcare, smart cities, geoscience, and more. Autonomous vehicles are a great application for panoptic segmentation, as driving data naturally includes distinct objects such as cars, as well as regions like the road and sidewalks. Recently, Andrej Karpathy, Tesla’s director of AI, shared this example of their panoptic segmentation model.
1/3 Some panoptic segmentation eye candy 🌈🤩 from a new project we are bringing up. These are too raw to run in the car, but feed into auto labelers. Collaboration of data labeling a large (100K+), clean, diverse, multicam+video dataset and engineers who train the models pic.twitter.com/RTERAxyRO0— Andrej Karpathy (@karpathy) November 30, 2021
In order to train a panoptic segmentation model, we first need labeled training data. We can either use a publicly available dataset or create our own dataset. We’ll first look at which public datasets are available for both 2D images and 3D point cloud data.
COCO 2020 Panoptic Segmentation Task
COCO is a large dataset of common objects in their context. It features over 200K labeled images of objects such as different kinds of animals, appliances, food, and much more. The panoptic task uses 80 thing categories as well as several stuff categories.
BDD100K Panoptic Segmentation
BDD is a large driving video dataset captured in different cities in the US. It consists of 100,000 +-40s videos, of which 10,000 videos have pixel-wise annotations. The annotations use 10 thing categories (mainly for non-stationary objects) and 30 stuff categories.
Cityscapes Panoptic Semantic Labeling Task
Cityscapes is a dataset of urban street scenes captured by a vehicle in 50 German cities. The dataset includes 5000 diverse frames with high-quality pixel-level annotation, taken from stereo video sequences. There are 10 thing categories including cars, persons, etc., and 20 stuff categories such as ground, sky, and vegetation.
Mapillary Vistas Dataset v2.0
The Mapillary Vistas Dataset consists of street-level imagery captured in 6 different continents. It features 25,000 images annotated with 70 thing and 54 stuff categories. An important difference with the other datasets is that the annotations consist of polygons and not bitmaps.
Pastis: Panoptic Agricultural Satellite TIme Series
Pastis is a dataset of agricultural satellite images. It contains 2,433 variable-length time series of multispectral images. In the images, 18 different kinds of parcels are annotated with their respective crop types.
Panoptic segmentation can also be used with 3D point cloud data. This data is often gathered using a lidar sensor, or a stereo camera.
SemanticKITTI Panoptic Segmentation
SemanticKITTI is a dataset of lidar sequences of street scenes in Karlsruhe (Germany). It contains 11 driving sequences with panoptic segmentation labels. The labels use 6 thing and 16 stuff categories.
nuScenes is a large-scale autonomous driving dataset. It consists of 1000 20s scenes of urban street scenes in Singapore and Boston. The dataset includes point clouds captured by a lidar sensor, as well as synchronized camera data. The nuScenes-lidarseg annotations use 23 thing and 9 stuff classes.
ScanNet is an RGB-D video dataset of indoor scenes containing 2.5 million views in 1513 scans. It uses 38 thing categories for items and furniture in the rooms and 2 stuff categories (wall and floor). It is not a complete panoptic dataset, as the labels only cover about 90% of all surfaces.
Using the right dataset is essential in order to build a performant ML system. Public datasets can help you quickly experiment, but they might not be suited to train your final models. This can be because the type of data in the public datasets is different, because you need to use different categories, or because of domain shift. Therefore, it is often necessary to build your own dataset.
Creating a dataset requires three steps:
Data collection involves acquiring the right tool to capture new data, e.g. a vehicle with a lidar sensor, and going out and capturing the data. You should capture data in an environment that matches the real production environment as closely as possible.
Next, you often have to select which captured data you want to include in the dataset, as it can be infeasible and inefficient to use all captured data. Here, it is important to choose diverse data that covers all the different scenarios you captured.
Finally, you have to label your selected data. For panoptic segmentation, this means creating segmentation masks for each instance and each background region. This can be a tedious and time-consuming process, but with the right tools you can speed up your labeling significantly.
Segments.ai was built specifically for segmentation, so it’s the perfect tool for creating panoptic segmentation labels. Using Segments.ai’s superpixel tool, you can create segmentation masks with a single click. Segments.ai also allows you to set up a model-assisted workflow, where you train an initial model on a small set of labeled data, and then use the model to help label the complete dataset. Finally, you can choose whether to label the data in-house or work with an external workforce.
A showcase of Segments.ai’s superpixel tool
Panoptic segmentation is an image segmentation task that combines instance segmentation and semantic segmentation. Panoptic segmentation labels provide holistic information about a scene and thus help ML models understand the scene. Panoptic segmentation can both be used for image data, and for 3D point clouds (lidar or RGBD).
There are a number of public datasets for panoptic datasets. Most of them consist of urban driving imagery and are thus suited for autonomous vehicle applications. There are also datasets for common everyday objects.
You’ll have to create your own dataset if you want to create a panoptic segmentation model for a different application, if you want to use different categories, or if your data in production differs from the data in the public datasets. For this, you need to collect, curate, and label data. For panoptic segmentation, Segments.ai is the best tool for labeling your data and managing the labeling workforce.
Hope this was useful! If you have any questions or suggestions, feel free to send me an email at firstname.lastname@example.org