Annotating Ground Truth in Computer Vision: Vector vs. Segmentation Annotations

4 min read -
Avatar photo
- April 16th, 2024 -

Accurately interpreting the environment is crucial in computer vision, especially in fields like autonomous vehicles and robotics. This requires assigning meaning to dataset elements, known as annotations or data labels. The two primary methods to annotate ground truth data are vector and segmentation (or pixel) annotations, each with unique strengths and applications. But which one should you go with?

The TL;DR is that it doesn’t matter. In the end, you can easily convert one format into the other. It is rather a balance between budget (bounding boxes are cheaper) and accuracy (polygons or segmentation).

Don’t choose an annotation type because your ML model said so.

A common misunderstanding is that the choice of annotation type should directly reflect a machine-learning model’s input or output format. But with the right tools, you can easily convert a vector to a segmentation bitmap and vice versa. Don’t believe us? Just peek at how easy it is to export to different formats.

This flexibility in converting between annotation formats means that the initial choice of annotation type does not limit the machine learning model’s design or data requirements. Instead, the focus should be on selecting the annotation method that best captures the necessary detail and accuracy for the task at hand, considering the ease of conversion for model feeding or visualization purposes.

You can see here how to achieve this in

When To Use Vector Annotations?

What are vector annotations?

With vector annotations, we assign a label to the outline of an object with a series of connected points, forming shapes like key points, polylines, bounding boxes, and polygons. In terms of labeling effort, this is also the order from cheap to more expensive.

For example, when annotating with bounding boxes, unintended objects can be included, leading to potential misclassification by the machine learning model. Using polygons can prevent this by precisely masking the target object at the pixel level. But this added detail comes with a price in working hours.

Default output: A list of coordinates assigned to a class or category.

image sequence video smart interpolation with track id for image data labeling

Bounding boxes simplify object tracking across sequences.

When to annotate data with vectors?

  • 1

    To track objects: When you must distinguish and track objects within the same class, such as counting and tracking objects in the scene.

  • 2

    High-level abstraction: When you need a high-level abstraction of the scene, making it easy to understand the spatial relations between objects.

  • 3

    Object Detection: When the goal is to identify and locate objects within a scene without needing pixel-level detail, vector annotation with bounding boxes or polygons is often sufficient.

  • 4

    Path Planning: In robotics and autonomous vehicles, vector annotations can define paths, lanes, and navigable spaces without the computational overhead of segmentation.

  • 5

    Simplified Representations: When models need to run on edge devices or in real-time applications where computational resources are limited, vector annotations can provide a balance between detail and efficiency.

You can do instance segmentation through polygon annotations to identify and pinpoint every distinct object within an image or video. This goes beyond traditional object detection with bounding box coordinates, but with precise pixel boundaries.

When to use Segmentation Annotation?

What is segmentation annotation?

Segmentation, or pixel annotation, assigns a label to each image pixel, accurately detailing the scene. Instead of drawing the boundaries of objects, each pixel is classified according to the object or category it belongs to.

In data segmentation, we often speak of instance, semantic and panoptic segmentation.

With instance segmentation, we identify individual objects. On the contrary, semantic segmentation classifies each pixel into a category. Panoptic segmentation is the combination of both, where you identify each individual object and assign it a category.

Default output: A pixel mask with each pixel colored according to its class.

When to annotate data with segmentation masks?

  • 1

    Precise Object Identification: It is ideal for precisely identifying diverse objects in a scene, like roads, trees, and vehicles. A pixel can only be linked to 1 class.

  • 2

    Detailed Scene Understanding: For autonomous vehicles, understanding the exact outline of roads, pedestrians, vehicles, and obstacles is crucial for safe navigation.

  • 3

    Object Interaction: Knowing the precise boundaries of objects, especially for manipulative tasks, helps plan how to interact with them physically.

  • 4

    Complex Environments: In cluttered or highly dynamic environments, segmentation can provide the nuanced understanding needed to make informed decisions.

Both approaches are not mutually exclusive.

Both approaches are not mutually exclusive. Often both approaches are combined in the development process of autonomous systems.

However, with the advancements of new machine learning techniques, you can achieve better, faster, and more accurate results with data segmentation. The question is how far your labeling budget allows for precise data segmentation.

Tools that can speed up segmentation

… and limit the budgetary constraints.

Segmentation annotation, which involves classifying each pixel, is a significant challenge due to its meticulous nature. This drives up the annotation price of labeling workforces or the time internal teams spend.

The tools below were created to semi-automate the creation of labels that can then be easily corrected by expert annotators.

Superpixel 2.0

Superpixel 2.0 takes the traditional approach of Superpixel a step further with the help of machine learning. Where Superpixel groups pixels with similar properties, such as colors, Superpixel 2.0 also recognizes shapes to improve accuracy further.

This works great for larger objects in the scene, but increasing the granularity is possible.

Read more on the importance of superpixel 2.0 here


Superpixel 2.0 is useful for larger objects because it automatically recognizes shapes. Autosegment is an excellent complement to that because you can draw a box around an object, and it will create a mask for the (smaller) object with more accuracy. Autosegment also works better on high-resolution images.

More info on Autosegment can be found here

Segment Anything Model (SAM)

Trained on a massive dataset of over 11 million images, SAM by Meta makes it easy to segment any object in a single click. Although that might look like the holy grail, tests by our team and customers show – while useful in some cases – the right combination of the above two ML-assisted tools gets you faster and more qualitative results.


You already have a more specific model trained for your dataset? You can also use your own models to automatically create the labels. After that, you can easily correct any prediction errors with the smart labeling tools mentioned earlier by uploading prelabeled data to


Accurate annotating ground truth data in computer vision is essential for training robust and effective models. The decision between vector annotations and segmentation annotations hinges not on the capabilities of machine learning models but on the specific requirements of the project. Vector annotations balance efficiency and detail, which is ideal for object tracking or high-level abstractions. Segmentation annotations, on the other hand, provide detailed scene understanding and precise object identification and are invaluable in complex environments.

The choice between these methods should be guided by the project’s goals, the required level of detail, and budget constraints. With advancements in tools and technologies, the boundaries between these annotation types blur, offering more flexibility and efficiency in the annotation process. Whether you opt for vector annotations for their simplicity and speed or segmentation annotations for their precision and depth, the key is to leverage the strengths of each method to meet your project’s unique needs.