Back to the articles

Evaluating SAM for image segmentation labeling

June 9th, 2023 - 2 min -
Avatar photo

In robotics and autonomous driving, accurate image segmentation is crucial for extracting meaningful information from visual data. As computer vision experts, we understand the challenges faced when labeling vast amounts of data and the need for efficient tools that streamline the process without compromising quality.

In April 2023, Meta unveiled SAM (Segment Anything Model), a model developed explicitly for image segmentation. As we owed it to our company name, we instantly integrated SAM as an edit mode into our labeling interface in the form of a hover-and-click tool, besides our existing suite of tools for smooth and efficient image segmentation.

After two months of real-world usage, it is time to review SAM’s performance and compare it with our established Superpixel and Autosegment features.

SAM’s performance

SAM generates accurate segmentation masks for everyday objects in images, such as vehicles. This simplifies the annotation process for computer vision engineers or data labeling teams. However, as we delve deeper into SAM, some areas for improvement emerged.

SAM’s performance

Obtaining a refined outline of an object in a single click using SAM often required considerable effort. When using the tool, one can hover over an object to enlarge or contract the exact part they wish to select. While this feature intuitively seems like it would offer enhanced control over the segmentation process, it often led to the inclusion of unnecessary pixels in the object’s mask when attempting to improve precision.

As those unnecessary pixels included in the segmentation mask can appear anywhere on the image, they risk to not be noticed and can have a large impact on the quality of the labeling.

Source: sidewalk-semantic dataset

A potential improvement here could be the new SAM-HQ model, which equips SAM with the ability to detect the outlines of an object more accurately, while maintaining SAM’s original strengths.

Class distinguishment

Another area of improvement is the tool’s ability to differentiate between similar-looking objects or regions in an image. SAM sometimes struggled to distinguish between items such as grass and mulch, the tree trunk and its canopy, or the road and sidewalk. While this might appear as an isolated issue given the wide variety of class items we deal with, such nuances are vital for high-quality image segmentation.

Source: cityscapes dataset

Efficiency considerations

In terms of time efficiency, SAM’s performance was somewhat lacking compared to the already existing tools in our arsenal. The time consumed in cleaning up inaccurate object edges while using the hover-and-click feature almost doubled the amount of time compared to using tools such as Superpixel and Autosegment.

Segment Anything Model

Time to label: 4m15

Time to label: 7m08

Time to label: 3m40

Time to label: 4m20

Superpixel and Autosegment

Time to label: 2m30

Time to label: 2m40

Time to label: 1m50

Time to label: 3m30

Like any tool in its early stages, SAM shows immense promise and areas for enhancement. Still, integrating SAM into is a significant step towards providing computer vision developers with a powerful and efficient tool for their labeling needs.

Special thanks to Jose Rendon Leyva, Field Operations Manager at Scythe Robotics, for sharing his feedback with us on data labeling with the Segment Anything Model.

Share this article