Introducing Text Labeling

1 min read -
Avatar photo
- February 8th, 2024 - has great labeling tools for computer vision. As multimodal learning is becoming increasingly important, even computer vision teams sometimes need to label other data like text or audio. To support the labeling needs of our users beyond computer vision, we added two new interfaces for text labeling: named entity recognition and span categorization.

Named Entity Recognition (NER)

Named entity recognition is the task of locating and classifying words and phrases into non-overlapping categories such as names, organizations, locations, etc. Each word can have one category. For example, the sentence “James bought 30 shares of Apple in 2020.” contains the following named entities:

  • James: person
  • Apple: organization
  • 2020: time

Span Categorization

When words can have multiple overlapping labels, you can use our more general span categorization interface. For example, when we also want to label grammar and parts of speech, the annotated sentence now becomes:

  • James: person, noun, subject
  • Apple: organization, noun, object
  • 2020: time

How to start

To start labeling data with our text interfaces:

  1. Upload your dataset through the web platform or Python SDK. We support bulk uploads through the web platform in txt, csv and json format.
  2. Label the text with your mouse or using the (customizable) hotkeys.
  3. Create a release and download your labeled dataset.

Let us know what you think of our text interfaces so we can make them even better in the future! And don’t hesitate to contact us if you have any questions.