GUIDE 3 Annotation Team Playbooks to Boost Labeling Speed and Quality

OCR Invoices Pre-NER BIO Format

OCR text extraction and tokenization with BIO format for invoice documents. All tokens are initially tagged as ‘O’ (Outside) for subsequent NER tagging.

Labeling Configuration

<View>
  <!-- The image to annotate -->
  <Image name="image" value="$image" zoomControl="true"/>

  <!-- Bounding-box control that will receive the "rectanglelabels" results
       coming from your OCR model (from_name = "label") -->
  <RectangleLabels name="label" toName="image" choice="single">
    <!-- You only emit the generic "O" class, but feel free to add more labels -->
    <Label value="O" background="#FFA500"/>
  </RectangleLabels>

  <!-- Per-region transcription box (from_name = "transcription").
       Because perRegion="true", one TextArea is linked to each rectangle. -->
  <TextArea name="transcription"
            toName="image"
            perRegion="true"
            editable="true"
            rows="1"
            required="true"
            placeholder="Type or correct OCR text…"/>
</View>

About the labeling configuration

All labeling configurations must be wrapped in View tags.

This configuration uses the following tags:

Usage Instructions

This configuration provides a streamlined interface for OCR text verification and correction: