OCR Invoices Pre-NER BIO Format
Community

OCR text extraction and tokenization with BIO format for invoice documents. All tokens are initially tagged as ‘O’ (Outside) for subsequent NER tagging.
Labeling Configuration
<View>
<!-- The image to annotate -->
<Image name="image" value="$image" zoomControl="true"/>
<!-- Bounding-box control that will receive the "rectanglelabels" results
coming from your OCR model (from_name = "label") -->
<RectangleLabels name="label" toName="image" choice="single">
<!-- You only emit the generic "O" class, but feel free to add more labels -->
<Label value="O" background="#FFA500"/>
</RectangleLabels>
<!-- Per-region transcription box (from_name = "transcription").
Because perRegion="true", one TextArea is linked to each rectangle. -->
<TextArea name="transcription"
toName="image"
perRegion="true"
editable="true"
rows="1"
required="true"
placeholder="Type or correct OCR text…"/>
</View>
About the labeling configuration
All labeling configurations must be wrapped in View tags.
This configuration uses the following tags:
Usage Instructions
This configuration provides a streamlined interface for OCR text verification and correction: