Language model (LM) pretraining has led to consistent improvements in many NLP downstream tasks, including named entity recognition (NER). In this paper, we present T-NER (Transformer-based Named Entity Recognition), a Python library for NER LM finetuning. In addition to its practical utility, T-NER facilitates the study and investigation of the cross-domain and cross-lingual generalization ability of LMs finetuned on NER. Our library also provides a web app where users can get model predictions interactively for arbitrary text, which facilitates qualitative model evaluation for non-expert programmers. We show the potential of the library by compiling nine public NER datasets into a unified format and evaluating the cross-domain and cross-lingual performance across the datasets. The results from our initial experiments show that in-domain performance is generally competitive across datasets. However, cross-domain generalization is challenging even with a large pretrained LM, which has nevertheless capacity to learn domain-specific features if fine-tuned on a combined dataset. To facilitate future research, we also release all our LM checkpoints via the Hugging Face model hub.
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
1. T-NER
An All-Round Python Library for Transformer-based
Named Entity Recognition
Asahi Ushio
Jose Camacho-Collados
Cardiff University
School of Computer Science and Informatics
Presented at EACL 2021
https://github.com/asahi417/tner
https://pypi.org/project/tner
2. Language Model Pretraining & Finetuning
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin, Jacob, et al., 2018)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel, Colin, et al. 2020)
Improving language understanding by generative
pre-training (Radford, Alec, et al., 2018)
2
3. Named Entity Recognition
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
Jacob Collier is an English artist.
Person Location
3
4. Named Entity Recognition
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
Jacob Collier is an English artist.
4
5. Named Entity Recognition
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
Jacob Collier is an English artist.
Jacob is English
an artist.
Collier
Tokenization
5
6. Named Entity Recognition
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
Jacob Collier is an English artist.
BERT + Linear Projection
Jacob is English
an artist.
PJacob PCollier Pan PEnglish Partist
Pis
Collier
Tokenization
Location
Person
6
7. Implement NER System
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
Unify Tagging Scheme
- IOB, IOB2, IOBES, etc
Jean Auguste Dominique Ingres
B-Person I-Person I-Person I-Person
Jean Auguste Dominique Ingres
Person
B-Person I-Person I-Person E-Person
IOBES
IOB
7
8. Implement NER System
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
Unify Tagging Scheme
- IOB, IOB2, IOBES, etc
Fix Sequence Mismatch
- Algine label sequence to
model tokenization
Jean Auguste Dominique Ingres was a French painter.
B-Person I-Person I-Person I-Person
Jean August e Dominique Ingres was a French painter.
B-Location
Dataset
RoBERTa Tokenization
B-Person I-Person
I-Person
I-Person
I-Person
I-Person
I-Person
B-Location
Jean Auguste Dominique Ingres
Jean Auguste Dominique Ingres
Person
B-Person I-Person I-Person I-Person
B-Person I-Person I-Person E-Person
IOBES
IOB
8
9. T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
Unify Tagging Scheme
- IOB, IOB2, IOBES, etc
Fix Sequence Mismatch
- Algine label sequence to
model tokenization
Jean Auguste Dominique Ingres was a French painter.
B-Person I-Person I-Person I-Person
Jean August e Dominique Ingres was a French painter.
B-Location
Dataset
B-Person I-Person
I-Person
I-Person
I-Person
I-Person
I-Person
B-Location
Jean Auguste Dominique Ingres
Jean Auguste Dominique Ingres
Person
Evaluate in Cross-domain
- Dataset specific entity
definition
BioNLP2004
● Protein
● Cell type
● RNA
WNUT2017
● Person
● Corporation
● Creative work
Implement NER System
B-Person I-Person I-Person I-Person
B-Person I-Person I-Person E-Person
IOBES
IOB
RoBERTa Tokenization
9
10. NLP Open Source Softwares
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
10
12. Overall T-NER Design
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
LM finetuning
OntoNotes 5
CoNLL 2003
BioNLP 2004
WNUT 2017
WikiAnn
...
Datasets
NER model
web APP
LM evaluation
*cross-domain
*cross-lingual
46 finetuned NER models released in model
hub !!
● IOB format
● Sequence
mismatch fixed
Upload/download
model
Notebook link
● Finetuning
● Evaluation
● Model prediction
● Multilingual NER
12
13. Web Application
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
# SETUP
>>> git clone https://github.com/asahi417/tner
>>> cd tner
>>> pip install .
# RUN APPLICATION at http://0.0.0.0:8000/
>>> export NER_MODEL=’asahi417/tner-xlm-roberta-large-ontonotes5’
>>> uvicorn app:app --reload --log-level debug --host 0.0.0.0 --port 8000
13
14. T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
14
15. Experimental Results
T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition
Asahi Ushio and Jose Camacho-Collados
15