RoSeq: Robust Sequence Labeling

Abstract

In this paper, we mainly investigate two issues for sequence labeling, namely label imbalance and noisy data which are commonly seen in the scenario of Named Entity Recognition and are largely ignored in existing works. To address these two issues, a new method termed robust sequence labeling (RoSeq) is proposed. Specifically, to handle the label imbalance issue, we first incorporate label statistics in a novel CRF loss. Additionally, we design an additional loss to reduce the weights of overwhelming easy tokens for augmenting the CRF loss. To address the noisy training data, we adopt an adversarial training strategy to improve model generalization. In experiments, the proposed RoSeq achieves state-of-the-art performances on CoNLL and English Twitter NER benchmark datasets without using additional data.

Publication
IEEE Transactions on Neural Networks and Learning Systems (TNNLS 2019)
Hao Zhang
Hao Zhang
Staff Algorithm Engineer