The use of machine learning for NLP generally requires resources for training. Tasks performed in a low-resource language usually rely on labeled data in another, typically resource-rich, language. However, there might not be enough labeled data even in a resource-rich language such as English. In such cases, one approach is to use a hand-crafted approach that utilizes only a small bilingual dictionary with minimal manual verification to create distantly supervised data. Another is to explore typical machine learning techniques, for example adversarial training of bilingual word representations. We find that in event-type detection task-the task to classify [parts of] documents into a fixed set of labels-they give about the same performance. We explore ways in which the two methods can be complementary and also see how to best utilize a limited budget for manual annotation to maximize performance gain.
Recommended citation: Muis, A. O., Otani, N., Vyas, N., Xu, R., Yang, Y., Mitamura, T., & Hovy, E. (2018). Low-resource Cross-lingual Event Type Detection in Documents via Distant Supervision with Minimal Effort. In COLING (pp. 70–82). Retrieved from http://aclweb.org/anthology/C18-1007v2