Page Not Found
Page not found. Your pixels are in another canvas. Read more
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas. Read more
NLP researcher, mathematician, puzzle-solver Read more
This is a page not in the main menu Read more
Published in NAACL, 2016
This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500SMSmessages, annotated by university students. We then explored several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking. We demonstrated through empirical evaluations on the new dataset that the new variant yielded similar accuracy but ran in significantly lower running time compared to the conventional semi-CRF. Read more
Recommended citation: Muis, A. O., & Lu, W. (2016). Weak Semi-Markov CRFs for Noun Phrase Chunking in Informal Text. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 714–719). Stroudsburg, PA, USA: Association for Computational Linguistics. https://aclanthology.org/N16-1085/ https://arxiv.org/abs/1810.08567
Published in EMNLP, 2016
This paper focuses on the study of recognizing discontiguous entities. Motivated by a previous work, we propose to use a novel hypergraph representation to jointly encode discontiguous entities of unbounded length, which can overlap with one another. To compare with existing approaches, we first formally introduce the notion of model ambiguity, which defines the difficulty level of interpreting the outputs of a model, and then formally analyze the theoretical advantages of our model over previous existing approaches based on linearchain CRFs. Our empirical results also show that our model is able to achieve significantly better results when evaluated on standard data with many discontiguous entities. Read more
Recommended citation: Muis, A. O., & Lu, W. (2016). Learning to Recognize Discontiguous Entities. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 75–84). Stroudsburg, PA, USA: Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1008 https://arxiv.org/abs/1810.08579
Published in AAAI, 2017
Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for several down-stream natural language processing (NLP) tasks such as relation extraction and event extraction. Dependency trees, on the other hand, also convey crucial semantic-level information. It has been shown previously that such information can be used to improve the performance of NER (Sasano and Kurohashi 2008; Ling and Weld 2012). In this work, we investigate on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER. Specifically, unlike existing approaches which only exploit dependency information for designing local features, we show that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Through extensive experiments, we show that our proposed novel dependency-guided NER model performs competitively with models based on conventional semi-Markov conditional random fields, while requiring significantly less running time. Read more
Recommended citation: Jie, Z., Muis, A. O., & Lu, W. (2017). Efficient Dependency-guided Named Entity Recognition. In The 31st AAAI Conference on Artificial Intelligence (AAAI’17). Retrieved from http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14741 https://arxiv.org/abs/1810.08436
Published in ACL, 2017
Cybersecurity risks and malware threats are becoming increasingly dangerous and common. Despite the severity of the problem, there has been few NLP efforts focused on tackling cybersecurity. In this paper, we discuss the construction of a new database for annotated malware texts. An annotation framework is introduced based around the MAEC vocabulary for defining malware characteristics, along with a database consisting of 39 annotated APT reports with a total of 6,819 sentences. We also use the database to construct models that can potentially help cybersecurity researchers in their data collection and analytics efforts. Read more
Recommended citation: Lim, S. K., Muis, A. O., Lu, W., & Ong, C. H. (2017). MalwareTextDB : A Database for Annotated Malware Articles. In ACL 2017. http://aclweb.org/anthology/P17-1143
Published in EMNLP, 2017
In this paper, we propose a new model that is capable of recognizing overlapping mentions. We introduce a novel notion of mention separators that can be effectively used to capture howmentions overlap with one another. On top of a novel multigraph representation that we introduce, we show that efficient and exact inference can still be performed. We present some theoretical analysis on the differences between our model and a recently proposed model for recognizing overlapping mentions, and discuss the possible implications of the differences. Through extensive empirical analysis on standard datasets, we demonstrate the effectiveness of our approach. Read more
Recommended citation: Muis, A. O., & Lu, W. (2017). Labeling Gaps Between Words : Recognizing Overlapping Mentions with Mention Separators. In EMNLP. Copenhagen. https://doi.org/10.18653/v1/D17-1276 https://arxiv.org/abs/1810.09073
Published in COLING, 2018
The use of machine learning for NLP generally requires resources for training. Tasks performed in a low-resource language usually rely on labeled data in another, typically resource-rich, language. However, there might not be enough labeled data even in a resource-rich language such as English. In such cases, one approach is to use a hand-crafted approach that utilizes only a small bilingual dictionary with minimal manual verification to create distantly supervised data. Another is to explore typical machine learning techniques, for example adversarial training of bilingual word representations. We find that in event-type detection task-the task to classify [parts of] documents into a fixed set of labels-they give about the same performance. We explore ways in which the two methods can be complementary and also see how to best utilize a limited budget for manual annotation to maximize performance gain. Read more
Recommended citation: Muis, A. O., Otani, N., Vyas, N., Xu, R., Yang, Y., Mitamura, T., & Hovy, E. (2018). Low-resource Cross-lingual Event Type Detection in Documents via Distant Supervision with Minimal Effort. In COLING (pp. 70–82). Retrieved from https://aclanthology.org/C18-1007/ https://aclanthology.org/C18-1007v2.pdf
Undergraduate course, National University of Singapore, School of Computing, 2011
I was a lab assistant from January 2011 to Apr 2012 for entry-level programming courses: Read more
Undergraduate Course, National University of Singapore, School of Computing, 2012
I was an in-class teaching assistant from Aug 2012 to Nov 2012 for entry-level course CS1231: Discrete Structure. Duties included in-class homework discussion and grading. Read more
Graduate Course, Carnegie Mellon University, Language Technology Institute, 2018
I was a teaching assistant for graduate course on Algorithms for Natural Language Processing (11-711) under the instruction of Yulia Tsvetkov and Robert Frederking. Read more