Talk Info: He He
Title: Guarding Against Spurious Correlations in Natural Language Understanding
Abstract: While we have made great progress in natural language understanding, transferring the success from benchmark datasets to real applications has not always been smooth. Notably, models sometimes make mistakes that are confusing and unexpected to humans. In this talk, I will discuss shortcuts in NLP tasks and present our recent works on guarding against spurious correlations in natural language understanding tasks (e.g. textual entailment and paraphrase identification) from the perspectives of both robust learning algorithms and better data coverage. Motivated by the observation that our data often contains a small amount of “unbiased” examples that do not exhibit spurious correlations, we present new learning algorithms that better exploit these minority examples. On the other hand, we may want to directly augment such “unbiased” examples. While recent works along this line are promising, we show several pitfalls in the data augmentation approach.
Bio: He He is an assistant professor in the Center for Data Science and Courant Institute at New York University. Before joining NYU, she spent a year at Amazon Web Services and was a postdoc at Stanford University. She received her PhD from University of Maryland, College Park. She is broadly interested in machine learning and natural language processing. Her current research interests include text generation, dialogue systems, and robust language understanding.
Video:
Slides: