![]() Experimental analysis further indicates that LGTR can improve the generalization accuracy on OOD data, while preserving the accuracy on in-distribution data. ![]() Experimental results on three NLU benchmarks demonstrate that our long-tailed distribution explanation accurately reflects the shortcut learning behavior of NLU models. Based on this shortcut measurement, we propose a shortcut mitigation framework LGTR, to suppress the model from making overconfident predictions for samples with large shortcut degree. These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample. There are two findings: 1) NLU models have strong preference for features located at the head of the long-tailed distribution, and 2) Shortcut features are picked up during very early few iterations of the model training. In this work, we show that the words in the NLU training set can be modeled as a long-tailed distribution. As a result, these models fail to generalize to real-world out-of-distribution data. ![]() Abstract Recent studies indicate that NLU models are prone to rely on shortcut features for prediction, without achieving true language understanding. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |