Subjective NLP tasks and complex human values

The flexibility of input structure of text-to-text pretrained models have enabled few-shots adaptions to new tasks using prompts. I have recently started exploring how the same flexibility can be used to improve the model’s performance in subjective NLP tasks like emotion classification, hate speech detection, stance detection, or elaboration of concepts in a domain-specific text. In my latest AAAI 2023 submission, I used this flexibility to enable users to selectively simplify the contents of medical texts.

Quality labeling has always been a challenge for subjective NLP tasks due to the lack of a well-defined standard. Our recent project on mining human values from online reviews, submitted to CHI 2023 exposed us to some of those challenges. We found that low inter-annotator agreement aligns directly with the low performance of large language models like RoBERTa. One of the interesting reasons for subjectivity in data labels turned out to be the ranking of labels (words) in English vocabulary, especially when one label was a hyperonym of another label or belonged to the synset of another label in the Wordnet sense or when one label had strong correlation (co-occurrence in multi-label setting) with another label.