Binary Classifier for Experimental Search of Triggers in Jokes in English
https://doi.org/10.25205/1818-7935-2024-22-3-98-111
Abstract
This paper describes the development of a binary classifier differentiating humorous and non-humorous texts. The proposed seq2seq model consists of a pre-trained BERT embedding layer and a Bi-LSTM layer used for sequence classification. Training and validation corpora include 76,000 jokes and non-jokes with identical vocabulary; this is essential in preventing vocabulary choice from being employed as a distinguishing factor between humor and non-humor. Further, this paper also describes the application of the trained neural network in a series of experiments on linguistic transformations of humorous and non-humorous texts. The purpose of these experiments is to identify the essential parts and words, without which the joke ceases to be humorous. Some interdisciplinary theories of humor specify such expressions as triggers [Attardo S., 1994]. Based on the results of quantitative and qualitative analyses, 78 of the jokes from the validation dataset changed the label to the opposite at least once when the text was transformed. At the same time, 16 of the remaining 22 jokes contain explicit or implicit extralinguistic information. T-test, which measured probabilistic estimates of the original and modified texts for each type of linguistic transformation, revealed (keep tense consistent) the most common types of them: deletion of the punchline, deletion of the setup, deletion of 1 to 3 tokens from the beginning of the text, deletion of 1 to 3 tokens from the middle of the text and deletion of all the nouns.
About the Author
E. M. ZakovorotnaiaRussian Federation
Eugeniia M. Zakovorotnaia, Postgraduate Student of the Faculty of Humanities
Moscow
References
1. Annamoradnejad I. ColBERT: Using BERT Sentence Embedding for Humor Detection. 2022, URL: https://arxiv.org/abs/2004.12765
2. Attardo S. Linguistic theories of humor. Mouton de Gruyter. 1994
3. Blinov V., Bolotova-Baranova V., Braslavski P. Large Dataset and Language Model Fun-Tuning for Humor Recognition // In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, p. 4027–4032.
4. Chen Y., Shi B., Si M. Prompt to GPT-3: Step-by-Step Thinking Instructions for Humor Generation. 2023, URL: https://arxiv.org/abs/2306.13195
5. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Techno logies, 2019, vol. 1 (Long and Short Papers), p. 4171–4186.
6. Epstein B. The Internal and the External in Linguistic Explanation. // Croatian Journal of Philosophy, 2008, vol. 8(22), p. 77–111.
7. Hasan M. K., Rahman W., Zadeh A. B., Zhong J., Tanveer M. I., Morency L.-P., Hoque M. UR-FUNNY: A Multimodal Language Dataset for Understanding Humor. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Internatio nal Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 2046–2056.
8. He H., Peng N., Liang P. Pun Generation with Surprise // In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, p. 1734–1744.
9. IberLEF2019, URL: https://sites.google.com/view/iberlef-2019/
10. Karande A. What Humour Tells Us About Discourse Theories // Conference of the European Chapter of the Association for Computational Linguistics, 2006, p. 31–38.
11. Liu Y., Ott M., Goyal N., Du J., Joshi M, Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. // ‘RoBERTa: A Robustly Optimized BERT Pretraining Approach’, URL: https://arxiv.org/abs/1907.11692.
12. Morreall J. “Philosophy of Humor”, The Stanford Encyclopedia of Philosophy (Fall 2020 Edition). Edward N. Zalta (ed.), Metaphysics Research Lab, Stanford University, 2020, vol. 2. URL: https://plato.stanford.edu/archives/fall2020/entries/humor/
13. Pritchett Bradley L. Garden Path Phenomena and the Grammatical Basis of Language Processing // Language 64, 1988, p. 539–576.
14. Raskin V. Semantic Mechanisms of Humor, Volume 24 Springer Netherlands, Dordrecht, 1984, p. 99–147.
15. Raskin V., Attardo S. Script theory revis(it)ed: joke similarity and joke representation model // Humor – International Journal of Humor Research, Voume. 4 (Issue 3-4), 2020, p. 293–348.
16. SemEval2020, URL: https://alt.qcri.org/semeval2020/
17. SemEval2021, URL: https://semeval.github.io/SemEval2021/
18. Spacy-model “en_core_web_trf”: https://huggingface.co/spacy/en_core_web_trf
19. Tang L., Cai A., Li S., Wang J. The Naughtyformer: A Transformer Understands Offensive Humor, 2023, URL: https://arxiv.org/abs/2211.14369
20. Toplyn J. Witscript 3: A hybrid ai system for improvising jokes in a conversation. 2023, URL: https://arxiv.org/abs/2301.02695
21. Veale T. Figure-Ground Reversal in Linguistic Humour:A multimodal prespective // Lodz Papers in Pragmatics 4.1, Special Issue on Humour, 2008, p. 63–81.
22. Wang M.,Yang H., Qin Y., Sun S., Deng Y. Unified Humor Detection Based on Sentence-pair Augmentation and Transfer Learning // In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 2020, p. 53–59.
23. Weller O., Seppi K. Humor Detection: A Transformer Gets the Last Laugh // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 3621–3625.
Review
For citations:
Zakovorotnaia E.M. Binary Classifier for Experimental Search of Triggers in Jokes in English. NSU Vestnik. Series: Linguistics and Intercultural Communication. 2024;22(3):98-111. (In Russ.) https://doi.org/10.25205/1818-7935-2024-22-3-98-111