Preview

NSU Vestnik. Series: Linguistics and Intercultural Communication

Advanced search

Automatic Methods for Detecting Cultural Bias in Social Media (Based on Telegram’s Dialogs)

https://doi.org/10.25205/1818-7935-2021-19-2-54-72

Abstract

In this paper, we described and tested several ways to use machine learning in order to analyze large collections of text data from social networks (namely, public Telegram chat), retrieve relevant social or cultural information from them, and to visualize the results of the research. The proposed approach has an advantage to reveal hidden patterns of social, political or cultural behavior by being able to cover large amounts of data. It can complement the standard social surveys methodology. Automatic detecting cultural bias on the example of social media requires mastering methods for measuring and visualizing its different kinds, such as cultural shifts, specific national or group refractions, mutations, stereotypes. We argue that cultural bias is a result of nonrandom errors in thinking. It is based, firstly, on a person's understanding of himself and the world around him and, secondly, on the translation of this understanding into abstraction in the form of common misconceptions, ideologemes, narrative, slogans. In society the bias inevitably leads to the separation of one social group or subculture from another. Social networks (both classic and new formats, for example, messengers with public chat options) are the most active ground for the representation of this phenomenon. Since the discussion of sociopolitical and cultural contexts in the case of chats takes place in public, the participants of such a communicative act tend to get approval of the social group to which they are ideologically close. It is this phenomenon that allows us to form comparisons of the “friend - foe” type, which lead next to unconscious cultural shifts. Thus, mastering methods to identify properly cultural shifts is not only relevant but crucial for the intra- and intercultural communication, for controlling the level of aggressiveness of the society, understanding its mood. As helpful illustrations, readers will find semantic associations elicited by the words “freedom”, “democracy”, “Internet”; sociocultural analysis of several topical clusters (e.g. Россия, страна, Путин, русский, православный); visualization of semantic associations for the words “freedom”, “democracy”, “Internet”.

About the Authors

Y. A. Zherebtsova
ITMO University
Russian Federation


A. V. Chizhik
ITMO University; Saint Petersburg State University
Russian Federation


A. P. Sadokhin
Russian State Social University
Russian Federation


References

1. Кирсанова М. А. Роль антипословиц с гендерным компонентом в формировании юмористического образа женщины (на материале русского и английского языков) // Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация. 2020. Т. 18, № 3. С. 87-102. DOI 10.25205/1818-7935-2020-18-3-87-102

2. Петерс Б. Прикладная этнолингвистика о правильном использовании стереотипов в курсе французского как иностранного (пер. с фр. А. Ф. Фефелова) // Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация. 2020. Т. 18, № 1. С. 84-104. DOI 10.25205/ 1818-7935-2020-18-1-84-104

3. Поршнев Б. Ф. Социальная психология и история. М., 1979. 232 с.

4. Садохин А. П. Языковая личность и ее структура в межкультурной коммуникации // Библиотечное дело. 2008. № 1. С. 94-98.

5. Слесарева E. Р., Рыжкина О. А., Фефелов А. Ф. Трактовка темы австралийской идентичности в национальной прессе // Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация. 2020. Т. 18, № 1. С. 105-119. DOI 10.25205/1818-7935-2020-18-1-105-119

6. Узнадзе Д. Н. Психология установки. СПб., 2001. 416 с.

7. Фефелов А. Ф. Этносемантические свойства культурной среды: рефракция и адаптация // Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация. 2016a. Т. 14, № 3. С. 15-33.

8. Фефелов А. Ф. Семантика и прагматика взаимодействия британской и китайской культур в поликодовом тексте документального фильма // Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация. 2016б. Т. 14, № 4. С. 60-80.

9. Шабаев Ю. П., Садохин А. П. Культурные границы и маркеры в этническом пространстве современной России. Опыт Case Study // Общественные науки и современность. 2012. № 6. С. 166-173.

10. Ядов В. А. Идеология как форма духовной деятельности общества. Л., 1961. 122 с.

11. Яковлев А. А. Системное описание языкового сознания студента: образы идеального преподавателя и идеального студента // Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация. 2020. Т. 18, № 3. С. 141-153. DOI 10.25205/1818-7935-2020-18-3-141-153

12. Artetxe, M., Schwenk, H. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. In: CoRR. arXiv:1812.10464. 2018.

13. Che, W., Liu, Y., Wang, Y., Zheng, B., Liu, T. Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation. In: CoRR. arXiv:1807.03121. 2018.

14. Jelinek, F. Computation of the probability of initial substring generation by stochastic context free-grammar. Computational Linguistics, 1991, vol. 17, no. 3, p. 315-323.

15. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T. Bag of Tricks for Efficient Text Classification. In: arXiv:1607.01759. 2016.

16. Maaten, L., Hinton, G. Visualizing data using t-SNE. Journal of machine learning research, 2008, vol. 9, p. 2579-2605.

17. Mikolov, T. Distributed Representations of Words and Phrases and their Compositionality. In: Proceedings of Workshop at ICLR. 2013. URL: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf (accessed: 17.02.2020).

18. Panchenko, A., Romanov, P., Morozova, O., Naets, H., Philippovich, A., Romanov, A., Fairon, C. Serelex: Search and Visualization of Semantically Related Words. In: Proceedings of the 35th European Conference on Information Retrieval (ECIR 2013). Springler’s Lecture Notes on Computer Science. Moscow, 2013, p. 837-840.

19. Pennington, J., Socher, R., Manning, C. D. GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. Doha, Qatar, 2014, p. 1532-1543.

20. Peters, M. E., Neumann, M., Iyyer, M. Deep contextualized word representations. In: arXiv preprint arXiv: 1802.05365. 2018.

21. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I. Language Models are Unsupervised Multitask Learners. In: Technical Report OpenAi. 2018. URL: https://d4mucfpksywv. cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

22. Stolcke, A., Segal, J. Precise n-gram probabilities from stochastic context-free grammars. In: Proceedings of the 32th Annual Meeting of ACL, 1994, p. 74-79.


Review

For citations:


Zherebtsova Y.A., Chizhik A.V., Sadokhin A.P. Automatic Methods for Detecting Cultural Bias in Social Media (Based on Telegram’s Dialogs). NSU Vestnik. Series: Linguistics and Intercultural Communication. 2021;19(2):54-72. (In Russ.) https://doi.org/10.25205/1818-7935-2021-19-2-54-72

Views: 268


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7935 (Print)