Keyphrase Generation for Abstracts of the Russian-Language Scientific Articles
https://doi.org/10.25205/1818-7935-2023-21-1-54-66
Abstract
In this paper, we attempted to adapt various well-known algorithms for keyword selection to a very specific text corpus containing abstracts of Russian academic papers from the mathematical and computer science domain. We faced several challenges including the lack of research in the field of keyword extraction for Russian, the absence of large text corpora of academic abstracts, and the insufficient length of the abstracts. Keywords are often found in the full text of the paper and can simply be highlighted, whereas abstracts may not include keywords in an explicit form. At the same time, it is abstracts that are usually in the public domain, so automatic selection of keywords from them would significantly facilitate the process of searching for papers. Moreover, an automatic keyword selection would be useful even for papers for which keywords were already specified by the authors. During the study, we found that authors often use unique keywords for their papers. This complicates their systematization on a given topic. For visualizing the results, we have created a web resource keyphrases.mca.nsu.ru, where young/beginning scholars can form an approximate list of keywords for their first research paper.
About the Authors
D. A. MorozovRussian Federation
Dmitry A. Morozov, junior researcher, Laboratory of Applied Digital Technologies, Mathematical Center in Akademgorodok
Novosibirsk
A. V. Glazkova
Russian Federation
Anna V. Glazkova, Cand. Sc. (Technology), Associate Professor, Department of Software, Institute of Mathematics and Computer Science
Tyumen
M. A. Tyutyulnikov
Russian Federation
Mikhail A. Tyutyulnikov, engineer, Laboratory of Applied Digital Technologies, Mathematical Center in Akademgorodok
Novosibirsk
B. L. Iomdin
Russian Federation
Boris L. Iomdin, Cand. Sc. (Philology), Leading Researcher
Moscow
References
1. Boudin, F. PKE: an open source python-based keyphrase extraction toolkit. Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations. Osaka, Japan, 2016, pp. 69–73.
2. Bougouin, A., Boudin, F., Daille, B. TopicRank: Graph-based topic ranking for keyphrase extraction. Proceedings of the Sixth International Joint Conference on Natural Language Processing. Nagoya, Japan, 2013, pp. 543–551.
3. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A. YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 2020, 509, pp. 257–289.
4. Chen, W., Chan, H. P., Li, P., King, I. Exclusive Hierarchical Decoding for Deep Keyphrase Generation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online, 2020, pp. 1095–1105.
5. Devlin, J., Chang, M. W., Lee, K., Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT. Minneapolis, USA, 2019, pp. 4171–4186.
6. El-Beltagy, S. R., Rafea, A. KP-Miner: A keyphrase extraction system for English and Arabic documents. Information Systems, 2009, no. 1 (34), pp. 132–144.
7. Ghanbarpour, A., Naderi, H. A model-based method to improve the quality of ranking in keyword search systems using pseudo-relevance feedback. Journal of Information Science, 2019, no. 4 (45), pp. 473–487.
8. Grootendorst, M. KeyBERT: Minimal Keyword Extraction with BERT, 2020. Available at: http://doi.org/10.5281/zenodo.4461265 (accessed 29.11.2022).
9. Harris, Z. S. Distributional structure. Word, 1954. no. 2-3 (10), pp. 146–162.
10. Koloski, B., Pollak, S., Škrlj, B., Martinc, M. Extending Neural Keyword Extraction with TF-IDF tagset matching. Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation. Online, 2021, pp. 22–29.
11. Korobov, M. Morphological analyzer and generator for Russian and Ukrainian languages. International conference on analysis of images, social networks and texts. Yekaterinburg, 2015, pp. 320–332.
12. Kuratov, Y., Arkhipov, M. Adaptation of deep bidirectional multilingual transformers for Russian language. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019”. Moscow, 2019. Available at: https://www.dialog-21.ru/media/4606/kuratovyplusarkhipovm-025.pdf (accessed 29.11.2022).
13. Lin C. Y. ROUGE: A package for automatic evaluation of summaries. Text summarization branches out. Osaka, Japan, 2004, pp. 74–81.
14. Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y. Deep Keyphrase Generation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada, 2017, pp. 582–592.
15. Mihalcea, R., Tarau, P. TextRank: Bringing order into text. Proceedings of the 2004 conference on empirical methods in natural language processing. Barcelona, Spain, 2004, pp. 404–411.
16. Morozov, D., Glazkova, A. Keyphrases CS&Math Russian, Mendeley Data, 2022. Available at: http://doi.org/10.17632/dv3j9wc59v.1 (accessed 29.11.2022).
17. Page L., Brin S., Motwani R., Winograd T. The PageRank citation ranking: Bringing order to the web, Stanford InfoLab, 1998. Available at: http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf (accessed 02.12.2022).
18. Sandul, M., Mikhailova, E. Keyword extraction from single Russian document. Proceedings of the Third Conference on Software Engineering and Information Management (full papers). Saint Petersburg, 2018, pp. 30–36.
19. Sheremetyeva, S. O., Osminin, P. G. [On Methods and Models of Keywords Automatic Extraction]. Vestnik Juzhno-Ural’skogo gosudarstvennogo universiteta. Serija: Lingvistika [Bulletin of South Ural State University, Series «Linguistics»], 2015, no. 1 (12), pp. 76–81. (In Russ.)
20. Sokolova, E., Moskvina, A., Mitrofanova, O. Keyphrase Extraction from the Russian Corpus on Linguistics by Means of KEA and RAKE Algorithms. Data analytics and management in data intensive domains: Proceedings of the XX International Conference – DAMDID/RCDL’2018. Moscow, 2018, pp. 369–372.
21. Tikhonova, E. V., Kosycheva, M. A. Effective Keywords: Strategies for Their Formulation. Health, Food & Biotechnology, 2021, no. 4 (3), pp. 7–15. (In Russ.)
22. Wienecke, Y. Automatic Keyphrase Extraction From Russian-Language Scholarly Papers in Computational Linguistics: University Honors Theses. Portland State University, 2020. 36 p.
23. Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., Nevill-Manning, C. G. KEA: Practical automatic keyphrase extraction. Proceedings of the fourth ACM conference on Digital libraries. Berkeley, USA, 1999, pp. 254–255.
24. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., Artzi, Y. BERTScore: Evaluating Text Generation with BERT. International Conference on Learning Representations. Online, 2019 Available at: https://openreview.net/pdf?id=SkeHuCVFDr (accessed 29.11.2022).
Review
For citations:
Morozov D.A., Glazkova A.V., Tyutyulnikov M.A., Iomdin B.L. Keyphrase Generation for Abstracts of the Russian-Language Scientific Articles. NSU Vestnik. Series: Linguistics and Intercultural Communication. 2023;21(1):54-66. (In Russ.) https://doi.org/10.25205/1818-7935-2023-21-1-54-66