Assessment of the Reliability of Lexical Lists for Automated Evaluation of Proficiency in Russian as a Foreign Language
https://doi.org/10.25205/1818-7935-2024-22-3-84-97
Abstract
The assessment of language proficiency plays a crucial role in education, but it often relies on subjective evaluation methods, which can result in bias and inconsistencies in the results. To address this challenge, many researchers advocate for the use of automated and semi-automated assessment methods based on linguistic characteristics of texts. In this study, we explore the applicability of available vocabulary lists as tools for automatically evaluating the proficiency levels of students learning Russian.
There exist several types of lexical lists, including frequency-based word lists and minimum vocabulary lists. In this research, we analyze four popular Russian-language lexical lists commonly employed in education and lexical knowledge analysis. We hypothesize that texts produced by students assessed at lower language proficiency level will predominantly contain frequently used words and low-level vocabulary items corresponding to the distribution of lexical elements by frequency or language proficiency level represented in these lists. Conversely, students assessed at a higher proficiency level are expected to employ less commonly used and more complex lexical units. By examining the correlation between these resources and student-generated texts, we aim to gain insight into the suitability of using lexical lists for evaluating proficiency in Russian.
For the analysis of correlations between the selected lexical lists and student texts, we employ custom Python scripts. Additionally, we utilize a cluster analysis method known as Principal Component Analysis (PCA) to test the hypothesis that students at the same proficiency level tend to use a similar basic vocabulary with some degree of variation.
This study raises important questions about the effectiveness of using lexical lists for assessing language proficiency. The findings may serve as a foundation for developing more accurate and comprehensive methods for evaluating lexical proficiency among students learning the Russian language.
About the Author
A. Y. VakhranevRussian Federation
Anton Y. Vakhranev, PhD postgraduate student
Moscow
References
1. Abdi, Herve, and Lynne J. Williams. “Principal component analysis.” Wiley interdisciplinary reviews: computational statistics 2, no. 4 (2010): 433–459.
2. American Council on the Teaching of Foreign Languages (ACTFL) proficiency guidelines, [Online] Available: https://www.actfl.org/resources/actfl-proficiency-guidelines-2012
3. Bast, R., et al. (2018–2023). SMARTool. [Online] Available: https://smartool.github.io/smartoolrus-eng/. (Retrieved June 26, 2023).
4. Callies, M., & Götz, S. (2015). Learner corpora in language testing and assessment: Prospects and challenges. Learner corpora in language testing and assessment, 1-9.
5. Clancy, S. (2014–2023). Visualizing Russian. [Online] Available: https://digitalhumanities.fas.harvard.edu/project/visualizing-russian/. (Retrieved June 26, 2023).
6. Common European Framework of Reference for Languages (CEFR, Council of Europe) proficiency scale, [Online] Available: https://www.coe.int/en/web/common-european-framework-reference-languages/level-descriptions
7. Dice, Lee R. (1945). “Measures of the Amount of Ecologic Association Between Species”. Ecology. 26 (3): 297–302. [Online] doi:10.2307/1932409. JSTOR 1932409.
8. Golubeva, А. (2015). Leksičeskij minimum po russkomu kak inostrannomu [Lexical Minimum for Russian as a Foreign Language]. Zlatoust: Saint Petersburg.
9. Hawkins, J. A., & Buttery, P. (2010). Criterial features in learner corpora: Theory and illustrations. English Profile Journal, 1, e5.
10. Lyashevksaya, O.N., Sharoff, S.A.: Frequency Dictionary of Modern Russian Language (based on materials from the National Corpus of the Russian Language). M.: Azbukovnik, 2009.
11. Janda, Laura A. & Francis M. Tyers. 2018. Less is More: Why All Paradigms are Defective, and Why that is a Good Thing. Corpus Linguistics and Linguistic Theory 14(2), 33pp. [Online] doi org/10.1515/cllt-2018-0031
12. Kilgarriff, A., et al. (2014). Corpus-based vocabulary lists for language learners for nine languages. Language Resources and Evaluation, 48, pp. 121–163. [Online] DOI 10.1007/s10579-013-9251-2
13. Laposhina, A. (November 2018). Opyt ėksperimental’nogo issledovaniya slozhnosti tekstov po RKI [Experience of Experimental Research of Text Complexity in RFL]. Conference: Dinamika yazykovykh i kul’turnykh protsessov v sovremennoy Rossii [Electronic resource]. Vyp. 6. Materialy VI Kongressa ROPRYAL (g. Ufa, 11–14 oktyabrya 2018 goda) At: SPB.
14. Laposhina, A. N., & Lebedeva, M. Y. (2021). Tekstometr: onlayn-instrument opredeleniya urovnya slozhnosti teksta po russkomu yazyku kak inostrannomu [Textometr: online tool for determining the complexity level of texts in Russian as a foreign language]. Russian Language Studies, 19(3), 331-345. [Online] DOI: 10.22363/2618-8163-2021-19-3-331-345.
15. Looman, J.; Campbell, J.B. (1960). “Adaptation of Sorensen’s K (1948) for estimating unit affinities in prairie vegetation”. Ecology. 41 (3): 409–416. doi:10.2307/1933315. JSTOR 1933315.
16. Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International journal of corpus linguistics, 15(4), 474–496.
17. Lyashevskaya, O. N., et al. (2010). Otsenka metodov avtomaticheskogo analiza teksta: morfologicheskie parsery russkogo yazyka [Assessment of Text Analysis Methods: Morphological Parsers of the Russian Language]. Computational Linguistics and Intellectual Technologies. Proceedings of the Annual International Conference “Dialogue” (2010), Volume 9, Issue 16, pp. 318-326. Moscow: RSUH.
18. Ondov, Brian D., et al. “Mash: fast genome and metagenome distance estimation using MinHash.” Genome biology 17.1 (2016): 1–14.
19. Sorensen, T. (1948). “A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons”. Kongelige Danske Videnskabernes Selskab. 5 (4): 1–34.
20. Tschirner, Erwin, Bärenfänger, Olaf & Wanner, Irmgard. 2012. Assessing Evidence of Validity of Assigning CEFR Ratings to the ACTFL Oral Proficiency Interview (OPI) and the Oral Proficiency Interview by Computer (OPIc). Leipzig: Universitat Leipzig, Herder-Institut.
21. UDPipe 1 Models. 2021. (30 June 2021).
Review
For citations:
Vakhranev A.Y. Assessment of the Reliability of Lexical Lists for Automated Evaluation of Proficiency in Russian as a Foreign Language. NSU Vestnik. Series: Linguistics and Intercultural Communication. 2024;22(3):84-97. (In Russ.) https://doi.org/10.25205/1818-7935-2024-22-3-84-97