A New Approach to Automatic Detection and Correction of Derivational Errors in L2 Russian

A. S. Vyrenkova; I. Yu. Smirnov

doi:10.25205/1818-7935-2021-19-3-57-68

A New Approach to Automatic Detection and Correction of Derivational Errors in L2 Russian

A. S. Vyrenkova, I. Yu. Smirnov

https://doi.org/10.25205/1818-7935-2021-19-3-57-68

Full Text:

PDF (Rus) |

Generate QR code

Abstract

Learner corpora serve as one of the most valuable sources of statistical data on learners' errors. For instance, data from foreign-language learners’ corpora can be used for the Second Language Acquisition research. However, corpora representativity strongly depends on the quality of its error markup, which is most frequently carried out manually and thus presents a time-consuming and painstaking routine for the annotators. To make annotation process easier, additional tools, such as spellcheckers, are usually used. This paper focuses on developing a program for automatic correction of derivational errors made by learners of Russian as a foreign language. Derivational errors, which are not common for adult Russian native speakers (L1), but occur quite often in written texts or speech of Russian as foreign language learners (L2) [Chernigovskaya, Gor, 2000], were chosen as scope of our research because correction of such mistakes presents a formidable challenge for existing spellcheckers. Using the data from the Russian Learner Corpus (http://www.web-corpora.net/RLC/), we tested two already existing approaches to solve such kind of problems. The first one is based on a finite state automaton principle developed by Dickinson and Herring 2008, and it was test-ed as algorithm for derivational errors detection. The second one which relies on the Noisy Channel model by Brill and Moore, 2000, was used for studying errors correction. After we analyzed effectiveness of these tests, we developed our own system for autocorrection of derivational errors. In our program the algorithm of Dickinson and Herring was used as word-formation error detection module. The Noisy Channel model has been rejected, and we decided to use instead the Continuous Bag of Words FastText model, based on Harris distributional semantics theory [1954]. In addition, filtering rules have been developed for correcting frequent errors that the model is unable to handle. To restore automatically the correct grammatical word form, dictionary of word paradigms is used. Model results were validated on the data of Russian Learner Corpus.

Keywords

derivational errors, machine learning, automatic error detection, automatic error correction, Russian as a foreign language, learner corpus, corpus annotation

About the Authors

A. S. Vyrenkova

HSE University
Russian Federation

I. Yu. Smirnov

HSE University
Russian Federation

References

1. Копотев М. Введение в корпусную лингвистику: электрон. учеб. пособие для студентов филологических и лингвистических специальностей университетов. Praha: Animedia, 2014.

2. Amaral, L., Detmar, M. Where does ICALL Fit into Foreign Language Teaching? In: Talk given at CALICO Conference. University of Hawaii, 2006.

3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, vol. 5, p. 135–146.

4. Brill, E., Moore, R. An Improved Error Model for Noisy Channel Spelling Correction. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000, p. 286–293.

5. Chernigovskaya T., Gor K. The Complexity of Paradigm and Input Frequencies in Native and Second Language Verbal Processing: Evidence from Russian. Language and Language Behavior (Eds. Erling Wande & Tatiana Chernigovskaya), 2000, p. 20–37.

6. Church, K., Gale, W. Probability scoring for spelling correction. Statistics and Computing, 1991, vol. 1, p. 93–103

7. Dickinson, M., Herring, J. Developing Online ICALL Resources for Russian. The 3rd workshop on innovative use of NLP for building educational applications, Columbus, OH, 2008, p. 1–9.

8. Granger, S. From CA to CIA and back: An integrated contrastive approach to computerized bilin-gual and learner corpora. In: Languages in Contrast. Text-based cross-linguistic studies, Lund University Press, 1996, p. 37–51.

9. Granger, S. Learner Corpora in Foreign Language Education. In: Language, Education and Technology, 2017, p. 1–14. DOI 10.1007/978-3-319-02328-1_33-1.

10. Harris, Z. Distributional Structure. WORD, 1954, vol. 10, iss. 2–3, p. 146–162. DOI 10.1080/00437956.1954.11659520

11. Heift, T., Devlan, N. Web delivery of adaptive and interactive language tutoring. International Journal of Artificial Intelligence in Education, 2001, vol. 12 (4), p. 310–325.

12. Kernighan, M., Church, K., Gale, W. A Spelling Correction Program Based on a Noisy Channel Model. COLING-90, 1990, p. 205–210. DOI 10.3115/997939.997975.

13. Kopotev, M. Introduction to Corpus linguistics: Course-book for students of arts subjects with emphasis on the Russian language. Praha, Animedia, 2014. (in Russ.)

14. Kutuzov, A., Kuzmenko, E. WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models. Ignatov D. et al. (eds) Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, 2017, vol. 661. Springer, Cham.

15. Leacock, C., Chodorow, M., Gamon, M., Tetreau, J. Automated Grammatical Error Detection for Language Learners, 2nd ed. Synthesis Lectures on Human Language Technologies. 2014, vol. 7, p. 1–185. DOI 10.2200/S00562ED1V01Y201401HLT025

16. Nagata, N. An Effective Application of Natural Language. Processing in Second Language Instruction. CALICO Journal, 1995.

17. Paquot, M., Jarvis, S. Learner corpora and native language identification, 2015. DOI 10.1017/CBO9781139649414.027.

18. Rudzewitz, B., Ziai, R., De Kuthy, K., Möller, V., Nuxoll, F., Detmar, M. Generating Feedback for English Foreign Language Exercises. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 2018, p. 127–136.

19. Shannon, C. A Mathematical Theory of Communication. Bell System Technical Journal, 1948, vol. 27, p. 379–423.

20. Shavrina, T., Shapovalova, O. To the methodology of corpus construction for machine learning: «Taiga» syntax tree corpus and parser. Proceedings of international conference CORPO-RA2017, 2017, p. 78–84.

21. Sorokin, A., Baytin, A., Galinskaya, I., Rykunova, E., Shavrina, T. SpellRuEval: the First Com-petition on Automatic Spelling Correction for Russian. Computational Linguistics and Intellec-tual Technologies Proceedings of the Annual International Conference “Dialogue”, 2016, p. 660–673.

22. Valdes, G. The teaching of heritage languages: an introduction for Slavic-teaching professionals. Slavica, Bloomington, 2000, p. 375–403.

Review

For citations:

Vyrenkova A.S., Smirnov I.Yu. A New Approach to Automatic Detection and Correction of Derivational Errors in L2 Russian. NSU Vestnik. Series: Linguistics and Intercultural Communication. 2021;19(3):57-68. (In Russ.) https://doi.org/10.25205/1818-7935-2021-19-3-57-68

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1818-7935 (Print)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

NSU Vestnik. Series: Linguistics and Intercultural Communication

A New Approach to Automatic Detection and Correction of Derivational Errors in L2 Russian

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy