Preview

NSU Vestnik. Series: Linguistics and Intercultural Communication

Advanced search

Using Semantic Relatedness Measures in Coreference Resolution for Russian

https://doi.org/10.25205/1818-7935-2019-17-1-65-77

Abstract

The paper is devoted to the role of semantic information (in the form of semantic relatedness measures) in coreference resolution for the Russian language. It describes a series of experiments in calculating metrics of semantic relatedness based on Russian material and evaluating the possibility of using them in systems of natural language processing, as well as the performance of such systems. The goal of the first stage of experiments was to find out, which semantic relatedness measures better correspond to coreference relations between referential expressions. For this purpose, several metrics calculated from different parameters were chosen and evaluated on the test set, derived from the Russian coreference corpus RuCor. Semantic data for the metrics was obtained from two sources: Russian Wikipedia and RuThes thesaurus. The results showed that while RuThes provided more reliable data for common nouns, Wikipedia data correlated better with named entities. Based on the obtained results, metrics that corresponded to coreference relations the most were chosen to be implemented during the next stage of experiments. For the second stage of experiments a machine-learning based coreference resolution system that could use semantic relatedness measures as features was developed, based on the decision trees classification algorithm. Four versions of the system were tested: without any features derived from semantic information, with features derived from only one of the sources, and with features derived from both sources. Tests were performed on the subset of RuCor corpus that already included gold standard mark-up as the base for evaluation. The tests showed noticeable improvement for the version that was using semantic information from both data sources. The experiments made demonstrate the increase of quality of coreference resolution with the implementation of features based on semantic information. The results obtained are comparable to or exceed the ones described in similar papers on the topic of Russian coreference resolution.

About the Author

I. L. Azerkovich
National Research University Higher School of Economics
Russian Federation


References

1. Крюков К. В., Панкова Л. А., Пронина В. А., Суховеров В. С., Шипилина Л. Б. Меры семантической близости в онтологии // Проблемы управления. 2010. Вып. 5. С. 2-14

2. Лукашевич Н. В. Тезаурусы в задачах информационного поиска. М.: Изд-во Моск. ун-та, 2011

3. Толпегин П. В. Автоматическое разрешение кореференции местоимений третьего лица русскоязычных текстов: Дис. ... канд. техн. наук. М., 2008

4. Azerkovich, I. Employing Wikipedia data for coreference resolution in Russian. Artificial Intelligence and Natural Language. AINL 2017. Series: Communications in Computer and Information Science, 2018, vol. 789, p. 107-112.

5. Bagga, A., Baldwin, B. Algorithms for scoring coreference chains. In: The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, 1998, vol. 1, p. 563-566.

6. Banerjee, S., Pedersen, T. Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2003, p. 805-810.

7. Bogdanov, A. V., Dzhumaev, S. S., Skorinkin, D. A., Starostin, A. S. Anaphora analysis based on ABBYY Compreno linguistic technologies. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 89-102.

8. Budanitsky, A., Hirst, G. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 2006, vol. 32 (1), p. 13-47.

9. Haghighi, A., Klein, D. Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2009, vol. 3, p. 1152-1161.

10. Harabagiu, S. M., Bunescu, R. C., Maiorano, S. J. Text and knowledge mining for coreference resolution. In: Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies. Association for Computational Linguistics, 2001, p. 1-8.

11. Kamenskaya, M. A., Khramoin, I. V., Smirnov, I. V. Data driven methods for anaphora resolution of Russian texts. Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 241-250.

12. Kameyama, M. Recognizing referential links: an information extraction perspective. In: Proceedings of the ACL'97/EACL'97 workshop on Operational factors in practical, robust anaphora resolution. Madrid, Spain, 1997, p. 46-53.

13. Kutuzov, A. B., Ionov M. The impact of morphology processing quality on automated anaphora resolution for Russian. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 232-240.

14. Leacock, C., Chodorow, M. Combining local context and WordNet similarity for word sense identification. In: WordNet. An Electronic Lexical Database. MIT Press, 1998, p. 265-283.

15. Lesk, M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual Conference on Systems Documentation. ACM, 1986, p. 24-26.

16. Ng, V., Cardie, C. Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002, p. 104-111.

17. Ponzetto, S. P., Strube, M. Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 2006, p. 192-199.

18. Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y. CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012, p. 1-40.

19. Rada, R., Mili, H., Bicknell, E., Blettner, M. Development and application of a metric to semantic nets. In: IEEE Transactions on Systems, Man and Cybernetics, 1989, iss. 19 (1), p. 17-30.

20. Rahman, A., Ng, V. Coreference resolution with world knowledge. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2011, vol. 1, p. 814-824.

21. Resnik, P. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the Fourteenth International. Joint Conference on Artificial Intelligence. IJCAII, 1995, vol. 1, p. 448-453.

22. Seco, N., Veale, T., Hayes, J. An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of the 16th European Conference on Artificial Intelligence. IOS Press, 2004, p. 1089-1090.

23. Sharoff, S., Nivre, J. The proper place of men and machines in language technology: Processing Russian without any linguistic knowledge. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2011, vol. 10 (17), p. 591-605.

24. Soon, W. M., Ng, H. T., Lim, D. C. Y. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 2001, vol. 27, no. 4, p. 521-544.

25. Toldova, S. Ju. Roytberg, A., Ladygina, A. A., Vasilyeva, M. D., Azerkovich, I. L., Kurzukov, M., Sim, G., Gorshkov, D. V., Ivanova, A., Nedoluzhko, A., Grishina, Y. Ru-Eval-2014: Evaluating anaphora and coreference resolution for Russian. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 681-694.

26. Toldova, S., Grishina, Yu., Ladygina, A., Vasilyeva, M., Sim, G., Azerkovich, I. Russian coreference corpus. In: Almeida F. A., Barrera I.O., Toledo E. Q. (eds.). Input a Word, Analyze the World: Selected Approaches to Corpus Linguistics. Cambridge Scholars Publishing, 2016, p. 107-124.

27. Toldova, S., Ionov, M. Coreference Resolution for Russian: The Impact of Semantic Features. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2017, vol. 16 (23), p. 339-348.

28. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L. A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Conference on Message Understanding, ser. MUC6 ’95. Stroudsburg, PA, USA: Association for Computational Linguistics, 1995, p. 45-52.

29. Wu, Z., Palmer, M. Verb semantics and lexical selection. In: Proceedings of ACL-94, 1994, p. 133-138.

30. Yeh, E., Ramage, D., Manning, C. D., Agirre, E., Soroa, A. WikiWalk: random walks on Wikipedia for semantic relatedness. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing. Association for Computational Linguistics, 2009, p. 41-49.


Review

For citations:


Azerkovich I.L. Using Semantic Relatedness Measures in Coreference Resolution for Russian. NSU Vestnik. Series: Linguistics and Intercultural Communication. 2019;17(1):65-77. (In Russ.) https://doi.org/10.25205/1818-7935-2019-17-1-65-77

Views: 214


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7935 (Print)