Использование мер семантической близости для распознавания кореференции в русском языке

И. Л. Азеркович

doi:10.25205/1818-7935-2019-17-1-65-77

Использование мер семантической близости для распознавания кореференции в русском языке

И. Л. Азеркович

https://doi.org/10.25205/1818-7935-2019-17-1-65-77

Полный текст:

PDF (Rus)

сгенерировать QR код

Аннотация

Статья посвящена описанию серии экспериментов по исследованию роли семантической информации в разрешении кореферентных связей для русского языка, ее использованию в системах автоматического анализа текстов и оценке результатов их работы. Целью первого этапа экспериментов было определить, какие метрики семантической близости между референциальными выражениями больше соответствуют кореферентным связям между ними. Подсчет метрик производился на материале русской Википедии и тезауруса RuThes. На втором этапе была разработана система автоматического распознавания кореферентности, использующая метрики семантической близости в качестве признаков для машинного обучения, и оценено качество ее работы. Результаты проведенных экспериментов позволяют установить метрики семантической близости, подходящие для использования в системах разрешения кореферентности, а также демонстрируют повышение качества работы подобных систем при использовании семантической информации.

Ключевые слова

автоматическая обработка естественного языка, распознавание кореферентности, метрики семантической близости, машинное обучение, русский язык

Об авторе

И. Л. Азеркович

Национальный исследовательский университет «Высшая школа экономики»
Россия

Список литературы

1. Крюков К. В., Панкова Л. А., Пронина В. А., Суховеров В. С., Шипилина Л. Б. Меры семантической близости в онтологии // Проблемы управления. 2010. Вып. 5. С. 2-14

2. Лукашевич Н. В. Тезаурусы в задачах информационного поиска. М.: Изд-во Моск. ун-та, 2011

3. Толпегин П. В. Автоматическое разрешение кореференции местоимений третьего лица русскоязычных текстов: Дис. ... канд. техн. наук. М., 2008

4. Azerkovich, I. Employing Wikipedia data for coreference resolution in Russian. Artificial Intelligence and Natural Language. AINL 2017. Series: Communications in Computer and Information Science, 2018, vol. 789, p. 107-112.

5. Bagga, A., Baldwin, B. Algorithms for scoring coreference chains. In: The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, 1998, vol. 1, p. 563-566.

6. Banerjee, S., Pedersen, T. Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2003, p. 805-810.

7. Bogdanov, A. V., Dzhumaev, S. S., Skorinkin, D. A., Starostin, A. S. Anaphora analysis based on ABBYY Compreno linguistic technologies. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 89-102.

8. Budanitsky, A., Hirst, G. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 2006, vol. 32 (1), p. 13-47.

9. Haghighi, A., Klein, D. Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2009, vol. 3, p. 1152-1161.

10. Harabagiu, S. M., Bunescu, R. C., Maiorano, S. J. Text and knowledge mining for coreference resolution. In: Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies. Association for Computational Linguistics, 2001, p. 1-8.

11. Kamenskaya, M. A., Khramoin, I. V., Smirnov, I. V. Data driven methods for anaphora resolution of Russian texts. Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 241-250.

12. Kameyama, M. Recognizing referential links: an information extraction perspective. In: Proceedings of the ACL'97/EACL'97 workshop on Operational factors in practical, robust anaphora resolution. Madrid, Spain, 1997, p. 46-53.

13. Kutuzov, A. B., Ionov M. The impact of morphology processing quality on automated anaphora resolution for Russian. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 232-240.

14. Leacock, C., Chodorow, M. Combining local context and WordNet similarity for word sense identification. In: WordNet. An Electronic Lexical Database. MIT Press, 1998, p. 265-283.

15. Lesk, M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual Conference on Systems Documentation. ACM, 1986, p. 24-26.

16. Ng, V., Cardie, C. Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002, p. 104-111.

17. Ponzetto, S. P., Strube, M. Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 2006, p. 192-199.

18. Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y. CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012, p. 1-40.

19. Rada, R., Mili, H., Bicknell, E., Blettner, M. Development and application of a metric to semantic nets. In: IEEE Transactions on Systems, Man and Cybernetics, 1989, iss. 19 (1), p. 17-30.

20. Rahman, A., Ng, V. Coreference resolution with world knowledge. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2011, vol. 1, p. 814-824.

21. Resnik, P. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the Fourteenth International. Joint Conference on Artificial Intelligence. IJCAII, 1995, vol. 1, p. 448-453.

22. Seco, N., Veale, T., Hayes, J. An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of the 16th European Conference on Artificial Intelligence. IOS Press, 2004, p. 1089-1090.

23. Sharoff, S., Nivre, J. The proper place of men and machines in language technology: Processing Russian without any linguistic knowledge. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2011, vol. 10 (17), p. 591-605.

24. Soon, W. M., Ng, H. T., Lim, D. C. Y. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 2001, vol. 27, no. 4, p. 521-544.

25. Toldova, S. Ju. Roytberg, A., Ladygina, A. A., Vasilyeva, M. D., Azerkovich, I. L., Kurzukov, M., Sim, G., Gorshkov, D. V., Ivanova, A., Nedoluzhko, A., Grishina, Y. Ru-Eval-2014: Evaluating anaphora and coreference resolution for Russian. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2014, vol. 13 (20), p. 681-694.

26. Toldova, S., Grishina, Yu., Ladygina, A., Vasilyeva, M., Sim, G., Azerkovich, I. Russian coreference corpus. In: Almeida F. A., Barrera I.O., Toledo E. Q. (eds.). Input a Word, Analyze the World: Selected Approaches to Corpus Linguistics. Cambridge Scholars Publishing, 2016, p. 107-124.

27. Toldova, S., Ionov, M. Coreference Resolution for Russian: The Impact of Semantic Features. In: Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2014”. Moscow, 2017, vol. 16 (23), p. 339-348.

28. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L. A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Conference on Message Understanding, ser. MUC6 ’95. Stroudsburg, PA, USA: Association for Computational Linguistics, 1995, p. 45-52.

29. Wu, Z., Palmer, M. Verb semantics and lexical selection. In: Proceedings of ACL-94, 1994, p. 133-138.

30. Yeh, E., Ramage, D., Manning, C. D., Agirre, E., Soroa, A. WikiWalk: random walks on Wikipedia for semantic relatedness. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing. Association for Computational Linguistics, 2009, p. 41-49.

Рецензия

Для цитирования:

Азеркович И.Л. Использование мер семантической близости для распознавания кореференции в русском языке. Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация. 2019;17(1):65-77. https://doi.org/10.25205/1818-7935-2019-17-1-65-77

For citation:

Azerkovich I.L. Using Semantic Relatedness Measures in Coreference Resolution for Russian. NSU Vestnik. Series: Linguistics and Intercultural Communication. 2019;17(1):65-77. (In Russ.) https://doi.org/10.25205/1818-7935-2019-17-1-65-77

JATS XML

Контент доступен под лицензией Creative Commons Attribution 4.0 License.

ISSN 1818-7935 (Print)

Логин
Пароль
	Запомнить меня
Регистрация нового пользователя Забыли Ваш пароль?

Войти

Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация

Использование мер семантической близости для распознавания кореференции в русском языке

Полный текст:

Аннотация

Ключевые слова

Об авторе

Список литературы

Рецензия

Для цитирования:

For citation:

Использование куки-файлов