Vol 17, No 1 (2019)
View or download the full issue
PDF (Russian)
ПРИКЛАДНАЯ И КОМПЬЮТЕРНАЯ ЛИНГВИСТИКА
Combinatorics of Rhyming Variants in Sonnets by R. M. Rilke: A Quantitative and Typological Approach
5-20 209
Abstract
Combinatorial analysis of the German sonnet corpus based on mathematical study of verse was carried out on poetic works by Rainer Maria Rilke. Sonnets rhyming methods classification based on their symmetry relations in quatrain and tercet parts was constructed. According to this classification, which does not take into consideration the so-called irregular sonnets, the concepts of symmetric, partially symmetric and asymmetrical sonnets are introduced and the possible quantity of sonnets of each type is calculated by means of the combinatorial method. Taking into account these notions, the unique combinatorial potential of the poet has been revealed, and the evolution of Rilke’s sonnets structure from “New poems” (“Neue Gedichte”) to “Sonnets to Orpheus” (“Sonette an Orpheus”) is also investigated. The notions of symmetrical and independent sonnets are introduced. Using these notions, it is shown that “New poems” contain just independent sonnets, which have non-recurring rhyming variants, while in “Sonnets to Orpheus” this principle is not respected. The nuclear part of sonnets, consisting of those independent sonnets that generate the whole amount of Rilke’ sonnets (i. e., the entire collection of independent and dependent sonnets), has been determined. Eight sonnets adjacent to the “Sonnets to Orpheus” are also taken into account in the corresponding calculations, too. Special attention is paid to symmetrical sonnets. The structure of all 18 symmetric Rilke’s sonnets is described, among which 11 are independent (four sonnets in “New poems”, and seven sonnets in “Sonnets to Orpheus”). As an example, the stylometric analysis of the rhythmic field of one of the symmetrical sonnets was performed. It is shown that its internal rhythm is close to the structure of the golden section. A hypothesis is expressed concerning the influence of rhyming variants on the external and internal rhythms of the sonnet.
21-48 256
Abstract
Linguistic corpora and computer technologies make it possible to search and analyze large amount of unstructured texts. This paper describes in detail the method we used to extract adjectives of color from the poetic texts found in the Poetry corpus of the Russian National Corpus and from various Internet sources. Using a base of 180 lexical units extracted from the poetic texts of 36 authors, we devised a categorization scheme for adjectives of color; this scheme also incorporated data obtained from the Hermitage Museum information system. It includes five categories, with the largest (“Derivatives”) broken down into three subcategories. Our paper further provides quantitative data indicating the extent to which the elements of the different categories are represented in the texts; and from this data we were able to draw a preliminary conclusion vis-à-vis the use of adjectives of color by various authors. Specifically, we compared the frequency of use of basic adjectives of color (белый - white; чёрный - black; красный - red; зелёный - green; жёлтый - yellow; синий - blue; голубой - light blue; коричневый - brown; оранжевый - orange; розовый - pink; and фиолетовый - violet) in the texts of four corpora of the Russian National Corpus, i.e., the Basic corpus, Newspaper corpus, Oral corpus, and Poetry corpus. Our paper describes some patterns of frequency behavior for adjectives denoting colors in Russian poetic texts. We arranged the adjectives retrieved from each corpus in order of decreasing frequency and found that, in all four corpora, the order of the adjectives of color largely correlates with the evolutionary theory of Berlin - Kay which describes the order of appearance of color adjectives in the historical development of different languages. The comparison showed that the frequency of adjectives of color in the Poetry corpus is significantly higher than in the other three corpora. In addition, we searched the information system of the State Hermitage Museum and established that the frequency correlation between adjectives of color and the Berlin - Kay evolutionary model is expressed there weaker than in the RNC corpora. Also, in the course of our study, we found a few semantic tagging errors in the Russian National Corpus. The patterns of frequency behavior of color adjectives revealed in the Russian language may become a reliable basis for further research. Their classification needs more attention too.
49-64 267
Abstract
Articulatory and acoustic variation has recently become one of the most prominent spheres in phonetics. The acoustics of fricative sounds, sibilants in particular, however, is known to be very difficult to study. As far as we are concerned, no algorithm has been created yet to estimate the degree of acoustic variation of fricatives, neither between the speakers of one language, nor between languages. In this article, we try to estimate the degree of acoustic variability of the alveolar sibilant s . We were interested in creating and evaluating algorithm for estimation interlanguage variability and make data from different languages comparable. We also were interested in estimation correlation between the phonological complexity of sibilant subsystem and variability of s sound. We analyze s in similar contexts pronounced by several female speakers of five unrelated languages of Russia: Adyghe, Nanai, Russian, Udmurt and Chukchi. All pronunciations were manually annotated, and then spectral information were automatically extracted and transformed via Linear Predicting Coding. The obtained spectral slices were analyzed and ten different features were extracted: frequency of the first peak (in Hertz and Bark), slope of the linear regression based on values from global minimum before peak (annotated manually) to peak itself, center of gravity (in Hertz and Bark), standard deviation (in Hertz and Bark), skewness and kurtosis. Since it is hard to analyze all these features separately, we used Principal Component Analysis transformation for reducing number of variables. Cumulative percentage of the dispersion explained by first and second Principal Components is equal to 65 %. At the end we show how it is possible to use obtained Principal Components for measuring variability and comparing different utterance of alveolar sibilants. As a result we achieve some goals we planned: 1) we developed the algorithm for variability analysis that could be used in any other field of acoustics; 2) our analysis shows that some speakers could be more variable then the whole languages; 3) the analysis of our data using this algorithm shows that Nanai and Chukchi is more variable comparing to other variables. This also corresponds to the least complex sibilant subsystems.
65-77 182
Abstract
The paper is devoted to the role of semantic information (in the form of semantic relatedness measures) in coreference resolution for the Russian language. It describes a series of experiments in calculating metrics of semantic relatedness based on Russian material and evaluating the possibility of using them in systems of natural language processing, as well as the performance of such systems. The goal of the first stage of experiments was to find out, which semantic relatedness measures better correspond to coreference relations between referential expressions. For this purpose, several metrics calculated from different parameters were chosen and evaluated on the test set, derived from the Russian coreference corpus RuCor. Semantic data for the metrics was obtained from two sources: Russian Wikipedia and RuThes thesaurus. The results showed that while RuThes provided more reliable data for common nouns, Wikipedia data correlated better with named entities. Based on the obtained results, metrics that corresponded to coreference relations the most were chosen to be implemented during the next stage of experiments. For the second stage of experiments a machine-learning based coreference resolution system that could use semantic relatedness measures as features was developed, based on the decision trees classification algorithm. Four versions of the system were tested: without any features derived from semantic information, with features derived from only one of the sources, and with features derived from both sources. Tests were performed on the subset of RuCor corpus that already included gold standard mark-up as the base for evaluation. The tests showed noticeable improvement for the version that was using semantic information from both data sources. The experiments made demonstrate the increase of quality of coreference resolution with the implementation of features based on semantic information. The results obtained are comparable to or exceed the ones described in similar papers on the topic of Russian coreference resolution.
78-89 245
Abstract
This paper presents an overview of the existing approaches in Russia and abroad to the compilation of minimal vocabulary lists. Special attention is paid to the English, as well as to the German scientific tradition. The purpose of the overview is to track and compare the underlying lexicographical and lingvo-didactic trends from the beginning of the 20th century until now and to define the criteria for making a list that would match the expectations of the modern user. The first section of the article provides a definition of the notion lexical minimum and introduces the parameters of comparison for the wordlists under discussion. By lexical minimum we understand not only foreign learners’ list but more broadly any wordlist which is compiled by minimization of the lexicon on the basis of statistical, pragmatic or mixed criteria. The wordlists are compared across four parameters: purpose of the list (general service list, theoretical, lingvo-didactic), approach to compilation (statistical, pragmatic, mixed), sources of data (corpora, textbooks, questionnaires etc.) and text coverage in percent. The second section discusses the existing approaches to the lexical minima compilation in Russia with emphasis on pedagogical aspects which are prominent in the Russian tradition. The second section discusses the approaches used in German and English lexicographical traditions focusing on the problem of defining core vocabulary and compiling general service lists. The closing section of the article compares Russian and foreign traditions and summarizes the overview. The present overview suggests that the creation of minimal vocabulary lists requires a combination of both statistical and communicatively oriented methods. In addition to that, given the recent development of large corpora the new challenge arises as to provide a stylistically diverse and balanced corpus or a number of corpora that would serve as a proper vocabulary basis for a vocabulary list. Thus, in addition to the fiction texts, authors should include oral corpus data, as well as newspaper, art and academic sources, and internet speech.
90-101 200
Abstract
Traditionally, text and literature abstracts analysis is associated with the individual perception and creative interpretation of the researcher. Though many software applications are being created for multidimensional text analysis from the point of view of verification of authorship, identifying the compatibility of words and emotional coloring of the text, etc., the conservative approach prevails. No matter how many different types of software for the Humanities (such as sociology, psychology and management) there exist, they are still rarely used in philological research while interpreting a text. The current article provides a critical review of technology enhanced philological analysis. Various computer programs were successfully tested on the material of Russian and foreign literature, and later on used to train future specialists from the Humanities department (National Research University “Higher School of Economics”, Nizhny Novgorod). The accumulated experience allowed us to obtain a reliable data and perform a comparative analysis of computer programs, such as AntConc concordancer, LEKTA, LF Alighner, TextAnalyst SDK, a multifunctional content analyzer. The article describes a survey conducted at NRU HSE to address the need to move from traditional collection of authentic materials and analysis of various discourses to the widespread use of web-based tools designed to accelerate this process and follow the contemporary requirements of the Digital Humanities. Brief prefaces of the term papers and the theses devoted to various methods of text analysis based on software tools are considered and described in detail. Specific examples emphasize the need for a selective approach to publicly available linguistic corpora and highly specialized concordance programs, as well as to tools for aligning translated texts and frequency analyzers. However, the use of computer tools in text analysis should be combined and completed with traditional methods of linguistic and stylistic interpretation. The purpose is to achieve the necessary balance in the use of software resources which can mean optimizing the comparative analysis of original or translated texts, along with the definition of new criteria for assessing the representativeness and validity of the results of literary or linguistic studies. Though the proposed gap filling between the traditional and innovative techniques of text interpretation is rather novice, computer assisted text analysis is promising and tends to be in the mainstream of comparative linguistic and stylistic research.
ЛИНГВОКУЛЬТУРНАЯ КОГНИТИВИСТИКА И ПСИХОЛИНГВИСТИКА
102-114 203
Abstract
The article is devoted to lyubopytstvo ‘curiosity’ as a language-specific word in order to show its specific conceptual configuration in the Russian language consciousness. In this regard, the National Russian Corpus is more appropriate, because a conceptual configuration of an analyzed concept is not present in a “finished” form in any single utterance, but may be reconstructed only on the totality of all possible utterances. It can be manifested in many different ways: distribution, ability to accumulate some Russian “key ideas”, predisposition to be associated with some emotional attitudes, concepts, propositional and metaphorical models. According to the National Russian Corpus, curiosity is usually felt for everything which may be of interest, and defies the imagination: another man’s life, news and policy, death, abroad and foreigners, origins and workings of the universe, friends’ husband’s salary, danger and suspense, someone’s life stories, scientific discoveries, etc. In different contexts, curiosity is defined in relation to interest, surprise, excitement, hope, desire, idleness, sin, etc., that allow us to reconstruct some conventional situations of curiosity, as well as related feelings, acts, opinions and axiological norms, in conformity with different “conceptual schemas” of curiosity as a cognitive interest, boredom, idleness, or sin. The propositional model provides information that predicates applied to lyubopytstvo ‘curiosity’ vary with the position in the syntactic structure of the proposition. As a semantic object curiosity is felt, constrained, excited, masked, and satisfied; as a semantic subject it appears, covers, grows, and encourages. In the metaphoric mapping, lyubopytstvo ‘curiosity’ is redefined over categorical boundaries in terms of a propositional model appropriated for an inner voice, a human being, a living creature, an inevitable force, or a flammable mixture. By analogy with an inner voice it calls, tells and counsels; by analogy with a living creature it is waking up, brings out; by analogy with the beast it gnaws and bites; by analogy with an inevitable force it covers, overcomes, leads and wins; by analogy with a peculiarly flammable mixture it inflames and burns. Such use becomes so common that native speakers don’t pay more attention to metaphorical expressions like curiosity killed someone or to burn and consume oneself in curiosity but take them almost for the authentic characteristic of curiosity.
115-124 195
Abstract
The article is aimed at revealing some peculiarities of the Ostrogothic language of the 6th century which are preserved in the Latin text of the Getica on the mental level of text formation. This text being written by a Goth who was not a native Latin speaker is sure to contain some Ostrogothic language features accessible for observation due to the phenomenon of linguistic interference. Mother tongue inevitably influences the process of text formation when it is carried out in a foreign language and the manifestations of its realization can be traced in accordance with the principles of linguistic reconstruction. Purpose. The idea of the article is to show that the procedure of linguistic reconstruction can be enlarged according to the achievements of contemporary linguistics comprising the theory of text and its mental foundations such as textual mentality. The article deals with one of the reconstructed manifestations of the Ostrogothic language of the 6th century, i.e. Jordanes’ tropology.Tropology seems to be most sensitive to mental interpretation resulting in its specific realization in the text. It is stated that the tropology of the the Latin text of the Getica is conditioned by the native Gothic mentality and could have been equally verbalized with the help of the corresponding Gothic lexical equivalents in case Jordanes wrote the Getica in Gothic. This type of mentality according to the article is located on the preverbal level of text formation showing that the author when describing a non-verbal situation selects some important from his/her viewpoint elements of the situation and verbalizes them by means of lexical resourses of the language he/she uses. Jordanes’ tropology as is evident from the article can be evaluated as his choice of the elements of the imaginative description of a text situation. Results. The analysis shows that Jordanes’ tropology is rather restricted in its scope and comes to an inconsiderable in number outer world spheres, i.e. human body, vegetative resourses, non-animated objects, facts relating to animals, landscape, water, few natural elements such as lightning, storm and very few abstract phenomena. The point is that nearly all the Latin lexemes used by Jordanes as the foundation of his tropology have Gothic lexical equivalents found in the Gothic Bible which up to the present moment remains the olny source of the Gothic language material. The Gothic lexemes equivalent to the Latin text variants are represented in the article in the form of a certain structure relating to the so-called language world picture which could underlie the imaginative manner of interpreting the outer world the by Gothic native speakers and Jordanes in particular. Conclusion. It is emphasized in the article that the described level of Jordanes’ tropology may primarily be indicative of his lingual personality, but on the other hand his lingual personality cannot be isolated from the speech practice characteristic of the Ostrogothic language of that period. The facts stated in the article are to be accepted as one of the spheres of the Gothic text formation inherent in its natural mode of functioning since the Gothic Bible being a wordfor-word translation from the Ancient Greek is not illustrative in this regard.
125-133 204
Abstract
The article covers the potentials of the English language corpus in terms of developing intercultural awareness, which is the topical issue nowadays, discussed in various domains of humanities. Successful intercultural communication is contributed by both mastery of language and mastery of speech etiquette. For the speech etiquette standards, nonnative speakers teaching foreign languages may refer to the teaching support kits, dictionaries, reference textbooks, and academic research. Communicating with native speakers and reading general fiction may also be of use. Nevertheless, contradictory recommendations raise concern about reliability of information sources on stylistic and cultural marking of various linguistic units. Reflecting the actual usage of linguistic units instead of biased native speakers’ opinions and being a representational, unified, and well-structured body of data, a language corpus may be used as a reliable information source for the purposes of teaching a foreign language. For the purposes of this study, British National Corpus was chosen as it represents the British English of the late 20th century. Pardon apology formula was used as an example to study the applicability of language corpus data to identify the social class marking of discourse. The research had for a goal validating or disproving the opinion that pardon apology formula is inherent to the lower-middle and upper-middle class, therefore its frequency was put to analysis. As result, the language corpus shows that pardon apology formula is used by the upper social classes almost twice as frequent as by the other classes (303.47 cases vs 139.54 cases per million). Besides, there is a distinct trend showing that this apology formula is used more frequently by each next social class. Though there are some constraints imposed by multiple meaning and insufficient marking of materials, using the language corpus as a tool of teaching intercultural communication is deemed to be quite practicable.
134-148 447
Abstract
Despite the abolishment of discriminatory laws and practices in the United States and the subsequent social and legal measures, racism continues to be one of the most acute problems of American society. Modern racism is expressed in new specific forms, often veiled and difficult to distinguish (covert, subtle racism), which include excessive politeness or the specific use of euphemisms. As a means of expressing basic ideological positions, language also serves as a powerful tool to influence public consciousness. The article reveals the ways to reproduce and enforce racial ideology through language. The opposition of social constructs «whiteness» and «blackness» is directly reflected in linguistic phenomena. For example, in the phenomenon of political correctness, which has become an indispensable part of the language practice in Western society. The author also reveals the content of colorblind ideology. In contrast to the policies of multiculturalism, the concept of colorblindness does not recognize any differences between racial and ethnic groups. However, in American discourse, this ideology is often associated with covert racism, since it leads to the silencing of existing racial problems and makes any mention of races and racism a taboo subject. It is suggested that American English is ideologically linked to the categories of race in the minds of its speakers. Using the example of African-American culture and African American English (AAE), the author conducts a sociolinguistic analysis of such phenomena as «linguistic appropriation» and «linguistic discrimination». Diametrically opposed problems characterize the American reality: on the one hand, the problem of preserving the linguistic and cultural identity of ethnic minorities, on the other hand, their integration into the American society. Drawing on the concept of “double consciousness” by W. E. B. Dubois, the author argues that this dualism, as one of the most important consequences of slavery and segregation, characterizes both African Americans and European Americans.
149-160 212
Abstract
The article is devoted to the psycholinguistic analysis of changes the image of war has been undergoing in the Russian language consciousness. The research of the word war on the basis of psycholinguistic associative experiments can open access to the understanding of meanings given to the word by typical native speakers, as well as to the understanding of images that arise in their linguistic consciousness. A comparative analysis of the Russian associative bases obtained in different years made it possible to reveal the dynamics of connections formed by the word war at the macro level (in the core of linguistic consciousness) and at the micro level (in the associative field including reactions to the word-stimulus war). Comparison of the data of Russian and European (British, American, French and Spanish) associative bases exposed the common and the Russian-only of the semantic structure of the image of war in linguistic consciousness of nations under analysis. An associative experiment made with Russian military servicemen helped reveal professional features of the associations evoked by the word war, primarily, a higher degree of relevance of the concept war evidenced by a great number of associative connections. In all associative bases under analysis, the most frequent reactions caused by the stimulus war are the words peace and death. However, the European associative bases demonstrate stability of the high frequency of reaction peace in the course of years, while in the Russian linguistic consciousness the war - death association has become more frequent in the last forty years. This phenomenon may be related to the variability of the meaningfulness of the both members of the universal dichotomy war - peace, as evidenced by the results of their etymological analysis undertaken in this research. The word peace in various languages originally meant “harmony, integrity”, than “the absence of war”. This original meaning seems to be more correlated to the primary meaning of the concept of war in Romance and Germanic (“confusion”), than to the historic denotation of the term to “hunting, stalking” in Slavic and Baltic languages. The results of the psycholinguistic analysis do not reveal any signs of heightened militarism in the Russian mass consciousness. The war is perceived by native Russian speakers as a negative phenomenon, which instills confidence in the predominance of peace-loving traits in the Russian national character.
ISSN 1818-7935 (Print)