A
The report based on the data of a parallel corpus is aimed at revealing and analyzing the main correspondence patterns between the Russian concept of “dusha” and its English equivalents as well as translation techniques used both in Russian-English and English-Russian renditions of fiction.
The paper examines Russian words and expressions which are used to speak about external forces affecting events and situations – such as sud’ba, promysel, providenie, rok, ne suzdeno, ne sud’ba etc. They are compared to their English counterparts and certain parallels are drawn. Russian concept of ne sud’ba is proven language-specific on the basis of linguistic criteria.
While constructing the computer thesaurus RussNet, valency frames are specified for lexicon units. The attributes of valencies provide the capacity to distinguish thesaurus synonymic sets and to disambiguate analyses in the text parser. Valency frame features are based on the statistically steady context markers accompanying realisation of some lexical meaning in the text corpus. These features are morphologic, syntactic, and semantic.
The automatic classification of corpus samples with unambiguous morphology annotation is discussed in the paper. The rough sorting of word contexts into lexical groups, i.e. semantic trees of RussNet thesaurus, is a pre-processing stage facilitating valency frame specification. The described procedure is fulfilled by means of morphology tag distribution in the context “window” for lemmas from particular trees and their gathering into distinguishable clusters. The preliminary results are to be considered.
The automatic text processing system IDEOGRAPH is presented. It involves the formal grammar description of Russian (Rus4IR) and computer thesaurus RussNet. The special extension of RussNet, valency frames, is used for syntactic & lexical disambiguation. These frames comprise description of context markers, which are statistically consistent in the text corpus.
The text fragments are interpreted in terms of proposition structures with core component – predicate with subject & object arguments referred to synonymic set ids in RussNet associated by hyponymy links into semantic trees. The inheritance of valency frame attributes is described concerning the structure of three semantic trees. This device may be used for phrase analysis specification, ranking of output structures, argument unification in inference.
B
A new approach applied to document clustering is described in the paper. Modified LSA/LSI algorithm underlies our clustering method implemented in "Galaktika-Zoom" search and analysis system. The main problem being solved by means of approach presented in this paper is to separate document corpus into groups (clusters) on base of topic similarity, i.e., on the similarity of its’ feature vectors. In contrast to the traditional LSA implementation, base units for clustering process are words and word combination sets (information portraits) preliminary selected on statistic base. Elements of information portraits are lingual invariants, which statistically distinguish document sample.
Important semantic and pragmatic features of the hint as a language phenomenon are considered. For hinting at something speaker can use both linguistic forms and non-verbal actions with non-standard semantics. It is necessary to distinguish between genuine hint and regular hint. In opposite to indirect speech acts, using of hints presupposes an implicit way of communication.
The paper discusses the application of an instrumental environment for experiments with surface‑syntactic analysis algorithms. A rapid debugging and implementation practice of a set of algorithms of surface‑syntactic analysis in this environment is described.
The article concerns the typology and functions of the illustrations in the traditional Russian explanatory dictionaries and the role of the WWW in the selection of the illustrative examples for the Dictionary the varieties of urban Russian.
The development of a navigation system for information resources search on the basis of mutual mapping of their classification systems is described. A database is generated which contains all classes of seven wide-spread classifications. Tools for establishing semantic interconnections of classes have been developed and implemented.
The paper describes an attempt to calculate phonetical, morphological and lexical distances between Latvian dialects. An experiment using Levenshtein distance is followed by one with Wagner-Fischer distance. The results are compared, allowing for some important concusions.
The analytical search engine “Galaktika-ZOOM” provides automated extraction of key words from textual data, the so called “information portrait”. Algorithmically, an infoportrait represents words and word combinations which are characteristic of the query text. In fact the infoportrait is the query’s paradigmatic context, or the sample’s hypertext. Inside this information paradigm one can define sense syntagms which do not exist in syntagmatic context, which we refer to as subtext.
Malapropism is a semantic error that replaces one content word with another one close in sound but having a different meaning. The paper discusses the results of an extended experiment that tests the earlier proposed method of malapropism detection and correction based on Internet statistics and a numerical Index of Semantic Compatibility.
The interactive approach makes part of the dynamic models of speech. It is based on the pragmatic principles and makes it possible to take into consideration the activity of both the Speaker and the Hearer while choosing proper words and forms. It is supposed that the Speaker imagines the way the Hearer can understand various variants that can express the necessary sense and chooses those that are the most easy for understanding.
The paper shows that the approach can be useful for describing rules of usage for some synonyms or such grammar categories as the Russian aspect, etc. Still linguists should turn to it rather seldom.
The paper describes four methods for automatic two-word term extraction from raw text based on occurrence frequencies and morphological templates. The paper reports on the results of the methods applied to texts from two different domains. A combined evaluation methodology is proposed; comparative evaluation results are provided.
D
Deviatology is defined as a cognitive science that deals with deliberate and unwilling deviations of the norm within a wide field of human activity. Language deviatology is part of general deviatology, which includes the study of:
* planned deviations of the norm, such as neologisms, jokes, stylistic tropes
* non-planned deviations, such as slips of the tongue, lapses and speech errors.
This classification of deviations applies equally to errors in the mother tongue as to errors in foreign languages. Speech errors in the mother tongue, which, theoretically, should not occur at all, are the object of stylistics while the errors in foreign languages, where there are few occurrences of conscious deviation of the norm, are studied in interlanguage deviatology.
The problem of the relation between concepts and lexical senses became very practical for the development of ontologies intended for natural language processing. In the paper we consider the existing approaches to description of concepts and senses in various ontologies.
Experimental data presented in this study shows that individual differences in working memory can account for variance in relative clause attachment preference in a three-site context. We discuss how parsing strategies can be affected by working memory constraints.
F
The рареr presents contrastive analysis of English, French and Russian intonation. Phonological markers of focus, topic, contrast and emotional emphasis are discussed. The analysis of the three languages reveals similar and different intonation patterns.
The paper describes webpage ranking algorithms based on page content, which are used in relevance counting in system Search@Mail.Ru. Their effectiveness has been tested experimentally, results are given. The feasibility of these algorithms being used in building full-scale text Web search systems is considered.
G
Issues of construction of functional semantic models for the transformations of nominative structures within the framework for decision of problems of French-Russian (and Russian-French) machine translation are considered .
The analysis of structures and the block of multiple logical semantic rules are being developed with the account of functional similarity and syntactic polysemy for nominative constructions on the material of the focal sample of parallel texts in the Russian and French languages.
The problem of meaning transfer is decided on the basis of analysis of cognitive structures.
The modelling is conducted as part of the project on creation of a multilingual linguistic processor on the basis of functional semantic approach.
Aligned parallel corpora are very important linguistic resources that help in many computational linguistic tasks such as machine translation, automatic dictionary compilation, linguistic machine learning, etc. Nevertheless, there are very few available linguistic resources of this type, especially for fiction texts, due to the difficulties of getting the texts and the high cost of alignment. In this paper, we describe an English-Spanish parallel corpus compiled of fiction texts and an evaluation of how a method of alignment based on linguistic data, namely, on the usage of bilingual dictionaries for calculation of the similarity, performs for fiction texts. The basic idea of the method is that if a meaningful word is present in the source text, then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are given. The results show that this type of methods is applicable for fiction texts as well.
Some applications of natural language texts need such form of text representation that is a result of a reasonable compromise between the wish to make the text shorter, saving its fundamental thematic purposes, and the wish to retell the source text in more detail. Some degree of this compromise should be achieved at text abstracting when creation of different storage of textual information, for example archives, personal libraries and so on. The paper discusses possible ways of achieving this compromise. The method has been implemented in the KONSPEKT software system.
A new version of the Nomenclature Analyzer software is presented. The software translates the systematic names of chemical compounds, given in the IUPAC nomenclature, into molecular graphs. The algorithm is based on the morphemic segmentation of the compound names into chemically meaningful components- morphemes.
The paper is devoted to the constructions of the topic in the Nominative case and those of chto kasaetsa and chto do types. Theoretical research in this field as well as the data of corpora research are taken into consideration. The results of psycholinguistic experiments show that even the smallest-scale formalization of the topic concerned considerably increases the agreement between the answers of the examinees in its identification.
The paper examines the differences between government patterns of Russian prefixal derivates and their non-prefixed counterparts. The paper discusses issues of causes triggering the changes of government patterns in prefixal derivation and offers a hypothesis that partly explains the transformation
An approach to linguistic analysis is presented that assumes that the description of sentences should be viewed as a demonstration of how they can act as variations of previously produced sentences. Five principles play a central role in such demonstrations:
substitution of arguments, permutation, lexical functions, grammatical functions and predicate-argument schema identity. Among the conclusions we draw is the observation that a grammar is not so much a system of grammatical rules (of the phrase structure variety) but rather a set of operations that allow us to relate arbitrary sentences to other sentences and ultimately a set of "elementary" predicate-argument structures.
I
In Russian, combinations of the demonstrative pronoun этот ‘this’ with proper nouns seem interesting, since this determinant marks lower definiteness of a completely definite referent of the noun in the speaker’s world. Semantic and pragmatic properties of this construction are discussed.
An analysis is offered of syntactic properties of the Russian polysemous idiom ВСЁ РАВНО:всё равно 1 » ‘all the same’; as in Я всё равно сижу дома‘I am staying at home all the same’;все равно 2 » ‘makes no difference’, as in Нам всё равно, куда ехать‘We don’t care where we’ll be going’ and всё равно 3 » ‘tantamount’; as in Жаловаться на народ – всё равно что на климат ‘To complain about one’s people is equivalent to raving about the climate’.
In the structure of linguistic expertise dealing with the text, a key role is played by the notion of the communicatory utterance. The need for distinguishing between facts and evaluative data has shaped the author’s view of the modality as a pragmatic category.
K
The formalism of R-attributes permits representing and processing structural relations and linguistic rules associated with them as relational attributes of the entities they are relevant to. It is efficient for a wide range of syntactic generation problems, from computing valency models for occasional lexemes to lexico-syntactic transformations.
Automatic keyword spotting in continuous speech is of great importance for a number of applied tasks. Most of those are connected with security systems and phone services. The keyword spotting system based on dynamic programming and speech synthesis is presented. We use the one-pass method which secures both high rate of correct recognition and low level of false alarms.
A new approach to initial signal processing of speech is presented. The approach enables the extraction and measurement of signal parameters responsible for the perception of sounds of speech.
Сo-ordination analysis is a required constituent of automatic syntactic analysis. We discuss sentence structure properties in conjunction reduction , i.e. co-ordination projectivity and recursiveness in Russian sentence structure.
These properties are of great importance for analysis zones delimitation in the process of constructing co-ordinative and subordinating links of main and dependent clauses, dangling participles and other isolated sentence parts, during the ambiguity resolution of punctuation marks and coordinating conjunctions and during the segment graph construction.
A model of conversation agent is introduced which consists of several modules and implements various kinds of knowledge. Knowledge representation is considered, including determination of dialogue acts as frames, and regular expressions that represent the structure of information dialogue
Forensic semantics as type of forensic linguistics (FL) is aimed at revealing senses in the given text and analyzing them from different points of view. We propose to use Segmented Theory of Discourse Representation [Asher, Lascarides 2003] to resolve problems of this type of FL.
In a computer role-playing game a player is operating a virtual agent (game hero). During the game the player and his virtual hero are experiencing successes and faults. In the present issue we study a theoretical model to simulate speech behaviour of the virtual agent, which enables the production of possible utterances in different game situations. We study the selection of utterances from a database and semantic synthesis of utterances in emotional situations.
Direct and reverse linguistic processors for autobiographical data (job requests, Curriculum Vitae) written as natural language texts are considered. In such a texts, a person provides information about himself or herself in a free form: first name, middle name, surname, birthday, address, time and place of education, job experience with its periods, positions, responsibilities etc. These data may be expressed by different ways. The objective of the direct linguistic processor is to extract the data, standardize them and linking the objects: organizations with dates, job positions etc. This activity underlies the construction of knowledge structures. The objective of the reverse linguistic processor is to present these structures as natural language units (such as word combinations and sentences) and to map them in the fields of a formalized questionnaire or a structured site.
A generalized communication act model is proposed, which includes a detailed description of all basic factors affecting the preparation and realization of verbal utterances. The model is aimed at the solution of applied problems of speakers’ identification and diagnostics by their speech, reconstruction of verbal activity circumstances, and authenticity validation of phonograms
Two alternative syntactic annotation schemes applied in the Helsinki annotated corpus of Russian texts HANCO are discussed in the presentation. Some problems arising during the application of one of them (viz. the traditional part-of-sentences doctrine) are discussed; some practical and theoretical deductions following from this experience are formulated.
The paper considers idioms of the semantic field IMPORTANCE – UNIMPORTANCE in Russian. Elements of meaning that are common for the whole semantic field, as well as those allowing a distinction between quasi-synonymic idioms are examined. The impact of the inner form of an idiom on its meaning is also considered. Definitions of some idioms of the semantic field are given. Statements on semantics of idioms are illustrated with plentiful examples of idiom usage in contemporary texts of various genres, as well as in conversation and on the Internet.
The activity was aimed at designing and implementing the intelligence system of a semantic dictionary expansion prototype. The dictionary is expanded through learning by examples, the primary JSM method’s procedure. COM object (normalizes words in sentences) is applied to text processing.
The problem of language structures equivalence in the source text and the text of translation is considered. The main research objectives consist in working out the translation techniques for a number of basic language phenomena characteristic of scientific discourse, in creating correct algorithms for semantic alignment of parallel texts and machine translation. The studies are founded on the material of the Russian and English scientific periodicals. The emphasis is given to translation of impersonal and indefinite personal constructions of the Russian language into the English language, nonfinite verbal constructions of the English language into Russian and other structures most frequent in scientific texts. Translation of expressive means including metaphors is also considered
The paper formulates principles of evaluation of contemporary Internet information retrieval systems. The results of testing of six information retrieval systems by the method of depth of user search are given.
The paper is devoted to automated analysis of Russian verse with STARLING software package. The aim is to describe software tools and algorithms implemented in the system.
The academic lecture regarded as a kind of dialog is a suitable testing ground for the recognition of some peculiarities of gesture-speech interaction. Gesture strokes in lecturing organize the text, accentuate its units, represent some cognitive and psychological processes and thereby facilitate the audiences' apprehension of the lecture.
Tasks of corpus linguistics being solved in StarLing environment are: (1) converting a written text into a multi-level textual database (DB); (2) automatic and manual marking (tagging) of the DBs; (3) creating and correction of primary and secondary lexical DBs (supported by outer sources of data).
HITS adapted algorithm for synonym search, the program architecture, and the program work evaluation with test examples are presented. A program for the search of synonyms (and related terms) in a specifically structured text corpus (Wikipedia), Synarcher, was developed. Search results are presented in the form of a graph. It is possible to explore the graph and search graph elements interactively. The proposed algorithm could be applied to expand search requests and to compile synonym dictionaries.
The paper deals with two types of adjective arguments and constructions. Arguments of the 1st type are common for the whole class of adjectives, arguments of the 2nd type are characteristic of concrete words.
The paper deals with the language structure, genre and communication acts as factors which influence the discourse-representation structure. The analysis is based on Selkup and German texts (in Russian translations) and on Russian texts.
Though discussed in various linguistic works, factors and conditions causing secondary stress (SS) in Russian are still not quite clear. Moreover, certain aspects of SS are interpreted by different linguists in quite contradictory ways. To avoid and understand such contradictions it is important to analyze nature and functions of SS in the Russian language.
L
The paper describes the approach, model and implementation of a multilevel qualifier-navigator built on responses of a full-text information retrieval system . An interface enabling to make the inquiry more precise is proposed. The interface, implementing the principle of Custom Search Folders is designed on the basis of word affinity definition.
The paper considers the stability of information sources, focussing on news websites. A formula and algorithm of computing of disorder level of information from a source is offered. Practical importance of this parameter is validated.
This article presents a part of collective research of external possessor constructions in Russian. We claim that the use of this construction (there are about ten of them) is determined by a combination of factors. In this paper we analyze one of the most important factors namely the possessive relations (different semantic relations between possessor and possessee). Since there is no generally recognized classification of possessive relations we propose a new one based on a corpus research. We also present some important conclusions about the nature of possessive construction and semantics of genitive.
To face some domain knowledge when analysing natural text we need to build semantic representation (SemR) comparable with the given domain knowledge structures. Does it mean that linguistic analysis has to be different for any text specific for the given domain? Not necessarily. In our approach a transition from linguistic SemR to conceptual units and relations specific for the domain passes through binary semantic relations (SemRel). The intended grammar consists of the basic list of SemRels plus transition rules.
Russian labile verbs (verbs that can be both transitive and intransitive) are analyzed: I will show that, although labile lexemes are rare in Russian, it is possible to note certain regularities in their meaning. Besides that, I am analyzing the mechanisms which can make a verb labile.
An approach to automate the creation of bilingual dictionaries is considered. This approach reuses work of translators: a bilingual corpus of parallel texts.
The paper is devoted to different strategies used by children while reporting someone’s thoughts and speech in narrative spoken discourse. The paper examines direct speech, indirect speech and some intermediate types of reported speech depending on syntax, grammar and intonation of those contexts in children’s night dream stories.
The paper offers a typological analysis of the peculiarities of phonetic systems of Belorussian, Polish and Russian languages. The results of this study are used as basis for an approach to create a unified phonetic-acoustical database for Multilanguage Slavonic Text-to-Speech Synthesis. Principles of creating and processing text and speech corpora for each of the languages are described.
M
The phenomenon of personal identity construction in the Internet communication is approached. An elaborate analysis of marginal semiotic elements in e-communication is developed. The main speech strategies of identity presentation are highlighted and exemplified within the genres of “chat” and on-line diary, the so called “blog”.
Formal methods of creating keyword sets for VINITI rubrics are discussed, including structural representation of statistical data, normalization of terms, synchronization of keyword lists compiled by the different experts, development of DB structure. Subject description of rubrics and term clusterization may be useful in the construction of search thesaurus for scientific and technical issues.
The paper is aimed at analyzing in ontological perspective the semantics of “naïve mechanics” in Russian language. The research is focused around Force Dynamics theory introduced by L. Talmy and presenting a specific semantic category. This category, being a generalization over the traditional linguistic notion of “causation”, is seen to become a theoretical basis for building up a piece of lexical ontology.
Most of existing software for teaching foreign languages seems to be traditional exercises in computerized form. The aim of this paper is to show that a well-structured lexical database improves use and performance of teaching materials of this kind.
This paper has two topics. First, the difficulty of translating Dostoevskij's prose from Russian into German and consideration of the differing translation strategies thereby adopted. Second, the problems that arise when translating Platonov from Russian into French, German, and English. In both cases, the issue is the necessity to transmit authorial combinatorial deviations, i.e. "estranged" utterances, where the sense of various common expressions merge and meld together.
The paper offers discussion of “pattern” contexts exhibiting the use of lexemes in various meanings and combinatorial properties of lexemes. Special attention is drawn to comparative analysis of linguistic data presented in explanatory dictionaries and corpora of Russian. The results of the experiment allows elaborating procedures of syntagmatic analysis and semantic information extraction.
N
The paper is devoted to a feasibility study of the method of functional homonymy disambiguation on the basis of contextual rules in Russian. The state-of the-art of lexicographical resources and complicated cases of functional homonymy disambiguation are among the topics discussed.
O
The paper presents an experimental research that uses a corpus of web-based communication portal www.rate.ee. This is the most popular website in Estonia, used by approximately one third of Estonian population. Users can present themselves through special personal webpages, rate each other's pictures and create virtual social networks. Their motivation factors are communication and self-presentation with social feedback. Rate.ee environment supports different social actions, calculates the "fame" (popularity) of users etc.
The author focuses on identity designing and language characteristics in this environment, that is to say: which markers and features can be used for promoting "virtual face" in the context of the web-based communication.
P
There is a zero sign with deictic meaning which is called Observer and serves as the subject of secondary deixis. The Observer, as well as the speaker, has the right to identify objects, places and time points through their relation to himself and his present moment. Examples are given of verbs, adverbs, nouns and grammatical categories with semantics that presupposes the Observer.
The paper discusses negation in Russian, expressed by the prefix ne- within deverbal nouns (e.g., nejavka ‘non-appearance’, nevmeshatel’stvo ‘non-intervention’). We identify three semantic types of negated nouns, depending on the aspectual properties of the negated event and the context in which the negated nominal occurs. Negation within deverbal nominals is in many substantial characteristics close to the typical verbal negation in Russian.
The paper concerns the problem of tools which can be used in a semantic metalanguage designed for describing the semantics of natural language (NL). Is it necessary to base such semantic metalanguge on the natural language described – or one may rest upon some universal inventory of meanings? Some doubts are cast upon the thesis on the necessity of describing semantics of NL on the basis of the limited sublanguage of the NL described; this thesis is being upheld in a number of semantic theories. The semantic metalanguage may be built on universal meanings, and this possibility can be supported by the fact that even semantic metalanguges constructed on the base of sublanguges of NL cannot restrict themselves within limits of NL and include artificial elements. In the scientific apparatus for describing the surface levels of language structure – syntax, morphology and phonology – the specific character of language entities is not supposed to be mirrored by means of specific metalinguistic units oriented to the NL described.
The paper considers a linguistic processor for formalization of English text information in a natural language as a network component of an Internet project. Objectives of the linguistic processor, particularities of its English version, and network integration into the Internet portal are discussed.
Hesitation in Armenian can be expressed by a semantically bleached noun BAN ‘thing, deal, word’. BAN can serve as a “placeholder” that mirrors a grammatical marking of a temporarily postponed nominal or verbal constituent, thus showing that a speaker may narrow a paradigmatic class of the upcoming lexeme before the search for the particular word is completed.
The paper deals with the analysis of lexico-syntactic repetition as a way of coordinating speech behavior. The kind of repetition (direct vs. indirect) appears to correspond to the speaker’s social position with respect to the addressee.
The choice of the type of coordination (modal vs. cognitive) is found to depend on the position of a speech act in the structure of the discourse.
R
The lexico-semantic annotation in RNC is considered in the light of other semantically-labeled corpora, such as WordNet-oriented corpora or FrameNet. In order to reduce “noise” in semantic search we propose some agreements that concern the traditional concepts of lexical semantics and lexicography: polysemy, homonymy, and the hierarchy of word meanings.
The NLP Lab at Purdue University (NLPL) has co-founded and tested, in a number of applications, a knowledge- and meaning-based approach to NLP called ontological semantics (OS). Since 1999, NLPL cooperated with CERIAS in applying the approach to information assurance and security (IAS) tasks. This paper tries to handle the question why most in NLP today—and the entire Semantic Web enterprise—are still pursuing non-semantic methodologies, even in response to RFP with explicitly semantic and even ontological-semantic objectives. The paper offers some sociological, educational, and academic explanations for the "fear of semantics."
The paper looks at instances of divergence between a synchronic pattern of semantic extension resulting in slang, and the real history of slang meaning. The author arrives at the conclusion that multiple motivation of slang should be reflected in lexicography.
Quantitative data recognition is discussed. We describe information extraction technology, which is under development now. The following topics are discussed: what is the quantitative data in a text document? methods of numerical data presentation; the tasks that the analyzing algorithm is expected to accomplish; the dictionary support; software implementation and results.
The idea of cue dictionary method for extracting information on various aspects of scientific text contents (purpose, novelty and etc.) had already been formulated in 1970s. The bottleneck of this technique is that the compilation of dictionaries is a very cumbersome procedure. An automation technique is proposed for this process which substantially reduces the use of manual labor.
S
We present a system designed for use by a psychologist in the analysis of a specific type of texts – texts of emotional autoreflexive writing. On the basis of linguistic analysis, the psychologist can make conclusions about the emotional state of a person or about the type of his personality. The system is designed to assist the psychologist. The system has the following features: automatic morphological analysis, calculation of various statistical parameters (frequencies, lexical richness, etc.). The data on words with emotional connotations are given separately because these words represent the person’s current condition. We implemented the mechanism for synchronization of measuring body temperature during text writing and the resulting text. Also, we describe the application of the system in another field – the analysis of political discourse in Mexico.
The paper discusses approach to text analysis based on ontology of subject domain. The main components of the ontology, in particular, schemes of facts are described. Authors consider construction of the facts as a primary goal of the semantic analysis. Fact joins the dictionary lexical objects founded in the text and/or objects corresponding to ontology concepts already allocated in the text. Semantic and syntactic compatibility of elements are used for the construction of facts.
The paper tackles principles for semantic annotation of the image content. XML form annotations represent objects and their static composition in the image. They were manually written for some outside photos on the ground of a little fragment of Ontology developed by the authors. The ontology describes conceptual knowledge about objects within an image. Annotation schemes and the ontology proposed in this paper can be used for data mining in the image collections or for natural language generation of the image content descriptions.
The article introduces a formal model of syntax description. This model is a combination of two different approaches to syntax description: phrase structure grammar and dependency grammar (in the spirit of A.V. Gladky). The "Treeton" morpho-syntactic analysis system, working within the mentioned formal model, is described. The paper also deals with the syntactic analysis algorythm implemented in the system. To lower the number of hypotheses produced during the analysis the algorythm uses a mechanism of penalizing the syntax structures for undesirable elements. This mechanism is also described
A new interpretation of emotional interjections is proposed. The interjections are regarded as transliterations of sound of vocal gestures. A broad analysis of contexts reveals the set of symptomatic situations, the basic list of vocal gestures and the lists of interjections that convey every vocal gesture in texts. This approach enables the creation of a basis for linguistic and lexicographic descriptions of emotive interjection in Russian and other languages.
This paper deals with idiomatic Russian phrases such as шириной в ладонь (the size of a palm), высотой с человеческий рост (as tall as a man), размером с дом (the size of a house) and others, that estimate the sizes of objects. More precise estimation would be with preposition В ‘in’, whereas rough estimation and comparison should be with preposition С ‘like’. It describes the inner structure and usage of these two constructions and sets them apart from some similar expressions with those prepositions.
The paper discusses various types of intertextuality in Russian jokes: direct quotations (among them modified quotations), “spot reference”, reference to complex plot units, reference to non-verbal semiotic objects. The most common sources of intertextuality are outlined.
T
In the environment STARLING a lexico-grammatical database (30 000 wordforms) of the dialect of Pustosha village (Moscow region, Shatura district) was created. The nuclear dialectal corpus (NDC) with the entire lexico-grammatical notation (lemmatization) is a base for secondary databases (indexes).
A method of automatic text generation for real-time commentary on the dynamic sports competitions is described. The key features are flexible selection of the event to be commented upon and synthesis of the commenting string based on the appropriate phrase templates. An automatic commenting system developed for “Formula-1” races is overviewed.
An approach to the automated third person anaphora resolution is considered. Reference rules were obtained with the aid of machine learning methods. More than 60% accuracy level was achieved.
The study has been carried out within the framework of research on personal voice cloning. The paper deals with the results of the experiment aimed at the evaluation of the effect of compilation elements of different phonetic types (stressed/unstressed vowels, consonants) and of different levels (allophones and multi-phones) on the perception of personal phonetic-acoustical characteristics of the voice in the Text-to-Speech Synthesis. Universal methods of subjective evaluation of synthesized speech quality (so called MOS evaluation) are used in the experiment. The paper reviews the prospects of how various levels of compilation elements applied can be used in synthesized speech systems.
The paper discusses the application of the discourse-oriented transcription developed for the corpora of Russian texts to the texts in Kuwaiti Arabic. The paper focuses mainly on the cases of non-standard division of the text into discourse units as well as on the grammatical features which cause such division.
The problem of computing the meaning of the prepositional-case forms within the formal lexicographic definitions of Russian words as prescribed by the semantic dictionary is discussed. The addition of a database containing the information of the subject domain to the dictionary allows to compute automatically the meanings of all prepositional-case forms of the Russian language. As a rule, the problem is reduced to the choice of an attribute for the object connecting the prepositional-case form. A possible structure of such database is considered.
It is proposed to describe the organization knowledge model in the form of a system of ontologies supplementing each other. The model consists of a basic ontology of the enterprise and a set of knowledge domain ontologies. An approach to the construction of knowledge model is described and the structure of a knowledge management system on its basis is proposed.
U
The main feature that determines the semantics of Russian conjunctions I ‘and’, A ‘and, but’, NO ‘but’ is contrariety/agreement-to-expectation. A hypothesis explaining the character of this distinction is proposed. A semantic invariant is proposed for every conjunction under consideration . The nature of this invariant as well as semantic metalanguage in general is discussed.
Y
The contact-making function of Russian computer jargon in comparison to literary language has certain peculiarities and appears in different variants. By considering the contact-making function of Russian computer jargon as part of a global computer sublanguage and a source of language convergence we can point to so-called “computerese generalities” in which the contact-making function is realized.
Local contexts are shown to be insufficient for disambiguation when translating from verbal language to sign language. Methods of concept comparison based on syntactic and semantic analysis are discussed. A method of automated search for documents unknown to the user in the Internet is proposed.
Fundamental frequency and its role in speech perception are analyzed with reference to professional and fiction texts. Subjects were exposed to the texts under white-noise masking and in the clear where the original words have been changed to their nonsense (artificial) ‘equivalents’. Recognition scores were correlated to the Topic-Comment structure, type of phonetic reduction, etc. One of the most important findings is a change in perceptual strategy depending on the text type (professional or fiction in our case). Fundamental frequency clues seem to be actively used to enhance word recognition, counterbalancing, to some extent, the poor quality of segmentals.
Russian intonation of text incompleteness has been analyzed. Text incompleteness is taken in compositions with contrast, emotional emphasis, and verification. The fundamental frequency fo contours and the accent placement proved to be the means of expression of text incompleteness and its compositions with contrast and other meanings. The text functions of a variety of intonation strategies have been described.
The paper investigates the categorial status of Russian kakoj-based pronouns: are they adjectives or determiners? It is argued that these pronouns exist in two variants differing in meaning. The proposed solution allows capturing observed semantic and syntactic facts.
We present new data showing that grammatical gender affects subject-verb agreement in Russian. The hypothesis that this effect is due to the level of markedness of different gender features in Russian is shown to be borne out.
The paper is devoted to the strategies of syntactic ambiguity resolution (based on high-low attachment investigation) from the point of view of production and comprehension. The purpose of our research was to test whether the high-attachment preference, which was proved in previous comprehension investigations on Russian material, will remain when producing such type of sentences.
Z
The paper discusses approach to text analysis based on ontology of the subject domain. The main components of the ontology, in particular, schemes of facts are described. The authors consider construction of facts as a primary goal of the semantic analysis. Fact joins the dictionary lexical objects founded in the text and/or objects corresponding to ontology concepts already allocated in the text. Semantic and syntactic compatibility of elements are used for the construction of facts.
The paper considers the problem of fixing sentence boundaries in speech in languages without stable written tradition. In modern written texts the borders between sentences are distinctly marked so there is no problem to tell where one sentence comes to an end and another begins. A quite different situation arises as soon as we are to fix sentence borders in an oral text, especially in a language without stable written tradition. Analyzing the material of two practically unwritten languages of Siberia (Selkup and Ket) we examine the possibility of using some prosodic features as sentence boundary markers in speech.
The paper deals with the history and the actual status of the word problema ‘problem’ in Russian. In contemporary Russian it has acquired a meaning, roughly, ‘something that creates an obstacle for the normal course of events’ (U X-a problemy s Y-om), which appears as a semantic calque from English. It is closely linked to one of a key ideas of the Western culture and a series of key words expressing it (such as happy, OK, enjoy).
E-mail correspondence is considered as a communicative genre characterized by a number of specific features that distinguishes it from other cognate speech genres. The analysis of e-mail correspondence in Russian reveals some important linguistic and psycholinguistic regularities of the spontaneous written speech production. It is argued that Russian e-mail correspondence in Latin transliteration constitutes an important and stable variant of Internet correspondence in Russian: this variant possesses its own specific features and may be responsible for the loosening of the Russian language norms.
Three types of arguments: apodictic, eristic, sophistic, are considered taking into account the motivation and speech behavior of opponents. The structure of public text is viewed as a set comprising the seven (eight) elements: address, thesis, narration, description, proof, disproof, appeal (conclusion). The categories of persuasiveness and argumentativeness are grounded both logically and emotionally. The description is given of verbal confrontation devices.
Models of intellectual systems intended for monitoring and evaluation of innovative potential and performance of researches are considered. Considered models are a combination of lexico-semantic, information, algorithmic, mathematical and of some other components.
Our work is focused on the properties of German compound adjectives conveying the idea of comparison, the source of the empirical data being a large corpus of newspaper discourse. The number of such compounds occurring in the corpus amounts 412 and only one third of them can be found in the Big German-Russian Dictionary. This proportion needs explanation, and we try to determine the relevant formal, semantic and stylistic-pragmatic factors. Finally, prognostic conclusions are drawn concerning lexical-graphic applications.
Language L is defined as having free word order if the relative order of any two sentence categories X, Y can be inverted: [X + Y] Þ [Y + X]. This definition does not exclude languages with constraints on the placement of elements attached to sentence 1st , 2nd or 3d positions from the left boundary. At the same time, many languages with one statistically prevailing order, as SOV, SVO, VSO etc, lack constraints that block for less frequent orders. Presumably, all natural languages have pairs (or sets with n elements, n≤ 2) of sentences with one and the same structure, but different linear orders. We proceed from the assumption that for each pair/set of such sentences it is possible to establish the variant representing the basic order and get the derived orders from it. It is possible to get the derived order from the basic one by singling out the element that moves: {a + b + c} Þ {b + a + tb + c}. The analysis in terms of Movement is preferable to the traditional description where e.g. Subject-Verb order is chosen as ‘basic’ and Verb-Subject order as ‘inverted’ and no attempt is made to prove that either of the elements in the group can move. Movement of elements can be formalized in a different way. The generative account (Fiengo, Chomsky) is counterfactual, since it does not explain the contexts with left-to-right Movement patterns: Movement patterns of this type are especially productive in languages with the so called Wackernagel’s law.