Proceedings 2007

A
Azarova I. V. , St.Petersburg State University
SEMANTIC INTERPRETATION OF RUSSIAN PREPOSITION PHRASES BASING ON CORPUS FREQUENCIES
The main parameters of semantic description for Russian prepositional phrases are discussed. The proposed model of data structuring will be implemented into the automatic text analysis procedure using a formal grammar parser, Russ4IR, and a Wordnet-type thesaurus, RussNet. Two random samples of contexts from the corpus of modern texts (21 million words) were used for a multi-para­meter investigation of characteristic prepositional distributions.
Apresyan V.Yu. Russian Language Institute n.a. V.V.Vinogradov, Russian Academy of Sciences
SET EXPRESSIONS WITH ADVERBS OF SMALL QUANTITY: MALO LI
The article focuses on syntactic phrasemes with adverbs of small quantity. In particular, the expression malo li X 'there's no saying what X' is considered. Arguments are offered in support of its analysis as a syntactically bound phrase­ological expression. It is contrasted with free adverbial collocations. This syntactic phraseme possesses two distinct meanings - one of quantification and one of concession. Their semantic, syntactic and combinatorial properties are ana­lyzed.
B
Baglei S.G. Antonov A.V. Meshkov V.S. Titov A.V. Galaktika Corporation, Moscow
A PROBABILISTIC APPROACH TO LEXICAL AMBIGUITY RESOLUTION OF WORDS AND WORD PAIRS
A probabilistic approach to lexical ambiguity resolution of words and word pairs is described. Words and word pairs form an Information Portrait generated in the Galaktika-Zoom search and analysis system. The method is based on frequency analysis of problematic linguistic units and uses statistical data of text collections and its elements.
Batalina A.M. Epifanov M.E. Kobzareva T.J. Kushnareva E.V. Lakhuti D.G. Russian State University for Humanities
EXPERIMENTAL IMPLEMENTATION OF RUSSIAN SENTENCE SEGMENTATION ANALYSIS
The paper describes the construction and debugging of Russian sentence segmentation analysis by means of instrumental environment for experiments with algorithms of surfacesyntactic analysis.
Belov A. A. Volovich M. M. «Ashmanov i Partnery», «Poiskovyje technologiji», Moscow
AUTOMATIC CLASSIFICATION OF VERY SHORT TEXTS
The approach realized by the companies «Ashmanov i Partnery» (Ashmanov & Partners) and «Poiskovyje tech-nologii» (Search Technologies) allows to effectively classify search queries, headings and other very short texts by means of the same term base which is used for automatic classification of usual texts.
Mira B. Bergelson Moscow State University
SOCIOCULTURAL MOTIVATION IN NARRATIVES
Successful interpretation of a narrative depends on the ability of the Narrator to adequately define a correspon­ding discourse community, and the Adressee's readiness to introduce changes to his/her linguacultural schemas involved in interpretation of the story. The paper deals with the ways relevant parts of the schemas can be deduced through their manifestation with linguistic means in the discourse.
Alexander Berdichevsky * Boris Iomdin ** * Moscow State University n.a. M. V.Lomonosov, ** Russian Language Institute n.a. V.V.Vinogradov, Russian Academy of Sciences
THE ROLE OF PUNCTUATION IN DISAMBIGUATION
The role of punctuation in ambiguity resolution is discussed. Punctuation marks do not only organize the text but also convey certain information. Sometimes punctuation is an easily accessible and effective means of disambiguation. A classification and an analysis of such cases are proposed.
Birialtcev E.V., Gusenkov A.M. Kazan State University
A RELATIONAL DATABASE ONTOLOGY. THE LINGUISTIC ASPECT
The task of relational databases structure representation in an ontology formalism aimed at processing search queries to such databases is considered, A basic ontology of relational databases is proposed which includes concepts, relations and interpretation functions.. It is demonstrated that processing queries in real databases requires an ontology expansion by lexical-semantic relations between the column definitions of database tables. Types of lexical-semantic relations that exist in real databases are considered.
Bogdanov A.V.. Moscow Lomonosov State University
THE STUDY OF LOCAL URBAN DIALECTS VOCABULARY BY MEANS OF SEARCH ENGINES
In the paper we discuss the study of local urban dialects vocabulary by means of search engines in the Internet. Certain examples of such studies and discussion of related problems are given. In the end we describe the prospects of our method.
Igor M.Boguslavsky Leonid L. Iomdin Victor G. Sizov Institute for Information Transmission Problems, Russian Academy of Sciences
STAND ART TESTS FOR NATURAL TEXT PROCESSING TASKS FOR RUSSIAN AND REGRESSION TESTING
Approaches to the construction of tests for the evaluation of certain parameters of automatic natural language pro­cessing systems, primarily the quality and stability of the parser, are considered. A method of creating such a test is described: it is created for the evaluation of the multipurpose linguistic processor ETAP-3 working with Russian as the source language. The system is in the making and is only implemented partially. The authors expect that these tests could be reused for evaluation of other systems of automatic processing of Russian texts.
E.I. Bolshakova N.V. Baeva E.A. Bordachenkova N. E. Vasilieva S. S. Morozov Moscow State University, Faculty of Computational Mathematics and Cybernetics
LEXICOSYNTACTIC PATTERNS FOR AUTOMATIC TEXT PROCESSING
The paper compares methods of declarative specification of NL text units, which are used for recognition of the text units via surface syntactic analysis. The concept of lexicosyntactic pattern of NL expression is discussed, and a for­mal language for template description is proposed.
Bonch-Osmolovskaya A.A., Rakhilina E.V. , Reznikova T.I.
CONCEPTUALIZATION OF PAIN IN RUSSIAN: A TYPOLOGICAL PERSPECTIVE
The paper presents the first results from a typological project on linguistic conceptualization of PAIN in the lan­guages of the world. Russian, being the mother tongue for the participants of the project, provided a starting point for the study. A list of verbs used to describe unpleasant bodily sensations was compiled. The metaphoric source domains and basic syntactic constructions were analyzed. Semantic parameters underlying the use of a specific pain verb were revealed. The results allowed for a preliminary analysis of the data collected from several European languages and were used to compile some questionnaires which contributed to further cross linguistic investigation.
E.G.Borisova Moscow State university of Press
ON MEASURING THE PERLOCUTIVE EFFECTIVITY OF LANGUAGE ENTITIES
The article deals with persuasive functions of advertising texts. Such concepts as semantic representation of the situation, emotional state of the Hearer, associations, sociolinguistic and pragmalinguistic peculiarities are to be taken into consideration. These characteristics are to be measured in order to get the total estimation of the efficiency of texts.
Braslavski P. Sokolov E. Institute of Engineering Science, UD RAS, Ekaterinburg
AUTOMATIC TERM EXTRACTION USING INTERNET SEARCH ENGINES
In this paper we describe several methods aimed at automatic extraction of two-word terms from an individual document or a text corpus using Internet search engines. We consider five different options of computing the degree of terminological character of word pairs. The experiments have been performed with three data sets originating from dif­ferent subject domains. A combined evaluation metric is proposed. Results of comparative evaluation of the methods are presented.
Brykina M. Moscow State University
"POSSESSIVE ANCHORING" OF RUSSIAN NOUN PHRASES DENOTING BODY PARTS
This paper investigates Russian possessive constructions with noun phrases denoting body parts as possessees. For each syntactic position of a possessee NP, a list of possible possessor positions is compiled. It is then possible to pro­pose a hierarchy of most probable possessors for an arbitrary body part lexeme in the text ("possessive anchoring" of a lexeme).
Budyanskaya E.M., Kotov A.A.Institute of Linguistics, Russian State University for Humanities
MODELING OF WISECRACKS AND SUBSEQUENT DIALOGUE STEPS FOR ANIMATION OF VIRTUAL AGENTS
We study a model of automatic usage of witty remarks in a dialog, applied to animation of computer agents, inter­acting with a user in a natural language. We consider witty remarks as a means of emotional interaction. Communicative functions of witty remarks and ways of their incorporation into a dialogue structure are listed.
Buras M. M. Applied Communications Centre Krongauz M.A. Russian State University for the Humanities
THE LANGUAGE OF CORPORATE WEB SITES: GAME, PARODY, PROVOCATION
A comparative analysis of corporate web sites of Russian companies specializing in advertising in the Internet is proposed. The focus is on the language and communication intentions of the respective texts. New trends have been observed which are in principle uncharacteristic of business communication outside of the Internet and seemingly con­tradict its basic purposes, - that is, presence of elements of game, parody, and provocation. The pragmatic effect of using such techniques in business communication is considered.
C
Christian Chiarcos University of Potsdam
AN ONTOLOGY OF LINGUISTIC ANNOTATION: WORD CLASSES AND MORPHOLOGY
In this paper, I describe the conceptual and technical structure of an ontology of linguistic terminology. As it is linked with existing annotation schemes for several languages, it can be used for the formulation of language-indepen­dent, cross-tag-set corpus queries. In addition to its technical relevance, the ontology provides a standardised repertoire for the formal specification of annotation schemes in general. Due to its modular architecture, further annotation schemes may be integrated with minimal effort, and thus, another field of application can be seen in the development of portable, i.e. tagger-independent, language processing tools as well. Primarily, the ontology is intended to provide integrated rep­resentation and access to terminologically heterogeneous resources. It will be applied as part of a sustainable archive of linguistic resources to be developed by the project "Sustainability of Linguistic Data", a joint initiative by three German collaborative research centers (CRC) started in 2006. The corpora hosted by the project comprise a huge variety of cor­pora of different languages including better documented languages such as German, English, Russian, but also resources from several African languages, historical corpora and further material. In the first phase, the focus of the ontology devel­opment has been put on terminology for part-of-speech (POS) tagging, at the moment, the extension to morphological annotation is on the way.
K.Chubinidze A.Ezhov A.Gromov A.Kusova CONVERA LLC
THE DEVELOPMENT OF LANGUAGE PROCESSING FOR SEARCH ENGINES: EXPERIENCE AND APPROACH
The paper attempts to define up-to-date requirements for the language processing in search engines. It presents the description of search dictionaries and algorithms used in the Convera RetrievalWare search system. Our experience of creating the Russian Language Processor for Convera system is discussed
D
Dobrovol'skij D.O. Russian Academy of Sciences, Russian Language Institute
POLYSEMY STRUCTURE IN CROSS-LINGUISTIC PERSPECTIVES (VERBS OF MOTION IN RUSSIAN AND GERMAN)
With verbs such as бежать - laufen, ехать - fahren, лететь - fliegen, плыть - schwimmen, I investigated the structure of polysemy typical of words with multiple meanings. The analysis showed, first, that regular polysemy is a typical phenomenon for verbs of this semantic class. Secondly, this kind of polysemy is specific for individual languages. Third, systematic polysemy in this domain ranges over restricted verb groups rather than over the semantic class as a whole.
E
Yermakov M.V. Russian State University for the Humanities
CORRECTION OF SEMANTIC RELATIONS AS A STAGE OF SEMANTIC ANALYSIS
An important problem of automatic text analysis is the transition from its semantic representation to the concep­tual structure, which imitates the knowledge. We suggest using corrections of semantic relations as a method of this tran­sition. Possible rules of this stage of analysis are considered.
G
Gelbukh A.F. 1, Sidorov G.O.1, Chanona-Hernandez L.2 1Natural Language and Text processing Laboratory, Center for Research in Computer Science National Polytechnic Institute Mexico City, Mexico 2Faculty of Electric and Mechanical Engineering National Polytechnic Institute Mexico City, Mexico
DYNAMIC PROGRAMMING WITH LEXICAL SIMILARITY CALCULATION IN ALIGNMENT OF PARALLEL TEXTS AND ITS EVALUATION
For a pair of texts, one of which is the translation of the other into a different language, the problem of alignment consists in establishing correspondences of their structural units (paragraphs, sentences, words). In this paper, we describe an optimization algorithm for automatic alignment on the paragraph level, based on calculation of similarity on the basis of the lexical correspondences between paragraphs, i.e., the fact that one of the texts contains dictionary trans­lations of the words from its counterpart.. We present experimental data of comparison of different similarity measures on a data for fiction texts that present alignment problems. In addition, we propose a new method of evaluation of align­ment algorithms based on the reconstruction of the global text structure from lower level units: in our case, we restore paragraph structure in one of the texts from sentences. The advantages of this method of evaluation are elimination of dependency on the existing corpora, where the paragraphs are usually aligned in a trivial way, or avoiding the manual markup for evaluation, because we can use the already known paragraph structure. In the last case, there may be some error percentage because of the fact that alignment is asymmetric.
Gordeev S. S. Azarova I. V.St.Petersburg State University
CHARACTERISTIC RELATIONS BETWEEN WORD-ORDER AND COMMUNICATION PERSPECTIVE PATTERNS IN RUSSIAN SCIENTIFIC TEXTS
Regular patterns of subject-object-predicate arrangements in clauses were examined in the random context sam­ple from the working corpus of modern texts compiled at the Department of Mathematical Linguistics of the St.­Petersburg State University. Manual markup of text topical structure (TS) of scientific texts afforded to pick up its core and peripheral com­ponents as well as dominate schemes interrelating clause word order and its topic/comment parts. These schemes are to be exploited in the syntactic module of the formal-grammar parser Russ4IR for distinguishing actual and novel zones of information in a text.
Горностай Т. Васильев А. Скадиня И. Скадиньш Р. Tilde Company, Riga
LATVIAN <-> RUSSIAN MACHINE TRANSLATION EXPERIENCE
The article presents a pilot version of a multilingual dictionary with elements of machine translation. The archi­tecture of the dictionary and basic steps of language processing are described. Linguistic difficulties of Latvian<->Russian translation are highlighted.
Grishina Elena Institute of Russian Language, RAS
THE SPOKEN RUSSIAN MARKERS
The paper presents the list of the Russian spoken markers, i.e. words, forms and constructions which allow the listener or reader to interpret a particular text as a spoken one rather than written. Transcripts of some Russian films (which are part of the Russian National Corpus) were compared with the original texts (plots and scripts) and subtitles made for people with hearing problems. It was revealed that during the transformation from the original text to transcript to subtitles practically the same sets of units and constructions appear and disappear. It is these elements that should be considered as markers of spoken Russian. We propose to use this set of units for the determination of the degree of spo-kenness of a text.
I
Iagounova E. V. St. Petersburg State University
TOPIC / COMMENT, GIVEN / NEW DISTINCTIONS AND AUDIBLE TEXT PERCEPTION
It is argued that the choice of genre (in our case, fiction vs. business Russian) is crucial for how the text is struc­tured in terms of Topic-Comment articulation (Functional Sentence Perspective), which types of Topics are chosen, which markers for indicating Topics are preferred, etc. This hypothesis is tested using a battery of speech perception and other experiments.
K
Karvovskaja E.A. Russian State University for the Humanities
RUSSIAN PARTICLE -TO: MORPHEME AND LEXEME
The paper discusses certain issues of ambiguity resolution. Two linguistic units in Russian may be referred to as the particle -то: the lexeme -то1 (in a word-combination Ivan-to prishel 'as for Ivan, he did come') and the morpheme -то2 (as part of the word что-то 'something'). Certain words and phrases built with the help of -то2 are considered. An attempt is made to outline their lexicographic definitions and properties.
Andrej A. Kibrik Institute of Linguistics, Russian Academy of Sciences Evgenija V. Prozorova Moscow State University
REFERENTIAL CHOICE IN RUSSIAN SIGN LANGUAGE
We compare the referential system of Russian Sign Language (RSL) with that of spoken languages. Besides the referential mechanisms of deixis and anaphora, an additional process is important for RSL, termed quasi-deixis: the sign­er creates analogs of imagined referents in his/her signing area, and their loci are used thereafter for quasi-deictic men­tions of the referents.
Kibrik A. E. Arkhipov A. V. Daniel M. A. Kodzasov S. V. Moscow State University Myers Tom N-Topus Software Nakhimovsky A. D. Colgate University
DIGITAL PROCESSING OF LINGUISTIC DATA FOR MINORITY LANGUAGES DOCUMENTATION
The paper presents a new standard for a digital format of language documentation. A unified format of linguistic data presentation along with an integrated computer platform for creating and accessing multimedia linguistic resources are being developed within a project of documenting several minority languages of Russia.
Vitali Kiselov Ivan Tampel Marina Tatarnikova Yuri Khokhlov Speech Technology Center, St. Petersburg, Russia
OPEN-VOCABULARY HMM-BASED ISOLATED WORD RECOGNITION SYSTEM FOR THE RUSSIAN LANGUAGE
The paper describes a method of training context-dependent and context-independent acoustic models for the Russian language. The results obtained with these acoustic models applied to the HMM-based isolated word recognizer are presented.
Kobozeva I. M. Moscow State University
AMBIGUITY OF DISCOURSE MARKERS — CAN IT BE RESOLVED IN CLAUSAL CONTEXT? (THE CASE OF VOT)
In the paper we discuss the possibility of resolving syntactic and semantic ambiguity of discourse markers in а clause that has undergone morphological and partial syntactic analysis and partial semantic tagging in terms of semantic fea­tures of National Corpus of Russian. The Russian particle vot is used is used to illustrate the approach.
Koval S.L., Labutin P.V., Pehovsky T.S., Proschina E.A.,Smirnova N.S.,Talanov A.O. Speech Technology Center, St. Petersburg, Russia
COMPOSITE SPEAKER IDENTIFICATION METHOD
Forensic speaker identification method includes some stages demanding different types of speech analysis and is applicable to speech examination in languages unknown to the expert. The method embraces comparison of Gaussian Mixture Models of speech signals, formats and pitch statistical and structural analyses, "format matching" method, lin­guistic, aural and psychological speech analyses.
Koval S.L. Panova Е.А. Speech Technology Center, St. Petersburg, Russia
THE EXPERT METHOD OF THE DIAGNOSTICS OF SPEAKER
The expert method of diagnostics of speakers' biological parameters by speech is presented. The main attention focuses on representative speech data bases' requirements. During the diagnostics an expert provides optimally selected speech templates that illustrate demonstration of used auditory characteristics. The results of experts' diagnostics of speakers' biological parameters were checked on the speech data base (289 speakers). The accuracy of experts estima­tion of speakers biological parameters is acceptable to some practical applications.
Kodzasov S.V. Arkhipov A.V. Bonch- Osmolovskaya A.A. Zakharov L.M. Krivnova O.F. Moscow State University
"INTONATION OF RUSSIAN DIALOGUE" DATABASE: DECLARATIVE UTTERANCES
An overview of the final development stage of the database "Intonation of Russian Dialogue" is given A database entry contains a sound file, a pitch graph, and a multi-parametrical description of the utterance prosody. A detailed clas­sification of declarative utterances is proposed.
Kozhunova O.S. Zatsman I.M. Institute for Informatics Problems of the Russian Academy of Sciences
PRAGMATIC ASPECTS OF CREATION OF THE SEMANTIC DICTIONARY FOR INFORMATION MONITORING
Development issues of research programs and project evaluation systems financed on a competitive basis are con­sidered. The statement of the problem concerning experts' agreed acceptance of the meaning of performance indicators is offered (indicators are defined within these systems). A semantic dictionary for information monitoring is offered to solve the problem. Its role and functions to be implemented when developing the dictionary are discussed.
Kozerenko E. B. Institute for Informatics Problems of the Russian Academy of Sciences
VERBAL AND NOMINAL TRANSFORMATIONS IN THE ENGLISH-RUSSIAN MACHINE TRANSLATION
The paper focuses on the problem of developing formal linguistic presentations of verbal - nominal transforma­tions for the English-Russian and Russian-English machine translation. The correlations of nominal and verbal func­tionality in the Russian and English scientific discourse are considered. The formal presentations of language phenome­na in the linguistic processor under discussion are based on the Cognitive Transfer Grammar designed and developed for machine translation systems.
Koit M. Roosmaa T. Oim H. University of Tartu
FROM SYNTAX TO SEMANTICS - CHOOSING FORMALISMS AND LANGUAGE RESOURCES
The paper considers formalisms, methods and linguistic resources used in Computational Linguistics in order to model syntax and semantics. The second part of the paper gives an overview of work on automatic syntactic and seman­tic analysis of Estonian carried out at the University of Tartu.
Kondratenko N.V. Odessa I. I. Mechnikov National University
THE NEW YEAR ADDRESS AS A RITUAL GENRE OF POLITICAL DISCOURSE: THE MAIN MACROSTRUCTURAL COMPONENT AND ITS REPRESENTATION
The article is dedicated to the analysis of a ritual genre of political communication, the new year address. Structure, semantics and style of the address are considered. The place of the new year address in political discourse typology is determined. Emphasis is laid on particular realization of political rhetoric in the new year addresses of the presidents of Russia, Ukraine and Belarus.
Kopotev M. V. University of Helsinki, Finland Gurin G. B. Petrozavodsk State University, Russia
MARKING OF SYNTACTIC INCOMLETENESS IN A CORPUS
The paper is devoted to ways of marking syntactic zeros and similar phenomena in a Russian corpus. Composite classification of zero and zero-like units based on the reference papers on the topic is offered. Two approaches of mark­ing and searching such units are discussed also.
Olga Krasavina Moscow State University / Humboldt University of Berlin, Department of German Language and Linguistics
Choice of third-person pronouns in discourse
Choice of referential expressions in discourse is highly dependent on contextual characteristics of referents. The current work analyses conditions under which prototypical (e.g. actant) vs. peripheral (e.g. possessive) pronouns are used. For this study, two German corpora annotated for discourse structure with co-reference mark-up have been used, Potsdam Commentary Corpus, PCC (Stede 2004) and NEGRA (Skut et al. 1997). The results of our study indicate that the use of different pronominal types is sensitive to distance. Furthermore, the effect of animacy, syntactic parallelism, discourse prominence, position in a sentence and discourse structure has been investigated. We came to a conclusion that there is no ultimate strategy responsible for all uses of referential forms, but rather there are a number of interacting mechanisms applicable to different discourse configurations. As for the pronoun use, the data revealed a set of compen­sating factors and complementary mechanisms, such as referential and rhetorical distance, animacy and distance, advan­tage of first mention and recency, topic persistence and distance.
Kreydlin G.E. Russian State University for the Humanities
MECHANISMS OF INTERATION BETWEEN VERBAL AND NOHVERBAL UNITS IN A DIALOG II A. DEICTIC GESTURES AND THEIR TYPES
An academic lecture regarded as a kind of dialog is a suitable testing ground for the recognition of certain pecu­liarities of gesture-speech interrelation and interaction. In this paper (part II A) certain classes of deictic gestures are described. Later on, (part II B) I plan to demonstrate that deictic gestures of different classes have different relations with the vocal and representational nonverbal signs in a dialog.
Sergej A. Krylov Sergej A. Starostin Institute of Oriental Studies of Russian Academy of Sciences, Moscow; Institute of System Analysis of Russian Academy of Sciences, Moscow; Russian State University for the Humanities
CREATION AND PRECESSING OF LEXICAL DATABASES IN THE ENVIRONMENT OF THE STARLING INEGRATED INFORMATIONAL SYSTEM
Tasks of computational lexicography being solved in StarLing environment are: (1) creation of lexical databases (LDB); (2) automatic and manual delimitation of the fields of the LDBs; (3) re-structuring of LDBs so as to bring their formal structure maximal close to their informative content.
Kuznetsov I.P. Matskevich A.G. Institute for Informatics Problems of the Russian Academy of Sciences
LINGUISTIC AND ALGORITHMIC ASPECTS OF OBJECT EXTRACTION FROM SUBJECT-DOMAIN-ORIENTED NATURAL LANGUAGE TEXTS
A semantic linguistic processor which extracts the objects and their links from natural language texts is consid­ered. The paper analyzes the experience of using the processor for formalization of Russian and English texts in various subject fields: criminal actions, mass media, terrorist activities. Peculiarities of the texts are taken into account by lin­guistic knowledge of the processor.
Kustova G.I. VINITI RAN
POLYSEMY OF TEMPORAL ADJECTIVES
This paper describes the meanings of Russian temporal adjectives davnij 'of long standing', nedavnij 'recent', вчерашний 'yesterday's' and their interaction with the meanings of nouns they modify.
L
Lande D.V. Brajchevskiy S.M. Grigorjev A.N. Darmokhval A.T. Radetskiy A.B. ElVisti Information Center, Kiev
DETECTION OF NEW EVENTS FROM NEWS FLOW
The paper deals with current issues of new event detection from news flow, tracking, and clustering. An overview of theoretical and practical developments in the field is given. An innovative multicriteria algorithm of new event detec­tion is presented. Retrospective analysis and technology of formation of subject chains that has been created within the framework of InfoStreama content monitoring system are used for algorithm parameter tuning.
N. Laufer
PREDICATIVES OF NECESSITY:STATISTICS AND SEMANTICS (A CORPUS-BASED RESEARCH)
Frequency characteristics of phrases with Russian predicative words надо and нужно 'it is necessary'are analyzed on the basis of the Russian National Corpus. An attempt is made to use statistical data to find semantic differences between the two words, which are usually considered synonymous.
Levontina I.B. Russian Language Institute — Vinogradov Institute
THE LANGUAGE OF CONSUMPTION (ON SOME NEW PHENOMENA IN RUSSIAN)
Considerable quantity of new Russian words is not in the last instance connected with some changes in the Russian linguistic "picture of the world". In particular, there are some new phenomena determined by the dissemination of values of the consumer society.
A. P. Leontyev Moscow State University / ABBYY Software House
CORRELATION BETWEEN EXTERNAL POSSESSOR CONSTRUCTIONS AND GENITIVE RELATIONS; PROBLEMS THET ARISE DURING THE RESEARCH
The paper is devoted to the correlation between external possession constructions and the so-called genitive rela­tions. I demonstrate that the semantics of the external possessor impose certain restrictions on the range of possible gen­itive relations. I also claim that it is the semantics of external possessor constructions that determines its syntactic form. And the semantics of genitive relation is responsible for the other component of external possessor constructions - pos­sessive relation between its components
Leontyeva N.N. Research Computing Center of Moscow State University
ON THE LEVELS AND EVALUATION OF SEMANTIC INCOMPLETENESS OF THE TEXTS
Redundancy and coherence are global properties of any natural text. These properties, as well as local semantic incompleteness, are made explicit in the semantic representation (SR). All these parameters affect the information value of the text and explain semantic compression. The compressed SR is text knowledge that also includes the component of ignorance (incompleteness of the text as a whole).
Alexander Letuchiy Vinogradov Russian language Institute, Moscow
RUSSIAN CONSTRUCTION OF THREAT AND ITS RELATIVES
The Russian construction of threat as illustrated by examples like Ja jemu spoju 'I will make him something bad if he sings' is analyzed. I examine formal and semantic properties of the construction and its relationship with valency derivations - and finally show that the construction itself can be regarded as a type of valency derivation.
Lobanov B.M. Tsirulnik L.I. United Institute of Informatics Problems, National Academy of Science of the Republic of Belarus
RULES OF SPEECH CORPUS SEGMENTATION INTO PHONETIC UNITS AND THE STRATEGY OF UNIT SELECTION IN SPEECH SYNTHESIS
Variants of speech corpus segmentation into word-internal and phrase-internal phonetic units such as allophones, di-allophones, and three types of allo-syllables are considered. Algorithms of speech corpus segmentation into phonetic units are described. Statistical characteristics of phonetic units of different types are discussed. The strategy of unit selec­tion in speech synthesis is given.
Loukachevitch N. V. Dobrov B. V.Research Computing Center of Moscow State University, Center for Information Research
LEXICAL DISAMBIGUATION BASED ON DOMAIN SPECIFIC THESAURUS
The paper describes the means of the representation of senses and a procedure of lexical disambiguation based on a socio-political thesaurus. We also describe results of evaluation of the proposed algorithm.
Lashevskaja O.N. VINITI RAN, Moscow
TOWARD THE LEMMATIZATION OF WORD FORMS ABSENT FROM THE DICTIONARY
The paper deals with lemmatization of text tokens that dictionary-based morphological analyzers are unable to induce from their built-in dictionary. We evaluate an algorithm that establishes paradigmatic connections inside the unknown forms array, weighing up alternative hypotheses about the length of the stem for each word form. The compo­sition of light and more elaborated clusterization routines proves to be highly effective for morphological post-process­ing of large text collections.
M
Mitrofanova О.А. Mukhin А.S. Panicheva P.V. St. Petersburg State University
AUTOMATIC WORD CLUSTERING IN RUSSIAN TEXTS BASED ON LATENT SEMANTIC ANALYSIS
The paper deals with elaboration and application of automatic word clustering tool aimed at processing of Russian raw texts. Special attention is given to experimental results on clustering with changing parameters, for various types of texts.
N
Nosenko N. Moscow State University
RUSSIAN CONSONANT SUBSTITUTION MODELS (PERCEPTION UNDER CONDITIONS OF NOISE)
Russian consonants substitution models are presented as binary block schemes which are describing characteris­tics of initial and perceived consonants. An attempt is made to classify consonant substitutions by schemes describing the types of transitions and to reduce the number of such models to a finite set.
Nevzorova O.A. Nevzorov V.N. Pjatkin N.V. Zin 'kina J. V. Chebotarev Research Institute of Mathematics and Mechanics, Kazan State Technical University
INTEGRAL TECHNOLOGY OF HOMONYMY DISAMBIGUATION IN THE LOTA TEXT MINING SYSTEM
An integral technology of homonymy disambiguation in the text mining system "LoTA" is described. The tech­nology includes a collection of methods of homonymy disambiguation and their cooperation procedure.
O
Ovchinnikova T.E. MSLU
FRAGMENTATION OF THE MENTAL SPACE ACCORDING TO MODAL PARTICLES
The article deals with Russian modal particles VOT and VON that are homonymous with demonstrative adverbs. That fact makes it possible to use the concept of mental spaces where these particles are used as demonstratives. Particle senses are studied.
P
Paducheva E. V. VINITI RAN, Moscow
QUEST FOR THE OBSERVER: RUSSIAN VERBS VYGLJADET
The Russian verb vyglyadet’ 'look like' belongs to the class of perception verbs, but has the following peculiari­ty: its subject position is occupied by the Object of perception, so that Experiencer, obligatory for perception verbs, has no corresponding syntactic argument. Such a participant is called the Observer. The function of the Observer is usually fulfilled by the speaker. Hence the co-occurrence restriction: the subject position of vyglyadet’ cannot be occupied by the 1st person pronoun (*Ja vygljadela dovol'no stranno 'I looked rather strange'). There are contexts in which this restric­tion does not hold (Otec skazal, chto ja vygljadela dovol 'no stranno 'the father said that I looked rather strange': the func­tion of the Observer is delegated to some person other than the speaker. The verb vyglyadet’ is semantically related to byt ' ‘be': vyglyadet’ may acquire the contextual meaning 'be', while byt’ may have a diathesis that includes the Observer.
Anna G. Pazelskaya ABBYY Software House
NUMBER AGREEMENT IN RUSSIAN NOUN PHRASES
The paper presents unusual uses of plural forms of Russian event nouns. In these uses event nouns stand in plu­ral not for semantic reasons (because they denote a set of situations), but for syntactic reasons, These uses can be treat­ed as a sort of number agreement in noun phrases in Russian.
Petrova M.A. ABBYY Software House; Institute of Linguistics, Russian Academy of Sciences
ON INTERCHANGEABILITY OF VERBS EXPRESSING KNOWING AND ABILITY (ON SLAVIC AND GERMAN LANGUAGES)
The work describes contexts where verbs normally meaning 'know' express some meanings of ability, and, on the contrary, modal verbs meaning 'ability' express some kinds of knowing. We isolate specific meanings of knowing and ability which can be expressed both by verbs of knowing and ability.
Pirogova J. Plekhanov Russian Academy of Economics, State University Higher School of Economics
DISCOURSE PRESSURE AND PERSUASIVE STRATEGY SELECTION IN MARKETING COMMUNICATIONS
The article discusses the generation of marketing communications arranged into some sort of unity - marketing communication campaign. The paper investigates various discourse factors affecting persuaders and determining their selection of persuasive strategy combinations.
Podlesskaya Vera Russian State University for the Humanities
A FAMILY OF CHTO ‘WHAT’ + ZA ‘FOR’ + NP CONSTRUCTIONS IN RUSSIAN: A CORPUS ANALYSIS
The paper presents a corpus analysis of a family of chto 'what' + za 'for' + NP constructions in Russian. Pragmatically, they are shown to have both interrogative and exclamative functions. Syntactically, they are unique in being transparent for the external nominative and accusative case. Semantically, they presuppose the existence of their referent.
R
Rogov A.A. Sidorov Yu.V. Solopova A.I., Surovtsova T.G Petrozavodsk State University
THE INFORMATION SYSTEM "SMALT"
The work presents the information system "SMALT". Its main tasks are collection, integrated storage of literary works including their grammatical and syntactic structures, and statistical processing and analysis of these structures aimed at detection of regularities.
Rozina R.I. Vinogradov 's Russian Language Institute
THE DERIVATION OF EXISTENTIAL AND LINK-VERB MEANINGS: THE CASE OF IDTI
The paper addresses the new meanings of the verb idti 'to go', namely the existential one and the meaning of a link-verb. Patterns of their derivation are suggested, and an attempt is made to account for their colloquial coloring. The derivation of the meaning of the link-verb idti is compared to that of the link-verb meanings of other Russian verbs.
Valery Sh. Rubashkin St. Petersburg State University
ONTOLOGY - PROBLEMS ANS SOLUTIONS. THE DESIGNER
We discuss current problems of ontological modeling. The author's R&D experience in ontology environment, as well as in ontology proper, is proposed.
S
Semenova S.Yu. Institute of Scientific Information on Humane Sciences of the Russian Academy of Sciences
IF THE SEMANTIC CLASS IS TOO BROAD FOR A LEXEME (TOWARDS MENAING REPRESENTATION IN A COMPUTER DICTIONARY)
The basic way of lexical meaning description in a semantic dictionary aimed at NLP is to ascribe to a word some semantic class (or a conjunction of classes). The accuracy and completeness of meaning representation depends on the set of classes a lexicographer is allowed to use in the descriptions. Selection of classes makes an essential problem at the meeting-point of linguistics and information science. It is obvious that any finite set of bulk classes cannot cover satis­factorily the whole lexicon including lexemes with rather individual semantic characters, lexemes making groups that are smaller than the classes chosen, and lexemes that can be placed only at the periphery of the classes. Methods of sense representation of the lexemes above by means of the semantic classes of the RUSLAN machine dictionary are discussed. Lexicographical experiments are associated necessarily with the definition of the classes intension and of the boundaries between the neighboring classes. These issues are to some extent considered as well.
Serge Sharoff University of Leeds
CENTRAL PLANNING VS. FREE MARKET: COMPARING THE DISTRIBUTION OF TOPICS AND GENRES IN THE RUSSIAN NATIONAL CORPUS AND INTERNET
This study compares traditional representative corpora, such as the British or Russian National Corpora, against corpora extracted from the Internet. One method implies human annotation of a sample from an Internet corpus, which can be compared against a traditional corpus in the same language. The second method uses statistical models, which uses automatic text clusterisation to estimate the variation in their domains and genres.
Shemanaeva O.Ju. Kustova G.I. Lashevskaja O.N. Rakhilina E.V. VINITI RAN
SEMANTIC FILTERS FOR THE WORD SENSE DISAMBIGUATION IN RNC: ADJECTIVES
The paper demonstrates how the lexico-semantic annotation in RNC is used to make semantic filters for the word sense disambiguation. Most of the meanings of polysemous adjectives and other words have tags of semantic classes in the RNC semantic dictionary. In the corpus each instance of the word in the text receives all the semantic tags automat­ically. However, the system of semantic filters helps to delete the unnecessary tags and leave only relevant ones.
Tatiana Sherstinova Gregory Martynenko St. Petersburg State University
A STATISTIC DESCRIPTION OF INTONATION IN NENETS
The paper presents a methodology to study intonation in minority languages, which aims at description of the main prosodic models and revelation of general regularities of the intonation system. The proposed method is tested on the material of the Nenets language.
Elena G. Sokolova Russian State University for the Humanities Michael V. Boldasov Luxoft
SEMANTIC ANNOTATION OF AN IMAGE AS THE INPUT FOR NATURAL LANGUAGE GENERATION
In this paper we describe our investigation of NLG of image description texts. The input to NLG is a formal XML representation of image content - photo of open-air space: landscapes, city views etc. Means for the formal representa­tions are discussed - objects, properties and relations. The XML representation consists of two parts - objects and spa­tial relations. The first part presents the elements of a photo composition, the second - spatial relations between these elements. We also discuss an ontology for the NL representation of objects and sources of verbs in the generated texts.
A.S. Starostin M.G. Malkovsky Moscow State University
ALGORITHM OF SYNTAX ANALYSIS EMPLOYED BY THE TREETON MORPHO-SYNTACTIC ANALYSIS SYSTEM
This paper continues presenting the project introduced in a previous paper by the authors. We discuss the algo­rithm of analysis employed by the syntax analyzer "Treevial", which is part of the "Treeton" morpho-syntactic analysis system. In the first three sections we describe the mathematical model on which "Treevial" is based. In the next two sec­tions we state formally the task of syntax analysis, propose an algorithm which performs this task and discuss various features of this algorithm.
Shmeleva E. Vinogradov Institute of Russian Language, Russian Academy of Sciences Shmelev A. Moscow Pedagogical State University
POST-SOVIET RUSSIAN JOKES: NEW CHARACTERS
The paper describes new characters of Russian jokelore (such as new Russians, Estonians, computer program­mers, drug addicts) that have emerged since 1990. In particular, it will discuss their "linguistic masks", which correlate with their "behavior masks".
T
Tsirulnik L.I. Zhadinets D.V. Lobanov B.M. Sizonov O.G. United Institute of Informatics Problems, National Academy of Science of the Republic of Belarus
ALGORITHMS OF SPEECH PROSODIC CHARACTERISTICS SYNTHESIS IN "MULTIPHONE" TTS SYNTHESIS SYSTEM
The Accent Unit Portraits model (AUP-model) of prosodic characteristics synthesis is presented. The principles of creation of phrase accent units portraits are described. The structure of prosodic characteristics synthesis subsystem is shown. The implementation of an AUP-model in the "Multiphone" multi-language TTS synthesis system is outlined.
Tsirulnik L.I. Lobanov B.M. United Institute of Informatics Problems, National Academy of Science of the Republic of Belarus
THE TECHNOLOGY OF COMPUTER CLONING AND SYNTHESIS OF PERSONAL SPEECH CHARACTERISTICS
The problems and technology of computer cloning of personal speech characteristics are outlined. The "PhonoCloner" computer system is presented. The system automatically creates a DB of compilation elements for speech synthesis, that constitutes the nucleus of the speech clone, i.e. the nucleus of a personalized Text-to-Speech sys­tem.
U
Uryson E.V. V.V.Vinogradov Institute of Russian Language, RAS
RUSSIAN PARTICLES UZHE AND UZH: VARIANTS, HOMONYMS, OR RELATED WORDS?
Semantics of Russian particles UZHE and UZH is described. In general, these particles have similar sets of mean­ings, but there are also contexts specific only for UZH. The particles under discussion share the common structure of polysemy, but UZH follows this structure more regularly.
V
Voskresenskij A.L. ANO «College of management, law & information technologies MESI», Moscow Khakhalin G.K. Independent researcher, Moscow
A MULTIMEDIA EXPLANATORY DICTIONARY OF RUSSIAN SIGN LANGUAGE
A description of an electronic explanatory dictionary of the Russian sign language is given. Problems of the con­ceptual "mapping" of the natural language onto the sign language are considered. The development of this dictionary is extrapolated to include a system of automatic translation into sign language.
Y
Yanko T.E. Institute of Linguistics, Russian Academy of Sciences
PERFORMATIVE INTONATION. IS IT POLYSEMANTIC, OR HIGHLY ABSTRACT IN MEANING?
In spite of the general view that the L+H*LH% intonational pattern (in Pierrehumert's terminology) in English indicates contrast it has been shown that this pattern undifferentiatedly denotes a wide range of performative meanings which oppose the speaker to the hearer.
Yanovich I. ABBYY, MSU, Gruntova L. ABBYY
THE DISTRIBUTION OF RUSSIAN RELATIVE PRONOUNS KTO (СТО...) VS. KOTORYJ
The paper analyses the distribution of Russian relative pronouns kto (chto...) vs. kotoryj and suggests that the dis­tribution of these pronouns depends on the presence of a lexical N in the DP modified by the relative. If present, N allows for the usage of kotoryj in the relative clause.
Yudina M.Moscow State University
SYNTACTIC AMBIGUITY RESOLUTION: IS THERE ANY PRIMING?
The paper is devoted to the first experience of the adaptation of the experiment on syntactic priming of relative clause attachment to the Russian material, certain difficulties and unexpected results are discussed.
Yudina M.V.1 Yanovich I.S.1,2 Fedorova O. V.1
SYNTACTIC AMBIGUITY IN THE EXPERIMENT AND IN LIFE
The paper is devoted to the distinctions between four experimental methodologies aimed at study of syntactic ambiguity from the point of view of results and cognitive operations required. The attempt was held to compare experi­mental activity of the participants with the ambiguity resolution in real communication.
Z
Zhuravleva A., Koval S.L. Speech Technology Center, St. Petersburg, Russia
DIAGNOSTOCS OF PSYCHOLOGICAL FEATURES OF THE SPEAKER BY ORAL SPEECH
The method proposed enables trained experts to establish basic psychological features of the speaker.. The oper­ational psychological model of the speaker includes individual life priorities, temperament, socionic type, personal char­acter features.
Zalizniak Anna A. Institute of linguistics, Russian Academy of Sciences
THE SEMANTICS OF INVERTED COMMAS
The paper deals with the semantics of the quotation marks. A list of possible semantic functions of this punctua­tion character is given. The proposed invariant semantic definition explains individual meanings of the quotation mark as context variations. The quotation mark signals a violation of a standard semiotic act.
Zaretskaya E.N. Academy for National Economy under the Government of the Russian Federation
PERSUASIVE SPEECH
Speech behaviour of people has been the subject of special interest for linguistics of recent years. Justification and persuasiveness are not only important as cogitative, but as communication property. It is the projection of interrelation, internal mutual conditioning of subjects and phenomena in our consciousness. Any attempts to work with text irrespec­tive of the content level are senseless and futile. Semantics and pragmatics come in the foreground in speech proof. Mechanism of persuasion is built on consecutive using of two logical speech procedures: - extrusion and replacement. Hence, persuasion is a system of two consecutive proofs.
Zakharov V.P. Institute for Linguistic Studies St. Petersburg State University
DICTIONARY CARD FILES AS AN OBJECT FOR AUTOMATION
The paper deals with issues of the computerization of card files of the Institute for Linguistic Studies comprising about 8 million cards. The place and the role of card files and corpora in lexicography are dis­cussed. Compiling of specialized corpora aimed at creation of dictionaries is emphasized. The idea of an open online card index is discussed.
Zakharov L.M. Kazakevich O.A. Moscow State University
INTONATION OF DIALOGUE
The paper presents results of instrumental analysis of phrasal intonation in Ket, Selkup and Evenki dialogue speech. Our previous research into the intonation in Ket and Selkup narrative revealed that the tone at the end of phras­es is practically always falling. The authors expected to find a richer spectrum of intonation contours in dialogue, and they discuss what they managed to find.
Anton Zimmerling
LOCATIVE INVERSION IN FREE WORD ORDER LANGUAGES
Locative Inversion, i.e. transformation SVLoc —> LocVS, is characteristic of a class of languages, including Russian, Lithuanian, Spanish, Greek and Albanian. In all these languages the position of the Verb is not fixed, in most of them the position of the Subject is not fixed either: therefore, the mechanism changing the placement of S and V in the context where Loc takes sentence-initial position is a challenge for the theory of word order. The author argues that Locative Inversion, contrary to claims made elsewhere, is triggered by Subject Movement to Focus position and not by Verb Movement to second position. The current versions of the EPP-driven analysis make wrong predictions about Locative Inversions in Russian and typologically similar European languages and cannot account for the placement of postverbal subjects in free word order languages.

.