Proceedings 2007

Additional

Online articles

Azarova I. V. , St.Petersburg State University

SEMANTIC INTERPRETATION OF RUSSIAN PREPOSITION PHRASES BASING ON CORPUS FREQUENCIES

The main parameters of semantic description for Russian prepositional phrases are discussed. The proposed model of data structuring will be implemented into the automatic text analysis procedure using a formal grammar parser, Russ4IR, and a Wordnet-type thesaurus, RussNet. Two random samples of contexts from the corpus of modern texts (21 million words) were used for a multi-parameter investigation of characteristic prepositional distributions.

Apresyan V.Yu. Russian Language Institute n.a. V.V.Vinogradov, Russian Academy of Sciences

SET EXPRESSIONS WITH ADVERBS OF SMALL QUANTITY: MALO LI

The article focuses on syntactic phrasemes with adverbs of small quantity. In particular, the expression malo li X 'there's no saying what X' is considered. Arguments are offered in support of its analysis as a syntactically bound phraseological expression. It is contrasted with free adverbial collocations. This syntactic phraseme possesses two distinct meanings - one of quantification and one of concession. Their semantic, syntactic and combinatorial properties are analyzed.

Baglei S.G. Antonov A.V. Meshkov V.S. Titov A.V. Galaktika Corporation, Moscow

A PROBABILISTIC APPROACH TO LEXICAL AMBIGUITY RESOLUTION OF WORDS AND WORD PAIRS

A probabilistic approach to lexical ambiguity resolution of words and word pairs is described. Words and word pairs form an Information Portrait generated in the Galaktika-Zoom search and analysis system. The method is based on frequency analysis of problematic linguistic units and uses statistical data of text collections and its elements.

Batalina A.M. Epifanov M.E. Kobzareva T.J. Kushnareva E.V. Lakhuti D.G. Russian State University for Humanities

EXPERIMENTAL IMPLEMENTATION OF RUSSIAN SENTENCE SEGMENTATION ANALYSIS

The paper describes the construction and debugging of Russian sentence segmentation analysis by means of instrumental environment for experiments with algorithms of surfacesyntactic analysis.

Belov A. A. Volovich M. M. «Ashmanov i Partnery», «Poiskovyje technologiji», Moscow

AUTOMATIC CLASSIFICATION OF VERY SHORT TEXTS

The approach realized by the companies «Ashmanov i Partnery» (Ashmanov & Partners) and «Poiskovyje tech-nologii» (Search Technologies) allows to effectively classify search queries, headings and other very short texts by means of the same term base which is used for automatic classification of usual texts.

Mira B. Bergelson Moscow State University

SOCIOCULTURAL MOTIVATION IN NARRATIVES

Successful interpretation of a narrative depends on the ability of the Narrator to adequately define a corresponding discourse community, and the Adressee's readiness to introduce changes to his/her linguacultural schemas involved in interpretation of the story. The paper deals with the ways relevant parts of the schemas can be deduced through their manifestation with linguistic means in the discourse.

Alexander Berdichevsky * Boris Iomdin ** * Moscow State University n.a. M. V.Lomonosov, ** Russian Language Institute n.a. V.V.Vinogradov, Russian Academy of Sciences

THE ROLE OF PUNCTUATION IN DISAMBIGUATION

The role of punctuation in ambiguity resolution is discussed. Punctuation marks do not only organize the text but also convey certain information. Sometimes punctuation is an easily accessible and effective means of disambiguation. A classification and an analysis of such cases are proposed.

Birialtcev E.V., Gusenkov A.M. Kazan State University

A RELATIONAL DATABASE ONTOLOGY. THE LINGUISTIC ASPECT

The task of relational databases structure representation in an ontology formalism aimed at processing search queries to such databases is considered, A basic ontology of relational databases is proposed which includes concepts, relations and interpretation functions.. It is demonstrated that processing queries in real databases requires an ontology expansion by lexical-semantic relations between the column definitions of database tables. Types of lexical-semantic relations that exist in real databases are considered.

Bogdanov A.V.. Moscow Lomonosov State University

THE STUDY OF LOCAL URBAN DIALECTS VOCABULARY BY MEANS OF SEARCH ENGINES

In the paper we discuss the study of local urban dialects vocabulary by means of search engines in the Internet. Certain examples of such studies and discussion of related problems are given. In the end we describe the prospects of our method.

Igor M.Boguslavsky Leonid L. Iomdin Victor G. Sizov Institute for Information Transmission Problems, Russian Academy of Sciences

STAND ART TESTS FOR NATURAL TEXT PROCESSING TASKS FOR RUSSIAN AND REGRESSION TESTING

Approaches to the construction of tests for the evaluation of certain parameters of automatic natural language processing systems, primarily the quality and stability of the parser, are considered. A method of creating such a test is described: it is created for the evaluation of the multipurpose linguistic processor ETAP-3 working with Russian as the source language. The system is in the making and is only implemented partially. The authors expect that these tests could be reused for evaluation of other systems of automatic processing of Russian texts.

E.I. Bolshakova N.V. Baeva E.A. Bordachenkova N. E. Vasilieva S. S. Morozov Moscow State University, Faculty of Computational Mathematics and Cybernetics

LEXICOSYNTACTIC PATTERNS FOR AUTOMATIC TEXT PROCESSING

The paper compares methods of declarative specification of NL text units, which are used for recognition of the text units via surface syntactic analysis. The concept of lexicosyntactic pattern of NL expression is discussed, and a formal language for template description is proposed.

Bonch-Osmolovskaya A.A., Rakhilina E.V. , Reznikova T.I.

CONCEPTUALIZATION OF PAIN IN RUSSIAN: A TYPOLOGICAL PERSPECTIVE

The paper presents the first results from a typological project on linguistic conceptualization of PAIN in the languages of the world. Russian, being the mother tongue for the participants of the project, provided a starting point for the study. A list of verbs used to describe unpleasant bodily sensations was compiled. The metaphoric source domains and basic syntactic constructions were analyzed. Semantic parameters underlying the use of a specific pain verb were revealed. The results allowed for a preliminary analysis of the data collected from several European languages and were used to compile some questionnaires which contributed to further cross linguistic investigation.

E.G.Borisova Moscow State university of Press

ON MEASURING THE PERLOCUTIVE EFFECTIVITY OF LANGUAGE ENTITIES

The article deals with persuasive functions of advertising texts. Such concepts as semantic representation of the situation, emotional state of the Hearer, associations, sociolinguistic and pragmalinguistic peculiarities are to be taken into consideration. These characteristics are to be measured in order to get the total estimation of the efficiency of texts.

Braslavski P. Sokolov E. Institute of Engineering Science, UD RAS, Ekaterinburg

AUTOMATIC TERM EXTRACTION USING INTERNET SEARCH ENGINES

In this paper we describe several methods aimed at automatic extraction of two-word terms from an individual document or a text corpus using Internet search engines. We consider five different options of computing the degree of terminological character of word pairs. The experiments have been performed with three data sets originating from different subject domains. A combined evaluation metric is proposed. Results of comparative evaluation of the methods are presented.

Brykina M. Moscow State University

"POSSESSIVE ANCHORING" OF RUSSIAN NOUN PHRASES DENOTING BODY PARTS

This paper investigates Russian possessive constructions with noun phrases denoting body parts as possessees. For each syntactic position of a possessee NP, a list of possible possessor positions is compiled. It is then possible to propose a hierarchy of most probable possessors for an arbitrary body part lexeme in the text ("possessive anchoring" of a lexeme).

Budyanskaya E.M., Kotov A.A.Institute of Linguistics, Russian State University for Humanities

MODELING OF WISECRACKS AND SUBSEQUENT DIALOGUE STEPS FOR ANIMATION OF VIRTUAL AGENTS

We study a model of automatic usage of witty remarks in a dialog, applied to animation of computer agents, interacting with a user in a natural language. We consider witty remarks as a means of emotional interaction. Communicative functions of witty remarks and ways of their incorporation into a dialogue structure are listed.

Buras M. M. Applied Communications Centre Krongauz M.A. Russian State University for the Humanities

THE LANGUAGE OF CORPORATE WEB SITES: GAME, PARODY, PROVOCATION

A comparative analysis of corporate web sites of Russian companies specializing in advertising in the Internet is proposed. The focus is on the language and communication intentions of the respective texts. New trends have been observed which are in principle uncharacteristic of business communication outside of the Internet and seemingly contradict its basic purposes, - that is, presence of elements of game, parody, and provocation. The pragmatic effect of using such techniques in business communication is considered.

Christian Chiarcos University of Potsdam

AN ONTOLOGY OF LINGUISTIC ANNOTATION: WORD CLASSES AND MORPHOLOGY

In this paper, I describe the conceptual and technical structure of an ontology of linguistic terminology. As it is linked with existing annotation schemes for several languages, it can be used for the formulation of language-independent, cross-tag-set corpus queries. In addition to its technical relevance, the ontology provides a standardised repertoire for the formal specification of annotation schemes in general. Due to its modular architecture, further annotation schemes may be integrated with minimal effort, and thus, another field of application can be seen in the development of portable, i.e. tagger-independent, language processing tools as well. Primarily, the ontology is intended to provide integrated representation and access to terminologically heterogeneous resources. It will be applied as part of a sustainable archive of linguistic resources to be developed by the project "Sustainability of Linguistic Data", a joint initiative by three German collaborative research centers (CRC) started in 2006. The corpora hosted by the project comprise a huge variety of corpora of different languages including better documented languages such as German, English, Russian, but also resources from several African languages, historical corpora and further material. In the first phase, the focus of the ontology development has been put on terminology for part-of-speech (POS) tagging, at the moment, the extension to morphological annotation is on the way.

K.Chubinidze A.Ezhov A.Gromov A.Kusova CONVERA LLC

THE DEVELOPMENT OF LANGUAGE PROCESSING FOR SEARCH ENGINES: EXPERIENCE AND APPROACH

The paper attempts to define up-to-date requirements for the language processing in search engines. It presents the description of search dictionaries and algorithms used in the Convera RetrievalWare search system. Our experience of creating the Russian Language Processor for Convera system is discussed

Dobrovol'skij D.O. Russian Academy of Sciences, Russian Language Institute

POLYSEMY STRUCTURE IN CROSS-LINGUISTIC PERSPECTIVES (VERBS OF MOTION IN RUSSIAN AND GERMAN)

With verbs such as бежать - laufen, ехать - fahren, лететь - fliegen, плыть - schwimmen, I investigated the structure of polysemy typical of words with multiple meanings. The analysis showed, first, that regular polysemy is a typical phenomenon for verbs of this semantic class. Secondly, this kind of polysemy is specific for individual languages. Third, systematic polysemy in this domain ranges over restricted verb groups rather than over the semantic class as a whole.

Emashova O.A. Malkovsky M.G. Lomonosov Moscow State University, Department of Computational Mathematics and Cybernetics

FUNCTIONAL STYLES OF RUSSIAN LANGUAGE AS APPLIED TO AUTOMATIC TEXT SUMMARIZATION

A new summarization method is proposed. Texts in Russian are summarized according to their functional subtypes. Five subtypes have been isolated and investigated.

Ermakov A.E. RCO LLC

AUTOMATICAL EXTRACTION OF FACTS FROM TEXTS OF PERSONAL FILES: EXPERIENCE IN ANAPHORA RESOLUTION

The paper is devoted to the extraction facts from the texts of specialized personal files (dossiers) Technical solutions used in fact extraction based on a syntactic parser and syntactic-semantic templates are described. Special attention is given to regularities of discourse arrangement used for anaphora resolution.

Yermakov M.V. Russian State University for the Humanities

CORRECTION OF SEMANTIC RELATIONS AS A STAGE OF SEMANTIC ANALYSIS

An important problem of automatic text analysis is the transition from its semantic representation to the conceptual structure, which imitates the knowledge. We suggest using corrections of semantic relations as a method of this transition. Possible rules of this stage of analysis are considered.

Gelbukh A.F. 1, Sidorov G.O.1, Chanona-Hernandez L.2 1Natural Language and Text processing Laboratory, Center for Research in Computer Science National Polytechnic Institute Mexico City, Mexico 2Faculty of Electric and Mechanical Engineering National Polytechnic Institute Mexico City, Mexico

DYNAMIC PROGRAMMING WITH LEXICAL SIMILARITY CALCULATION IN ALIGNMENT OF PARALLEL TEXTS AND ITS EVALUATION

For a pair of texts, one of which is the translation of the other into a different language, the problem of alignment consists in establishing correspondences of their structural units (paragraphs, sentences, words). In this paper, we describe an optimization algorithm for automatic alignment on the paragraph level, based on calculation of similarity on the basis of the lexical correspondences between paragraphs, i.e., the fact that one of the texts contains dictionary translations of the words from its counterpart.. We present experimental data of comparison of different similarity measures on a data for fiction texts that present alignment problems. In addition, we propose a new method of evaluation of alignment algorithms based on the reconstruction of the global text structure from lower level units: in our case, we restore paragraph structure in one of the texts from sentences. The advantages of this method of evaluation are elimination of dependency on the existing corpora, where the paragraphs are usually aligned in a trivial way, or avoiding the manual markup for evaluation, because we can use the already known paragraph structure. In the last case, there may be some error percentage because of the fact that alignment is asymmetric.

Gordeev S. S. Azarova I. V.St.Petersburg State University

CHARACTERISTIC RELATIONS BETWEEN WORD-ORDER AND COMMUNICATION PERSPECTIVE PATTERNS IN RUSSIAN SCIENTIFIC TEXTS

Regular patterns of subject-object-predicate arrangements in clauses were examined in the random context sample from the working corpus of modern texts compiled at the Department of Mathematical Linguistics of the St.Petersburg State University. Manual markup of text topical structure (TS) of scientific texts afforded to pick up its core and peripheral components as well as dominate schemes interrelating clause word order and its topic/comment parts. These schemes are to be exploited in the syntactic module of the formal-grammar parser Russ4IR for distinguishing actual and novel zones of information in a text.

Горностай Т. Васильев А. Скадиня И. Скадиньш Р. Tilde Company, Riga

LATVIAN <-> RUSSIAN MACHINE TRANSLATION EXPERIENCE

The article presents a pilot version of a multilingual dictionary with elements of machine translation. The architecture of the dictionary and basic steps of language processing are described. Linguistic difficulties of Latvian<->Russian translation are highlighted.

Grishina Elena Institute of Russian Language, RAS

THE SPOKEN RUSSIAN MARKERS

The paper presents the list of the Russian spoken markers, i.e. words, forms and constructions which allow the listener or reader to interpret a particular text as a spoken one rather than written. Transcripts of some Russian films (which are part of the Russian National Corpus) were compared with the original texts (plots and scripts) and subtitles made for people with hearing problems. It was revealed that during the transformation from the original text to transcript to subtitles practically the same sets of units and constructions appear and disappear. It is these elements that should be considered as markers of spoken Russian. We propose to use this set of units for the determination of the degree of spo-kenness of a text.

Gruntov I.A. Institute of Linguistics, Russian Academy of Sciences

«THE CATALOGUE OF SEMANTIC SHIFTS»: A DATABASE FOR THE TYPOLOGY OF SEMANTIC EVOLUTION

The paper describes "The catalogue of semantic shifts», a database comprising regular semantic shifts reproduced in various languages of the world. The main purpose of the database is the investigation of typology of semantic changes.

Iagounova E. V. St. Petersburg State University

TOPIC / COMMENT, GIVEN / NEW DISTINCTIONS AND AUDIBLE TEXT PERCEPTION

It is argued that the choice of genre (in our case, fiction vs. business Russian) is crucial for how the text is structured in terms of Topic-Comment articulation (Functional Sentence Perspective), which types of Topics are chosen, which markers for indicating Topics are preferred, etc. This hypothesis is tested using a battery of speech perception and other experiments.

Karvovskaja E.A. Russian State University for the Humanities

RUSSIAN PARTICLE -TO: MORPHEME AND LEXEME

The paper discusses certain issues of ambiguity resolution. Two linguistic units in Russian may be referred to as the particle -то: the lexeme -то1 (in a word-combination Ivan-to prishel 'as for Ivan, he did come') and the morpheme -то2 (as part of the word что-то 'something'). Certain words and phrases built with the help of -то2 are considered. An attempt is made to outline their lexicographic definitions and properties.

Andrej A. Kibrik Institute of Linguistics, Russian Academy of Sciences Evgenija V. Prozorova Moscow State University

REFERENTIAL CHOICE IN RUSSIAN SIGN LANGUAGE

We compare the referential system of Russian Sign Language (RSL) with that of spoken languages. Besides the referential mechanisms of deixis and anaphora, an additional process is important for RSL, termed quasi-deixis: the signer creates analogs of imagined referents in his/her signing area, and their loci are used thereafter for quasi-deictic mentions of the referents.

Kibrik A. E. Arkhipov A. V. Daniel M. A. Kodzasov S. V. Moscow State University Myers Tom N-Topus Software Nakhimovsky A. D. Colgate University

DIGITAL PROCESSING OF LINGUISTIC DATA FOR MINORITY LANGUAGES DOCUMENTATION

The paper presents a new standard for a digital format of language documentation. A unified format of linguistic data presentation along with an integrated computer platform for creating and accessing multimedia linguistic resources are being developed within a project of documenting several minority languages of Russia.

Vitali Kiselov Ivan Tampel Marina Tatarnikova Yuri Khokhlov Speech Technology Center, St. Petersburg, Russia

OPEN-VOCABULARY HMM-BASED ISOLATED WORD RECOGNITION SYSTEM FOR THE RUSSIAN LANGUAGE

The paper describes a method of training context-dependent and context-independent acoustic models for the Russian language. The results obtained with these acoustic models applied to the HMM-based isolated word recognizer are presented.

Kobzareva T.Yu. Russian State University for the Humanities

BUILDING AND USE OF PROJECTIVE FRAGMENTS OF ATTRIBUTIVE NOUN AND AND PREPOSITIONAL PHRASES (SURFACE SYNTACTICAL ANALYSIS OF RUSSIAN SENTENCE)

Syntagmatic links delimiting projective fragments of attributive noun and prepositional phrases in the linear structure of the Russian sentence are examined from the point of view of (1) syntactic analysis strategy and (2) their analysis procedure.

Kobozeva I. M. Moscow State University

AMBIGUITY OF DISCOURSE MARKERS — CAN IT BE RESOLVED IN CLAUSAL CONTEXT? (THE CASE OF VOT)

In the paper we discuss the possibility of resolving syntactic and semantic ambiguity of discourse markers in а clause that has undergone morphological and partial syntactic analysis and partial semantic tagging in terms of semantic features of National Corpus of Russian. The Russian particle vot is used is used to illustrate the approach.

Koval S.L., Labutin P.V., Pehovsky T.S., Proschina E.A.,Smirnova N.S.,Talanov A.O. Speech Technology Center, St. Petersburg, Russia

COMPOSITE SPEAKER IDENTIFICATION METHOD

Forensic speaker identification method includes some stages demanding different types of speech analysis and is applicable to speech examination in languages unknown to the expert. The method embraces comparison of Gaussian Mixture Models of speech signals, formats and pitch statistical and structural analyses, "format matching" method, linguistic, aural and psychological speech analyses.

Koval S.L. Panova Е.А. Speech Technology Center, St. Petersburg, Russia

THE EXPERT METHOD OF THE DIAGNOSTICS OF SPEAKER

The expert method of diagnostics of speakers' biological parameters by speech is presented. The main attention focuses on representative speech data bases' requirements. During the diagnostics an expert provides optimally selected speech templates that illustrate demonstration of used auditory characteristics. The results of experts' diagnostics of speakers' biological parameters were checked on the speech data base (289 speakers). The accuracy of experts estimation of speakers biological parameters is acceptable to some practical applications.

Kodzasov S.V. Arkhipov A.V. Bonch- Osmolovskaya A.A. Zakharov L.M. Krivnova O.F. Moscow State University

"INTONATION OF RUSSIAN DIALOGUE" DATABASE: DECLARATIVE UTTERANCES

An overview of the final development stage of the database "Intonation of Russian Dialogue" is given A database entry contains a sound file, a pitch graph, and a multi-parametrical description of the utterance prosody. A detailed classification of declarative utterances is proposed.

Kozhunova O.S. Zatsman I.M. Institute for Informatics Problems of the Russian Academy of Sciences

PRAGMATIC ASPECTS OF CREATION OF THE SEMANTIC DICTIONARY FOR INFORMATION MONITORING

Development issues of research programs and project evaluation systems financed on a competitive basis are considered. The statement of the problem concerning experts' agreed acceptance of the meaning of performance indicators is offered (indicators are defined within these systems). A semantic dictionary for information monitoring is offered to solve the problem. Its role and functions to be implemented when developing the dictionary are discussed.

Kozerenko E. B. Institute for Informatics Problems of the Russian Academy of Sciences

VERBAL AND NOMINAL TRANSFORMATIONS IN THE ENGLISH-RUSSIAN MACHINE TRANSLATION

The paper focuses on the problem of developing formal linguistic presentations of verbal - nominal transformations for the English-Russian and Russian-English machine translation. The correlations of nominal and verbal functionality in the Russian and English scientific discourse are considered. The formal presentations of language phenomena in the linguistic processor under discussion are based on the Cognitive Transfer Grammar designed and developed for machine translation systems.

Koit M. Roosmaa T. Oim H. University of Tartu

FROM SYNTAX TO SEMANTICS - CHOOSING FORMALISMS AND LANGUAGE RESOURCES

The paper considers formalisms, methods and linguistic resources used in Computational Linguistics in order to model syntax and semantics. The second part of the paper gives an overview of work on automatic syntactic and semantic analysis of Estonian carried out at the University of Tartu.

Kondratenko N.V. Odessa I. I. Mechnikov National University

THE NEW YEAR ADDRESS AS A RITUAL GENRE OF POLITICAL DISCOURSE: THE MAIN MACROSTRUCTURAL COMPONENT AND ITS REPRESENTATION

The article is dedicated to the analysis of a ritual genre of political communication, the new year address. Structure, semantics and style of the address are considered. The place of the new year address in political discourse typology is determined. Emphasis is laid on particular realization of political rhetoric in the new year addresses of the presidents of Russia, Ukraine and Belarus.

Kopotev M. V. University of Helsinki, Finland Gurin G. B. Petrozavodsk State University, Russia

MARKING OF SYNTACTIC INCOMLETENESS IN A CORPUS

The paper is devoted to ways of marking syntactic zeros and similar phenomena in a Russian corpus. Composite classification of zero and zero-like units based on the reference papers on the topic is offered. Two approaches of marking and searching such units are discussed also.

Olga Krasavina Moscow State University / Humboldt University of Berlin, Department of German Language and Linguistics

Choice of third-person pronouns in discourse

Choice of referential expressions in discourse is highly dependent on contextual characteristics of referents. The current work analyses conditions under which prototypical (e.g. actant) vs. peripheral (e.g. possessive) pronouns are used. For this study, two German corpora annotated for discourse structure with co-reference mark-up have been used, Potsdam Commentary Corpus, PCC (Stede 2004) and NEGRA (Skut et al. 1997). The results of our study indicate that the use of different pronominal types is sensitive to distance. Furthermore, the effect of animacy, syntactic parallelism, discourse prominence, position in a sentence and discourse structure has been investigated. We came to a conclusion that there is no ultimate strategy responsible for all uses of referential forms, but rather there are a number of interacting mechanisms applicable to different discourse configurations. As for the pronoun use, the data revealed a set of compensating factors and complementary mechanisms, such as referential and rhetorical distance, animacy and distance, advantage of first mention and recency, topic persistence and distance.

Kreydlin G.E. Russian State University for the Humanities

MECHANISMS OF INTERATION BETWEEN VERBAL AND NOHVERBAL UNITS IN A DIALOG II A. DEICTIC GESTURES AND THEIR TYPES

An academic lecture regarded as a kind of dialog is a suitable testing ground for the recognition of certain peculiarities of gesture-speech interrelation and interaction. In this paper (part II A) certain classes of deictic gestures are described. Later on, (part II B) I plan to demonstrate that deictic gestures of different classes have different relations with the vocal and representational nonverbal signs in a dialog.

Sergej A. Krylov Sergej A. Starostin Institute of Oriental Studies of Russian Academy of Sciences, Moscow; Institute of System Analysis of Russian Academy of Sciences, Moscow; Russian State University for the Humanities

CREATION AND PRECESSING OF LEXICAL DATABASES IN THE ENVIRONMENT OF THE STARLING INEGRATED INFORMATIONAL SYSTEM

Tasks of computational lexicography being solved in StarLing environment are: (1) creation of lexical databases (LDB); (2) automatic and manual delimitation of the fields of the LDBs; (3) re-structuring of LDBs so as to bring their formal structure maximal close to their informative content.

Kuznetsov I.P. Matskevich A.G. Institute for Informatics Problems of the Russian Academy of Sciences

LINGUISTIC AND ALGORITHMIC ASPECTS OF OBJECT EXTRACTION FROM SUBJECT-DOMAIN-ORIENTED NATURAL LANGUAGE TEXTS

A semantic linguistic processor which extracts the objects and their links from natural language texts is considered. The paper analyzes the experience of using the processor for formalization of Russian and English texts in various subject fields: criminal actions, mass media, terrorist activities. Peculiarities of the texts are taken into account by linguistic knowledge of the processor.

Kustova G.I. VINITI RAN

POLYSEMY OF TEMPORAL ADJECTIVES

This paper describes the meanings of Russian temporal adjectives davnij 'of long standing', nedavnij 'recent', вчерашний 'yesterday's' and their interaction with the meanings of nouns they modify.

Lande D.V. Brajchevskiy S.M. Grigorjev A.N. Darmokhval A.T. Radetskiy A.B. ElVisti Information Center, Kiev

DETECTION OF NEW EVENTS FROM NEWS FLOW

The paper deals with current issues of new event detection from news flow, tracking, and clustering. An overview of theoretical and practical developments in the field is given. An innovative multicriteria algorithm of new event detection is presented. Retrospective analysis and technology of formation of subject chains that has been created within the framework of InfoStreama content monitoring system are used for algorithm parameter tuning.

N. Laufer

PREDICATIVES OF NECESSITY:STATISTICS AND SEMANTICS (A CORPUS-BASED RESEARCH)

Frequency characteristics of phrases with Russian predicative words надо and нужно 'it is necessary'are analyzed on the basis of the Russian National Corpus. An attempt is made to use statistical data to find semantic differences between the two words, which are usually considered synonymous.

Levontina I.B. Russian Language Institute — Vinogradov Institute

THE LANGUAGE OF CONSUMPTION (ON SOME NEW PHENOMENA IN RUSSIAN)

Considerable quantity of new Russian words is not in the last instance connected with some changes in the Russian linguistic "picture of the world". In particular, there are some new phenomena determined by the dissemination of values of the consumer society.

A. P. Leontyev Moscow State University / ABBYY Software House

CORRELATION BETWEEN EXTERNAL POSSESSOR CONSTRUCTIONS AND GENITIVE RELATIONS; PROBLEMS THET ARISE DURING THE RESEARCH

The paper is devoted to the correlation between external possession constructions and the so-called genitive relations. I demonstrate that the semantics of the external possessor impose certain restrictions on the range of possible genitive relations. I also claim that it is the semantics of external possessor constructions that determines its syntactic form. And the semantics of genitive relation is responsible for the other component of external possessor constructions - possessive relation between its components

Leontyeva N.N. Research Computing Center of Moscow State University

ON THE LEVELS AND EVALUATION OF SEMANTIC INCOMPLETENESS OF THE TEXTS

Redundancy and coherence are global properties of any natural text. These properties, as well as local semantic incompleteness, are made explicit in the semantic representation (SR). All these parameters affect the information value of the text and explain semantic compression. The compressed SR is text knowledge that also includes the component of ignorance (incompleteness of the text as a whole).

Alexander Letuchiy Vinogradov Russian language Institute, Moscow

RUSSIAN CONSTRUCTION OF THREAT AND ITS RELATIVES

The Russian construction of threat as illustrated by examples like Ja jemu spoju 'I will make him something bad if he sings' is analyzed. I examine formal and semantic properties of the construction and its relationship with valency derivations - and finally show that the construction itself can be regarded as a type of valency derivation.

Li I.V.

THE LOCAL AND THE GLOBAL LEVELS OF DIALOGUE MANAGEMENT

A frame-based dialogue model for an information desk on air flights is outlined. A model of dialogue management represented by local and global managers is considered.

Lobanov B.M. Davydov A.G. United Institute of Informatics Problems, National Academy of Science of the Republic of Belarus

A HIGH PRECISION PITCH MARKER OF COMPILATION UNITS FOR TEXT-TO-SPEECH SYNTHESIS

The paper describes a high precision pitch marker of compilation units for text-to-speech synthesis. The program and test results on male and female voices are presented.

Lobanov B.M. Tsirulnik L.I. United Institute of Informatics Problems, National Academy of Science of the Republic of Belarus

RULES OF SPEECH CORPUS SEGMENTATION INTO PHONETIC UNITS AND THE STRATEGY OF UNIT SELECTION IN SPEECH SYNTHESIS

Variants of speech corpus segmentation into word-internal and phrase-internal phonetic units such as allophones, di-allophones, and three types of allo-syllables are considered. Algorithms of speech corpus segmentation into phonetic units are described. Statistical characteristics of phonetic units of different types are discussed. The strategy of unit selection in speech synthesis is given.

Loukachevitch N. V. Dobrov B. V.Research Computing Center of Moscow State University, Center for Information Research

LEXICAL DISAMBIGUATION BASED ON DOMAIN SPECIFIC THESAURUS

The paper describes the means of the representation of senses and a procedure of lexical disambiguation based on a socio-political thesaurus. We also describe results of evaluation of the proposed algorithm.

Lashevskaja O.N. VINITI RAN, Moscow

TOWARD THE LEMMATIZATION OF WORD FORMS ABSENT FROM THE DICTIONARY

The paper deals with lemmatization of text tokens that dictionary-based morphological analyzers are unable to induce from their built-in dictionary. We evaluate an algorithm that establishes paradigmatic connections inside the unknown forms array, weighing up alternative hypotheses about the length of the stem for each word form. The composition of light and more elaborated clusterization routines proves to be highly effective for morphological post-processing of large text collections.

Mitrofanova О.А. Mukhin А.S. Panicheva P.V. St. Petersburg State University

AUTOMATIC WORD CLUSTERING IN RUSSIAN TEXTS BASED ON LATENT SEMANTIC ANALYSIS

The paper deals with elaboration and application of automatic word clustering tool aimed at processing of Russian raw texts. Special attention is given to experimental results on clustering with changing parameters, for various types of texts.

Nosenko N. Moscow State University

RUSSIAN CONSONANT SUBSTITUTION MODELS (PERCEPTION UNDER CONDITIONS OF NOISE)

Russian consonants substitution models are presented as binary block schemes which are describing characteristics of initial and perceived consonants. An attempt is made to classify consonant substitutions by schemes describing the types of transitions and to reduce the number of such models to a finite set.

Nevzorova O.A. Nevzorov V.N. Pjatkin N.V. Zin 'kina J. V. Chebotarev Research Institute of Mathematics and Mechanics, Kazan State Technical University

INTEGRAL TECHNOLOGY OF HOMONYMY DISAMBIGUATION IN THE LOTA TEXT MINING SYSTEM

An integral technology of homonymy disambiguation in the text mining system "LoTA" is described. The technology includes a collection of methods of homonymy disambiguation and their cooperation procedure.

Ovchinnikova T.E. MSLU

FRAGMENTATION OF THE MENTAL SPACE ACCORDING TO MODAL PARTICLES

The article deals with Russian modal particles VOT and VON that are homonymous with demonstrative adverbs. That fact makes it possible to use the concept of mental spaces where these particles are used as demonstratives. Particle senses are studied.

Paducheva E. V. VINITI RAN, Moscow

QUEST FOR THE OBSERVER: RUSSIAN VERBS VYGLJADET

The Russian verb vyglyadet’ 'look like' belongs to the class of perception verbs, but has the following peculiarity: its subject position is occupied by the Object of perception, so that Experiencer, obligatory for perception verbs, has no corresponding syntactic argument. Such a participant is called the Observer. The function of the Observer is usually fulfilled by the speaker. Hence the co-occurrence restriction: the subject position of vyglyadet’ cannot be occupied by the 1st person pronoun (*Ja vygljadela dovol'no stranno 'I looked rather strange'). There are contexts in which this restriction does not hold (Otec skazal, chto ja vygljadela dovol 'no stranno 'the father said that I looked rather strange': the function of the Observer is delegated to some person other than the speaker. The verb vyglyadet’ is semantically related to byt ' ‘be': vyglyadet’ may acquire the contextual meaning 'be', while byt’ may have a diathesis that includes the Observer.

Anna G. Pazelskaya ABBYY Software House

NUMBER AGREEMENT IN RUSSIAN NOUN PHRASES

The paper presents unusual uses of plural forms of Russian event nouns. In these uses event nouns stand in plural not for semantic reasons (because they denote a set of situations), but for syntactic reasons, These uses can be treated as a sort of number agreement in noun phrases in Russian.

Petrova M.A. ABBYY Software House; Institute of Linguistics, Russian Academy of Sciences

ON INTERCHANGEABILITY OF VERBS EXPRESSING KNOWING AND ABILITY (ON SLAVIC AND GERMAN LANGUAGES)

The work describes contexts where verbs normally meaning 'know' express some meanings of ability, and, on the contrary, modal verbs meaning 'ability' express some kinds of knowing. We isolate specific meanings of knowing and ability which can be expressed both by verbs of knowing and ability.

Pirogova J. Plekhanov Russian Academy of Economics, State University Higher School of Economics

DISCOURSE PRESSURE AND PERSUASIVE STRATEGY SELECTION IN MARKETING COMMUNICATIONS

The article discusses the generation of marketing communications arranged into some sort of unity - marketing communication campaign. The paper investigates various discourse factors affecting persuaders and determining their selection of persuasive strategy combinations.

Podlesskaya Vera Russian State University for the Humanities

A FAMILY OF CHTO ‘WHAT’ + ZA ‘FOR’ + NP CONSTRUCTIONS IN RUSSIAN: A CORPUS ANALYSIS

The paper presents a corpus analysis of a family of chto 'what' + za 'for' + NP constructions in Russian. Pragmatically, they are shown to have both interrogative and exclamative functions. Syntactically, they are unique in being transparent for the external nominative and accusative case. Semantically, they presuppose the existence of their referent.

Rogov A.A. Sidorov Yu.V. Solopova A.I., Surovtsova T.G Petrozavodsk State University

THE INFORMATION SYSTEM "SMALT"

The work presents the information system "SMALT". Its main tasks are collection, integrated storage of literary works including their grammatical and syntactic structures, and statistical processing and analysis of these structures aimed at detection of regularities.

Rozina R.I. Vinogradov 's Russian Language Institute

THE DERIVATION OF EXISTENTIAL AND LINK-VERB MEANINGS: THE CASE OF IDTI

The paper addresses the new meanings of the verb idti 'to go', namely the existential one and the meaning of a link-verb. Patterns of their derivation are suggested, and an attempt is made to account for their colloquial coloring. The derivation of the meaning of the link-verb idti is compared to that of the link-verb meanings of other Russian verbs.

Valery Sh. Rubashkin St. Petersburg State University

ONTOLOGY - PROBLEMS ANS SOLUTIONS. THE DESIGNER

We discuss current problems of ontological modeling. The author's R&D experience in ontology environment, as well as in ontology proper, is proposed.

Salomatina N.V. Gusev V.D. Institute of Mathematics

EDITING AND ENRICHMENT OF CUE DICTIONARIES FOR AUTOMATIC INFORMATION EXTRACTION FROM SCIENTIFIC TEXTS

The paper is a continuation of the authors' research in the field of Cue Dictionary formation and its application for extraction of various content aspects of scientific texts. Issues of dictionary editing and enrichment without the increase of training material and automatic text summarization are considered.

Semenova S.Yu. Institute of Scientific Information on Humane Sciences of the Russian Academy of Sciences

IF THE SEMANTIC CLASS IS TOO BROAD FOR A LEXEME (TOWARDS MENAING REPRESENTATION IN A COMPUTER DICTIONARY)

The basic way of lexical meaning description in a semantic dictionary aimed at NLP is to ascribe to a word some semantic class (or a conjunction of classes). The accuracy and completeness of meaning representation depends on the set of classes a lexicographer is allowed to use in the descriptions. Selection of classes makes an essential problem at the meeting-point of linguistics and information science. It is obvious that any finite set of bulk classes cannot cover satisfactorily the whole lexicon including lexemes with rather individual semantic characters, lexemes making groups that are smaller than the classes chosen, and lexemes that can be placed only at the periphery of the classes. Methods of sense representation of the lexemes above by means of the semantic classes of the RUSLAN machine dictionary are discussed. Lexicographical experiments are associated necessarily with the definition of the classes intension and of the boundaries between the neighboring classes. These issues are to some extent considered as well.

Smirnova N.S. Speech Technology Center

SPEAKER IDENTIFICATION BASED ON THE COMPARISON OF UTTERANCE PITCH CONTOUR PARAMETERS

A formalized approach to speaker identification based on utterance pitch contour parameters is presented. Experimental results allowing preliminary evaluation of the method are provided. Further research directions are discussed.

Serge Sharoff University of Leeds

CENTRAL PLANNING VS. FREE MARKET: COMPARING THE DISTRIBUTION OF TOPICS AND GENRES IN THE RUSSIAN NATIONAL CORPUS AND INTERNET

This study compares traditional representative corpora, such as the British or Russian National Corpora, against corpora extracted from the Internet. One method implies human annotation of a sample from an Internet corpus, which can be compared against a traditional corpus in the same language. The second method uses statistical models, which uses automatic text clusterisation to estimate the variation in their domains and genres.

Suleymanov D.Sh. Nevzorova O.A. Gatiatullin A.R. Gilmullin R.A. Ayupov M.M., Pyatkin N. V.

MAIN COMPONENTS OF APPLIED GRAMMATICAL MODEL OF TATAR LANGUAGE

The main components of an applied grammatical model of the Tatar language for information search tasks are presented.

Shemanaeva O.Ju. Kustova G.I. Lashevskaja O.N. Rakhilina E.V. VINITI RAN

SEMANTIC FILTERS FOR THE WORD SENSE DISAMBIGUATION IN RNC: ADJECTIVES

The paper demonstrates how the lexico-semantic annotation in RNC is used to make semantic filters for the word sense disambiguation. Most of the meanings of polysemous adjectives and other words have tags of semantic classes in the RNC semantic dictionary. In the corpus each instance of the word in the text receives all the semantic tags automatically. However, the system of semantic filters helps to delete the unnecessary tags and leave only relevant ones.

Tatiana Sherstinova Gregory Martynenko St. Petersburg State University

A STATISTIC DESCRIPTION OF INTONATION IN NENETS

The paper presents a methodology to study intonation in minority languages, which aims at description of the main prosodic models and revelation of general regularities of the intonation system. The proposed method is tested on the material of the Nenets language.

Elena G. Sokolova Russian State University for the Humanities Michael V. Boldasov Luxoft

SEMANTIC ANNOTATION OF AN IMAGE AS THE INPUT FOR NATURAL LANGUAGE GENERATION

In this paper we describe our investigation of NLG of image description texts. The input to NLG is a formal XML representation of image content - photo of open-air space: landscapes, city views etc. Means for the formal representations are discussed - objects, properties and relations. The XML representation consists of two parts - objects and spatial relations. The first part presents the elements of a photo composition, the second - spatial relations between these elements. We also discuss an ontology for the NL representation of objects and sources of verbs in the generated texts.

A.S. Starostin M.G. Malkovsky Moscow State University

ALGORITHM OF SYNTAX ANALYSIS EMPLOYED BY THE TREETON MORPHO-SYNTACTIC ANALYSIS SYSTEM

This paper continues presenting the project introduced in a previous paper by the authors. We discuss the algorithm of analysis employed by the syntax analyzer "Treevial", which is part of the "Treeton" morpho-syntactic analysis system. In the first three sections we describe the mathematical model on which "Treevial" is based. In the next two sections we state formally the task of syntax analysis, propose an algorithm which performs this task and discuss various features of this algorithm.

Shmeleva E. Vinogradov Institute of Russian Language, Russian Academy of Sciences Shmelev A. Moscow Pedagogical State University

POST-SOVIET RUSSIAN JOKES: NEW CHARACTERS

The paper describes new characters of Russian jokelore (such as new Russians, Estonians, computer programmers, drug addicts) that have emerged since 1990. In particular, it will discuss their "linguistic masks", which correlate with their "behavior masks".

Tsirulnik L.I. Zhadinets D.V. Lobanov B.M. Sizonov O.G. United Institute of Informatics Problems, National Academy of Science of the Republic of Belarus

ALGORITHMS OF SPEECH PROSODIC CHARACTERISTICS SYNTHESIS IN "MULTIPHONE" TTS SYNTHESIS SYSTEM

The Accent Unit Portraits model (AUP-model) of prosodic characteristics synthesis is presented. The principles of creation of phrase accent units portraits are described. The structure of prosodic characteristics synthesis subsystem is shown. The implementation of an AUP-model in the "Multiphone" multi-language TTS synthesis system is outlined.

Tsirulnik L.I. Lobanov B.M. United Institute of Informatics Problems, National Academy of Science of the Republic of Belarus

THE TECHNOLOGY OF COMPUTER CLONING AND SYNTHESIS OF PERSONAL SPEECH CHARACTERISTICS

The problems and technology of computer cloning of personal speech characteristics are outlined. The "PhonoCloner" computer system is presented. The system automatically creates a DB of compilation elements for speech synthesis, that constitutes the nucleus of the speech clone, i.e. the nucleus of a personalized Text-to-Speech system.

Uryson E.V. V.V.Vinogradov Institute of Russian Language, RAS

RUSSIAN PARTICLES UZHE AND UZH: VARIANTS, HOMONYMS, OR RELATED WORDS?

Semantics of Russian particles UZHE and UZH is described. In general, these particles have similar sets of meanings, but there are also contexts specific only for UZH. The particles under discussion share the common structure of polysemy, but UZH follows this structure more regularly.

Voskresenskij A.L. ANO «College of management, law & information technologies MESI», Moscow Khakhalin G.K. Independent researcher, Moscow

A MULTIMEDIA EXPLANATORY DICTIONARY OF RUSSIAN SIGN LANGUAGE

A description of an electronic explanatory dictionary of the Russian sign language is given. Problems of the conceptual "mapping" of the natural language onto the sign language are considered. The development of this dictionary is extrapolated to include a system of automatic translation into sign language.

Yanko T.E. Institute of Linguistics, Russian Academy of Sciences

PERFORMATIVE INTONATION. IS IT POLYSEMANTIC, OR HIGHLY ABSTRACT IN MEANING?

In spite of the general view that the L+H*LH% intonational pattern (in Pierrehumert's terminology) in English indicates contrast it has been shown that this pattern undifferentiatedly denotes a wide range of performative meanings which oppose the speaker to the hearer.

Yanovich I. ABBYY, MSU, Gruntova L. ABBYY

THE DISTRIBUTION OF RUSSIAN RELATIVE PRONOUNS KTO (СТО...) VS. KOTORYJ

The paper analyses the distribution of Russian relative pronouns kto (chto...) vs. kotoryj and suggests that the distribution of these pronouns depends on the presence of a lexical N in the DP modified by the relative. If present, N allows for the usage of kotoryj in the relative clause.

Yudina M.Moscow State University

SYNTACTIC AMBIGUITY RESOLUTION: IS THERE ANY PRIMING?

The paper is devoted to the first experience of the adaptation of the experiment on syntactic priming of relative clause attachment to the Russian material, certain difficulties and unexpected results are discussed.

Yudina M.V.1 Yanovich I.S.1,2 Fedorova O. V.1

SYNTACTIC AMBIGUITY IN THE EXPERIMENT AND IN LIFE

The paper is devoted to the distinctions between four experimental methodologies aimed at study of syntactic ambiguity from the point of view of results and cognitive operations required. The attempt was held to compare experimental activity of the participants with the ambiguity resolution in real communication.

Zhuravleva A., Koval S.L. Speech Technology Center, St. Petersburg, Russia

DIAGNOSTOCS OF PSYCHOLOGICAL FEATURES OF THE SPEAKER BY ORAL SPEECH

The method proposed enables trained experts to establish basic psychological features of the speaker.. The operational psychological model of the speaker includes individual life priorities, temperament, socionic type, personal character features.

Zalizniak Anna A. Institute of linguistics, Russian Academy of Sciences

THE SEMANTICS OF INVERTED COMMAS

The paper deals with the semantics of the quotation marks. A list of possible semantic functions of this punctuation character is given. The proposed invariant semantic definition explains individual meanings of the quotation mark as context variations. The quotation mark signals a violation of a standard semiotic act.

Zaretskaya E.N. Academy for National Economy under the Government of the Russian Federation

PERSUASIVE SPEECH

Speech behaviour of people has been the subject of special interest for linguistics of recent years. Justification and persuasiveness are not only important as cogitative, but as communication property. It is the projection of interrelation, internal mutual conditioning of subjects and phenomena in our consciousness. Any attempts to work with text irrespective of the content level are senseless and futile. Semantics and pragmatics come in the foreground in speech proof. Mechanism of persuasion is built on consecutive using of two logical speech procedures: - extrusion and replacement. Hence, persuasion is a system of two consecutive proofs.

Zakharov V.P. Institute for Linguistic Studies St. Petersburg State University

DICTIONARY CARD FILES AS AN OBJECT FOR AUTOMATION

The paper deals with issues of the computerization of card files of the Institute for Linguistic Studies comprising about 8 million cards. The place and the role of card files and corpora in lexicography are discussed. Compiling of specialized corpora aimed at creation of dictionaries is emphasized. The idea of an open online card index is discussed.

Zakharov L.M. Kazakevich O.A. Moscow State University

INTONATION OF DIALOGUE

The paper presents results of instrumental analysis of phrasal intonation in Ket, Selkup and Evenki dialogue speech. Our previous research into the intonation in Ket and Selkup narrative revealed that the tone at the end of phrases is practically always falling. The authors expected to find a richer spectrum of intonation contours in dialogue, and they discuss what they managed to find.

Anton Zimmerling

LOCATIVE INVERSION IN FREE WORD ORDER LANGUAGES

Locative Inversion, i.e. transformation SVLoc —> LocVS, is characteristic of a class of languages, including Russian, Lithuanian, Spanish, Greek and Albanian. In all these languages the position of the Verb is not fixed, in most of them the position of the Subject is not fixed either: therefore, the mechanism changing the placement of S and V in the context where Loc takes sentence-initial position is a challenge for the theory of word order. The author argues that Locative Inversion, contrary to claims made elsewhere, is triggered by Subject Movement to Focus position and not by Verb Movement to second position. The current versions of the EPP-driven analysis make wrong predictions about Locative Inversions in Russian and typologically similar European languages and cannot account for the placement of postverbal subjects in free word order languages.

Proceedings 2007

Contents

Additional

Collection of proceedings