Formal models of nouns in the Kazakh language

 

Assel MUKANOVA, Banu YERGESH, Gulmira BEKMANOVA, Bibigul RAZAKHOVA, Altynbek SHARIPBAY

 

L.N. Gumilyov Eurasian National University, Astana, Kazakhstan

E-mails: comyouth@mail.ru, asel_ms@bk.ru, b.yergesh@gmail.com, gulmira-r@yandex.ru, utalina@mail.ru, sharalt@mail.ru

* Corresponding author, Phone: +77014295966

 

 

Abstract

This paper explains how semantic hypergraphs are used to construct ontological models of morphological rules in the Kazakh language. The nodes within these g raphs represent semantic features (morphological concepts) and the edges within represent the relationships between these features. Word forms within the hypergraph structure are described in trees which are converted into linear parenthesis notation; the trees and the linear parenthesis notations correspond to each other. Linear parenthesis notations are the formal models of morphological rules and the software implementation of the linear parenthesis notation allows for the automation of the synthesis of the various morphological word form analyses of the Kazakh language.

Keywords

Ontology; Morphological rules; Hypergraph; Semantic hypergraph; Linear parenthesis notation; Word form synthesis; Morphological analysis

 

 

Introduction

 

Agglutinative languages (lat. Agglutinatio - combine, stick) are languages that have a system in which the dominant type of inflection is the agglutination ("sticking") of different formants; these can be either a prefix or a suffix and have only one meaning [1].

The Kazakh language is part of the Turkic group of languages; this language group can be classified as an agglutinative language. Words in the Kazakh language contain many word inflections; inflections are formed by adding suffixes and endings to words. Suffixes and endings are attached in a strict sequence and words in the Kazakh language vary in number, case, and person. A possessive form in Kazakh exists as it does in the English language [2-3].

Currently, ontology is a powerful and widely used tool which is used to model the relationship between objects of different subject fields. It is acceptable to classify ontology based on the degree of dependence on the task or application area, the model of ontological knowledge representation and expressiveness as well as other parameters [4]. Applied ontologies describe concepts which depend on both the task and the subject field of ontology.

Applied ontology is based on the general principles of ontology building, using semantic hypergraphs as a model for the representation of knowledge. This formalism will determine ontology O as triplet (V, R, K) where V is a set of concepts of the subject field (hypergraph nodes), R is a set of relationships between these concepts (hypergraph and edges), and K is a set of the names of concepts and relationships in the given subject field.

The semantic hypergraph language is a formal means of the representation of knowledge in which it is possible to implement classifying, functional, situational, and structural networks and scenarios, depending on the relationship types. This language is an extension of semantic networks where N-ary relations are represented naturally; these relations not only allow for the specification of the objects’ attributes but also permit a representation of their structural, "holistic" descriptions [5].

There are some papers on the use of semantic hypergraph [6-7]. Zhen L, Jiang Z. [7] describes the semantic hypergraph model as a 'hyper-graph based semantic network' (Hy-SN), which can represent more complex semantic relationships and which have a more efficient data structure for storing knowledge in repositories.

In [8-9] the hypergraph H (V, E) is defined by the pair (V, E), where V is the set of vertices V = {vi}, i Î I, I = {1, 2, …, n}, and E is set of edges E = {ej}, j Î J, J = {1, 2, …, m}; each edge is a subset of V. For vertex v and edge e, v is described as an incident to e if v Î e. For v Î V by d (v) denotes the number of edges incident to a vertex v; d (v) is called the degree of a vertex v. Degree of edge e, the number of vertices incident to this edge, is denoted by r (e).

Use of the ontology model for the representation of morphological rules allows for the translation of the morphological model on an almost one to one basis within the object-oriented data model. Where classes are the part of speech of the Kazakh language and the objects refer to their semantic categories, for example, animateness and inanimateness.

Use of the ontology model for the representation of morphological rules part of speech allows describing complete morphological model with their relationships. Use semantic hyper graph for the representation of morphological rules part of speech and structure (frame) for the representation the concept. This representation allows translating to the object-oriented data model, where semantic hypergraph vertices are classes.

The purpose of this research is the automated generation of word forms and new words in the Kazakh language as well as the morphological analysis of the Kazakh language.

The research problem consists of the difficulties of formalizing of any natural language.

The authors believe that the problem of formalization of the Kazakh language is handled well through the proposed model below. In this paper we describe a noun in detail.

In this paper we describe a noun in Material and Method section.

 

 

Material and method

 

The semantic features of the initial forms of nouns (N) are animateness (anim) and inanimateness (inanim); the sign determines the trajectory of the inflection of the noun. Nouns in the Kazakh language conjugate (pers_end) and variesy for case (casas), as well as numbers (number) and have a possessive form (poss_end).

We used the ontology editor Protege [10] to build an ontology. It is a free and open source ontology editor and framework for building knowledge bases and is being developed at Stanford University in collaboration with the University of Manchester. Figure 1 shows the ontological model of noun with its semantic features.

 

Описание: C:\Users\Администратор\Desktop\Новая папка\Morph_Ont\Noun\Zat1.jpg

Figure 1. Ontological model of noun

Описание: C:\Users\Администратор\Desktop\Новая папка\Morph_Ont\Noun\Noun.jpeg

Figure 2. Visualization of noun as a graph

 

Описание: C:\Users\Администратор\Desktop\Новая папка\Zat.jpg

Figure 3. Graphical representation of ontology using semantic hypergraph

 

Table 1 describes the concepts and relationships used in the ontology.

Table 1. Concepts and relationships

ID

Notation

Description

N

Noun

Part_of_speech

Part_of_speech

Item

Item

Anim

Animate

Sign of animateness

 

Inanim

Inanimate

Sign of inanimateness

 

Cases

Cases

Nom

Nominative case

Gen

Genitive case

Dat

Direction- dative case

Acc

Accusative case

Loc

Locative case

Abl

Ablative case

Ins

Instrumental case

Pers_end

Personal endings

1 pr

1 personal

2 pr

2 personal

3 pr

3 personal

Poss_end

Possessive endings

1 ps

1 personal

2 ps

2 personal

3 ps

3 personal

Number

Number

Pl

Plural

Sg

Singular

is_a

 

denotes

 

has_feature

 

has

 

devided

 

change

 

add

 

 

Hyper-arcs will be called as semantic arcs for separating semantic hypergraphs from other types of graphs; it will also be assumed that the set of vertices of the semantic hypergraph includes set of classes, where each of which will consist of set of instances of the class [11]. Thus, vertex-class can be represented by triple:

,

where - set of class properties, - set of semantic arcs incident to class, - set of instance of class.

The noun vertex-classes:

We can represent the noun morphological model with the semantic hypergraph model:

Hypergraph H (V, E), where

 

 

Results and discussion

 

We have the base of initial forms containing 40,000 words with semantic features. Here 25,660 words are nouns. From the above described semantic hyper graph we can obtain formal rules using the parenthesis notation. The number of formal rules for nouns are 4,500.

Through the use of these formal rules 1,605,725 word forms of the noun are generated; it is also possible to generate nouns from other parts of speech.

As an example the inflection of the animate noun "bala" ( translate "child") includes all word forms of this noun and their morphological information, which in abbreviated notation contains information on which number, which case of the noun, and which person is an action and whether it belongs to one or another person. An example shows the inflection of the noun "bala" in cases. Figure 4 shows the program implementation.

Example. Inflection of the noun "bala"

S=bala

{bala (бала), balanyn (баланың), balagha (балаға), balany (баланы), balada (балада), baladan (баладан), balamen (баламен)} 

 

Описание: C:\Users\Администратор\Desktop\bala.jpg

Figure 4. Program implementation of the inflection

 

On the basis of these rules the morphological analyzer for the Kazakh language was created. It can be used to create spell checking technology of the Kazakh language and can be a cornerstone for translators, semantic search engines, speech technologies, etc.

Many methods of formalizing the morphological rules of a natural language do not allow the description of the semantic properties of words. This paper elaborates on the possibility of using semantic hyper graphs as a tool in order to formalize the morphological rules of any natural language based on the semantic features of words. Although this paper uses the Kazakh language to illustrate this concept the semantic hyper graph can be applied to any natural language.

Earlier results were obtained using a semantic neural network. 2.8 million Word forms were generated from 40,000 initial word forms; these results were approved in [12]. The application of the semantic hyper graph allowed an increase of the number of word forms to 400,000 units. This was achieved by a complete description of the semantic features of words, which utilized the expressive power of the semantic hyper graph.

In the future we plan to apply this proposed method towards other Turkic languages.

 

Conclusion

 

The construction of ontological models of the morphological rules of Kazakh language allowed for the creation of formal rules of inflection and word formation for each part of speech. Software implementation of these rules made it possible to automatically generate more than 3.2 million word forms (dictionary entries) from 40,000 initial word forms with marked semantic features.

 

 

References

 

1.      Eifring H., Theil R. (online), Linguistics for students of asian and african languages. Available at: http://www.uio.no/studier/emner/hf/ikos/EXFAC03-AAS/h05/larestoff/linguistics/ (accessed 19/08/2014)

2.      Kazakh grammar. Phonetics, word formation, morphology, syntax, Astana, 2002. In Kazakh.

3.      Batayeva Z., Colloquial kazakh, Routledge, 2012.

4.      Gruber, T.R., Toward principles for the design of ontologies used for knowledge sharing, International journal human-computer studies, 1995, 43(5-6), p. 907-928.

5.      Khakhalin, G., Applied ontology in the language of hypergraphs, Proceedings of IInd all–russian conference “Knowledge - Ontology - Theory” (KONT-09), 2009, p. 223-231. In Russian.

6.      Lian R., Goertzel B., Ke S., O’Neill J., Sadeghi K., Shiu S., Wang D., Watkins O., Yu G., Syntax-semantic mapping for general intelligence: language comprehension as hypergraph homomorphism, language generation as constraint satisfaction, Artificial General intelligence lecture notes in computer science, 2012, 7716, p. 158-167.

7.      Zhen L., Jiang Z., Hy-SN: Hyper-graph based semantic network, Knowledge-based systems, 2010, 23(8), p. 809-816.

8.      Bretto A., Hypergraph theory, Springer international publishing Switzerland, 2013.

9.      Berge C.C., Graphs and hypergraphs, Elsevier science Ltd., 1985.

10.  Protégé. Available at: http://protege.stanford.edu (accessed 19/08/2014)

11.  Potchinskii I., Formal representation of semantic hypergraphs and their operations, 2012. Available at: http://rgu-penza.ru/mni/content/files/2012_Pochinskii.pdf.

12.  Sharipbaev A.A., Bekmanova G.T., Buribayeva A.K., Yergesh B.Z., Mukanova A.S., Kaliyev A.K., Semantic neural network model of morphological rules of the agglutinative languages, 6th International Conference on soft computing and intelligent systems, and 13th International symposium on advanced intelligence systems, SCIS/ISIS, 2012, p. 1094-1099.