The Kazakh Language Requires Reform of its Writing
Linguistics EthnomedicineComputational ChemistryReceived 25 Jan 2024 Accepted 08 Feb 2024 Published online 09 Feb 2024
Focusing on Biology, Medicine and Engineering disciplines | ISSN: 2995-8067 G o o g l e Scholar
Next Full Text
Lifestyle and Well-being among Portuguese Firefighters
Received 25 Jan 2024 Accepted 08 Feb 2024 Published online 09 Feb 2024
The article provides information about modern problems of writing the Kazakh language, the importance of its role and development in the context of mass digitization using artificial intelligence technologies and computational linguistics methods. The incorrectness of the current alphabet of the Kazakh language based on the Cyrillic alphabet is proved in connection with the inclusion of Cyrillic letters in it, denoting phonemes that are not included in its sound structure. The necessity of reforming the Kazakh writing by replacing the incorrect alphabet is substantiated. Errors and contradictions are shown in the approved version of the Kazakh alphabet based on the Latin alphabet, as well as the alphabet proposed as a replacement for the approved one, in which some previous errors are repeated. In both cases, no analysis and clarification of the sound system of the Kazakh language, which is the basis of any alphabet, is carried out. In this study, to clarify the sound system of the Kazakh language, experiments were carried out to determine the articulation and acoustic features of Kazakh sounds with the help the computer programs used for many natural languages. In the articulation analysis, special attention was paid to vowels, which give rise to various contradictions in the Kazakh letter. It is proposed to use a new classification of vowels according to four binary features, rather than the traditional classification according to three binary features. Acoustic analysis uses the method of formant analysis, which is aimed at identifying certain formants in the spectrogram. The formant is obtained using a spectrograph. Quantitatively, the formants correspond to the maxima in the speech spectrum and usually appear on spectrograms as horizontal bands. After determining the composition and classification of the sound system of the Kazakh language, two variants of the alphabet based on the Latin alphabet are proposed: the first one is based on the Turkish alphabet using diacritical marks; the second is based on the English alphabet using digraphs. The second option offers ways to solve problems that arise when using digraphs. In conclusion, information is provided on the ongoing and ongoing work in Kazakhstan related to the creation of smart systems in the Kazakh language based on the methods and technologies of artificial intelligence and computational linguistics, the results of which are reflected in the list of sources.
Kazakh language is the national language of Kazakhs living in Kazakhstan and in many countries of the world [1]. Therefore, it, like any national language, which serves as the main means of communication for native speakers, must be studied and developed for use in the daily life of the Kazakh people.
Kazakh language is the state language of the Republic of Kazakhstan [2]. Therefore, it, like any state language, should serve as a means of communication for the peoples inhabiting the territory of Kazakhstan, supported by the state and used in all spheres of its activity, including in external relations with other countries. Kazakhstan, integrating into the world community, must provide a sufficient (in relation to the world level) level of development of the state language.
Kazakh writing originates from ancient Turkic runic writings of the VI-th - X-th centuries. The key to one of the variants of the ancient Turkic alphabet, the symbols of which were engraved in the Orkhon-Yenisei inscription on the Kul-Tegin tombstone, was first discovered by the Swedish scientist Wilhelm Thomsen in 1893 (he determined the meanings of the letter signs) [3]. Runic inscriptions close to the Orkhon-Yenisei ones were found 50 kilometers from Almaty, in the Issyk mound, where the “Golden Man” was buried. These inscriptions are called “Issyk inscriptions” and consist of 26 characters; in appearance they resemble Orkhon-Yenisey writing. Therefore, some researchers admit that the descendant of the Issyk writing may be the later classical Orkhon-Yenisey writing [4].
In connection with the spread of Islam in the written language of the Turkic world from IX-th to the XX-th centuries for many centuries, Arabic alphabet was used, since the holy book of Muslims, the Koran, was written on it. In the same alphabet, many scientists, teachers and poets have been creating their immortal works for more than thousand years, which are included in the world treasury of science, education and culture.
However, the Arabic alphabet, created for Semitic languages and well adapted to the requirements of Arabic language, did not fully reflect the rich phonetic system of Turkic languages: a number of characters in it were not needed by the Turkic languages, and vice versa, many sounds that are available in the Turkic languages are not reflected in it. As a result, it became necessary to make changes to the Arabic alphabet used by the Turkic languages. A new method of Turkic writing based on Arabic script, called “Usul al-Jadid”, was first invented by the prominent Crimean Tatar educator Ismail Gaspraly. The essence of his method lies in the phonetic processing of the Arabic alphabet, which put sounds and letters in correspondence, in contrast to the old Usul al-Qadim method, which offered a syllabic study of the language, when individual letters merged into syllables, and words were then formed from syllables. The new method made it possible to minimize the shortcomings of the basic Arabic pronunciation and reduce the period of literacy training by 2 times - 3 times [5].
In the XX-th century, the Kazakh language changed its alphabet 3 (three) times [6]. For the first time in 1912, using the “Usul al-Jadid” method, the founder of the modern theory of the Kazakh language, Akhmet Baitursynov, translated the Kazakh script into a new alphabet based on Arabic script. To do this, he systematized and specified the sound system of the Kazakh language of 28 sounds, in which there were 5 vowels, 19 consonant phonemes and 4 soft vowels were considered allophones (in fact, they were phonemes). Then he excluded all Arabic letters denoting non-kazakh sounds and developed a new alphabet of the Kazakh language, which contained 24 letters based on Arabic script and 1 special character (apostrophe). At the same time, digraphs were used to denote soft vowels, in which the first pairs were letters denoting hard consonants, and in each digraph the second pair was an apostrophe. The second time, in 1929, instead of an alphabet based on Arabic characters, a Latin-based alphabet consisting of 31 letters was adopted. For the third time, in 1940, a 42-letter alphabet based on the Cyrillic alphabet was adopted. In it, 31 letters represent Kazakh sounds, 9 letters represent Russian sounds that are not typical for the Kazakh language, and 2 symbols are used to indicate the thinness and thickness of consonant sounds.
Currently, Kazakhs living in different countries use different alphabets. For example, in China, Afghanistan, Pakistan and Iran based on Arabic letters, in Europe, America and Turkey based on the Latin letters, and in Central Asian countries based on the Cyrillic letters. This does not allow written communication and the formation of a single information and cultural space for the Kazakhs.
In the context of mass digitization, many countries are discussing the issue of using a single alphabet in the global information space, which is proposed as the English alphabet, in order to reduce the costs of searching, processing and combining information resources. In addition, all computers and other electronic devices produced in the world support the basic Latin alphabet. If the alphabet of a certain language differs by at least one letter from the basic Latin alphabet, then creating additional fonts, drivers, sorting and information retrieval programs to work with the alphabet of this language on these devices requires significant intellectual, labor, time and financial costs.
Currently, intelligent technologies are being introduced into all areas of human intellectual activity with the possibility of written and oral communication in the languages of sovereign countries. Such technologies require a large amount of complex scientific and practical work to create ontologies of subject areas, formalize the grammatical (morphological and syntactic) rules of natural languages, and develop language processors for the analysis and generation of written units (words and sentences).
It is known that the alphabet of any natural language is compiled in accordance with its sound system, specifying all phonemes and allophones that are represented in the International Phonetic Alphabet - IFA [7]. As for the Kazakh language, it should be noted that the current alphabet based on the Cyrillic and all subsequent officially proposed variants of alphabets based on the Latin were compiled without taking into account its sound system and therefore they are erroneous. These facts prove the need for a reform of the written language of the Kazakh language based on in-depth linguistic research using models and methods of computer linguistics and information technology.
Errors in the alphabet of Kazakh language based on Cyrillic: There are 42 letters in current alphabet of Kazakh language based on Cyrillic, of which 15 (Аа, Әә, Ее, Ёё, Ии, Оо,Өө, Уу, Ұұ, Үү, Ыы, Іі, Ээ, Юю, Яя) denotes vowel phonemes, 25 (Бб, Вв, Гг, Ғғ, Дд, Жж, Зз, Йй, Кк, Ққ, Лл, Мм, Нн, Ңң, Пп, Рр, Сс, Тт, Фф, Хх, Һһ,Цц, Чч, Шш, Щщ) are consonant phonemes, and 2 letters Ьь and Ъъ are used to denote softness and hardness of consonant phonemes, respectively [8].
It should be noted that the letters Ии, Уу, Ёё, Ээ, Юю, Яя, Щщ, Ьь, Ъъ have nothing to do with the Kazakh language, i.e. in native Kazakh words there are no sounds that can be indicated by these letters. But in the modern Kazakh language, these letters are mainly involved in terms borrowed from the Russian language.
When changing the Kazakh alphabet, you can easily get rid of the letters Ёё, Ээ, Юю, Яя, Щщ, Ьь, Ъъ by changing only the spelling rules, since they do not participate in affixes (suffixes, endings) of words. But with regard to the letters Ии and Уу, certain problems will arise, since in the modern Kazakh language they are included in the composition of affixes. In the current Cyrillic alphabet, the letters Ии and Уу denote the vowel phonemes of the Russian language (и) - [i] and (у) - [u], where in vertical brackets are symbols of the International Phonetic Alphabet [8].
Note that the inclusion of vowel phonemes of vowel phonemes (и) - [i] and (у) - [u] in the sound system of the Kazakh language will cause a violation of the following norms:
1) Law of synharmonism [9]:
In Kazakh root words, vowels must alternate with consonants, and only soft or only hard vowels should be used in recording, but when using these letters, this condition is violated. For example, in following words “қиа (cliff)”, “иә (yes)”, “ауа (air)”, “әуе (sky)”, “уақыт (time)”, “уәде (promise)”, “кие (shrine)”, “қиуа (away)”, “қия (obliquely)”. “саяхат (journey)”, “сүю (love)”, “сұю (liquefy)” letters of vowels occur in a row without regard to their softness and hardness;
2) Possessive conjunctions, person III [10]:
- If stem of word (root or root + suffix) ends with a hard (or soft) vowel, then a possessive ending in third person “сы” (or “сі”) is added to it. For example, “ана + сы (mother)”, “әже + сі (grandmother)”;
- If hard (or soft) stem of word ends in a consonant, then a possessive ending in third person “ы” (or “і”) is added to it. For example, “отан + ы (homeland)”, “ел + і (country)”.
But when using letters Ii, Uu, which denote vowels of Russian language, in the record of stem of word, these rules are violated. For example, “би + і (dance)”, “ми + ы (brain)”, “ту + ы (flag)” and “гу + і (hum)” are really written instead of writing according to rule “би + сі”, “ми+сы, “ту+сы” and “гу+сі”, which have no meaning in Kazakh language.
These examples prove the fallacy of the current alphabet of Kazakh language based on Cyrillic alphabet. Therefore, a reform of writing of Kazakh language is required.
Analysis of projects of the alphabet of Kazakh language based on Latin graphics: Variants of alphabets of the Kazakh language based on the Latin alphabet are presented in the following tables [8]:
Table 1 shows the alphabet with apostrophes, approved by Decree of the President of the Republic of Kazakhstan dated October 26, 2017 No. 569.
The alphabet in Table 1 has the following shortcomings and errors:
- In line 2 (A’ a’), 8 (G’ g’), 11 (I’ i’), 17 (N’ n’), 19 (O’ o’), 24 (S’ s’), 25 (C’ c’), 28 (U’ u’) and 31 (Y’ y’), digraphs are used to indicate one sound - a sequence of two characters: one of them is a Latin letter, and the second is an apostrophe, which has a separate encoding in the computer presentation.
- In line 11, one Latin letter represents two sounds, one of them is a vowel (и) - [i], and the other is a consonant (й) - [j], which will allow you to write words without syllables (vowelless).
- In line 31, the signs Y’y’ indicate the vowel sound of the Russian language (у) - [u], which is not included in the sound system of the Kazakh language.
Table 2 presents a new version of the Kazakh language alphabet, amended by Presidential Decree No. 637 of February 19, 2018 with the replacement of the apostrophe with diacritics and digraphs: Á á - (ə), Ó ó - (ө), Ú ú - (ү), Ǵ ǵ - (ғ), Ú ú - (ү), Ń ń - (ң), Sh sh - (w), Ch ch - (h).
The alphabet in Table 2 has the following shortcomings and errors:
- In line 11, one Latin letter represents two sounds, one of them is a vowel (и) - [i], and the other is a consonant (й) - [j], which will allow you to write words without syllables.
- In line 28, the letter Y y denotes the vowel sound (ы) - [ɯ], and in line 29, the same letter with the diacritic sign Ý ý denotes the vowel sound (у) - [u], which is not consonant with it, in the Russian language, which is not included in the sound system of the Kazakh language.
- In line 31, the digraph Sh sh denotes sound (ш) - [ʃ] and when writing some words, it generates spelling errors, for example, kitchen = асхана = ashana.
- In line 32, the digraphs Ch ch should not denote the phoneme (ч) - [tʃ], since the Latin Letter Cc is not included in the alphabet presented in Table 1.
Table 3 presents the version of the Kazakh Latin alphabet, developed in January 2021 by the A. Baitursynov Institute of Linguistics, into which the letters Ä ä - (ə), Ö ö - (ө), Ü ü - (ү), Ğ ğ - (ғ), Ū ū - (ұ), Ŋ ŋ - (ң) and Ş ş - (ш).
The alphabet in Table 3 has the following shortcomings and errors:
- Some consonant hard and soft vowel phonemes of the Kazakh language are designated by different Latin letters. For example, in line 11 the soft vowel phoneme (i) - [i] is designated by the Latin letter Ii, and in line 13 the consonant hard vowel phoneme (ы) - [ɯ] is designated by another Latin letter Yy, in contrast to the designations of other consonant vowels, presented in lines 1 and 2, 21 and 22, 28 and 29.
˗ In line 10 two consonant sounds [x] and [h] are designated by the same Latin letter H h;
˗ In line 11, the letter İ i̇ marks the soft vowel phoneme (i)-[ɪ], and in line 13, the letter Yy marks the hard vowel phoneme (ы)-[ɯ].
˗ In line 12 two different sounds {first vowel (и) - [i], and second consonant (й) - [y]} are indicated by the same Latin letter I ı;
- In line 20 the Kazakh consonant sound (ң) - [ŋ] is indicated by the extended Latin symbol Ŋ ŋ.
- In line 31, the Latin letter W w denotes the vowel sound of the Russian language (у) - [u], which is not part of the sound system of the Kazakh language.
Table 4 presents a version developed by the A. Baitursynov Institute of Linguistics in April 2021, in which the letter Ŋ ŋ is replaced by the letter Ñ ñ, and the letters C, X, W will be used in foreign words according to the principle of quotation marks.
The alphabet in Table 4 has the following shortcomings and errors:
In general, the errors in the alphabet of table 4 are the same as in the alphabets of tables 1, 2 and 3, that is, repeated errors associated with vowels of the Russian language, but there are also new shortcomings. For example, in line 28, the letter Y y denotes the hard vowel (ы) – [ɯ], consonant with the vowel sound (i) - [i], indicated by the letter I ı in line 11.
Further, taking into account the above, first in paragraph 3 the sound system of the Kazakh language is clarified, and then in paragraph 4 new variants of the Kazakh alphabet in the Latin script are given based on the sound system.
Vowel sounds: It is known that in any natural language, oral speech is primary, and written speech is secondary. Therefore, to build an alphabet, one must first clarify the sound system of the Kazakh language, and only then select letters (signs) to designate phonemes. In some cases, allophones can also be used.
However, at the moment there is no generally accepted opinion regarding the phonetics of the Kazakh language, there is still no standard of phonetics, where Kazakh sounds and their classification should be clarified. There is no such state as in the Kazakh language or any other state language of the world. This is due to the fact that the existing teaching on the phonetics of the Kazakh language is inherited from the Russian language and does not reflect the exact characteristics of the Kazakh sounds. Therefore, it is not clear what to do with the vowels of the Russian language, which were introduced into the Kazakh language without taking into account its phonetic patterns. One way to solve this problem is to exclude these sounds from the sound system of the Kazakh language, for example, as in the Azerbaijani language, and develop appropriate spelling rules.
In the Kazakh language, vowel phonemes, unlike consonant phonemes, play an important role in writing, since the feasibility of the law of synharmonism and spelling rules for the addition of affixes (suffixes and endings) depends on their classification features.
In the alphabet of Kazakh language, 11 vowel phonemes are indicated in Cyrillic (a) - [ɑ], (ә) - [æ], (e) - [e], (и) - [i], (o) - [ɔ], (ө) - [ɵ], (ұ) - [ʊ], (ү) - [ʏ], (y) - [u], (ы) - [ɯ], (i) - [ɪ, i]. Of these, the sounds (и) - [i], (у) - [u] were borrowed from the Russian language in 1940 during the translation of Kazakh writing from Arabic to Cyrillic. To check the reliability of their penetration into the Kazakh language, we used the methods of “Experimental Phonetics” [10] and conducted a perceptual analysis of the Kazakh language in order to determine its sound units by ear. For this purpose, a phonetic rich test was compiled (sound, syllable, word, phrase) with words from native Kazakh speakers from different regions, different genders and different ages.
As a result, it was found that the phonemes (и) - [i] and (y) - [u] are absent in the original word of the Kazakh language. As a result, it was determined that the diphthongs (й) - [ij] and (й) - [ɯj] are pronounced instead of the phoneme (и) - [i] depending on the softness and hardness of the syllable read. In the Kazakh language, instead of the hard vowel phoneme (у) - [u], there is a semivowel phoneme, which is absent in the Russian language, but exists in the English language. This half-vowel can be denoted as in IFA by the symbol ʊ, i.e. [ʊ]. Figure 1 shows the result of detecting the semivowel [ʊ] phoneme. Therefore, in the Kazakh language we will denote this semivowel phoneme with the Latin letter Ww.
In Figure 1A, you can notice the semivowel sound [ʊ] between the vowels (a) - [ɑ] and (ы) - [ɯ]. The sound [ʊ] has very limited obstacles in the process of articulation and therefore it is similar to vowels in terms of the spectrogram and sound wave, but it differs from them in the absence of energy in the frequency region of 11-15 kHz, the change in formants in the region of 2 kHz and 5 kHz and the amplitude , significantly less than the amplitudes of vowels (a) – [ɑ] and (ы) – [ɯ], which can be seen in Figure 1B. Based on the results of this experiment, it is assumed that the vowels of the Russian language (и) - [i] and (у) - [u] will not be included in the Kazakh sound system as a phoneme. They will be used as allophones of the Kazakh vowel phonemes (і) - [ɪ, i], (ұ) - [ʊ, u] and pronounced when reading terms borrowed from other languages. Thus, the sound system of Kazakh language will include 9 vowel phonemes (a) - [ɑ], (ә) - [æ], (e) - [e], (o) - [ɔ], (ө) - [ɵ], (ұ) - [ʊ], (ү) - [ʏ], (ы) - [ɯ], (і) - [ɪ], which are indicated in the alphabet based on the Latin script with separate letters.
In many sources on the phonetics of the Kazakh language, 3 binary features are distinguished for the articulation classification of Kazakh vowels. For a correct and complete description of the characteristics of nine objects, three binary features are not enough, since each feature has 2 values and the maximum number of distinguishable features is 23 = 9. Therefore, in these sources, the classification of Kazakh vowel sounds is incorrect and does not correspond to the classification of sounds in the IFA, which has 4 binary features: tongue position, jaw position and lips position with those shown in Table 3, as well as the vertical position of the tongue with two values (upper, lower). The IFA refers to the term “upper” as “closed” and “lower” as “open”, implying closeness when the tongue is raised to the roof of the mouth. Table 5 shows four binary articulatory features of Kazakh vowel phonemes.
It can be seen that the articulation features of vowels (ә) - [æ] and (e) - [e] in Table 5 completely coincide. This means that traditional classification cannot distinguish between these two vowels. Therefore, it is necessary to carefully analyze the articulatory and acoustic characteristics of the vowel system of the Kazakh language. For this we can recommend the following statement: “Language has two meanings depending on the vertical position, to which one more sign should be added.”
Let us present the articulatory classification of vowel sounds of the Kazakh language according to 4 binary characteristics according to the MPA. The classification is based on adding a fourth sign to the vertical position of the tongue and has two meanings, up and down. In IFA, the term “high”, which refers to the proximity of the tongue to the palate, is called “closed”, and the term “low” is called “open”. Table 6 presents four binary features of articulation of Kazakh vowels.
Now, using Boolean algebra [11-13], we will construct a mathematical model of the vowel system in the form of an algebraic expression based on the use of the values of the 4 articulatory binary features shown in Table 6.
Let us denote 4 articulatory features: Vertical position of the tongue and Horizontal position, Position of the lips, Position of the jaw, of Kazakh language vowels through logical variables x1, x2, x3 and x4, respectively. These variables only accept the values 1 for true and 0 for false:
- If tongue is in upper position, then x1 = 1, otherwise x1 = 0;
- If tongue is in back position, then x2 = 1, otherwise x2 = 0;
- If lips are in a rounded position, then x3 = 1, otherwise x3 = 0;
- If the jaw is in an open position, then x4 = 1, otherwise x4 = 0.
Based on set values for variables, it is possible to build a Boolean model of the system of Kazakh vowels presented in Table 7.
Further, for each of 9 vowels, writing out signs in form of their conjunction and combining these conjunctions, we obtain following disjunctive normal form of 9 members:
(2.1)
Applying axioms of Boolean algebra, we obtain a simplified expression:
(2.2)
Expression (2.2) is called the membership function, which characterizes the system of Kazakh vowels in the theorem below.
The membership theorem. The vowel λ belongs to Kazakh vowel system if and only if its Articulatory features x1, x2, x3 и x4, defined above, satisfy disjunctive normal form [13]:
(2.3)
On the basis of four Articulatory features, a geometric model of vowel sounds of Kazakh language, shown in Figure 2, was built in [14].
In Figure 2, plane [ɑ] [ɔ] [ʊ] [ɯ] represents back-lingual (hard) vowels (a) (o) (ұ) (ы), plane [æ] [ɵ] [ʏ] [ɪ] - front-lingual (soft) vowels (ә) (ө) (ү) (і), plane [ɑ] [ɔ] [ɵ] [æ] - open vowels (a) (o) (ө) (ә), plane [ɯ] [ʊ] [ʏ] [ɪ] - closed vowels (ы) (ұ) (ү) (і), plane [ɔ] [ʊ] [ʏ] [ɵ] - rounded vowels (o) (ұ) (ү) (ө), plane [ɑ] [ɯ] [ɪ] [æ] are unrounded vowels (а) (ы) (і) (ә), and peak [e] represents a special sound (е).
Based on this geometric model, following facts (axioms) can be established:
1. No peaks (vowel sounds) from lower and upper planes, including [e], can appear in the original Kazakh word. This fact is called palatal (palatal) vowel harmony or synharmonism.
2. In Kazakh language there is no native word containing more than three different vowels, and if it contains three vowels, then these are soft vowels, one of which is obligatory [e]. This can be interpreted in Figure 1 as follows: no 3 vowels of lower or upper planes occur in Kazakh word. A word containing 3 different vowels includes vertices of the plane passing through [e] and any 2 vertices of the upper plane.
Now we can classify Kazakh vowels based on 4 binary Articulatory features (according to horizontal, vertical position of tongue, jaw position, lips position), shown in Figure 3.
Here, signs are shown on top of horizontal position of tongue, on left - signs on position of jaw, on right - signs on vertical position of tongue, and signs on position of lips are taken into account when two sounds are located in one line - unrounded on left, rounded on right.
Acoustic analysis of vowels of Kazakh language is based on synharmonic timbres: hard low, soft low, hard high. soft highs, soft lows.
Table 8 shows the system of synharmonic vowel timbres of the Kazakh language.
In a computer experiment [13], we are mixing the oscillograms of hard (back-lingual) vowels (a) - [ɑ], (o) - [ɔ], (ұ) - [ʊ] and (ы) - [ɯ] with the oscillogram of the sound [e] - (e) and discovered the properties [15] shown in Figure 4:
These properties prove that in the root words of the Kazakh language there are no pairs of vowel phonemes: (а) - [ɑ] and [e] - (е); (о) - [ɔ] and [e] - (е); (ұ) - [ʊ] and [e] - (е).
Based on the above research, the sound system of the Kazakh language will include:
- 9 vowel phonemes and among them 2 phonemes (colored in red) will have allophones: (a) - [ɑ], (ә) - [æ], (e) - [e], (і) - [ɪ, i], (ы) - [ɯ], (o) - [ɔ], (ө) - [ɵ], (ұ) - [ʊ, u], (ү) - [ʏ];
- 24 consonant phonemes and among them 1 semivowel phoneme (colored in red): (б) - [b], (в) - [v], (г) - [g], (ғ) - [ɣ], (д) - [d], (ж) - [Ʒ], (з) - [z], (й) -[j], (к) - [k], (қ) - [q], (л) - [l], (м) - [m], (н) - [n], (ң) - [ŋ], (п) - [p], (р) - [r], (с) - [s], (т) - [t], (ц) - [tc], (ф) - [f], (х) - [h], (ч) - [tʃ], (ш) - [ʃ], (w) - [w].
These phonemes allow you to write international untranslatable terms for personal names, country names, geographic names, etc.
Two variants of the Kazakh language alphabet based on the latin alphabet
The proposed variants of the Latin alphabet of the Kazakh language based on the sound system updated in paragraph 9 will include 9 vowel phonemes and 2 allophones, as well as 24 consonant phonemes and 1 semivowel phoneme.
1. The phonemes of the Kazakh language in round brackets are indicated by lowercase Cyrillic letters, and in square brackets - by the symbols of the international phonetic alphabet.
2. Acceptance of the vowel sounds (и) – [i] and (у) –[u] of the Russian language as allophones of the vowel phonemes of the Kazakh language (i) - [ɪ, i] and (ұ) - [ʊ, u], allows you to write and read international terms (for example, internet, institute, university, supremum, Ural, India etc.) as in the original and does not cause any contradictions in the addition ending parts of speech involving these sounds.
3. In the current alphabet of the Kazakh language based on the Cyrillic alphabet, there is no letter denoting the semivowel phoneme (w) - [w]. The presence of the phoneme (w) - [w] in the Kazakh language is proved by acoustic analysis using a computer program [16] and is given above in subparagraph 3.1.
In the new alphabet, phonemes for which there are no adequate Latin letters will create problems. These include soft vowel phonemes (ә) - [æ], (ө) - [ɵ] and (ү) - [ʏ], as well as consonant phonemes (ғ) - [ɣ], (ң) - [ŋ], (ч) - [tʃ] and (ш) - [ʃ].
The first option is based on the Turkish alphabet:
˗ Soft vowel phonemes (ә) - [æ], (ө) - [ɵ] and (ү) - [ʏ] are denoted by the Latin letters Aa, Oo and Uu with a superscript “two dots” diacritic Ä ä, Ö ö and Ü ü, respectively;
˗ Soft vowel (i) - [i] is marked with the letter with a superscript diacritic “one dot” İ i, and the hard vowel (ы) consonant with it are marked with the letter without the diacritic I ı;
- Consonant phonemes (ғ) - [ɣ] and (ң) - [ŋ] are denoted by the Latin letters Gg and Nn with accented diacritic signs “brevis” Ǧǧ and Ňň, respectively;
˗ Consonant phonemes (ч) - [tʃ] and (ш) - [ʃ] are denoted by Latin letters with subscript diacritic signs “cedilla” Çç and Ş ş, respectively.
The first version of the proposed alphabet does not give rise to any contradictions in the Kazakh orthoepy and spelling, since it is based on the refined sound system of the Kazakh language and on the principle of “One sound and one letter”.
The second option based on the english alphabet:˗ Soft vowel phonemes (ә) - [æ], (ө) - [ɵ] and (ү) - [ʏ] are denoted by digraphs consisting of two Latin letters Ae ae, Oe oe and Ue ue, respectively. The rationale for the use of these digraphs to denote soft phonemes was the results of a computer experiment presented in Figure 2 “Properties of vowel sounds of the Kazakh language”, in paragraph 3.1 of this article.
˗ The consonant phonemes (ғ) - [ɣ], (ң) - [ŋ], (ч) - [tʃ] and (ш) - [ʃ] are denoted by letters with brevis Gh gh, Ng ng, Ch ch and Sh sh, respectively.
Note 2: Digraphs are not part of the alphabet, they are essentially the spelling rules of a particular language in which they are used.
It is proposed to include in both versions of the alphabet the letter of the basic Latin alphabet Xx, which will denote not a specific phoneme, but combinations of consonant phonemes (к) - [k] and (с) - [s]. The use of this letter will not generate any contradictions in the Kazakh script and will allow writing international scientific, technical and other terms in the same way as the original in English or closer to it. For example: axis, axiom, axelerat, box, exel, experience, expert, export, context, maximum, mixer, Oxford, taxi, xerox.
When using the second version of the alphabet, some problems of orthoepy and spelling of the Kazakh language arise, associated with the use of digraphs. But you can get rid of them if you take into account the following properties and requirements:
1) In native Kazakh words, there are no consonants before the consonant sound (ң), therefore, in any word, the digraph “ng” should be written in only one syllable.
2) When directly encoding words that are different in meaning and written in Cyrillic in Latin letters, they are spelled the same way:
Therefore, in order to avoid such cases, it is necessary:
- Take into account the properties of assimilation (similarity) of sounds and introduce a special spelling rule: “If in a word the sound (н) - [n] is immediately followed by the sound (г) - [g], then when reading such word, the sound (н) - [n] is pronounced like a sound (ң) - [ŋ], i.e. the sound (n) - [n] is assimilated to the sound (ң) - [ŋ]. For example, the words “angime - conversation” and “tungi - night” when reading are pronounced as “angime” and “tungi”.
- Instead of one Latin letter n, you need to write the digraph ng, for example, “kүngі = kuengi = kuenggi”, which will correspond to the pronunciation patterns in the Kazakh language and not give rise to ambiguity in its writing.
3) In word “хана - room”, phoneme (х) - [h] must be replaced by phoneme (қ) -[q]. Then records combined with his participation will not create any spelling problems. For example, «кітапхана → кітапхана = kitapqana, қымызхана → қымызқана = kitapqana, кітапхана → кітапхана = kitapqana».
4) In combined words or human nouns, consonant phoneme (х) - [h], which immediately follows consonant phoneme (с) - [s] or (з) - [z], must be replaced by phoneme (қ) - [q]. For example, «асхана → асқана = аsqana, Асхат → Асқат = Asqat, Досхан → Досқан = Dosqan, Оразхан → Оразқан = Orazqan. Otherwise, same words are «асхана = аshana = ашана, Асхат = Ashat = Ашат, Досхан = Doshan = Дошан, Оразхан = Orazhan = Оражан».
Thus, having eliminated the shortcomings of the variants of the alphabet of the Kazakh language given in Tables 9-11, based on the refined Kazakh sound system, two variants of the alphabet in the Latin script are proposed: the first version is based on the Turkish alphabet using diacretic marks, and the second version is based on the alphabet English using digraphs.
The proposed variants of Kazakh alphabet have following properties:
1. When creating and processing information resources in the Kazakh language, software tools that require additional production costs are not required:
- Fonts and drivers, as they consist only of letters of the English alphabet, drivers for all types of input and output devices placed on the keyboard of all types of computers with multiple fonts;
- Programs for sorting and searching for information, since the sequence of letters in it corresponds to the sequence of letters of the English alphabet;
2. The alphabet makes it possible to combine internationally accepted terms, not translated into the Kazakh language, according to orthographic rules - words written in the original, without changing the sound system of the Kazakh language;
3. If consecutive pairs of phonemes (а) - [ɑ] and [е] - (е) are found in complex words; (o) — [ɔ] and [e] — (e); (u) — [ʊ] and [e] — (е), then when writing simple words forming them should be separated by a space or a hyphen. For example, the complex word “Karaemel” is formed from two simple words “Kara” and “emel”.
4. Alphabet allows you to quickly and easily type Kazakh text using the keyboard of any computer or smartphone.
Table 9 shows the Kazakh alphabet using diacritics based on the Turkish alphabet. Table 10 presents the alphabet of the Kazakh language using digraphs based on the English alphabet. Table 11 lists the spelling rules for representing phonemes using digraphs.
In the era of digitalization of society, many natural languages are being developed with the help of computer programs: electronic dictionaries, multimedia question-answer systems, intelligent learning and knowledge assessment systems, machine translators from one language to another, recognition and synthesis systems for written and oral speech, etc. The basis of these works is mathematical models of the grammatical rules of these languages.
Similar problems can be posed and successfully solved for the Kazakh language. However, these errors do not make it possible to formalize the morphological rules of the Kazakh language and automate the morphological analysis and synthesis of Kazakh words. The proposed new alphabet makes it possible to successfully solve these problems.
In Kazakhstan, such work is carried out under the guidance of the author of this article. Some of the results of these works have been published in collections of proceedings of international conferences and journals indexed in the Scopus and Web of Science databases [17–27].
Altynbek S. The Kazakh Language Requires Reform of its Writing. IgMin Res. 09 Feb, 2024; 2(2): 073-083. IgMin ID: igmin148; DOI: 10.61927/igmin148; Available at: www.igminresearch.com/articles/pdf/igmin148.pdf
Anyone you share the following link with will be able to read this content:
Doctor of Technical Sciences, Professor of Department of Artificial Intelligence Technology, L. N. Gumilyov Eurasian National University, Astana, Kazakhstan
Address Correspondence:
Sharipbay Altynbek, Doctor of Technical Sciences, Professor of Department of Artificial Intelligence Technology, L. N. Gumilyov Eurasian National University, Astana, Kazakhstan, Email: [email protected]
How to cite this article:
Altynbek S. The Kazakh Language Requires Reform of its Writing. IgMin Res. 09 Feb, 2024; 2(2): 073-083. IgMin ID: igmin148; DOI: 10.61927/igmin148; Available at: www.igminresearch.com/articles/pdf/igmin148.pdf
Copyright: © 2024 Altynbek S. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.