All You Need Is A Wordbank
It is funny, how interesting ideas come to us from unexpected places. While I am in Japan, I really struggle to speak in Kazakh. For some reason while I’m away from my home, I don’t seem to have a command of the language I was trying to learn for almost half of my life. Yet, the Japanese seems to be coming to me way easier… Yet again, I noticed that I actually talk in Kazakh way better, when the person I am talking to is also Kazakh. Somehow my thoughts become expressed in another language, and my fingers already type away at it before I can say “What the hell is going on?”.
The idea came to me when I noticed that I was typing to her in Kazakh, or what seemed like Kazakh, but was actually my attempt at going at it with only a Russian alphabet.
Kazakh language has more sounds than the Russian language, so there are some letters that are missing, so we go about substituting for similar sounds, with an intermittent success.
Every self-respecting Kazakh person types with an Official Standardized Kazakh Layout™
.
I guess I don’t respect myself nearly enough.
The problem with Kazakh language layout is that it’s bad. Utterly bad. You must sacrifice your sanity to use it on a regular basis, and be on something to defend it from crititicism. I have seen many examples where the programming was supposed to make our lives better, yet we actually become slaves to it.
So the idea is embarrasingly simple: take the words that are written with Russian letters, and substitute some of them for Kazakh letters to end up with a Kazakh word. It’s like IME for Japanese (e.g. Anthy), but WAY simpler.
The problem? I don’t know how to code it. While the core of the algorithm is really easy, I have problems with building the interface. I wanted to intergrate it with the display server, but I am too stupid at the moment.
So I created a small browser extension instead. It is utterly bad. But as a proof-of-concept, it works quite nicely. It is exactly what I expected it to be. Now I just need to make it work on every website and iron out bugs. Probably will do it in the next year. But this is not what I want to talk about.
As I said, the algorithm is very simple. The problem is that it obviously depends on a Kazakh language dictionary. The bigger problem is that I can’t find the dictionary anywhere.
If you know Japanese language dictionaries, there are tons of them, and yet they are all the same. That’s because they use the same database. And by database I mean a file with all the words and meta-information necessary.
I can’t find anything similar in Kazakh language.
Okay, so all we need is a wordbank, right? Yeah. Since I am young adult that is trying to solve all the problems by himself, even when I should probably ask for help or advice, the next obvious thing would be trying to build it myself.
Now, you remember how I said that I don’t have a command of Kazakh language? Yeah, it’s difficult. There are many nuances I don’t know about Kazakh language. So because of that, I am most likely not making the database. At least not now.
Now, the whole point why I wanted to write this post: having a word database would be actually really-really useful.
One of the things that I came up with is a new IME for Kazakh language, but there’s more.
We could have a nice dictionary, like TAKABOTO, on Android, for free (as in freedom of speech AND free beer). This would make Kazakh language practically at your fingertips. One can add a nice fuzzy search and an IME for Kazakh layout, and you could actually allow people to make anki flashcards, or even have preset ankis for differing levels of difficulty.
Another useful thing we could have: an extension that allows to translate the selected Kazakh word on a webpage. Just like Yomitan!
Third, we could have more alternatives for websites such as Sozdik.kz.
Fourth, just having a database by itself is nice. One can do a datamining, or a research.
Those are just some ideas that I came up with. But very interesting observation is that all of them help with the preservation of the Kazakh language. The problem is not that less people speak Kazakh language, I don’t think that’s the case. The problem is that less people know how to speak proper Kazakh language. With all those small ideas we can:
- help preserve the state of the Kazakh language
- make it more accessible for general kazakh population
The most funny thing, however, is that all of them depend on one thing: having a solid wordbank (or database). So if you have ideas on how to build one such, I am all ears.