How to Fabricate a Serial Killer

He preys on the homeless. He kills dozens of homeless people across the city but bites them before he kills them. Then, he ties a red ribbon on their wrist as his modus operandi. He calls journalists…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Some thoughts about language detection modules in Android

This is the story of my involvement in a task to identify which languages are present in a given text in an Android application.

A while ago I started learning about language detection. I was required to investigate how language detection works and how to best utilize an existing language detection module in an Android app open source project I was involved in.

The objective was to be able to identity right-to-left languages in the text in a chat module of the application in order to be able to align the text messages to the right or left portion of the screen. When I took this task — the current method of determining if the text is in a right to left language was done by running a regex test on the first character in the text against the the letters in Arabic and Hebrew (which obviously was not a very good approach since the user can write a message starting with an English word following be Hebrew/Arabic which would be aligned to the left instead of to the right).

The objective of the task was to be able to identify the languages using a more reliable mechanism and also to be able to identity other right-to-left languages such as Urdu, Armenian, Farsi, Dhivehi, and Kurdish.

Well, it turns out than this subject is big. much bigger that I thought. Language detection is a complex subject.

How do you detect which languages are present in a given text ?
Check the letters ? not a good solution. The text might simple meaningless Gibberish in English letters so which language is it? No, not a good solution.
Check the words? hm… this means you need to have a dictionary of words. In many languages (as many as you’d want to support). That’s lots of space for each language and will result in your app’s package size increasing beyond an acceptable size. Aside from that — what happens if you have words in your text that exist in different languages but are written using the same letters? not a good solution.

I had to take another approach. As I researched It more I found interesting projects that tried to tackle this problem. They offer an interesting approach to solving this problem. Since I was developing the solution in context of an Android app I focused on Java based solution.

To be honest — I didn’t think any of the options were good ones. Each had its drawbacks like affecting the size of the APK or being available only in a specific version of Android and I thought all of them are partial solutions at best.

I would love your feedback and comments on this project and or any similar projects you were involved in!

Add a comment

Related posts:

Vulnerabilities and Start with WHY

I recently read a book named Start With WHY by Simon Sinek (link). It talks about how we as individuals can be successful by starting with the question of why we are doing what we are doing. Then the…

Is Virtual Reality Becoming a Commercial Failure?

Back in ye olden days of 2016, virtual reality (VR) seemed ready for a mainstream breakthrough. Encouraged by tech giants such as Facebook (which had acquired Oculus in 2014) and Valve (which…

Gasp

This is about my experience of burnout a few months ago. I’m more mindful these days about when I need to back away from the edge…unless love is involved…. La petite mort (French pronunciation: ​[la…