Posts Tagged ‘internationalization’

I’m constantly approached by developers working on new Mobile apps and by translators working on the localization of existing mobile apps asking me for guidelines and best practices on how to write or translate for Mobile. I’ve given the topic a lot of thoughts and I come out with the 3 magic Fs rule: No, the 3 Fs don’t stand for Francesco, Francesco and Francesco; they stand for Fast, Focus and Fun.

When writing for mobile, keep in mind the intended audience: mobile users have less time and shorter attention spam than regular web users. And they want to have fun. So here it goes:

1. Fast: Keep it brief. Be concise and precise. Try to use the same number of characters as in the English source (including spaces), and don’t use more unless absolutely necessary. Describe only what’s necessary, and no more. Don’t try to explain subtle differences. They will be lost on most users.

2. Focus: Keep it simple. Pretend you’re speaking to someone who’s smart and competent, but doesn’t understand technical jargon. Use short words, active verbs, and common nouns. Put the most important thing first. The first words in a sentence should include at least a hint of the most important information in the phrase.

3. Fun: Be friendly. Talk directly to the reader using second person “you”*. If your text doesn’t read the way you’d say it in a casual conversation, it’s probably not the way you should write it. Don’t be abrupt or annoying. Make the user feel safe, happy and energized. Don’t use abbreviations to shorten a word or a phrase. Abbreviations as a shortcut for space restrictions must be avoided at all times.

wiritingformobile
Examples:

1. Keep it brief.

noToo formal
Consult the documentation that came with your phone for further instructions.

yesPreferred
Read the instructions that came with your phone.

2. Keep it simple.

noConfusing
You cannot perform this action with this app because this feature is not supported for your country. Please use the main website instead.

yesCrystal clear
This feature is not supported in your country yet. Please use the website.

3. Be friendly.

noConfusing
Sorry! The app is not responding. Please close it and reopen it.

yes Shorter, more direct, no fake apologetic
The app isn’t responding. Please restart.

4. Put the most important thing first.

noTask last
Tap Next to complete setup.

yesTask first
To complete the setup, tap Next.

5. Describe only what’s necessary, and no more.

noToo wordy
The app needs to communicate with our servers to sign in to your account. This may take a few moments.

yesShort and to the point
The app is connecting to the server. This can take a few moments.

6. Don’t use abbreviations to shorten a word or phrase

noAbbreviation
Go to Intl. settings.

yesSpelled out
Go to International settings.

Being fast means that features are fast to use, therefore the text needs to be fast to read. Focus is about simplicity, therefore the text needs to be easy to read. Fun is about engagement, therefore text needs to be friendly.

 

Introduction to AI

AI (Artificial Intelligence) is the study and design of intelligent agents. AI programs are called Intelligent Agent. Here is how it works:


The Intelligent Agent (on the left) interacts with an Environment (on the right). The Agent perceives the state of the Environment through its sensors and at the same time it affects its state through its actuators.

The real challenge about AI is the function that maps sensors to actuators: that is called Control Policy for the Agent.

Based on the data received from sensors, the agent makes decisions and pass them over to its actuators. These decisions take place several times and the loop of environment, feedback from sensors, agent’s decision and actuators interaction with the environment is called Perception-Action-Cycle.

AI is used in many fields, among which:

  • Finance
  • Robotics
  • Games
  • Medicine
  • And of course: the Web

AI and uncertainty

AI is all about uncertainty management. In other words, we use AI if we want to know what to do when we don’t know what to do. There could be many reasons for uncertainty in a computer program:

  • Sensor limits
  • Adversaries that make it hard for you to understand what’s happening
  • Stochastic environment (where behaviors are intrinsically non-deterministic)
  • Laziness
  • Plain ignorance (many people that don’t know what’s going on, could easily learn it, but they just don’t care)

All of the above are possible causes for uncertainty and AI.

Example of AI in practice

One of the many key applications of AI techniques is Machine Translation. How does Machine Translation work?

Machine Translation generates translations using AI techniques based on bilingual text corpora. Where such corpora are available, impressive results can be achieved translating texts of a very similar kind. Unfortunately, such corpora of bilingual texts are still very rare and the size of the available corpora varies significantly from one language combination to the other.

So what does Machine Translation looks like? On a large scale Machine Translation system, examples are found on the web. On a small scale, they can be found anywhere. This example was found in a Chinese restaurant in Cupertino:

In these type of text a line in Chinese corresponds to a line in English. To learn from this text, we need to find out the correspondence between words in Chinese and words in English. For example, we can highlight the word “wonton” in English. It appears 3 times throughout the text. In each of those lines there is also one Chinese character that appears: 雲. So it seems that there is a high probability that this ideogram in Chinese corresponds to the word “wonton” in English. Please note that we are talking about probabilities here. As a matter of fact “wonton” in Chinese is 雲吞 and not just 雲. For some reason the ideogram 雲吞 on line 65 is abbreviated to just 雲. And it’s not a common abbreviation.

You can go further, and try to find out what ideogram in Chinese correspond to the word “chicken” in English:


Please note that we aren’t 100% sure that 雞 is the ideogram for “chicken” in Chinese but we do know that there is a good chance because each time the word “chicken” appears in English this ideogram appears in Chinese.

Now let’s see if we can find a correspondence for the word Soup:


As you can see the word “soup” occurs in most these phrases but not in all of them. In the English side of the menu is missing in 1 place (65. Egg Drop Wonton Mix). Equivalently, on the Chinese side of the menu is missing in 1 difference place (廣東雲吞 60).

The correspondence doesn’t have to be 100% to tell us that there is still a good chance of a correlation.

In Machine Translation these type of alignment is used to create probability tables. Hence the name Statistical Machine Translation. In other words, the probability of one phrase in one language to correspond to another phrase in another language.

More on Machine Translation in future posts. Stay tuned.

Francesco Pugliano

 

The end of free Machine Translation API

Last June Adam Feldman (API Product Manager at Google), announced they were pulling the plug on their Google Translate API, causing a lot of concern and some protests in the developers and localization world. You can read the announcement here.

Then in August, Jeff Chin, (International Product Manager at Google) took that back and announced that they were offering the Translate API at a cost instead of free of charge. You can read the announcement here.

Here is Google’s pricing model:
$20 per million characters of text translated.

In September, Vikram Dendi (Director of Product Management at Microsoft), announced something very similar, but not many people took notice. You can read the announcement here.

Here’s Microsoft’s pricing model:
No cost up to 4M characters a month. Then $10 per million characters.

Unlike Google, Microsoft will only charge you when you reach the threshold of 4M characters a month and will then cost half as much ($10 per million characters instead of $20).

Quality of Google and Bing Machine Translation services

The quality of Google and Bing Statistical Machine Translation systems now that the technology is mature, heavily depends on the quality of the parallel text found on the web and crawled by their MT engines. Before the advent of Google and Bing translate, parallel text found on the web – more often than not – was produced by professional translators, and therefore of good quality.

Now, translating content professionally is expensive. Depending on the domain of translation and the language pairs, professional translation can cost as much as $0.50 per word for a language such as Japanese and between $0.18 to $0.21 per word for European languages.

During the recent financial crunch in 2008, many web publishers needed to cut costs. It’s not a surprise that they started to abuse the free Google Translate and Bing Translate API to translate content and then publish it as is, with no professional review.

This is a common technique that SEO companies have been applying to bring more users to a website and then turn them to premium content (professionally translated content).

The problem is that no algorithm is (yet) capable to understand whether content has been translated by a Machine Translation system or by a professional translator. Only trained human translators that speak the language can do that.

Today, both Microsoft and Google Machine Translation engines are crawling and processing web content that may have been published without any human proof-reading after being translated using the very same Google or Microsoft’s translation API.

In other words, these two companies are “polluting their own drinking water”.

I hope that by starting to charge for their Machine Translation Services both Google and Microsoft can decrease or at least control the amount of sub-standard translations published on the web so that in turn their MT engines can produce more reliable translations. Feeding their engine with United Nations and European Union bilingual documents is not enough to produce high quality translation.

Size doesn’t matter without quality

Many publishers in recent years have started to build their own corpora of bilingual texts to feed their Machine Translation engines with. It’s a given that an ad-hoc Machine Translation database fed only with high quality human translated and proof-read bilingual text in a specific domain can produce higher quality than Bing and Google Translate.

Unfortunately at some point these publishers may start to pollute their MT systems with content that has been machine translated and not carefully reviewed by professional translators.

We have seen this happening in the past, for example when the hype was all about Translation Memories instead of MT engines as it appears to be today. Some companies saw their Translation Memories growing bigger and bigger with no or little control on the quality of the content they were fed with, thus polluting their TMs and making them almost unusable.

Francesco Pugliano