Can a machine replace a human interpreter?

Many people assume that simultaneous interpretation machines must be available by now. Here are some of the reasons they don't work for meetings and conferences yet.


If you are like many of our customers, you are looking for a machine that will automatically interpret from one language to another.

Sorry to be the bearer of bad news, but the short version of our answer is: you will be disappointed – there really is no such thing. Yet.

For more information, read on:

Many people assume that by now there must be a machine that can translate (or more accurately interpret) in spoken form between 2 languages. And to a certain extent, they are correct. There are some good talking phrase books on the market. Logbar ili is an example of new technology that offers that kind of function. Ectaco talking phrasebooks have been around for quite a long time. Google Translate is a free app for computers, pads and smartphones that is really fun and does a great job. The mobile versions include voice recognition in multiple languages, and text to voice, so it truly is a simultaneous interpretation machine. But it's best with simple sentences, and with the understanding that it will sometimes make mistakes.

By the way, if you want to test Google Translate, or any translation device, just translate a phrase, then make it translate the answer back into English. It's a fun game. Often, you will be impressed. Other times, you will have a good laugh. 

When it comes to interpreting a conference, meeting, sermon, or class, none of these devices can keep up, and even if they could, their rate of errors is too great for the interpretation to be useful.

In theory, an automatic interpretation (interpretation refers to spoken words, while translation means written words) machine needs 3 components:

  1. First of all, voice recognition is needed to change the spoken word into written form.
  2. Translation sofware then translates the written word from one language to the other.
  3. Text-to-voice software then pronounces the words in the new language.

All of these components are readily available. So how come they can't be combined into a useful interpretation machine?

The weakest link in the chain may be the voice recognition software. Of course, this software is improving all the time, but anyone who's tried dictating a text through their smartphone, or read the transcript of a voicemail message knows voice recognition still makes a lot of mistakes. Especially if the software has not had the chance to 'train with' and 'learn from' the particular person who is speaking. I get transcripts of many of my voicemails, and sometimes the transcript is perfect, and I don't need to listen to the actual message. But sometimes the transcript is so unintelligible, I either have no idea what is being said, or I completely misunderstand the message. Most of the time though, I get the gist of the message, but miss some finer details.

The second problem with voice recognition, beyond mis-hearing words, is the problem of homonyms. These are words that sound the same, but mean different things. Take the phrase 'walking down the aisle', which in English we understand
means 'getting married'. The software could be forgiven for transcribing it as 'walking down the isle' which would mean 'walking down the island'. And suddenly the meaning is lost.

The process of translating the written words from one language to another also has risks. Take the phrase 'I cannot lie'. This would probably be someone saying they have to tell the truth, but they might be saying they have a medical condition
that stops them from lying down. While it is usually easy for us to determine which meaning is intended because we understand context, it's much harder for software to figure this out.

If we put these two example phrases together, we might have an English sentence that reads 'I cannot lie, I want to walk down the aisle' For us English-speaking humans, that's easy to understand. But imagine if our software believed the sentence was 'I am unable to lie down, I want to walk down the island.' Would you guess the person was saying they wanted to get married?

Then there is the problem of idioms. These are the phrases that can't be translated literally – the words together mean something different from their normal individual meanings. An example might be that something costs 'an arm and a leg'. We know this just means something is expensive. But software might be forgiven for taking it more literally.

Let's add this to our sample sentence. How about: 'I cannot lie, I found out that it cost an arm and a leg to walk down the aisle'.

Now you see why the task is so difficult.

The third part of the equation – converting the newly translated text to voice is probably the easiest part. Text-to-voice has come a long way in the last years, no more robotic monotones. When I listen to the foreign language pronunciation in Google Translate, I am amazed at the authenticity (of the languages I understand). Of course, there are still hazards that the software must be aware of. In this case, one of the biggest concerns is heteronyms. These are words that are spelled the same, but mean different things, and are therefore pronounced differently. 'Tear' is a good example. Is it a 'teardrop' or a 'rip'? The pronunciation is different.

Going back to our example of a challenging sentence, let's add to it: 'I cannot lie, when I found out it would cost an arm and a leg to walk down the aisle, I felt a little tear in my eye'. Translation software might mix up 3 of those items (1 at each stage), to come up with the meaning of: 'I am unable to lie down, when I found out I would be walking down the island, I
felt a small rip in my eye'. We've transformed an honest admission into a medical nightmare!

Just for fun, I spoke this sentence into Google Translate on my phone, and asked it for the Chinese equivalent. Since I don't speak Chinese, I then 'reverse translated' it back to English to see how well it did. The resulting sentence (translated to Chinese and then back into English) was:

'I can not lie, when I found out it cost an arm and a leg in my eyes, and I felt a little tear aisle.'

I tried the same experiment with German, and the result was:

'I can not do it if I find it, would an arm and a leg cost the aisle, I felt a little tear in my eye go down.'

Obviously, it would be impossible to understand a speech if it was translated that way. To be fair to Google Translate, when I did the same experiment with French, the result was a perfect spoken and written interpretation. But then I tried a different (simpler) sentence from English to French and the software mis-heard me so the result didn't make much sense. 

And we haven't even addressed the issue of how you would place this machine so it would be able to hear a public speech clearly without picking up extraneous noises that would cause problems.

So, you begin to see the challenges. They are not insurmountable, but they are significant.

I've been in this business for 21 years, and people have been asking for automatic translation machines since the very beginning. My answer has always been 'not yet, but in the next 5-10 years'. So far, I've always been wrong about that. But
eventually, as long as I keep saying it, I will be right! And when the machines are able to accurately and simultaneously interpret a speech, sermon or class, you can bet we will offer them.

In the meantime, enjoy Google Translate, and talking phrasebooks – they are great tools for enabling simple one-on-one conversations.

Please log in to leave a comment.

Reviews for