from Hacker News

Google Translate invented its own language to help it translate more effectively

by alexkadis on 1/25/17, 12:41 PM with 8 comments

  • by jdmichal on 1/25/17, 1:36 PM

    No, it didn't. Or, rather, if it did, then so does every human. The neural network is doing what NNs do and associating particular input patterns with particular neurons / pathways. So the same or highly similar concepts end up in the same place, which is how they are connected for the purpose of translation.

    All this likely has similar analogues in the human brain. That is, I would be rather surprised if there wasn't a dedicated neural pathway identifying a banana, which fires whenever the thought of a banana is invoked. This is also where banana is associated with yellow and food and delicious etc.

    Also, don't forget that in the human brain reading and listening may as well be two separate languages processed by entirely different portions of the brain. I would have to see pretty convincing evidence to believe that reading "banana" and hearing it don't at some point touch the same part of the brain where the concept and associations of "banana" "live".

  • by xbmcuser on 1/25/17, 1:35 PM

    I disagree with the authors conclusion that it invented a new language. I know a few languages if you can see into my brain you would probably find that my brain has saved meanings of words that only my brain could understand that is not a new language. What Google translate is doing now is that it comprehends multiple languages so articulates the output in the required language.
  • by balabaster on 1/25/17, 2:20 PM

    This is fascinating... I wonder how it copes with the words and phrases that don't have any meaningful translation. Figures of speech that only work within a culture because of the culture and removing that removes the context in which the phrase makes sense.

    I remember a past girlfriend who had cute little phrases which I'd love to remember right now by way of example but they escape me, which made no sense when translated to English because the context that made them make sense didn't exist outside of her language.

  • by jasode on 1/25/17, 1:43 PM

    For technical HN readers, I think the article[1] that the author linked is better.

    After reading Google's explanation, I don't think his comment is accurate:

    >Google Translate invented its own language to help it translate more effectively.

    >What’s more, nobody told it to. It didn’t develop a language (or interlingua, as Google call it) because it was coded to. It developed a new language because the software determined over time that this was the most efficient way to solve the problem of translation.

    That makes it sound like the middle GNMT box (alternating in blue and orange) was automatically fabricated by the algorithm. Instead, what seems to have happened is that the existence of an "intermediate" representation was a deliberate architecture choice by human Google programmers. What got "learned by machine" was the build up of internal data (filling up the vectors with numbers to find mappings of "meaning").

    Google programmers can chime in on this but as an outsider, I'm guessing the previous incarnations of translate was more "point-to-point" instead of "hub-&-spoke".

    With the 103 languages, the point-to-point when computed as "n choose k"[2] means 5253[3] possible direct mappings. (Although one example pair such as African Swahili to Australia Aborigine would probably not be filled with translation data.)

    With the new GNMT (the intermediate hub), you don't need a 5253 mappings. Instead of (n!/k!(n-k)!) combinations, it's just n. (However, I'm not saying that reducing the mathematical combinations was the main motivator for the re-architecture.)

    An analogy would be the LLVM IR intermediate representation. One can target an "intermediate hub" language like LLVM-IR. This reduces the combinatorial complexity of all frontend programming language compilers to understand all backend machine languages. Instead of languages like Rust & Julia writing point-to-point backends to specific machine languages like x86 & ARM & Sun. The difference with Google's GNMT is that the keywords of "intermediate language" was not pre-specified by humans.

    [1] https://research.googleblog.com/2016/11/zero-shot-translatio...

    [2] https://en.wikipedia.org/wiki/Combination#Number_of_k-combin...

    [3] https://www.google.com/search?q=(103%5E2-103)%2F2

  • by wooot on 1/25/17, 5:20 PM

    What is the difference between how this neural network translates from Japanese to Korean, and just translating from Japanese to English and then English to Korean.
  • by Hnrobert42 on 1/25/17, 3:33 PM

    Is it just me or do you have to have a linkedin account to read the article?