Zero-shot transfer across 93 languages

code.fb.com

283 points by moneil971 5 years ago

> The encoder is five-layer bidirectional LSTM (long short-term memory) network. In contrast with neural machine translation, we do not use an attention mechanism but instead have a 1,024-dimension fixed-size vector to represent the input sentence.

5 layers of 1024-cell bidirectional LSTMs (edit: actually 512-cells x2?)? Can consumer GPUs even fit that (+ the decoder) into RAM?

isoprophlex 5 years ago

Notwithstanding the memory complexity, I love that they reach such impressive results with relatively straightforward components: theres no attention mechanism, and encoding into latent space is simply max pooling over the last LSTM layer.
I don't fully understand how the training step works though. They train only on translation to english and Spanish?
Anyway very cool stuff.
pmalynin 5 years ago

It’s not 1024 cells. The size of the hidden vector is 1024. Which is roughly an order of megabyte per cell (1024x1024 matrix). Here they have a cell per word, which is reasonable.
- minimaxir 5 years ago
  
  Max-Pooling a 1024-cell output LSTM will result in a 1024-sized vector.
  Looking at the code for the Encoder (https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2...), each LSTM has the same amount of hidden cells. (although the default parameters of that class don't quite match the ones used in the post; so I assume it's 512x2x5).
  
  pmalynin 5 years ago
  
  Yes but they're not maxpooling in the last dimension. They're max pooling over the sequence length [0], (the other way doesn't really make sense in this context).
  The output size is 1024, the hidden vector size is 512 but they're using bidirectional LSTMs which concatenates the outputs of each direction -- so the total is 1024 [1].
  [0] https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2...
  [1] https://pytorch.org/docs/stable/nn.html#lstm
  
  minimaxir 5 years ago
  
  Gotcha, that makes sense. (I'm less familiar with dimension ordering in PyTorch)

nograpes 5 years ago

I think the English translations of the Hindi and the Bulgarian are mixed up. The Hindi should be "Their destination was secret", and the Bulgarian should be "Nobody knew where they went". Also, the Devnagari script is not rendered properly; the diacritical marks should be directly over (or under) the related character, and conjunct characters are not "squished" together.

statguy 5 years ago

Also the Hindi translation is very formal, hardly anyone speaks like that colloquially. Which, is understandable because the translation probably reflects the data trained on, but something to keep in mind.
dmurray 5 years ago

There are also half a dozen typos in the third graphic (the one showing relationships between languages). Hopefully the sloppiness here isn't a reflection on the overall quality of the work.
thaumasiotes 5 years ago

> The Hindi should be "Their destination was secret", and the Bulgarian should be "Nobody knew where they went".
Those aren't good translations of each other (barring a helpful context); to me, "their destination was secret" states that the goal was for nobody to know where they went (with a weak implication that the goal was achieved), whereas "nobody knew where they went" states that, in reality, nobody knew where they went (and says nothing about whether that was intentional).
- yorwba 5 years ago
  
  > Those aren't good translations of each other
  That's because they're not translations. The table in the article makes it pretty clear that they're arbitrary sentences that were classified as "related" based on their embedding vectors.

raldi 5 years ago

Can someone explain what "zero shot" means? The link doesn't explain, and some basic googling doesn't either.

avinium 5 years ago

It's a catch-all term for "learning to solve a task without seeing any specific training examples for that task."
In this context, it's learning to translate from (say) Czech to Urdu, without ever having been explicitly provided Czech-Urdu sentence pairs.
- raldi 5 years ago
  
  Is it just a synonym for https://en.wikipedia.org/wiki/Unsupervised_learning ?
  
  avinium 5 years ago
  
  Not quite.
  Unsupervised learning implies that there are no human-annotated labels whatsoever (in this context, meaning that the model had no paired translations at all).
  Zero-shot learning (usually) means that the model can generalize learning from seen labels to unseen labels.
  That being said, conceptually I guess there could be an "unsupervised zero-shot learning" model - say, a language model that learns word embeddings from English wikipedia, and trying to use those embeddings to generate French sentences. My guess is that it simply doesn't work.
  
  nl 5 years ago
  
  To expand on this (complete correct) response, unsupervised training is often part of the training process for a zero-shot prediction task.
  For example it's pretty common to use unsupervised learning to build embeddings for each target language, align the embeddings somehow (noting that you don't have labels, so you are using the multi-dimensional shapes within the embeddings to try to match them) and then finally test against labelled data (the zero-shot thing).
  Zero shot, cross modal transfer is something humans do really well. You can read a description of a Platypus and then label it correctly even if you have never seen one before.
  A seminal paper in this was Richard Socher's Zero-Shot Learning Through Cross-Modal Transfer[1]. It's the paper that earmarked him as a star, and look at the co-authors (Chris Manning and Andrew Ng).
  [1] https://arxiv.org/abs/1301.3666
  
  raldi 5 years ago
  
  Okay, so unsupervised learning would be if you had never seen any Earth animals before, and were presented with 99 photos of fish and one giraffe and noticed that the latter was the oddball, whereas zero-shot would be like if you were told that a giraffe was yellow and brown with four legs and a long neck and then said, "That must be a giraffe!" the first time you ever saw a photo of one.
  
  nl 5 years ago
  
  Yeah that's a reasonable way to look at it.

XaspR8d 5 years ago

Tangent: I've always wondered if there would be utility in humans gaining expertise in writing "for" translation, i.e. knowing what kinds of semantic and syntactic constructs are the least lossy when localized, or perhaps even learning to write in some intermediate, non-native-human language whose reduced feature set guarantees a certain level of translatability.

I suppose the answer might be that machine translation will improve fast enough that such a field wouldn't have time to emerge. But I always think that using humans to intelligently fill gaps in machine competency is a neat solution!

radarsat1 5 years ago

Living the last several years as an ex-pat, I can at least attest that one tends to lean towards using expressions and words that are easy to understand or even slanted towards expressions that are translations of expressions in the target language instead of native expressions or word choices. One sort of subconsciously also starts to repeat the typical English mistakes that non-native speakers use when speaking English to you, which is bad because it reinforces their mistakes but it's hard to avoid. (E.g. like dropping articles or pronouns.)
And yes, I do the same when using Google Translate, I will very often write things in English in a way that I know will translate better to the target language, similarly to how I will write a search query using words that I think will be more likely to return useful results even if they aren't completely "natural". I just consider it part of a skill of knowing how to use automatic translation to my benefit. You literally learned to be "skilled" at using something that's supposed to be auto-adaptive, which is interesting in itself. To start to learn and adapt to its dynamics. This is also interesting just for the fact that I have learned to depend on automatic translation as a way of living. Many things related to living abroad would not have been as easy or even possible without the help of Google Translate, despite having learned the language colloquially.
- Cerium 5 years ago
  
  About five years back I had to do a good deal of communication with the aide of Google Translate. I would frequently copy what I wrote into Google Translate, then copy the result into another tab to translate back to English. I found that if the text could make a round trip without loosing anything important, then it could be understood.
- visarga 5 years ago
  
  Have you tried DeepL? It is a translation service with surprising quality.
  https://www.deepl.com/translator
- peteretep 5 years ago
  
  Also mimicking the way non-native speakers pronounce words is useful. You can get very very far with English in Thailand in almost every situation if you know how to Thai-ify words (Apple -> Ah-Poon, Stereo -> Sah-teh-lee-oh" for example)
  
  wingerlang 5 years ago
  
  It get extra interesting when you learn to read Thai script, you can really see how they have tried to write the English words in Thai script and how the Thai speech/reading rules "breaks" the word.
  For example Apple, in Thai it is written แอปเปิ้ล with each Thai character trying to be 1:1, for example แอ=ae ป=p (loosely). The interesting part is at the end, ล this character on its own is generally pronounced as an L. So for the English word, they put the L at the end to make the word apple. But - in Thai, characters have different sounds depending on their locations, ล is pronounced N when it is placed at the end of a syllable - so "apple" in Thai is generally pronounced "appen" instead.
  
  peteretep 5 years ago
  
  And the nickname "Ple" is pronounced "Pun"!
  
  wingerlang 5 years ago
  
  Can you write it in Thai?
  
  peteretep 5 years ago
  
  Sure, it's just the last syllable of what you posted, เปิ้ล. See also: http://www.thai-language.com/id/152113
  
  wingerlang 5 years ago
  
  Oh right, yeah that's a tricky one. I've always read it as 'pen' but looking closers there is technically the consonant cluster present.
jobigoud 5 years ago

> or perhaps even learning to write in some intermediate, non-native-human language whose reduced feature set guarantees a certain level of translatability
I cannot pass an opportunity to mention Europanto (not to be confused with Esperanto), a language where you arbitrarily mix and match words from various European languages.
If you know a Roman language and a Germanic language (in addition to English) you can craft sentences that will be understood by many, because of the shared roots and vocabulary.
When you write a sentence you select among synonyms and origin based on how close the word is to its variants in the other languages.
Unfortunately mon Deutsch est too bad pour escribir full paragraphes en cette technik.
https://en.wikipedia.org/wiki/Europanto
- IggleSniggle 5 years ago
  
  “Unfortunately mon Deutsch est too bad pour escribir full paragraphes en cette technik.”
  Wow, this is beautiful! I love it! Thank you for exposing me to this idea. It really optimizes for the kind of pattern matching most humans are adapted for.
- posterboy 5 years ago
  
  that works on word level, but unfortunately is the syntax too different to full paragraphs adapt (this is German word order). too different is an exaggeration but highlights the problem the GP mentions. por mir zu traduire weird idiomatics sono (sonet?) malade to het capo/tete/Kopf--For me to translate weird idiomatics sounds crazy in the head (that's along the lines of for them to ... and not ... sounds crazy for me, although the former might stem from a corruption of the latter if I'm any judge).
aidenn0 5 years ago

Tangent from your tangent: people are really bad at this for their native language. They usually tend towards using language that would be understandable by a young child.
This biases them towards using simple, concrete words that are used in everyday spoken language, despite the fact that many abstract terms and jargon can be very easy to translate, while many terms that are used in everyday language (most notably common idioms) can be very hard to translate, and which concrete terms are used as analogies for abstract concepts can be very culturally specific.
- yongjik 5 years ago
  
  This, a hundred times.
  To a native English speaker, "I turned the light off" is a perfectly straightforward sentence. To a foreign learner... can you imagine how many different meanings "turn off" can have? (Not to mention "turn" and "light" can both have a gazillion meanings.)
  
  gnulinux 5 years ago
  
  In my native language you "close" the lights to turn it off and "open" the lights to turn it on. It was really hard for me to explain this to my American roommates, one of them kept saying that this is "illogical" because when you turn lights on you're actually closing the circuit not opening it...
  
  chrismorgan 5 years ago
  
  My favourite example of such variation between languages and locales concerns examinations. The examples that follow are of different English locales; in the case of India, their form of English tends to match the words that would be used in their local languages.
  In different places as a student you might: give an exam; take an exam; write an exam; do an exam. Give and take are opposites, and some of these terms can be used for the examiner as well, again with variation between countries, languages and locales.
  If without context you said “give an exam” in India you’d mean you were the student performing an exam, but if you said “give an exam” in Australia it’d probably sound a little odd but indicate you were the lecturer presenting an exam to your students. Similarly with “write an exam”: in India that sounds perfectly normal and indicates you’re the student, performing the exam; but in Australia it sounds perfectly normal and means you’re the examiner, preparing the exam.
  
  yongjik 5 years ago
  
  In English you turn off the switch, set off the alarm, and switch off the alarm again. And they have the nerve to tell others what's logical? :P
  (Also, I hope you're not turned off by my remarks. Or turned on. That would be awkward.)
  
  scrollaway 5 years ago
  
  Your remarks may not be a turn-on, but as it turns out, I wouldn't turn them down. Or away, unless they turned up again. They set me off. It's clear that it was a set up; in fact it's starting to set in. They're spot-on; but if they weren't, they still wouldn't be spot-off.
  It's fun to think about how we sit down, we sit up, we stand down, we stand up, we lie down, we don't lie up. Except according to Merriam-Webster, who claims that we do when we stay in bed. (https://www.merriam-webster.com/dictionary/lie%20up)
  
  oasisbob 5 years ago
  
  That's a fun set of examples. I think lie up is used naturally sometimes, but I may have just been thinking about it hard enough to fool myself.
  "She had obviously not slept, the events of the previous day being so disturbing. 'You were lying up all night,' I said the next morning."
  
  scrollaway 5 years ago
  
  Ah, interesting. I may have heard it in that context before, hard to tell.
  
  maccam94 5 years ago
  
  You can be laid up sick in bed, but I've never heard "lie up."
  
  yorwba 5 years ago
  
  It kind of makes sense for lamps that are constantly burning and where you regulate the brightness by opening or closing a small door. I guess ancient China must have had such lamps for that usage to develop. On the other hand, I can't find a rationalization for the use of "open" to mean "drive a car".
  
  TheSpiceIsLife 5 years ago
  
  > constantly burning
  The sun has this property, so we open and close the blinds / curtains to regulate light flow.
  Small step from there to electric lights.
  
  SquishyPanda23 5 years ago
  
  When my daughter was first learning to talk, she kept asking us to close and open the lights. We've only spoken English to her, but somehow she decided that lights should be opened and closed.
  I wonder if this is a more natural metaphor.
  
  goodcanadian 5 years ago
  
  Français? I know that even English speakers in Quebec will often "open" and "close" the lights.
  
  netheril96 5 years ago
  
  I don't know what language your parent commenter is speaking, but in Chinese, we also "open" and "close" the light.
  
  _-___________-_ 5 years ago
  
  Italians often say this too. They also "open" and "close" air conditioners where I would turn/switch them on/off.
  
  pmontra 5 years ago
  
  This for current technology but maybe your "opening the light" has an older origin. Going back in time we had gas lamps (open gas?), candles, fireplaces. In my language (Italian) we use "accendere" and "spegnere" which we use also with fire, as in light a fire or estinguish a fire.
  
  QuercusMax 5 years ago
  
  I remember some old adventure games (SCUMM engine maybe?) where you picked a verb (from a menu) and then an object. Open and Close were apparently synonyms for Turn On and Turn Off, so you could Close the Light or Turn Off the Door.
  
  baddox 5 years ago
  
  That's very interesting, and especially confusing to anyone with a bit of electronics education. A light switch in a simple circuit needs to close the circuit to allow electricity to flow through the light source.
  
  IggleSniggle 5 years ago
  
  English is my first language, and yet my wife and I tell each other to “open the blinds” and “close the blinds” with far more frequency than “turn on/off the light” (which is on a timer).
  
  gnulinux 5 years ago
  
  In my language you "close the circuit" and when you do so you "open lights" and vice versa. It's not confusing (to me) since "the circuit" and "lights" (in plural) are different words.
  
  princeofwands 5 years ago
  
  An apocryphal story on commercial translation for aircraft maintenance manuals stated how aircraft engineers were instructed to "take out the broken object and place it back in". The original sentence was "to replace the broken object".
  
  dsr_ 5 years ago
  
  Particularly bad: some objects will be fixed by removing them from a socket and then being put back in. Loose connections, wrong orientation, dirty contacts...
owenversteeg 5 years ago

Like radarsat1 mentioned, after living as an expat you start to optimize speaking for the understanding of the locals. At first you're pretty bad at this, but after understanding the local language more you get much better. I'm both Dutch and American, but when in the Netherlands (speaking English) I optimize my English for easier understanding by Dutch people. (Which is not really necessary, but does help in many situations.) That takes the form of simpler words and speaking slower and clearly, but also more subtle things, like modifying word order, using as many cognates as possible, and avoiding false cognates.
- oasisbob 5 years ago
  
  Some of the things are really subtle.
  My wife grew up in Tamil Nadu, India - and English was her primary language growing up. When we travel, her English is infinitely easier for people to understand than mine, despite how hard I try. (In India, especially, but elsewhere too.) I think, "speak simply, deliberately, clearly, not-quickly" and get blank stares. She can mumble and speak quickly - comprehension is instantaneous.
  Some things are obvious, like your example of modifying word order. Front-loading objects in sentences seems critical for understanding. Hindi and Tamil are both subject-object-verb languages, though I'm not convinced this is the reason it works, and suspect it's more universal than that.
  Her pronunciation also changes dramatically. It doesn't feel so much as speaking broken English as being fluent in a local pidgin. After a while, sentences like "Ma, what you take?" start to sound so natural that I catch myself doing it unconsciously.
Cognitron 5 years ago

Software translation does often use an intermediate language, and one of them (mentioned in the link) is Esperanto, which is an artificial/constructed language.
https://en.m.wikipedia.org/wiki/Pivot_language
Sounds like your idea for a constructed language specifically designed for translation might be a good one. Maybe it could even help improve the accuracy of machine translation?
- ben_w 5 years ago
  
  I’ve been learning Esperanto for a while, and although it’s fun, I really don’t expect good things from having a machine translation system use it as an intermediary. Something else, sure, but Esperanto was designed with (1890s Euopean) human translators in mind, and A.I. works best when you don’t limit it to thinking like you.
bibinou 5 years ago

Look at Simple English: https://en.wikipedia.org/wiki/Simple_English_Wikipedia
sprt 5 years ago

https://en.wikipedia.org/wiki/Interlingual_machine_translati...
oevi 5 years ago

In the field of Technical Writing there are whole books written on how to write for human and machine translation. Main takeaway: consistency and unambiguity.
Controlled languages try to solve this with constraints. There is quite a bit of researcha round this topic: https://scholar.google.de/scholar?q=controlled+language+tran...
ergothus 5 years ago

I teach part time, and my students are largely ESL, while my non-English skills are decidedly limited. They speak English, but there's a big gap between "mostly-not-broken" and "can handle arbitrary nuance".
I regularly struggle to (1) find language to explain concepts that doesn't require strong fluency in English and (2) to recognize if I've achieved number 1.
thekyle 5 years ago

There are languages like Lojban that are designed to be an ideal intermediary language for machine translation.
https://en.m.wikipedia.org/wiki/Lojban
Ajedi32 5 years ago

You can get a pretty good idea today of how close a specific text is to that ideal by pasting it into Google translate, translating it to a few different languages, then translating it back and comparing the results to your intended meaning.
NegativeLatency 5 years ago

https://xkcd.com/1133/

sideral 5 years ago

The license is CC non-commercial. Does anyone here know if this means that it cannot be used to train models that will be used commercially?

jahewson 5 years ago

The data they provide is a test set (not a training set) and is from https://tatoeba.org which is in turn under the permissive CC-BY license. You can use that data for anything you like, but you can't use the code in the GitHub repository for commercial purposes - that includes running it to train models.
Academic code often used to be under licenses like this, though it's much less common now.

etiam 5 years ago

Wish they'd stayed away from LASER. That acronym's already taken...

fernly 5 years ago

They did redundant work to make it LASER (Language-Agnostic SEntence Representations). If they'd just gone with initials they'd have had LASR, which is a cooler-looking acronym anyway.
jholloway7 5 years ago

Are you referring to the Alan Parsons Project?
- owenversteeg 5 years ago
  
  They're referring to the term laser itself, which was originally "light amplification by stimulated emission of radiation" :)
  
  throwaway427 5 years ago
  
  And your parent is referring to this clip:
  https://www.youtube.com/watch?v=2Duj2oZIC8U
- jszymborski 5 years ago
  
  I think they're referring to the Towne's project ;)
  https://encyclopedia2.thefreedictionary.com/Light+Amplificat...
hirundo 5 years ago

I propose "debabelizer".

nahh 5 years ago

If we get babelfish from Facebook, was it worth it?

DoctorOetker 5 years ago

if we didn't have copyright, we'd eventually or already have it too... so IMHO, no

vladislav 5 years ago

"LASER achieves these results by embedding all languages jointly in a single shared space (rather than having a separate model for each)". There could be a good reason for why the mutual embedding of several languages works better than individual, beyond the extra data. If human languages share some minimal representation (universality so to say), training on multiple languages may be required to extract it with today's techniques, since training on just one language is bound to overfit to its particulars.

stephanimal 5 years ago

That graphic of the language families seems to misspell Estonian (as Estinain) and Finnish (as Finish) ? Seems like an odd oversight for such a project.

jechamt 5 years ago

I came here to post this (and add little to any substantial discussion). I'm glad to see someone else beat me to it! Also: Slovak (Solvak) and Slovenian? (Solvene). There were so many I thought maybe I misunderstood how they were being represented in that graphic.

MikusR 5 years ago

It even works on Latavian language (as shown on the top animation)

ahurmazda 5 years ago

I have been pleasantly surprised by FB's suggested translation even when the messages are written in seemingly (to me) complicated transliteration of Bengali.

vectorEQ 5 years ago

pretty nice release. just note that i find it a bit silly to refer to the berber language as if it's 1 language. it's a group of languages, and moreover they are phonetic, so the text you train on can vary greatly between the languages and even the writers on how you would write it.

bregma 5 years ago

My hovercraft is full of eels.

yesenadam 5 years ago

That was one of the first phrases I learnt in Spanish: Mi aerodeslizador esta lleno de anguilas
p.s. Is there any truth to the story that many decades ago, an early machine translator, going from English to Chinese and back again, rendered "Out of sight, out of mind" as "Invisible idiot"?

oska 5 years ago

I find it interesting that this page has already been 'snapshotted' on the Internet Archive 24 times [1], less than a day after it appeared. Is this because, like me, people are wary of visiting any facebook domain? Or is it because people consider it an important research result? (Obviously it can also be both).

[1] https://web.archive.org/web/*/https://code.fb.com/ai-researc...

munk-a 5 years ago

By default I'm quite wary of clicking on any links to facebook's domain, it is a bit silly since their tendrils cover pretty much every corner of the web, but I still hesitate.

mlamat 5 years ago

Solvak, Solvene???