by pesenti on 7/6/22, 7:52 PM with 159 comments
by Etheryte on 7/6/22, 9:17 PM
by pesenti on 7/6/22, 8:35 PM
by jkw on 7/6/22, 10:21 PM
As well as in the research paper: https://research.facebook.com/publications/no-language-left-...
by mikewarot on 7/6/22, 9:00 PM
We're at a point where it's now possible to determine the shape of every language, provided there are enough speakers of the language left who are both able and willing to help.
<Snark> Once done, Facebook can then commodify their dissent, and sell it back to them in their native language. </Snark>
by Groxx on 7/7/22, 5:38 AM
>Translating Wikipedia for everyone
Hmmm.
While there is very definitely utility in doing things like this, I do kinda fear "poisoning the well"-like effects of feeding (even partially-) AI-generated-data into extremely common AI-data-sources.
There's some info on it in a blog post[1] and the MediaWiki "Content translation" page[2], but does anyone know of any studies on the quality of the translations produced? I can absolutely see it being a huge time-saver for people who are essentially fluent in both (there's a lot of semi-mechanical drudgery in translating stuff like this that could be mostly eliminated)... but people are pretty darn good at choosing the easy option of trusting whatever they're given rather than being as careful as they should be. It kinda feels like it runs the risk of passively encouraging people to trust the machine's choice over their own, as long as it isn't obviously nonsense, and the cumulative effect could be rather large after a while.
[1]: https://diff.wikimedia.org/2021/11/16/content-translation-to...
by jw4ng on 7/6/22, 9:06 PM
by kgeist on 7/7/22, 5:23 AM
>The affinity of languages allows one common model to be trained for their translation. That is, “under the hood” of the translator, the same neural network translates into Russian from Yakut, Tatar, Chuvash and other Turkic languages. This approach is called many-to-one, that is, "from many languages \u200b\u200binto one." This is a more versatile tool than the classic bilingual neural network. And most importantly, it is the many-to-one approach that makes it possible to use knowledge about the structure and vocabulary of the Turkic languages, learned on the rich material of Turkish or Tatar, to translate languages like Chuvash or Yakut, which are less “resource-rich”, but no less important for the cultural diversity of the planet.
>In order to create a unified model for translating Turkic languages, Yandex developed a synthetic common script. Any Turkic language is translated into it, so that, for example, the Tatar “dүrt” (“four”) written in Cyrillic becomes similar to the Turkish dört (“four”), not only from the point of view of a person, but also at the level of similarity of lines for a computer.
This way they added support for Turkic and Uralic languages which are very underrepresented on the Internet. But I don't know what the quality of their translation is: even though I live in a region where Mari is spoken (indigenous Uralic language) and my wife is Mari, none of us, sadly, speak the language.
[0] https://techno-yandex-ru.translate.goog/machine-translation/...
by microtherion on 7/6/22, 10:08 PM
So you have a language with some economic opportunity (a few million speakers in a fairly wealthy country) but no clearly defined written interface, and an ambivalent attitude of many speakers towards the very idea of writing the language.
by otreblatercero on 7/7/22, 12:09 AM
by albertzeyer on 7/6/22, 10:13 PM
The Facebook paper has some direct comparison to that work.
by yellowapple on 7/7/22, 12:13 AM
by btheshoe on 7/6/22, 8:07 PM
by labrador on 7/6/22, 10:40 PM
"Skills required: United Nations translators are required to have a perfect command of their main language and an excellent knowledge of, in most cases, two other official languages"
by thamer on 7/7/22, 8:42 AM
Full story: https://abcnews.go.com/Business/wireStory/kill-facebook-fail...
These were submitted to test Facebook's systems, because there's a good reason not to trust their promises on this front. Facebook was used extensively to propagate hate speech in Myanmar during the crisis of 2017, with their moderation tools and hate speech detection system letting through a ton of hateful content with real-world consequences, in the course of an actual ethnic cleansing campaign.
Other references: "Facebook Admits It Was Used to Incite Violence in Myanmar" https://www.nytimes.com/2018/11/06/technology/myanmar-facebo... (2018)
"Violent hate speech continues to thrive on Facebook in Myanmar, AP report finds" https://www.cbsnews.com/news/myanmar-facebook-violent-hate-s... (9 months ago)
by vjerancrnjak on 7/6/22, 9:27 PM
I see the mixture model is ~ 300 GB and was trained on 256 GPUs.
I assume distilled versions can easily be run on one GPU.
by kwhitefoot on 7/6/22, 8:39 PM
by Tabular-Iceberg on 7/6/22, 9:19 PM
We shrug off all the little quirks of machine translated text because it usually gets the point across, and we recognize them as quirks because most of what we read was written by real people with no such quirks. But when most of what you read contain those quirks, I fear those will quickly become the standard way of writing and even speaking in those languages.
by TaupeRanger on 7/6/22, 8:53 PM
by account42 on 7/7/22, 8:44 AM
> These cookies are required to use Meta Products. They’re necessary for these sites to work as intended.
What cookies does Facebook "need" to serve a simple article?
by LtWorf on 7/6/22, 10:05 PM
by NoInkling on 7/7/22, 2:15 AM
by enos_feedler on 7/6/22, 10:13 PM
by schoen on 7/7/22, 12:43 AM
(Edit: and speech-to-text models.)
by _nalply on 7/7/22, 11:43 AM
Did the people at Meta think about the Signed Languages of the Deaf?
I didn't find a mention. Even Ctrl-F deaf didn't yield anything.
by langsoul-com on 7/7/22, 12:37 PM
by pdonis on 7/6/22, 11:26 PM
by zzzeek on 7/6/22, 11:12 PM
by bvanderveen on 7/6/22, 9:40 PM
Understanding foreign culture is about reading automated translations of online comments into your native language. It has nothing to do with putting the effort into learning a language and understanding the nuances and current events and issues of the culture it embeds.
The ESL (English as a single language) speakers over at Facebook don't even need to understand foreign cultures, because they already know everyone in the world needs to spend their lives staring into the Metaverse. So grateful that they are working on the world's fattest pipeline for exporting Anglophone culture to every corner of the planet!