by goy on 8/9/22, 1:46 PM with 3 comments
by PaulHoule on 8/9/22, 2:02 PM
https://metatext.io/datasets-list/translation-task
they might not have what you're looking for but you'll probably need something similar to one of the data sets they have.
Most systems now are trained on parallel corpuses, for instance there is a collection of 30,000 sentences in English and Japanese listed on that site. If you've got enough training examples you don't need a dictionary, a specification of the grammar or anything else. You need a lot of text though.
by mikewarot on 8/9/22, 2:35 PM
The amount of text required for a machine to grind through it millions of times to tease out the shape of a language doesn't sound like something you have. If you have the time of native speakers, it might be possible to build tools for them to correct the most "off" parts of the model interactively.
by he11ow on 8/9/22, 8:29 PM
[1] https://www.youtube.com/playlist?list=PLtmWHNX-gukKocXQOkQju...
[2] https://github.com/huggingface/notebooks/blob/main/examples/...