by shriphani on 2/4/20, 1:01 AM with 5 comments
by pattusk on 2/4/20, 3:46 AM
Instead it looks like this just performs language detection. Is there a significant advantage to that method as opposed to just reusing one of the many existing open sources solutions based on simpler models such as [1] and retraining them with a corpus that includes the language(s) that weren't supported? You offer a comparative table for FastText & GCP, how do you explain FastText's abysmal performance on English in terms of precision? The value just seems way too low to not be a bug of some sort?
by nl on 2/4/20, 3:17 AM
The authors knew this, because it compares it in the paper, but doesn't call it out in the post!
Edit: just realised the link on popular "open source" goes to the FastText post I linked below. Still - I think it would have been good to explicitly note this!