by julien_c on 1/13/20, 4:40 PM with 42 comments
by julien_c on 1/13/20, 5:09 PM
Main features: - Encode 1GB in 20sec - Provide BPE/Byte-Level-BPE/WordPiece/SentencePiece... - Compute exhaustive set of outputs (offset mappings, attention masks, special token masks...) - Written in Rust with bindings for Python and node.js
Github repository and doc: https://github.com/huggingface/tokenizers/tree/master/tokeni...
To install: - Rust: https://crates.io/crates/tokenizers - Python: pip install tokenizers - Node: npm install tokenizers
by mark_l_watson on 1/13/20, 6:33 PM
I had my own NLP libraries for about 20 years, simple ones were examples in my books, and more complex and not so understandable ones I sold as products and pulled in lots of consulting work with.
I have completely given up my own work developing NLP tools, and generally I use the Python bindings (via the Hy language (hylang) which is a Lisp that sits on top of Python) for spaCy, huggingface, TensorFlow, and Keras. I am retired now but my personal research is in hybrid symbolic and deep learning AI.
by screye on 1/13/20, 7:08 PM
They seemed to have found the ideal balance of software engineering capability and Neural network knowledge, in a team of highly effective and efficient employees.
Idk what their monetization plan is as a startup, but it is 100% undervalued at 20 million, and that is just the quality of that team. Now, if only I can figure out how to put a few thousand $ in a series-A startup as just some guy.
by ZeroCool2u on 1/13/20, 5:50 PM
by LunaSea on 1/13/20, 6:00 PM
Is this possible using HuggingFace (or another word embedding based library)?
I know that there are some simple heuristics like merging noun token sequences together to extract ngrams but they are too simplistic and very error prone.
by useful on 1/13/20, 6:42 PM
SentencePiece has to make it so you can shrink the memory requirements of your indexes for search and typeahead stuff.
by hnaccy on 1/13/20, 5:50 PM
by orestis on 1/13/20, 6:45 PM
by echelon on 1/13/20, 6:27 PM
What problems can you solve with NLP? Sentiment analysis? Semantic analysis? Translation?
What cool problems are there?
by m0zg on 1/14/20, 7:15 AM
by virtuous_signal on 1/13/20, 4:45 PM
by manojlds on 1/13/20, 9:00 PM
by rsp1984 on 1/13/20, 5:41 PM
by tarr11 on 1/13/20, 8:26 PM