by miket on 1/17/20, 12:27 AM with 7 comments
by miket on 1/17/20, 1:18 AM
NLP is good enough that we can now explicitly measure how well a system reads text in terms of what knowledge is extracted from it. This task is called Knowledge Base Population, and we've released the first reproducible dataset called KnowledgeNet that measures this task, along with an open source state-of-the-art baseline.
Direct link to the Github repo: https://github.com/diffbot/knowledge-net EMNLP paper: https://www.aclweb.org/anthology/D19-1069.pdf
by g82918 on 1/17/20, 2:51 AM
by bhl on 1/17/20, 9:35 AM
by nl on 1/17/20, 3:01 AM
This is moderately surprising.
In question answering (QA) style tasks (SQUAD, SQUAD 2) we see state of the art models approach human performance. QA is similar to KBC in the sense that the answers are usually extracted from text in a similar way.
I'd imaging there is potential for fairly rapid improvement in this (Knowledge Base Population) task.
by sdan on 1/17/20, 1:28 AM