from Hacker News

Explicit Ontologies in a World Without

by dbieber on 12/19/22, 1:59 AM with 8 comments

by tgbugs on 12/19/22, 7:11 AM
The key problem with trying to use LLMs as a proxy for ontologies is that you have to figure out what ontology the LLM is actually using if that is even possible in the first place, because LLMs should in principle be encoding multiple perspectives.
If all you want is search and retrieval, that might be ok. Otherwise, you are right back where you started with multiple implicit ontologies that individuals and groups struggle to reconcile.
However, there is a much deeper problem that is faced by ontologies, whether they are implicit/explicit, trapped in an LLP, or written in OWL, and that is that ontologies are best thought of as collections of hypotheses which must be tested and testable in order to be useful and verifiable. We are only now starting to think about how to get formal ontologies into the loop for validation based on observable data in the life sciences (beyond say, pointing the the literature).
LLMs can generate so much garbage that validating any latent ontology they may contain is likely to be both absolutely critical for them to be remotely useful, and also extremely difficult/labor intensive, bringing them right back down to reality when it comes to the difficulty of validating and verifying them.
In the end, formal manual ontologies look hard because they tend to put the validation of the model first. LLM pseudo ontologies might look easy, but the cost of validating and verifying them will likely be almost exactly the same in the end (if not worse).
The reason for this is that the real cost is reconciling a model with reality and having strong control over what constitutes valid data about reality, or making the measurements on the real world to verify some statement.
LLMs might help when it comes to coverage of a domain, but if that coverage is achieved by also having 80% of all statements being demonstrably false and leaving it to the users to determine which 20% are true, then the coverage probably isn't worth it.
by KingOfCoders on 12/19/22, 6:06 AM
Around ~2000 we founded a startup on knowledge management (to tag people), and we had a nice fuzzy search on ontologies implemented.
We got into large companies to solve their problem: Different ontologies in every department (E.g. engineering vs. marketing vs. sales).
Mapping didn't help because their fundamental world views were different.
We failed.
(My Wiki engine made it into Atlassian Confluence though, and Confluence users had to bear with my horrible {...} wiki macro syntax for years)
by kovezd on 12/19/22, 5:29 AM
From my perspective, ontologies serve 2 main purposes: 1. Providing an overview of the structure of a large body of content (e.g. a wiki). 2. Measuring, and understanding the change in the same corpus.
There used to be fancier applications to ontologies, like question answering, but I agree with the author that LLM could replace most of them. The more interesting question is how to auto-generate ontologies?
by xtiansimon on 12/19/22, 12:43 PM
I like the author’s cautious and careful explanation of the context. And so greater my disappointment with a phrase like this:
“These models can understand and extract relevant concepts and relationships from unstructured text…”
Models “understand”? Is the author being cheeky? Why so careful, and then drop such a blatant anthropomorphism? Sheesh.
Writing is hard.
by coretx on 12/19/22, 11:59 AM
Serious: I might be crazy when i say that i find most value of ontology exactly where the definition as used in philosophy differs from the definition as used in information sciences. Will LLM's make this more apparent, less or simply cause new simulacra while intelligence is lost ?
by 082349872349872 on 12/19/22, 6:38 AM
Projection onto a subontology and providing a formal way (much like a bureaucratic form) to enter data both sound suspiciously like the role of an ontology is to provide a means of going back and forth between intensions and extensions.
Edit: the same idea applies to delta'ing two data bases