by bussetta on 10/17/17, 5:25 PM with 153 comments
by sgentle on 10/17/17, 11:50 PM
Humans work well with ambiguity and context. You know that when your coworker says "Bob's birthday is this weekend" you know she means her husband Bob, not Bob from accounting who nobody likes. And you even prefer that system to having an unambiguous human identifier, even a friendly one like "Bob-4592-daring-weasel-horseradish".
Machines, on the other hand, hate ambiguity and context. Every bit of context is an extra bit of state that has to be stored somewhere, and now all your results are actually statistical guesses - how inelegant!
In the early days of computing, there was no separation between the internals of the machine and its interface. If you worked on a computer, you were as much the mechanic as the driver. We got used to usernames, filenames, and hostnames because they were a decent compromise; they were meaningful enough to humans, and unambiguous enough for machines, so we could use them as a kind of human-computer pidgin.
But we don't need them anymore, and they were never really very good at either job anyway. Google's (probably accidental) discovery was that we were using the web wrong. Everyone was building web directories and portals because they thought that URLs weren't discoverable, but the real problem was that they weren't usable. Search was the first human interface to the web.
So Google's going to kill the URL, Facebook's going to kill the username, and someone (apparently not Microsoft) is going to kill the filename. There'll be much wailing and gnashing of teeth from the old guard while it happens, but someday our grandchildren will grow up never having to memorise an arbitrary sequence of characters for a computer, and I think that's a future to look forward to.
by yathern on 10/17/17, 6:30 PM
This allows for easy URL readability, while also having a unique ID.
In the context of this post (the library example) that would look like
library.com/books/1as03jf08e/Moby-Dick/
by nayuki on 10/17/17, 7:57 PM
This problem about naming URLs is also present in file system design. File names can be short, meaningful, context-sensitive, and human-friendly; or they can be long, unique, and permanent. For example, a photo might be named IMG_1234.jpg or Mountain.jpg, or it can be named 63f8d706e07a308964e3399d9fbf8774d37493e787218ac055a572dfeed49bbe.jpg. The problem with the short names is that they can easily collide, and often change at the whim of the user. The article highlights the difference between the identity of an object (the permanent long name) versus searching for an object (the human-friendly path, which could return different results each time).
For decades, the core assumption in file system design is to provide hierarchical paths that refer to mutable files. A number of alternative systems have sprouted which upend this assumption - by having all files be immutable, addressed by hash, and searchable through other mechanisms. Examples include Git version control, BitTorrent, IPFS, Camlistore, and my own unnamed proposal: https://www.nayuki.io/page/designing-a-better-nonhierarchica... . (Previous discussion: https://news.ycombinator.com/item?id=14537650 )
Personally, I think immutable files present a fascinating opportunity for exploration, because they make it possible to create stable metadata. In a mutable hierarchical file system, metadata (such as photo tags or song titles) can be stored either within the file itself, or in a separate file that points to the main file. But "pointers" in the form of hard links or symlinks are brittle, hence storing metadata as a separate file is perilous. Moreover, the main file can be overwritten with completely different data, and the metadata can become out of date. By contrast, if the metadata points to the main data by hash, then the reference is unambiguous, and the metadata can never accidentally point to the "wrong" file in the future.
by wyndham on 10/17/17, 6:37 PM
by andrewstuart2 on 10/17/17, 7:35 PM
Natural keys, meaning entity identification by some unique combination of properties, are hard to get right (oops, your email address isn't unique, or it's a mailing list) and a pain to translate into a name (`where x = x' and y = y' and z = z'`, or `/x/x'/y/y'/z/z'`, etc.).
Surrogate keys, on the other hand, make it easy to identify one and only one object forever, but only so long as everybody uses the same key for the same thing.
And as mentioned in the article, the most appropriate is usually both. Often you don't have the surrogate key, so you need to look up by the natural key, but when you do have the surrogate key, it's fastest and most likely to be correct if you use that in your naming scheme.
by jey on 10/17/17, 8:17 PM
There are only two hard things in Computer Science: cache invalidation and
naming things.
-- Phil Karlton
https://martinfowler.com/bliki/TwoHardThings.htmlby bo1024 on 10/17/17, 11:55 PM
The article is largely based on a misguided premise: the idea that URLs should be conceptualized as either names or identifiers. URLs are neither: they are addresses of web pages. The things located at the URL may have names or identifiers, but by design of the web the stuff located at an address is mutable while the address is immutable.
This is an important point because it breaks the analogies to books or bank accounts. A physical copy of Moby Dick is a thing that may be located at a given address, or not. The work of fiction "Moby Dick" has an ISBN number, but the ISBN number is metadata, not an address. A bank account number is also metadata, not an address.
So I get the feeling that URLs should be conceptualized as addresses first and foremost. This isn't a magic bullet for the problem the blog post addresses (how to design URLs) but I think it gives some perspective:
* If the "thing" at the URL will always be conceptually the same "thing", but its name or other metadata may change, it makes sense to assign that thing a unique identifier and use this as part of the URL. (Because the thing with this ID will always be found at this address.)
* If the name of the stuff located at the URL is never going to change, it makes sense to use the name as part of the URL. (Because the stuff with this name will always be found there.)
* "Search results" as discussed in the blog post are a special case of the previous point: if a URL will always contain search results for a certain query, it makes sense to use the name of the query as part of the URL.
* There are also URLs that fall outside the name or identifier paradigms. http://www.ycombinator.com/about/ is the address of a bunch of stuff, which is not necessarily a single coherent thing with either an ID number or a name, but is a very reasonable address at which some content may be located.
Maybe this is all obvious, but to me it really helps think about the issue whereas the blog post confused some things for me, so I thought I'd share.
by spiralpolitik on 10/17/17, 7:39 PM
The author appears to have forgotten about 3xx redirection codes which were intended to solve that very problem.
by tejtm on 10/17/17, 8:13 PM
Abstract
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
claimer: I am one of the many authors.
by bvrmn on 10/17/17, 8:22 PM
by jgrodziski on 10/17/17, 9:13 PM
I started changing my way of looking at identity by reading the rationale of clojure (https://clojure.org/about/state#_working_models_and_identity) -> "Identities are mental tools we use to superimpose continuity on a world which is constantly, functionally, creating new values of itself."
The timeless book "Data and reality" is also priceless: https://www.amazon.com/Data-Reality-Perspective-Perceiving-I....
More specifically concerning the article, I do agree with the point of view of the author distinguishing access by identifier and hierarchical compound name better represented as a search. On the id stuff, I find the amazon approach of using URN (in summary: a namespaced identifier) very appealing: http://philcalcado.com/2017/03/22/pattern_using_seudo-uris_w.... And of course, performance matters concerning IDs and UUID: https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-ca....
Happy data modeling :)
EDIT: - add an excerpt from the clojure rationale
by lwansbrough on 10/17/17, 6:44 PM
For example, we ingest gamertags and IDs from players of Xbox Live, PSN, Steam, Origin, Battle.net, etc. - each have their own requirements in terms of what is allowed in a username, and even whether or not they're unique. Often you can't ensure a user is unique by their gamertag alone. You can't even ensure uniqueness based on gamertag and platform name. Reality is that search is almost always required in these cases, and that's why we've implemented search in the way described in this article, with each result pointing to a GUID representing a gamer persona.
by jlg23 on 10/17/17, 7:36 PM
by HumanDrivenDev on 10/18/17, 2:51 AM
by buro9 on 10/18/17, 8:42 AM
Books in a library are seldom renamed, if ever. The named URL would be almost as permanent as the canonical URL.
However in their earlier example of a bank account, a personal account name is typically the account holder name and the type of account, and both of these could be subject to change as a result of marriage, death, or the change in products offered by a bank. Even then, the rate of change is low.
A better example that the author could have (should have?) used is that of a news website where the article title may change frequently and yet there is a desire to make the link indicate the type of content at the destination... this is the real crux of the issue.
On a news site a canonical identifier driven URL may be correct... but does not sell or communicate the story behind the link and the link is likely to be shared without context. Sure you may see `example.com/news/a49a9762-3790-4b4f-adbf-4577a35b1df7` but this could be any news... it is far less obvious what is behind the link than the banking example as diversity in news stories is huge.
Yet the named URL would likely fail too, as once created and shared it should not mutate or at least should remain working... and yet the story title is likely to be sub-edited multiple times as news evolves.
The best scheme was not even mentioned in the article... combining both an identifier with a vanity named part: `example.org/news/a49a9762-3790-4b4f-adbf-4577a35b1df7_choosing_between_names_identifiers_URLs` . The named part can vary as it is not actually used for lookup, only the prefix identifier is used for lookup.
Though that has it's own downside... one can conjure up misleading named sections for valid identifiers to misdirect and mislead.
by dreamfactored on 10/17/17, 11:33 PM
by baradas on 10/18/17, 4:10 PM
by DelightOne on 10/18/17, 4:37 AM
What does this mean? Is it just to say don‘t use the name hierarchy but rather the permalink-key as identity in the database?
by mcdan on 10/17/17, 6:49 PM
by nazri1 on 10/18/17, 2:26 AM
Those who do not understand UNIX are condemned to reinvent it, poorly. -- Henry Spencer
Hard links, symlinks and inodes.by monkeycantype on 10/18/17, 4:50 PM
/shelf/{something}
{something} could be a name - 'american literature' {something} could be an identifier - '20211fcf-0116-4217-9816-be11a4954344'
if someone calls:
https://library.com/locations: { "kind": "Shelf", "name": "20211fcf-0116-4217-9816-be11a4954344", }
now we have a shelf named with the id of a different shelf
and the meaning of
/shelf/20211fcf-0116-4217-9816-be11a4954344/book
is now ambiguous
i don't know a great way to avoid this
this is unambiguous, but i don't think my co-workers would like it: /shelf/name/{id}/books /shelf/id/{id}/books
I think this would only be slightly more popular
/shelf/name/{id}/books /shelf/{id}/books
because the thing after shelf/ would not consistently be an id
by amelius on 10/17/17, 6:36 PM
That way, you have the best of both worlds in all cases.
If another object tries to use the same URL as another object (which was used first), then a new URL must be generated (just add something at the end of the name).
by a13n on 10/17/17, 6:35 PM
https://react-native.canny.io/feature-requests/p/headless-js...
For example, a post with title "post title" will get url "post-title".
Then a second post with title "post title" will get url "post-title-1".
Since there's only one URL part associated with each post, it's a unique identifier.
This gets rid of the ugly id in the URL, for epic URL awesomeness.
Furthermore, if you edit the first post to have "new post title" then its URL will update to "new-post-title", but "post-title" will still redirect to "new-post-title".
Someday I'm gonna open source a lib that lets you easily add awesome URLs to your app. :)
by joshzilla2017 on 10/18/17, 1:26 AM
by mirko22 on 10/18/17, 2:57 PM
by afandian on 10/17/17, 6:32 PM
But the sheer arrogance of serving a webpage that doesn't render any text unless you execute their JavaScript really annoys me. It's not a fancy interactive web-app, it's a webpage with some text on it.