by UkiahSmith on 6/2/19, 2:49 PM with 68 comments
by erichanson on 6/3/19, 4:49 AM
The root of the problem with files is that they lack an information model, beyond just a sequence of bytes. They are unopinionated to a fault. All files have structure. Even if that structure is a "non-structure" like "all these files are just a random sequence of meaningless bytes", then that is their structure. But this information isn't present in the system, nor can it be enforced or constrained when that is desirable.
To me, the obvious alternative is the database, aka "everything is a row". We have used the database (relational or otherwise, but mostly relation) to successfully model many many domains, and bring coherence and clarity to them. The cool thing about the relational database is that it's based on an underlying relational algebra. The syntax of data in an RDBMS is really just one manifestation of a deeper layer of structure that is syntax-free, and these abstract structures can be (and are) manifested in multiple coexisting syntaxes.
I'm exploring this pattern ("datafication", headshake) with Aquameta (http://aquameta.org/) and written a lot more about why file-centric is holding us back (http://blog.aquameta.com/intro-chpater2-filesystem/). Boot to PostgreSQL! :)
by zokier on 6/2/19, 8:14 PM
But at least in Linux there are ton of files that are not exposed to the "same nexus", i.e. filesystem. The most common example would be network sockets. They are files, but do not exist anywhere in filesystem. In Linux file is more of an object handle.
https://yarchive.net/comp/linux/everything_is_file.html
http://events17.linuxfoundation.org/sites/events/files/slide...
by Lowkeyloki on 6/2/19, 6:56 PM
by EmilStenstrom on 6/3/19, 4:51 AM
body {
width: 40em;
margin: 0 auto;
font-size: 1.4em;
line-height: 1.4em;
}
by OJFord on 6/2/19, 8:11 PM
Would be kind of interesting to call methods on objects rather than read/write files, but it's not immediately obvious to me that that really gains anything over the status quo.
And now that I've written that, I wonder is that what powershell's verb-object does anyway? I've never come close to proficient enough (nor wanted to!) to know.
by mpweiher on 6/2/19, 8:36 PM
https://github.com/mpw/MPWFoundation/blob/master/Documentati...
and Polymorphic Identifiers:
Hierarchical paths were a good idea, let's use them. Objects were also a good idea, let's use those. A small set of verbs (GET, PUT, POST, DELETE) was also a good idea. Let's combine these!
Abstract from:
Path + File + POSIX I/O
URI + Resource + REST Verbs
Get:1. Polymorphic Identifiers, which subsume paths, URIs, variables, dictionary keys etc.
2. Stores, wich resolve URIs, subsume filesystems, HTTP servers, dictionaries, etc.
3. A small protocol that essentially mirrors REST verbs in-process
See also: In-process REST, https://link.springer.com/chapter/10.1007/978-1-4614-9299-3_...
by syn0byte on 6/3/19, 4:36 PM
From a security/reliability standpoint it sounds like a nightmare combining the worst of things like NTFS alternate data streams and share library loading into one.
by leoc on 6/3/19, 12:04 AM
Lotus Agenda/Chandler https://en.wikipedia.org/wiki/Chandler_(software) is another part of this long Grail quest.
by bayareanative on 6/3/19, 12:22 AM
Also, programs should be able to dynamically-serve the contents of "files" as well with an "activation symlink", i.e.,
/etc/resolv ->* resolvconf
The "the everything must be plain text" refrain is obsolete and unnecessary because it's trivial to serialize anything to any format since it would already be an universally-supported data structure both in tools and code.It's not 1978 anymore.
by O_H_E on 6/3/19, 6:38 AM
TMSU - tags your files and then access them through a virtual filesystem from any other application
https://tmsu.org -- https://github.com/oniony/TMSU
Tagsistant - Semantic filesystem for Linux, with relation reasoner, autotagging plugins and deduplication
https://www.tagsistant.net -- https://github.com/StrumentiResistenti/Tagsistant
by solidsnack9000 on 6/3/19, 5:00 AM
by ubrpwnzr on 6/3/19, 12:27 AM
<style xmlns="http://www.w3.org/1999/xhtml"> body{ max-width: 600px; font-family: "Calibri"; margin-left: auto; margin-right: auto; }
</style>
by tgbugs on 6/3/19, 12:26 AM
In direct response to the suggestion about file paths for verbs. Allan Kay says in one (possibly many) of his talks something along the lines of 'every function should have a url.' The one of surely many challenges is how to ensure that the mechanism used to populate file system paths with nested functionality (e.g. /usr/bin/ls/all to `ls -a`) don't trigger malicious behavior during service/capability discovery. Being able to more deeply introspect file data and metadata as if the file were a folder could potentially be implemented as a plugin, and I worry about the complexity of requiring a file system to know about the contents of the files that it hosts, or that the files themselves be required to know about how to tell the file system about themselves. Existing file systems adhere to a fairly strict separation of concerns, since who knows what new file format or language will appear, and who knows what file system the file will need to exist on.
Said another way I think that the primary issue with the suggested approach is that it is hard to extend. The file system itself needs to know about the new type of object that it is going to represent, rather than simply acting as an index of paths to all objects. If there is a type of object that is opaque to the current version of the file system that object either has to implement a file-system-specific discovery protocol (which surely would have fun security considerations if it were anything other than a static manifest) or the user has to wait for a new version of the file system that knows what to do with that file type.
Some thoughts from my own work. (partially in the context of OJFord's comment below)
Treating files and urls as objects that have identifiers, metadata, and data portions and where the data portion is treated as a generator is very powerful, but the affordances around the expression local_file.data = remote_file.data make me hesitate. When assignment can trigger a network saturating read operation, or when setter doesn't know anything about how much space is on a disk, etc. then there are significant footguns waiting to be fired and I have already shot myself a couple of times.
The more homogeneous the object interface the better. However, this comes with a major risk. If the underlying systems you are wrapping have different operational semantics (think files system vs database transactions) and there is no way to distinguish between them based solely on the interface (because it is homogeneous) then disaster will strike at some point due to a mismatch. To avoid this everything built on top of the object representation has to be implemented under the assumption of the worst case possible behavior, making it difficult to leverage the features of more advanced systems. As with the affordances around local.data = remote.data, if I have a socket, a local file, a remote web page that I own, a handle to an led, a handle to a stop light, a database row in a table that has triggers set, the stdin to an open ssh session, and a network ring buffer all represented in the same object system, I have as many meanings for file_object.write('something') as I have types of objects, and the consequences and side effects of calling write are so diverse (from flipping bits on a harddrive to triggering arbitrary code execution) that it is all but guaranteed that something will go horribly wrong. At the very least there would need to be a distinction between operations where all side effects could be accounted for beforehand (e.g. writing a file of known length to disk has the side effect of reducing free disk space, but that is known before the operation starts), and operations where the consequences will depend on the contents of the message (e.g. DROP TABLES), with perhaps a middle ground for cases with static side effects (e.g. the database trigger) but that would not immediately visible to the caller and that might change from time to time.
The distinction between files and folders is quite annoying (non-homogeneous), especially if you want to require that certain pieces of metadata always 'follow' a file. This is from working with xattrs that are extremely easy to loose if you aren't careful. Xattrs are a great engineering optimization to make use of dead space in the file system data structure, but they aren't quite the full abstraction one would want. It is also not entirely clear what patterns to use when you have a file that is also a folder -- do you make the metadata the outer file and the data the inner file? Or the other way around? Having the metadata as the outer file means that you can change the metdata without changing the data, but that the metadata will always 'update' when its contents (the data) changes. However, when I first thought about using such a system, I had it the other way around, and a system with that much flexibility I suspect would have even more footguns than the current system.
Another issue is the long standing question around what constitutes an atomic operation. Everything is simple if only a single well behaved program is ever going to touch the files, but trying to build a full object-like system on top of existing systems is a recipe for leaky abstraction nightmares.
While I was working on this I came across debates from before I was born. For example hardlinks vs symlinks. There are real practical engineering tradeoffs that I can't even begin to comment on because I don't understand the use cases for hardlinks well enough to say why we didn't just get rid of them entirely.
0. https://github.com/SciCrunch/sparc-curation/blob/master/spar...