from Hacker News

Don't write just in plain text (longevity vs. authenticity)

by yumiris on 3/2/22, 10:59 AM with 128 comments

  • by dang on 3/2/22, 6:20 PM

    Related large thread from yesterday:

    Write plain text files - https://news.ycombinator.com/item?id=30521545 - March 2022 (345 comments)

  • by nonrandomstring on 3/2/22, 1:08 PM

    This essay is actually deeper than its surface appearance, about text versus other formats. It's about semantics and richness of content, although I am not sure Miris fully grasps what s/he is wrestling with.

    The author invokes the concept of "authenticity", and that's where it gets interesting.

    I used to set my students a question about information content in a class on the philosophy of procedural representation.

    We had a very high resolution photo of the aviation pioneer Amelia Earhart, and a short grainy video clip of her getting into a plane and smiling and waving.

    My question was: Which one of these two media conveys more information about Amelia?

    One gave extraordinary detail of her face, eyes, and seemed to many was a much better "fidelity" document. Others noticed that although you couldn't see her face in the video, you could feel from her gait, waving, body language and the way she shook hands _much more_ about her than from the static photo.

    Both files are the same size in bytes.

    So which one has more "information"? Which one is more "authentic"?

    Not to attempt to answer here with a deep dive into phenomenology, but each carries a different kind of information, which can be static, dynamic, or meta-dynamic in higher orders relative to a matrix of assumptions that must be carried forward in parallel by the culture that wants to decode the message later.

    I like that Miris tries to explore this by questioning the richness of text. But maybe the question doesn't hold up well under those conditions of investigation - because one might say that a great poet using only a few words might capture a landscape better than a painting, but if our culture drifts toward a visual one where poetry is no longer understood we cannot say that the medium itself degraded.

  • by thomascgalvin on 3/2/22, 2:43 PM

    This argument feels ... not quite like a strawman, but more pedantic than I think it needs to be.

    I don't think anyone really argues that everything should be plain text, even if that's an easy shorthand. The real argument is "use the simplest, most open format possible."

    Nobody is suggesting you go through all of your photos, transcribe your emotional reaction to each picture, and then delete the image. But, if you want to view those same photos when you're fifty years old, or seventy-five, you're better off storing them as a JPEG than a PSD, and you're better off storing them on a hard drive you have access to in addition to whatever cloud they're currently occupying.

    "Write plain text" is a shorthand for "use open formats." Because so much of what this audience does is test-based, plain text is the most common format we use, from source code to journaling, but that message applies to pretty much anything: if you lock yourself into a proprietary format, or a proprietary editor, you will almost certainly lose data over the long term.

  • by llarsson on 3/2/22, 12:23 PM

    That "some" proprietary formats from the 80's and 90's are still readable is already causing real problems: because not *all* are. So text, possibly with Markdown or similar hints regarding emphasis and structure, is still vastly better than any alternative I can think of.
  • by eatmygodetia on 3/2/22, 2:27 PM

    I feel like a lot of use plain text proponents forget that outside of ASCII and now UTF-8, lots of alleged plain text documents with diacritics or non-latin characters are at least slightly difficult to open because of their somewhat esoteric encodings. Plain text isn't as universal as it is often claimed, although it is immensely simpler than some other formats.

    But maybe we should all use monochrome bitmap files for everything? That would be very simple.

  • by yumiris on 3/2/22, 11:06 AM

    This was concocted at 5AM -- my apologies for any peculiar sentence structures or odd phrasing.

    Will re-re-re-revise it again with fresh eyes after resting 'em!

  • by aasasd on 3/2/22, 12:39 PM

    I got quite a lot of use out of metadata over the years, such that now I'll probably get a nervous itch and tremors all over my body if I attempt to use just plain text. Specifically, the creation and modification times for each addition to my notes are rather valuable, especially with the work-from-home lifestyle aka ‘day fades into night into day’—with which more people are gonna be familiarized in these years.

    Thankfully I'm using Org-mode these days, which is reasonably ‘plain text’ under practical definitions—but I make dozens new headings every week, and each of them is stamped with the creation time. But boy do I miss having modification times too—should probably finally set up automatic commits to Git. Also need to mess with Orgzly so that it marks notes that are created on the phone.

  • by brians on 3/2/22, 2:46 PM

    “all the binary formats of the 1990s can be opened today”

    Oh, sweet summer child. Scribe/mss. Koalapad. A bunch of Apple 2GS, Apple 3, and Lisa formats. Lotus Improv.

    The points about semantics and authenticity are wonderful, but I think the presumption that all formats can be opened is mistaken exactly because those that can’t be opened become effectively invisible and lost.

  • by ggm on 3/2/22, 11:51 AM

    he said.. in courier, monospaced paragraphs format, morally as close to "plaintext" as you can be with a couple of diagrams which could have been ASCII art...
  • by briandoll on 3/2/22, 5:02 PM

    I assume this is a response to Derek Sivers post: Write Plain Text Files https://sive.rs/plaintext

    I've been using computers daily for about 35 years now and I have a _lot_ of plain text files that I regularly use -- notes, lists, outlines, quotes, links, etc. Does anyone who has been around a while, have a large multi-decade collection of texts that are _not_ plain text? What formats do you use? How do you maintain access to those files over time?

  • by titzer on 3/2/22, 4:09 PM

    > What ultimately matters is that information is captured and preserved as thoroughly as possible. Between a picture that expresses a thousand words, and plain text file that sacrifices its detail and authenticity, why wouldn't we choose the former? Indeed, this question applies even the choice may sacrifice the longevity. What's the point of longevity, when the pursuit of it can compromise our ability to capture the information we may be afraid of possibly losing?

    I would contend that capturing a picture is absolutely a massive distortion of reality because reality is three dimensional, exists in many spectra beyond visible light, has sounds, smells, taste, and feeling, and exists in a historical context. The selection of framing, distance, focus, all of these are biases of the photographer. A photo is a lie, too. Just because it's higher resolution doesn't mean it has indeed captured the right information.

    Text is a lie too, granted. But in our current digitization zeitgeist, we have forgotten that our media (pictures, video, recordings, not just the TV, cable, and internet) lie to us. Our own bias towards slicing apart the world into computer-digestible bits is just us lying more convincingly to ourselves.

  • by orzig on 3/2/22, 1:10 PM

    Render to ASCII, everyone wins! (e.g. https://ascii-generator.site/)
  • by copperx on 3/2/22, 11:49 AM

    > but dismissing or abandoning media files is a much more guaranteed potential loss of information – information which plain text cannot capture due to its limitations.

    Some examples are sorely needed. How is a Word/InDesign file more authentic than a plain text file? Or is the author talking about media? Is a ProTools session more authentic than Wav files?

  • by jauco on 3/2/22, 2:57 PM

    Real archivists (as in people that have archivist as a job description and work at places that have “storing data forever” as a mission statement) tend to store the data in multiple formats. The source + a few derivations. They also store a bunch of copies to ward against bitrot. And they periodically compare the copies.

    Real archivists use a lot of data :)

  • by davbryn1 on 3/2/22, 5:02 PM

    "Prioritising the longevity of data can sacrifice the authenticity of what it tries to capture and preserve. When I say authenticity, I refer to how accurate and detailed the data in question preserves a particular state. An original raw image, for example, will capture a landscape much more authentically than written text would. Written text will inevitably comprise of ambiguity and even bias, if not distortion."

    Or, you need to become a better writer.

  • by Annatar on 3/2/22, 2:03 PM

    "This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

    http://catb.org/~esr/writings/taoup/html/ch01s06.html

  • by nicbou on 3/2/22, 8:00 PM

    There doesn't need to be a compromise. You can have both if you keep your data in multiple formats. Storage is cheap and text files are small.

    My timeline thing [0] keeps the original archives, stores the timeline entries in a database, and exports them hourly as JSON + files. If the code stops working or the database crashes, the files are still there. The automated backups are there too. No information is lost.

    However, the richness is not lost in the process. This timeline has geolocation history, notebook scans and a bunch of other things that don't really translate to plain text.

    The most important difference is that I can write to my timeline from my phone. Managing text files across devices is quite troublesome by comparison. If I want plain text out of it, I can write a new Destination that pipes entries to plain text files or to a fax machine.

    [0] https://nicolasbouliane.com/projects/timeline

  • by dorfsmay on 3/2/22, 2:43 PM

    Whenever choosing a markup, image format, or other technologies, keep the Lindy effect in mind. A boring technology that has been around for a long time will survive a lot longer and a brand new shiny one.

    https://en.wikipedia.org/wiki/Lindy_effect

  • by writegit on 3/2/22, 5:13 PM

    Or both?

    I have a daemon that watches for binary changes in writing documents.

    If changes are identified then it runs:

        $ libreoffice --headless --convert-to txt <CHANGED_FILES>
    
    Then commits the plaintext to a git repo.

    Allows for diffs, text search, and "longevity" across "authentic" docs.

  • by VariableStar on 3/2/22, 12:50 PM

    IMO the question is more about which standards are used, rather than specifying an specific format. In particular, using open and free standards and formats increases the chance to retrieve and use data after long time storage. Different formats suit different data types.
  • by highspeedbus on 3/2/22, 2:53 PM

    Obsidian/Markdown file structure is great for this. It can become a standard to "Offline Hypertext" format.

    Despite text being fully portable, it is limited when it's needed to link a image or other files. People often forget how useful this concept is.

    Html is not a viable option as it is awfully verbose for taking simple a note.

    Markdown adds just enough semantics that is perfectly readable. From a hex editor to Microsoft Word.

    We're in a somewhat critical moment, where markdown can either stay as it is, then dominate and become a godsend format of solid usability for decades, or a harmful feature is added that would slowly drag the whole thing down until the next Just Write Plain Text blog post.

  • by ad404b8a372f2b9 on 3/2/22, 12:34 PM

    I think longevity is not just an issue of the data format but more so of its organization. It so happens that text files organized using the file system is the most easily producible, maintainable and queryable data organization tool. But other media can have the same properties if they're organized using the file system rather than any complex tools. I have graphs and datasheets that have endured decades that I refer to often and are easily findable because they are well-named files in well-named folders, even though the formats are comparatively much more complex.
  • by Beldin on 3/2/22, 11:51 PM

    It seems the author overlooked the possibility of writing out the full binary string of whatever format he'd like (i.e., "zero one one ..."), prefaced by instructions on how to parse that.

    That would give you great "authenticity" (in his definition) and great longevity.

    Not practical for reading back, but that was not the point. With the help of a few simple scripts, writing is easy. So, in the end, not really an argument against storing information exclusively in plaintext.

  • by jjice on 3/2/22, 3:58 PM

    We use Google Docs for pretty much all of our docs since they're easy to create, share, and modify, and it works pretty well. I just (selfishly) want a good integrated plain text editor as part of GSuite. Sharing code via Google Docs isn't great, and sometimes I don't want to think about headers and formatting, I just want to use tabs to separate my pieces. That said, I'm definitely in the minority of users and I'll deal with it, not that big of a deal.
  • by thematrixadmin on 3/2/22, 1:09 PM

    What about writing data in markdown format, physically on the HDD. You can use bunch of different both online and local tools which will probably stay supported in the future. There is also no problem with implementing your own markdown editor (nice side, pet project as well). I store and run small server on my RPi, accessible through my phone and desktop. If I'd like to show the text to somebody I can easily copy it as a plain text, Word format or export it to HTML or PDF.
  • by happyglands on 3/2/22, 4:38 PM

    I've struggled with this for quite some time now, and tried almost every tool out there. At the moment, I'm settling with Bear, writing my notes in Markdown. I prefer the ease of using nvAlt but I need the ability to store images and PDFs and I like the fact that it has some very nice export options should I eventually move to another tool, so I don't feel like I'm "locked in".
  • by m348e912 on 3/2/22, 10:21 PM

    This might be off topic but in terms of communication such as email, plain text seems the most authentic format to me. For example, if you are one of those sales guys that bolds and highlights the important parts of an email that you send, it's off-putting. The only exception I would give is if you wanted to add an inline image or an emoji -- everything else, plain text.
  • by amiga1200 on 3/2/22, 1:02 PM

    The Epic of Gilgamesh was written in plain text.
  • by jdvh on 3/2/22, 1:07 PM

    Plain text is so compelling because it's as simple as it gets, you can bring your own editor, you own your own data, and you can use version control.

    Text+ is compelling because you can have images and some kind of formatting. You want to store metadata and have backlinks and tags. Ideally with the possibility of collaborative editing.

    There should be a way to fuse these two.

  • by quasarj on 3/2/22, 5:04 PM

    Wrote a whole article about not using plain text. Used plain text for everything except a useless image. A+++
  • by chaxor on 3/2/22, 6:42 PM

    I like the idea of making a binary file into a plaintext file - but you could store it as the ASCII characters "0000110100111011110001111100101..."

    This would be great for many reasons. At the top of that list for example, is getting a lot more use out of those hard drives you paid for.

  • by dade_ on 3/2/22, 1:02 PM

    MD for all things text and SVG journals for handwritten notes, diagrams, sketches, screenshots. Works great, but haven’t found a way to integrate them beyond using a common set of folders.
  • by anotherevan on 3/3/22, 9:45 PM

    Reminds me of the Einstein quote: Make something as simple as possible, but no more so.

    Paraphrased: Make your information capture format as simple as possible, but no more so.

  • by gandalfff on 3/2/22, 3:07 PM

    Plain text is fine for some things but lacking for others. I like GUIs for formatting. I wouldn't be surprised if my ODTs could be opened a thousand years from now.
  • by a1445c8b on 3/2/22, 2:39 PM

    s/comprise of/comprise/g