from Hacker News

My favourite Git commit (2019)

by karagenit on 2/1/24, 3:46 PM with 398 comments

by schacon on 2/1/24, 5:26 PM
For better or worse, my experience as a GitHub cofounder and author of several Git books (Pro Git, etc) is that the Git commit message is a unique vector for code documentation that is highly sub-optimal.
The main issue is that most of the tooling (in Git or GitHub or whatever) generally only shows the first line. So in the case of this commit example would be the very simple message of a generic "US-ASCII error" problem. Everything they talk about in this article is what is great about the _rest_ of the commit message, which, given modern tools, is _almost never_ seen by anyone.
The main problem is that Git was built so that the commit message is the _email body_, meant to be read by everyone in the project. But for better or worse, that is not generally the role of this text today. Almost nobody ever sees it. Unless it's discussed in a bunch of patch series over a mailing list, nobody reads anything other than the first 50 chars of the headline. It's actively difficult to do, by nearly every tool built around the Git ecosystem.
Even if you're _very good_ at Git, finding the correct invocation of "git blame" (is it "-w -C -C -C"? Or just _two_ dash C's?) to even find the right messages that are relevant to the code blocks you care about is not widely known and even if you find them, still only show the first line. Then you need to "git show" the identified commit SHA to get this long form message. There is just no good way to find this information, even if it's well written.
This is one of my biggest complaints with Git (or, indeed, any VCS before it), and I think why people just don't care much about good commit messages. It's just not easy to get this data back once it's written.
If you want an example of this, search through the Git project's history. Run a blame on any file. It's _so hard_ to figure out a story of any function implementation in any file, but the commit messages are _pristine_. Paragraphs and paragraphs of high quality explanation for almost every single commit. Look at any single commit that Jeff King has done for the last decade. Hundreds of hours of amazing documentation from a true genius that almost nobody will ever appreciate. It's horrifying.
I don't know exactly what the answer is, but the sad truth of Git is that writing amazing documentation via commit message, for most communities, is almost entirely a waste of time. It's just too difficult to find them.
by rpsw on 2/1/24, 4:49 PM
Overall agree with the sentiment, but I would add a more specific Bottom Line Up Front (BLUF) such as: "Fix test issues caused by non-breaking space character \xa0".
Tells me exactly what the problem was straight away, but I'm still free to choose to read more if I want to know more.
by spenczar5 on 2/1/24, 4:22 PM
I have felt that pride in writing a great commit message, but I am less sure of the value to others. I don’t think most people search commit messages when they encounter an unusual error message, or when adding a new feature, or really almost ever.
It’s a bit sad, but I have a growing suspicion that beautiful commit messages are a bit of vanity by the programmer. The person primarily impressed is often the author; others will walk on by without noticing.
There is room sometimes for those aesthetic flourishes but I am not convinced they have much practical value, and I have stopped really being bothered by commit messages of “fix whitespace issue” from others. I think I am a better colleague for that.
Things might be different on a project like Git or Linux with huge distributed teams and tons of commits, versus the projects I am used to which have between 1 and 100 contributors, mostly from the same organization.
by gumby on 2/1/24, 5:30 PM
That first line of the commit message is most important so that `git log` can address chesterton's fence. And IMHO in this case the committer whiffed.
The key is not to put what you did in that first line, but why. Anyone interested in what can just look at the code, perhaps via a diff.
So something like "nginx .conf files must be in us-ascii"
Then "changed blahblah.erb to remove nonbreaking space character"
Then the rest of the commit message which is quite good.
Think of it as a news article: write in decreasing levels of importance and increasing levels of detail, assuming the reader could stop reading at any point.
by adrianmsmith on 2/1/24, 4:46 PM
I think the disadvantage with this style of documentation is you can't really alter the commit message after it's written.
(I mean you could obviously with "rebase" but are you really going to alter something written one year ago, already merged to "main", and cause a bunch of pain with everyone's feature branch etc.?)
Compare that with documentation stored in a .md file, or even a Wiki or even Confluence. My colleague can write something and if I see a way to improve it I can go ahead and do that, and other colleagues can improve on what I've written.
In this particular case I suppose the bug is fixed and won't come up again. But I also myself find it tempting to describing the design of a particular component when I commit that component, and that's something I now avoid. What about when that component needs to be changed by a future commit e.g. due to the business requirements changing? Will the commit documentation just describe the differences? Then in order for a new team member to find out how the system works by reading the documentation they've got to read multiple commit messages and "merge" them in their head.
by OJFord on 2/1/24, 4:19 PM
One thing I disagree with is:
> I wouldn’t expect all commits (especially ones of this size) to have this level of detail.
(emphasis added) - actually in my experience it's often the little ones, innocuous looking things that might really need a relatively longer explanation.
Yesterday I wrote three paragraphs on why I added `--limit=999` to a `gh pr list` because it's confusing: there's already a `limit(` in the `--jq` argument, and the higher it is (given say infinite PRs in total) the lower the end result will actually be. (Yes I wrote a comment too. And probably spent even longer thinking about and working it up than writing about it; hopefully I'll recall it as an example the next time someone implies the job is about churning out code!)
by macspoofing on 2/1/24, 7:14 PM
It's not a great git commit.
1) For all that text, the first line "Convert template to US-ASCII to fix error" - could be better. Maybe a couple of extra words to state what whitespace character caused the error, and what the error was. That comment plus the diff is all the context you need.
2) Honestly, everything else is kind of pointless. It doesn't hurt, but there's not a lot of value here. The author documented their journey in tracking this bug .. who cares?
by bhasi on 2/1/24, 5:06 PM
For great commit messages, just browse the git history of the Linux kernel where this is the standard.
The first line always mentions the subsystem affected by the change, followed by a one-line imperative-mood summary of the change. Subsequently, three questions are answered in as much detail as possible:
1. What is the current behaviour? 2. What led to this change? 3. What is the new behaviour after applying this change?
Example:
"Currently, code does X. When running test case T, unexpected behaviour U was observed. This is because of reason R. Fix this by doing F."
by RustyRussell on 2/1/24, 10:43 PM
I was told by a recent contributor that my approach (i.e. requirement) to git messages is "unique". Apparently my Linux kernel background is showing, but all my commit messages look like the one shown here!
If about existing code, the comment belongs with the code. If it's a process thing (e.g. code that is removed or didn't work), it belongs in the commit.
Most importantly, while commit messages can reference issues for convenience, they MUST reproduce the critical details: GitHub is transient, git messages are not!
by keybored on 2/1/24, 4:38 PM
Here is a context-full commit message.[1]
This is so common that the maintainer wrote this[2]
[1] https://github.com/git/git/commit/d70f554cdf38b0b05cfaa8e8eb...
[2] https://lore.kernel.org/git/xmqqedevo8ps.fsf@gitster.g/
by simonw on 2/2/24, 1:20 AM
I used to write really long, essay style commit messages like this one.
Then a friend pointed out that I was effectively writing documentation and hiding it in commit messages.
Instead, I switched a lot of that effort to updating actual documentation (in a docs/ folder) that was relevant to the commit - so the commit would still have the information in it, it's just it was in an actual file and not just the commit message.
I also make sure my commits almost always link to an issue thread, as that's a great place to put all kinds of extra context around the commit that can be updated independently of the commit itself.
by ryandrake on 2/1/24, 6:09 PM
Git commit message aside, the described debug session raises a lot of questions about the crappy tooling developers rely on.
"ArgumentError: Invalid byte sequence in US-ASCII" is a terrible, hard-to-action error message. What file? What line? What byte sequence? This "let's give the user another problem to solve" style of error messages is pervasive in our tools.
Also, why does the tool even require US-ASCII as input in the first place? Are we still living in 1995?
Also, if only ASCII characters are allowed, why does the code editing tooling allow non-breaking spaces in source code? Is there a good reason for having such a character in this file? This problem could have been avoided if the editor could have been smarter or highlighted the "bad" character better.
This developer lost an hour of his life because of a cascading chain of defective tools.
by mrinterweb on 2/1/24, 6:45 PM
This is the reason I dislike automatic squashing branches with rebase. Squashing discourages thoughtful and meaningful commit messages. What is the point of making a meaningful commit message for some specific change when it is just going to all be smashed together as a single commit on merge. I feel like rebasing is something that should be intentional to clean things up by the dev, but not as a default pattern on merge.
by krmbzds on 2/1/24, 5:49 PM
I would just go with "Remove non-breaking space characters" instead of writing a Russian novel.
Also, if you're on macOS just use a Karabiner rule [0] that converts all non-breaking space characters to regular space characters to prevent yourself from accidentally typing it out.
[0] https://ke-complex-modifications.pqrs.org/#nonbreaking_space
by gtirloni on 2/1/24, 5:22 PM
Great commit indeed. lots of context information. that's gold.
The worst I've seen are dozens of tiny commits pushed to the master branch directly. If you want to find out what took to implement a feature, good luck.
I'm a fan of tiny commits during code review but afterwards I prefer to squash everything in a functionally relevant commit. It makes git archeology much easier.
by dgunay on 2/1/24, 11:00 PM
I put less importance on commit messages being thorough, though I do admire when people write detailed information in the body. What's more important to me is to have good commit hygiene. It's something the industry is also generally terrible at, but has slightly more immediate value. For example, if your PRs have clean, atomic commits that can stand on their own, I can "rescue" chunks of useful functionality from review hell by cherry picking them out. I do this several times a month to help my teammates burn down huge PRs or take good ideas out of doomed branches.
by aeurielesn on 2/1/24, 4:18 PM
I had a terrible time when someone used "smart quotes" (beautified Office quotation marks) in a configuration file. I believe this was only possible because they copied it from Outlook.
by nickm12 on 2/2/24, 3:51 AM
I'm not at all a fan of this commit message. The summary line is vague (what template? what error?) and then the body spends 250 words explaining all the steps it took to get to this fix. What is this, a recipe on the web?
The commit message should explain the change being made, what impact it will have, and why it is being made. The audience is the developers reviewing the change or someone looking through the logs to determine why this line changed.
by Forge36 on 2/1/24, 7:00 PM
Previous discussion https://news.ycombinator.com/item?id=22519632
by AeroNotix on 2/1/24, 5:34 PM
Essentially zero people read complex commit messages.
Do with this information what you will.
9/10 the code already is documentation enough for what the code currently does, if you need to go back through history then look at the commits. The messages are generally noise.
I've literally never cared _why_ someone made a change, I can see the change, I can see the effect of the old and new code. Rarely, if ever, has the thought process ever changed how I will interact with the code in question.
If I am at the level of debugging or history spelunking that the _commit message_ is the thing that saves me - I've already lost and there are other glaring organizational or design issues that are the actual problem.
by tehnub on 2/1/24, 7:29 PM
I try to write useful commit messages. Sometimes they're as expansive as this example, but not always. On GitHub, at least the way my team uses it, the PR is the more visible unit of code change. If you have a single-commit PR, GitHub will automatically make your commit message the PR description, which is nice. It does not do that if you have more than one commit, in which case I write a general overview of the changes and write "See individual commit messages for more detail" in bold.
by thrdbndndn on 2/1/24, 5:55 PM
I'm surprised the author (of the git commit) put that much effort into the message, but did not mention what exactly that character is (its Unicode code point).
by WalterBright on 2/1/24, 4:58 PM
At the DLF, our pull requests are usually accompanied by a link to the bugzilla entry, which usually have a detailed explanation.
P.S. Having multiple Unicode values that exhibit identically when displayed are a huge veer-into-the-ditch mistake. I.e. the notion that code points should have semantic value is simply wrong.
by eduction on 2/1/24, 5:47 PM
It's true that giving a little potted history like this is "good" (other than he should have made a nice informative first line for summaries)
BUT it's not super useful to say this is good, the hard part is knowing WHEN to put this much effort in and when you can skip it.
I have many instances where I could do a longer story like this but it would be exhausting to do it every time. I try to do it when the commit might look unclear in intent or effect to an outsider, when the change is being made for an important reason, /and/ where this a potentially negative consequence (like naively reverting or writing bad code) if the change is not explained.
I think this is a decent example of that but not great, because no one is intentionally going to go in and start introducing nonbreaking spaces.
by fl0ki on 2/1/24, 6:59 PM
A complementary virtue is that the commit is tightly scoped to exactly one change. I still see most engineers commit whatever they had in their working directory as a sort of blanket Save Point, without any thought to how those changes can be captured as individual commits that can be commented and reviewed on their own merits.
This will typically also involve completely unnecessary changes, because when you're merging unrelated changes anyway, the unnecessary ones are swept up in the noise. At best they complicate rebases for other contributors, but too often they also cause outright regressions.
It goes without saying that the commit messages are a write-off at this point, because even if they felt motivated to take the time to comment it clearly, the change is so messy and nebulous that it becomes hard to comment on. If their code gets reviewed it's more likely the reviewer gives up and stamps it so they no longer have to look at it.
Most people still don't seem to know that `git add -p`, `git reset HEAD`, `git stash`, `git rebase --interactive`, etc. are even available. They never learn what git is capable of, so they act like version control is a bureaucratic obligation rather than the peerless superpower that it can be. The problems they cause don't end at their terminal though, because now they've made a mess of the repository for every other contributor as well.
by 2devnull on 2/1/24, 8:19 PM
Big fan of the straussian commit message/commentary style. You read it once and think you understand, but you come back to it much later and understand it in a second, deeper way. There’s an art to this that some people seem to have. Maybe it correlates to taste.
by mo_42 on 2/1/24, 5:28 PM
I have a different opinion about favourite Git commit messages.
I think commits should be small steps that display the thought process of the author. Every individual commit should be self-explanatory. So the commit message should not describe (again) what the changes are but why it’s necessary. Sometimes the change is not self-explanatory and then I'd put a longer description below.
Somehow I came up with this on my own, so I'd be interested if it really makes sense or if others have a similar style.
by tomcam on 2/1/24, 11:34 PM
Not completely on topic (if you read TFA) but my favorite Git commit is by compiler badass and HN frequenter, where he checks in an entire C compiler to the D language repo:
https://github.com/dlang/dmd/pull/12507
https://news.ycombinator.com/item?id=27102584
by schnatterer on 2/2/24, 2:06 PM
This post has a similar tune:
How to Write a Git Commit Message / The seven rules of a great Git commit message https://cbea.ms/git-commit/#seven-rules
by MeteorMarc on 2/1/24, 6:17 PM
When a developer admits that a single wrong character cost them one hour, it probably took 3 hours.
by daitangio on 2/1/24, 8:21 PM
A good commit message must explain the reason of the commit (i.e. fix nasty char dncoding issue). This commit is nice but far too long in my humble opinion (and I like to write!) The what is already in the commit diff. Explain the why, trust me.
by eternityforest on 2/3/24, 8:06 AM
It would be cool to have a tool to add this stuff to a logfile, that gets auto dumped to the next commit message.
Something like a vs code command that would add to the message for the commit I'm working on.
by chjj on 2/1/24, 5:12 PM
Just an aside: is there a vim syntax command to highlight weird unicode whitespace as an error?
Something like:
```
        syn match unicodeWhitespace /[list of unicode whitespace]/
        hi def link unicodeWhitespace Error
```
by Zambyte on 2/1/24, 4:39 PM
I guess it makes sense that it's a common name, but I was expecting this David Thomson[0] instead :-)
[0] https://dthompson.us/
by pawelmi on 2/1/24, 10:00 PM
I would prefer to channel that energy that went into writing this lengthy description into actually fixing the toolchain to at least fail with more actionable error message.
by dlvhdr on 2/1/24, 4:21 PM
I do appreciate them but they’re just a pain to write sometimes
by darioush on 2/1/24, 7:50 PM
In my opinion one of the most important features of a commit message is that it fits in a single line so it can be read in git log.
by andrewfromx on 2/1/24, 4:57 PM
I stated using gofakeit's "hackerphrase" for all commit messages.
https://github.com/andrewarrow/feedback/commits/main/
hp | git commit -a -F -
hp is a golang binary that just spits out a hacker phrase. I have this aliased with the letter q for "quick" so I'm always checking in stuff with q return push done.
by jeffrallen on 2/2/24, 6:51 AM
This is a nice example of Neves' Law: the harder the bug is to find, the smaller the diff will be.
by merdaverse on 2/2/24, 10:37 AM
I really hate that commit message. It is extremely verbose and doesn't allow you to easily understand what was done in the commit in a single sentence or paragraph. It mixes a very narrative explanation in there that is hard to skim. It is in desperate need of a clear TLDR version.
If you really like the descriptive, verbose message, then the most important description should be at the top and it should gradually go into less interesting details as you read down, like in a news article.
by eviks on 2/1/24, 6:07 PM
The uberdetailed exploration is worthy of a blog post, those are for stories! but a commit is way too obscure a spot to put it in, so it's just a waste of effort on the writer side, but also a waste of attention of the readers, a short message describing why a change in space resolved which bug where would be more efficient
by excitom on 2/1/24, 4:39 PM
It's a great commit message, but could be even better with "TL;DR non-breaking space in a comment can cause parser problems".
by hartator on 2/1/24, 7:51 PM
“Fix miss-encoded white space” is clearer to me.
No need for a very lengthy explanation.
by gfiorav on 2/1/24, 6:48 PM
My rule for any developer doc is: don't tell me what, tell me WHY
by ajuc on 2/1/24, 5:01 PM
I agree commit messages are the most important form of documentation.
But I disagree about the format. I prefer commit messages like:
```
   JIRA-123 one-line 80-char-at-most description
   
   Long description if needed (but preferably keep it in JIRA).
```
by smashah on 2/1/24, 7:10 PM
Every country should have a GDS. They do great work.
by vonwoodson on 2/1/24, 7:37 PM
My favorite git commit is "Bug fixes"
by gavinhoward on 2/1/24, 6:39 PM
I'm about to release 1500+ commits. I hope people will be able to praise some of my commit messages like this one.
by fusslo on 2/1/24, 4:47 PM
great commits are great. This is fantastic
As an aside, I'm tired of documenting:
- in code
- in commits
- in jira
- in confluence
- in daily standups
- in release notes
by jiveturkey on 2/1/24, 10:41 PM
love those guys at the UK digital service!
by declan_roberts on 2/1/24, 6:59 PM
Is it a trend now to write the blog post as the commit message?
I must be getting old because I hate this.