from Hacker News

Message threading (1997-2002)

by tdonia on 9/8/13, 8:35 PM with 39 comments

  • by songgao on 9/8/13, 9:45 PM

    I was interning in Rackspace this summer. For a while we were working on an email notification sender and we wanted email clients to group messages related the same event into a single conversation. However, different "event" might have same message subject, and we wanted to suggest how messages should be grouped. So we looked into "In-Reply-To" (RFC-822) and "References"(RFC-2822) fields. We ended up implementing RFC-2822 since it obsoletes RFC-822 and we figured if we want our message grouping work on most email clients, the safest way was to use the up-to-date standard (2822).

    Interesting fact is that, among three clients we tested, only mutt faithfully implemented the standard. It honestly grouped all messages referenced to the same ID into the same parent, despite subject or sending time. However, neither Gmail or Outlook respects the "References" field.

    In Gmail, it seems subject of the message plus [one of <time of message sent> and <References>] are used for grouping. But it certainly doesn't exclusively rely on "References" since we got messages referenced to same parent message grouped into different conversations.

    In Outlook, "References" field is ignored completely. It only relies on subject of messages. We got messages for different "event" from more than 10 days from each other, grouped into same conversation.

  • by greenyoda on 9/8/13, 9:22 PM

    1. I'm impressed by the amount of analysis and the clarity of thought that went into designing this algorithm. It's not just something you can sit down at the keyboard and pound out.

    2. This is a great example of the perils of re-writing code that you don't completely understand:

    4.0 eliminated the "dummy thread parent'' step, which is an absolute necessity to get threading right in the case where you don't have every message (e.g., because one has expired, or was never sent to you at all.) The best explanation I was able to get from them for why they did this was, "it looked ugly and I didn't understand why it was there.''

  • by gregschlom on 9/9/13, 3:32 AM

    I implemented jwz's algorithm for my now defunct email client (http://betterinbox.com)

    It was fun and worked extremely well, though it did give different results than gmail on some instances.

  • by mfincham on 9/9/13, 3:42 AM

    For what it's worth, Balsa (http://pawsa.fedorapeople.org/balsa/) implements this as a threading option.

    Edit: pointed to correct URL

  • by hendry on 9/9/13, 1:08 AM

    You could use Dovecot's "thread references" to produce an appropriate data structure from a variety of mail stores.

    See "Write a decent mailing list Web archive system" on http://suckless.org/project_ideas for an example.

  • by pestaa on 9/8/13, 9:58 PM

    Very insightful article. I do wonder though if "say no to databases" still stands as of now. I agree that performance-wise files are hard to beat for most problems, but we're storing data in databases because they provide guarantees a filesystem doesn't, eases deployment and configuration, etc.
  • by jbverschoor on 9/9/13, 12:31 PM

    I'm always frikking annoyed by gmail and mail.app and airmail app with the fact that they try to guess a thread..

    Messages with the same subjects are not threads!

  • by longlivedeath on 9/9/13, 12:58 AM

    I read the title as an epitaph.
  • by taeric on 9/8/13, 10:03 PM

    Can we look forward to this coming to twitter soon? :)
  • by frozenport on 9/9/13, 9:33 AM

    It tickles my fancy thinking about an era when C++ was compared to C.