from Hacker News

Ask HN: how to visualize a mailing list as a threaded conversatoin?

by shurane on 11/14/13, 3:08 PM with 1 comments

I liked the threaded conversation style of Hacker News and Reddit, especially the online interfaces for them. Google Groups, however, isn't as nice. I would like to not migrate off of Google Groups if possible. I'd rather consume a website or service that visualizes Google Groups. The emails are there, anyway. Any suggestions?

Reference mailing list: https://groups.google.com/forum/#!topic/coderdojonyc

I have threaded conversation disabled for the time being -- figured it would be a better way to track conversations, but it hasn't been really.

  • by malandrew on 11/18/13, 7:57 AM

    The algorithm you end up with for a particular message format is going to be very dependent on the meta data you have attached to each message. Given that "It depends..." disclaimer, I would start here:

    http://www.jwz.org/doc/threading.html

    With respect to GG in particular, you can't really "rethread" an unthreaded medium via message metadata, but you may be able to do so via "replied to message data" in the form of quoting.

    I would go about parsing messages to extract quoted text, then I would use the quoted text to determine the "parent" message to which is being replied to. 98% of the time, there will only be one parent, but you need to make sure that your data format allows more than one parent in the case that someone has replied to two messages at once via cut and paste quoting.

    In such a case, determining quoting is going to be far more challenging and computationally intensive. You'll have to figure out a few heuristics, such as maybe limiting the search space to "only search messages posted between the post time of the current reply and the last message a user posted to a particular original ancestor message."

    You may also try to figure out the replying style of each user to figure out what text in their message is likely to be quoted. i.e. Are they a top replier, a bottom replier or an inline replier. Labeling each user according to their reply style could speed up the identification of the parent message(s) much more quickly.