from Hacker News

I've been writing ring buffers wrong all these years

by b3h3moth on 12/14/16, 1:40 PM with 167 comments

by jgrahamc on 12/14/16, 2:50 PM
This is of course not a new invention. The earliest instance I could find with a bit of searching was from 2004, with Andrew Morton mentioning in it a code review so casually that it seems to have been a well established trick. But the vast majority of implementations I looked at do not do this.
I was doing this in 1992 so it's at least 12 years older than the 2004 implementation. I suspect it was being done long before that. Back then the read and write indexes were being updated by separate processors (even more fun, processors with different endianness) with no locking. The only assumption being made was that updates to the read/write pointers were atomic (in this case 'atomic' meant that the two bytes that made up a word, counters were 16 bits, were written in atomically). Comically, on one piece of hardware this was not the case and I spent many hours inside the old Apollo works outside Boston with an ICE and a bunch of logic analyzers figuring out what the hell was happening on some weird EISA bus add on to some HP workstation.
It's unclear to me why the focus on a 2^n sized buffer just so you can use & for the mask.
Edit: having had this discussion I've realized that Juho's implementation is different from the 1992 implementation I was using because he doesn't ever reset the read/write indexes. Oops.
by phaemon on 12/14/16, 2:31 PM
> Join me next week for the exciting sequel to this post, "I've been tying my shoelaces wrong all these years".
Probably. Use the Ian Knot: http://www.fieggen.com/shoelace/ianknot.htm
Seriously, spend 20 mins practising this, and you'll never go back to the clumsy old way again.
by cannam on 12/14/16, 8:30 PM
I love the way this discussion has divided neatly into thirds: history of ringbuffers; digression on shoelaces; fragmentary, widely ignored, replies about everything else (this one included, I'm sure).
I like this kind of article and enjoyed this particular one, but the long discussion above about the "right" way to do it goes some way to justifying why so many people are happy to do it the "wrong" way.
I've implemented and used ring buffers the "wrong" way many times (with the modulus operator as well!) and the limitations of this method have never been a problem or bottleneck for me, while its simplicity means that it's easier to write and understand than almost any other data structure.
In most practical applications, it's memory barriers that you really have to worry about.
by planckscnst on 12/14/16, 3:58 PM
This is another interesting ring buffer implementation that uses mmap. https://github.com/willemt/cbuffer

by tveita on 12/14/16, 6:25 PM

The Linux kernel seems to leave one element free, which surprised me, but it does have this interesting note about it:

https://www.kernel.org/doc/Documentation/circular-buffers.tx...

  Note that wake_up() does not guarantee any sort of barrier unless something
  is actually awakened.  We therefore cannot rely on it for ordering.  However,
  there is always one element of the array left empty.  Therefore, the
  producer must produce two elements before it could possibly corrupt the
  element currently being read by the consumer.  Therefore, the unlock-lock
  pair between consecutive invocations of the consumer provides the necessary
  ordering between the read of the index indicating that the consumer has
  vacated a given element and the write by the producer to that same element.

by ChuckMcM on 12/14/16, 3:39 PM
I have always considered these "double ring" buffers. Along the same lines as how you figure out which race car is in the race is in lead by their position and lap count. You run your indexes in the range 0 .. (2 * SIZE) and then empty is
```
    EMPTY -> (read == write)
    FULL -> (read == (write + SIZE) % (2 * SIZE))
```
Basically you're full if you're at the same relative index and your on different laps, you are empty if you at the same relative index on the same lap. If you do this with power of 2 size then the 'lap' is just the bit 2 << SIZE.
by ams6110 on 12/14/16, 2:22 PM
Why do people use the version that's inferior and more complicated?
Because it's easier to understand at first glance, has no performance penalty, and for most busy programmers that often wins.
by dom0 on 12/14/16, 2:59 PM
A very related post by ryg: https://fgiesen.wordpress.com/2010/12/14/ring-buffers-and-qu...
by falcolas on 12/14/16, 2:49 PM
Usually when I'm writing a ring buffer, it's for tasks where the loss of an item is acceptable (even desirable - a destructive ring buffer for debugging messages is a fantastic tool). As such, I simply push the read indicator when I get to the r=1, w=1 case.
Using the mask method is slick (I'd cache that mask with the array to reduce runtime calculations), but it's definitely going to add cognitive overhead and get messy if you want to make it lockless with CAS semantics.
by RossBencina on 12/14/16, 3:31 PM
From what I understand, this is the way you'd do it with hardware registers (maintain the read and write indices each with one extra MSB to detect the difference between full/empty).
We've been using similar code in PortAudio since the late 90s[0]. I'm pretty sure Phil Burk got the idea from his hardware work.
[0] https://app.assembla.com/spaces/portaudio/git/source/master/...
by pawadu on 12/14/16, 3:44 PM
> This is of course not a new invention
No, this is a well known construct in digital design. Basically, for a 2^N deep queue you only need two N+1 bit variables:
http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIF...
by tankfeeder on 12/14/16, 4:32 PM
PicoLisp: last function here as circular buffer task https://bitbucket.org/mihailp/tankfeeder/src/3258edaded514ef...
build in dynamic fifo function http://software-lab.de/doc/refF.html#fifo
by kazinator on 12/14/16, 4:28 PM
> don't squash the indices into the correct range when they are incremented, but when they are used to index into the array.
Great! Just don't use it if the indices are N bits wide and the array has 2N elements. :)
Not unheard of. E.g. tiny embedded system. 8 bit variables, 256 element buffer.
by jstanley on 12/14/16, 2:27 PM
I had to pause for a second to convince myself that the version relying on integer wrap-around is actually correct.
I guess that's the reason most people don't do it: they'd rather waste O(1) space than waste mental effort on trying to save it.
by hzhou321 on 12/14/16, 3:39 PM
He keeps stating the case of one-element ring buffer. Is that a real concern ever?
by phkahler on 12/14/16, 10:49 PM
I find the headline very interesting. It's very inviting because of the way it expresses a sort of epiphany about doing it wrong on a mundane programming task. One is tempted to read it in order to see if there is some great insight to this problem. just maybe it's applicable outside this one problem. It begs the question: if he's been doing it wrong on a fairly mundane thing, maybe I am too. I need to see what this is about.
by ansible on 12/14/16, 3:16 PM
Hmm..., interesting.
I've always been doing it the "wrong" way, mostly on embedded systems. My classic application is a ring buffer for the received characters over a serial port. What's nice is that this sort of data structure doesn't need a mutex or such to protect access. Only the ISR changes the head, and only the main routine changes the tail.
by noiv on 12/14/16, 10:50 PM
Just in case, StackOverflow has some variations for JavaScript, although not that much optimized ;)
http://stackoverflow.com/questions/1583123/circular-buffer-i...
by falcolas on 12/14/16, 4:02 PM
My C is rusty, but won't this act... oddly... on integer overflow?
```
    size()     { return write - read; }
```
0 - UINT_MAX -1 = ?
[EDIT] Changed constant to reflect use of unsigned integers, which I forgot to specify initially.
by ared38 on 12/14/16, 10:23 PM
Dumb question: why use power of two sized rings? If I know the reader won't be more than 100 behind the writer, isn't it better to waste one element of a 101 sized rings instead of 28 of a 128 sized ring?
by ts330 on 12/14/16, 2:37 PM
i love that he has 20 different shoelace knots! life was too simple before now.
by geophile on 12/14/16, 6:27 PM
His favored solution introduces subtlety and complexity. Remember that 20-year old binary search bug in the JDK a few years ago? That is the sort of bug that could be lurking in this solution.
I understand not wanting to waste one slot. A third variable (first, last, count) isn't too bad. But if you really hate that third variable, why not just use first and count variables? You can then compute last from first and count, and the two boundary cases show up as count = 0 and count = capacity.
by zimpenfish on 12/14/16, 2:34 PM
If you use modulus instead of bitmasking, it doesn't have to be power-of-2 size, does it?
by blauditore on 12/14/16, 6:31 PM
> I've must have written a dozen ring buffers over the years
Why would someone do this instead of re-using previous (or third-party) implementations? Of course unless it's all in different languages, but I don't think that's the case here.
by doktrin on 12/14/16, 5:34 PM
> So there I was, implementing a one element ring buffer. Which, I'm sure you'll agree, is a perfectly reasonable data structure.
I didn't even know what a ring buffer was
where do I dispose of my programmer membership card?
edit : lol, what a hostile reaction...