from Hacker News

The surprisingly difficult problem of user-defined order in SQL

by ccmcarey on 1/15/21, 10:57 PM with 93 comments

  • by dmurray on 1/16/21, 10:00 AM

    There's a no-free-lunch theorem here that's being ignored.

    If you have a space of 64 bits to represent elements of some ordered field, and you insert items one at a time, there is some ordering of the items that allows you to make only 64 inserts.

    Claiming that numbering the items with integers only allows you 16 insertions, but "It would take virtually forever to run out of space through repeated list reordering" using rational numbers just can't be correct. The author cherry-picked a bad implementation for the integers and a favourable insertion order for his pet approach.

    Where exactly did he go wrong? By picking the first number in the sequence of integers to be 65536, one of the smallest of the unsigned 64-bit integers, and then deciding all the inserts would happen in descending order. If he had picked a number like 2^63 instead, he would have got at least 63 inserts no matter what the insertion order. Using this integer strategy also lends itself to easy reindexing if you do need to: the row of rank N gets reindexed to 2^64 / N. Or if you want to increase the size of the index field to 128 bits, bitshift everything left.

    Note, just because we have this NFL theorem doesn't mean one approach can't outperform another in typical use patterns, but I don't see evidence for that here. The most logical integer-index strategy isn't considered and the insertion orders that exhaust the rational strategy are not particularly pathological.

  • by tzs on 1/16/21, 1:38 AM

    There was a discussion of this before on HN almost three years ago [1], with 134 comments.

    This seems to have exposed a bug in HN. Clicking on the "past" link for the present submission does not turn up that past submission, although the links are identical.

    However, clicking on the "past" link for that three year old submission does turn up the present submission.

    [1] https://news.ycombinator.com/item?id=16635440

  • by Animats on 1/16/21, 2:30 AM

    This is the same problem seen in algorithms for concurrent remote document editing, and the author seems to have re-invented some of the same solutions. See [1]

    This belongs to the class of problems involving trying to represent a tree in a relational database. There are many solutions, all with some problem.

    [1] https://news.ycombinator.com/item?id=24617542

  • by spiffytech on 1/16/21, 2:41 AM

    I've been reviewing this problem lately for a project, and I've settled on a solution that feels elegant and versatile which this article doesn't cover: lexicographical sort.

    I'm using Mudder.js to do it, but the algorithm is straightforward. Define an alphabet (e.g., A-Z). Every time an item is inserted/repositioned, its sort order is given the midpoint between the sort values of its two neighbors. So the first item is given the sort value M (midpoint between A and Z). Inserting before that item gets G (between A and M). Inserting between these two items gets J (between G and M).

    Once you run out of letters between two sort values, the algorithm tacks a digit onto the sort value of the item ahead of it. So inserting between M and N yields MM. If you do this enough times in a pathological pattern and wind up with some long string of characters you can reflow your list and rebalance everything out evenly (though that's strictly an optimization for storage space/bandwidth, and not a requirement for the algorithm to function).

    This all sorts perfectly with ORDER BY etc., supports an number of repositions bounded only by your storage space, and doesn't require arbitrary-precision decimal datatypes or fraction handling.

  • by putzdown on 1/16/21, 1:47 PM

    A linked list approach would also be worth considering. Give each item a permanent, unique id. Make each item remember the id of the next and/or previous item in the list. Reordering touches just two or three items, and there is no problem with running out of space or needing to re-index anything, ever. Of course, basic ordered query is murder from an efficiency standpoint, and the approach might be firmly rejected on that basis. But if we’re looking at approaches with a variety of tradeoffs, the linked list is worth mentioning.
  • by ramraj07 on 1/16/21, 8:26 AM

    I actually solved this exact problem recently (who is NOT trying to recreate Roam Research amirite?) And surprised no one else suggested the solution I came up with: just use a text column? When a new list is created the items get incrementing chars as text. If a user reorders an item, the items new sort order value is the value of the item above it concatenated to an "a". This is infinitely extensible, it's only slightly a drag on storage, and very trivial to clean up with some periodic code if storage is actually getting to e a problem. Works like a charm and Dont need to modify postgres itself for this!
  • by exabrial on 1/16/21, 4:25 AM

    Seems like the easiest way would be to emulate a linked list. Have a pointer to the next surrogate id.
  • by dizzystar on 1/16/21, 7:20 AM

    I would suggest picking up Bill Karwin's book on SQL Antipatterns. He goes through a few ideas.

    I found this slide show from him. It covers the same ideas:

    https://www.slideshare.net/billkarwin/models-for-hierarchica...

  • by swagasaurus-rex on 1/16/21, 1:35 AM

    I find similar issues with z-index in CSS.

    Early in a project, people stick to some heuristic, like using small z-index values like 0, 1, 2, 3... etc. But this has drawbacks because you can't add new z-index items in between without also shifting everything with a higher z-index

    Using z-index values with a gap is a better bet. 0, 10, 20, 30, etc. Of course, you are limited to only a few items in between the original. You've also lost an expectation of ordering in the values.

    Larger gaps make more sense, as long as you don't exceed the size of signed integers ±2147483647.

    The author of the post mentions how floating point has a similar problem in the amount of floating point precision you can wring out before you end up with rounding errors.

    The true crux of the problem is one of items related to one another, in other words it's a graph problem. I want these items to show up underneath these other items.

  • by asddubs on 1/16/21, 2:05 AM

    IMO the first approach is the best, unless there's a very high number of items (which usually in these scenarios, there's not, even a couple thousand wouldn't be catastrophically bad to update if it's not a common operation). they all kind of suck but everything else seems worse.
  • by teddyh on 1/16/21, 1:29 AM

    Why not (assuming that every entry has a unique ID) add a “next_id” field, and treat it like a linked list?
  • by smolder on 1/16/21, 3:09 AM

    Any of these methods can be combined with a process that re-indexes the order column once things get too pathological, or even does it on a schedule.
  • by conistonwater on 1/16/21, 9:32 AM

    > However with our choice of 2^16 blanks between each item, we can support no more than sixteen consecutive insertions between the first and next item. After reaching this limit we would have revert to the previous approach of shifting items forward.

    I do not see why this is necessary, you could also reindex the positions because only the relative order of integers matters. How hard can it be to recreate a bunch of integers once every 16 insertions in the same place in the list?

  • by geekpowa on 1/16/21, 1:16 PM

    Rationals are to decimals as arbitrary precision decimals are to binary floats.

    Drill the junior devs: use decimals if you want accuracy.

    What is unsaid with this: if you want accuracy with monetary representation.

    Lots of problems out there where decimals are simply inappropriate.

    One day, rational as a core type in postgres would be a nice addition. Rationals should belong as a core type in most every platform where general purpose compute can happen.

  • by coding123 on 1/16/21, 2:11 AM

    Every time I do this I get frustrated thinking about the fractions getting smaller and smaller. I don't think I ever solved it perfectly but I was always doing this with tiny pet projects where the order didn't matter much. Next time I'm in this situation and using postgres, I'll try this.
  • by throwaway189262 on 1/16/21, 11:19 AM

    You can emulate a tree easily by just putting new items half way between adjacent ones. If two items are next to each other have a fallback that regenerates the "tree" by re spacing all the items. Re generating the spacing is expensive but rare if you pick a big enough range.
  • by tester756 on 1/16/21, 12:29 PM

    yolo hack - this way we only update 1 row, but when obtaining items, then we have to join/load entry from this table too

    ItemsOrderConfiguration Table

    Id | Json (varchar(MAX) or proper JSON type in Relational DBs)

    1 | [

         { 
         
          "ItemId": 1, 
          
          "Order": 2
          
         },
         
         { 
         
          "ItemId": 2, 
          
          "Order": 1
          
         },
         
        ]
    
    
    2 | [

         { 
         
          "ItemId": 15, 
          
          "Order": 1
          
         },
         
         { 
         
          "ItemId": 32, 
          
          "Order": 2
          
         },
         
        ]
    
    the bad thing about this is that you lose constrains e.g that ItemId actually exists, but it shouldn't be a problem.
  • by renatovico on 1/16/21, 12:29 PM

    For me the final solution is the Left, right and depth approach Ex https://github.com/collectiveidea/awesome_nested_set
  • by _3u10 on 1/16/21, 4:51 AM

    There are two problems here:

    1. The position of an element in an array is not a property of the element but of the array.

    2. Don't use sorted sets when you need the properties of an array / linked list.

    You can store the todo list as an array of todos (use a user defined datatype), you can create another table called todo_list that contains an array of references to the todos. You can also create a linked list. Note that you'll have to use modern SQL to retrieve the linked list 'efficiently'.

    The problem only becomes difficult when we dislike the obvious solutions because it violates some property of 'elegance' which apparently means restricting yourself to the 1980s versions of SQL implementations.

    If the language we were writing this in was called JS and the code has the order of the array as a property of the object and we were continually sorting it every time we accessed the array instead of just reordering the array we'd recognize this as the anti-pattern it is. Similarly if for some reason using JS functions was considered 'inelegant', or for some reason using only JS features of 1996 instead of 2020.

    Modern SQL has CTEs, functions, and arrays. They are very elegant when solving the problem of arrays in SQL.