from Hacker News

C "clockwise/spiral" rule to understand declarations

by galkk on 1/1/25, 9:03 AM with 75 comments

  • by palotasb on 1/1/25, 10:06 AM

    The spiral rule works only if there is no pointer to pointer or array of array in the type. In other words it is an incorrect rule. But take this for example:

            +----------------------------+
            | +-----------------------+  |
            | | +------------------+  |  |
            | | | +-------------+  |  |  |
            | | | | +--------+  |  |  |  |
            | | | | |  +--+  |  |  |  |  |
            | | | | |  ^  |  |  |  |  |  |
        int * * ¦ ¦ ¦ VAR[1][2][3] |  |  |
         ^  | | | | |     |  |  |  |  |  |
         |  | | | | +-----+  |  |  |  |  |
         |  | | | +----------+  |  |  |  |
         |  | | +---------------+  |  |  |
         |  | ---------------------+  |  |
         |  +-------------------------+  |
         +-------------------------------+
    
    The type of VAR is a [1-element] array of [2-element] array of [3-element] array of pointer to pointer to ints. I drew a spiral that passes through each specifier in the correct order. To make the spiral correct it has to skip the pointer specifiers in the first three loops. This is marked by ¦.

    The Right-Left Rule is quoted less frequently on HN but it's a correct algorithm for deciphering C types: http://cseweb.ucsd.edu/~ricko/rt_lt.rule.html

    The spiral rule can be modified to process all array specifiers before all pointer specifiers, but then you'd have to specify that the order to do so is right and then left. At that point it's just the Right-Left Rule.

  • by LysPJ on 1/1/25, 10:29 AM

    As a C programmer, declarations in Go appeared "backwards" to me when I first saw them.

    IMO the Go syntax is a vast improvement as it's much simpler and avoids the clockwise/spiral issue: https://appliedgo.com/blog/go-declaration-syntax

  • by HexDecOctBin on 1/1/25, 10:14 AM

    I just spent a week trying to write a parser for C declarations. Recursive Descent didn't work, Top Down Operator Precedence didn't work, I finally found an simple implementation of cdecl in K&R2, but it printed out textual description instead of creating an AST. Converting it to create an AST was a whole different challenge. In the end, I got it working by adding the idea of a blank-type and doing some tree inversions (almost using each AST as a program that executes over the nested chunk of AST).

    It was only about 200 lines of code, and yet never have I been happier to finish a solution and never having to think about it again.

  • by wruza on 1/1/25, 10:00 AM

    The proper way is to use /usr/bin/cdecl or https://cdecl.org and extract as many typedefs as reasonable from the gibberish, because in C most of the times you’ll need these anyway to address lifetimes and ownership/borrowing points.
  • by fc417fc802 on 1/1/25, 10:07 AM

    It's unfortunate that we're stuck with syntax that so many people struggle to accurately decipher.
  • by f1shy on 1/1/25, 10:43 AM

    I find this description way better: http://unixwiz.net/techtips/reading-cdecl.html
  • by sylware on 1/1/25, 12:44 PM

    People should stick to the specifications...

    That said, that makes me think of perl5. I though the perl5 coders were doing some kind of sick competition: who is going to use the most implicit code, namely to read the code you would need a complete/perfect/permanent understanding of the full perl5 syntax parser to understand what some code is actually doing. I hate perl5 for that, but ultra complex syntax computer language like c++ and similar (java, rust, etc), are worse. In advanced OOP, you have no idea of what's going on if you don't embrace the full OOP model of a complex program, not to mention it does exponentially increase the syntax complexity, which is a liability in order to get a sane spectrum of alternative "real-life" compilers since those become insane to implement correctly.

    If implicit there is in a computer language, it must be very little and very simple, but should be avoided as much as possible. Does it mean more verbose code, well, most of the time yes, and this is usually much better on the long run. For instance in C, I try to avoid complex expression (often I fail, because I am too used to some operators like '++' '--'), many operators should not be around, not pertinent enough (like a ? b : c) only increasing compiler complexity.

  • by immibis on 1/1/25, 11:42 AM

    The actual rule is "declaration follows use".

    int p[5] means the type of p[5] is int (but you still have to remember valid elements are 0-4).

    void (signal(void()(int))(int) means the type of (signal(something that is a void()(int))(42) is void. And void(p)(int) means the type of (p)(42) is void.

    If you can remember the precedence of these operators, you automatically remember the precedence of their "type operators" as well.

  • by Ferret7446 on 1/4/25, 12:16 PM

    I've found it much easier to learn that C declarations are based on how they are used, than try to remember this (IMO very unhelpful) "spiral" rule.

    If you have int on the left and some kind of declaration for foo on the right, that means that when you use foo with all of the present syntax, you get an int.

    For example, in

        char *(*fp)( int, float *)
    
    if you use fp like this

        *(*fp)( int, float *)
    
    you get a char.
  • by casenmgreen on 1/1/25, 11:03 AM

    For non-array types, I write mine reversed, and read them right-to-left.

    int long long unsigned number_of_days; - read it right to left, an unsigned long long int

    float fraction; - read it right to left, "" reads as "pointer to", so pointer to float

  • by dapperdrake on 1/1/25, 11:07 AM

    UnixWiz put it well: "Go right if you can. Go left if you must." [1]

    [1]http://unixwiz.net/techtips/reading-cdecl.html

  • by alkonaut on 1/1/25, 11:44 AM

    I never understood why the asterisk is with the variable name in C and not by the type. Apart from declaring multiple variables of pointer and non-pointer type at the same time is there any other reason for it?
  • by foldr on 1/1/25, 12:53 PM

    After implementing (parts of) a recursive descent parser for C, I had a sudden enlightenment about how easy it is to understand C declarations once you get the basic principle. There's already a comment on this (https://news.ycombinator.com/item?id=42565459), but I'll try to go into a little more detail.

    A lot of people start with the idea that the syntax of C declarations is '<type> <variable_name>'. This works fine for simple cases, but it's completely wrong. What a declaration like 'int x' actually means is the following:

        declare a variable x of a type such that the expression x is of type int
    
    In such a simple case this seems unnecessarily long-winded, but now let's look at a more complex case:

        int (*p[4]) (int x, int y);
        
        declare a variable p of a type such that (*p[i])(x,y) is of type int
    
    If dereferencing the ith element of p and then calling the result with two arguments gives us an int, then p must be an array of pointers to functions that take two arguments and return int. If you saw the expression '(*p[i])(x,y)' in some code, you'd have no difficulty figuring out that p must be an array of function pointers. So you needn't have any difficulty when reading the declaration either.

    One slightly confusing thing here is that the nesting of the expression syntax is the opposite of the nesting of the type. The expression is

        funcall(deref(array_index(p, i)), [x, y])
    
    whereas the type is

        array_of(pointer_to(function(args: [int, int], returning: int)))
    
    This makes sense once you understand that the expression is to be interpreted just as a normal expression. The first thing you do with an array of something is index into it. So indexation is going to be the most deeply nested part of the expression, even though the outermost layer of the type is 'array of ___'.

    One additional source of confusion is the '*' operator and the need for additional parentheses in function pointer declarations. In C, function pointers dereference to themselves, so 'p()' and '(*p)()' are equivalent if p is a function pointer. However, in a type declaration you need something to distinguish a function pointer from a function, so the '*' has to be present. Why can't we just write 'int *p[4](int x, int y)' in the example above? Because of how the operator precedence rules work. That expression is equivalent to '*(p[4](x,y))', so it would declare an array of functions returning pointers to integers. (You can't declare an array of functions in C, so that's invalid.)

    Ad-hoc rules for interpreting C declarations miss the genius of their underlying concept. You already know all the syntax you need to understand a C type declaration! It's just C expression syntax.

  • by weinzierl on 1/1/25, 12:08 PM

    Are there any tricks to make Rust type declarations easier?
  • by pwdisswordfishz on 1/1/25, 9:52 AM

    Eh, this misconception again.
  • by the_gipsy on 1/1/25, 10:33 AM

    In example #2 it skips arbitrary "tokens": the first right parenthesis is visited, but its matching left parenthesis is skipped.
  • by keyle on 1/1/25, 10:51 AM

    (1994) but oh, still relevant.