from Hacker News

Kolmogorov Complexity: Extensions and Applications

by sean_pedersen on 5/4/21, 9:32 PM with 33 comments

by mountainriver on 5/5/21, 12:12 AM
Really love the use of Kolmogrov Complexity as Occam’s razor in Solomonoff Induction
by pama on 5/5/21, 3:56 AM
I’m a bit confused about the second sentence in this snippet: “ We search for the shortest program K(X, T=<42) that produces X within a predefined time / computational resource limit. Start with the shortest program and complexify it iteratively spanning a tree of all possible programs.” If we use the time cutoff to make the search for the Kolmogorov complexity practical, then we’re done once we have found the shortest program that produces X. If instead the author means that we can start with the shortest valid program that doesn’t necessarily produce X and then complexify it, then I believe that the logic is incorrect (a shorter valid program that produces X may arise from an otherwise invalid sequence that is not part of the tree that originated from the shortest valid program). If the statement is about finding K(X, T), ie taking the execution time into account, similarity, the optimal program might not result from an iteration of a previously valid shorter program (with the caveat that I’m not fully clear what the author means by complexify and iterate on a tree). Perhaps the suggestion is to search the space of all possible programs sorted by length; if we include invalid programs, we can think of trees of character strings starting from an empty string and spanning all possible strings that are allowed in our programming language. In any case, if we plan the search according to what I think the author means, then once we find a first hit that produces X in less than 42 seconds we are done. Or am I misunderstanding this part completely?
by chriswarbo on 5/5/21, 1:34 AM
The article proposes a computable approximation of Kolmogorov complexity which minimises the length of a program multiplied by its running time.
There's actually a much nicer alternative, sometimes called Levin Complexity, which minimises the length of a program (in bits) plus the logarithm of its running time (base 2). What's really cool about Levin Complexity is that the time required to calculate it only differs from the running time of the minimal program by a constant factor.
For example, let's say we have some data X and we want to find its Levin Complexity, using some pre-defined language L (e.g. the tape of a universal turing machine, or binary lambda calculus). For simplicity, we'll assume L is 'prefix-free' (e.g it has an EOF (end of file) symbol):
```
    def levinComplexity(x):
      # Start by trying complexity 1
      currentComplexity = 1

      # Keep trying larger complexities until we return
      while True:
        # Try different lengths, to trade-off length and time
        for 1 <= length <= currentComplexity:
          for program in enumeratePrograms(length):
            result = run(program, steps = 2^(currentComplexity - length))
            if result == x:
              # Found a program which outputs x
              return (currentComplexity, program)

        # x wasn't found, try a larger complexity
        currentComplexity += 1
```
This will start by enumerating all data of complexity 1 (the output of programs of length 1 bit, run for 2^0 = 1 steps), checking if any is equal to the input x. If not, the 'while' loop iterates and it checks all data of complexity 2 (outputs of programs of length 1 bit, run for 2^1 = 2 steps; and programs of length 2 bits, run for 2^0 = 1 steps); the next iteration checks complexity 3 (length 1 bit, run for 2^2 = 4 steps; length 2 bits, run for 2^1 = 2 steps; and length 3 bits, run for 2^0 = 1 steps); and so on until we find x and return.
The iteration which checks complexity N needs to re-run all the same programs as for N-1, but for twice as long; hence doubling the amount of time needed. We also need to run all programs of length N for one step; since there are 2^N programs of length N, this also doubles the time taken. This makes each iteration of the 'while' loop take 4 times as long as the previous, or about O(2^N) time. Hence the overall algorithm is around O(2^C), where C is the input's Levin Complexity (this grows so quickly that the final iteration dominates the running time).
If we focus on any particular program P we find something interesting: our search will eventually start executing P, once currentComplexity = length(P) it will be run for 1 step. The next iteration will run P again, but for 2 steps; the next iteration will run for P for 4 steps; then 8 steps and so on, doubling each time. Hence iteration N will run program P for around O(2^N) steps. Since iteration N takes around O(2^N) time, we're executing O(2^N) steps of P per O(2^N) steps of levinComplexity. In other words, this algorithm is running every program at O(1) speed!
What's the catch? There are two things going on:
- Restarting at exponentially-spaced intervals lets us switch between programs without having to save/restore any state, but surprisingly it only slows us down by at most two-thirds. It's easiest to work this out backwards: the first time we execute a program's Tth step, we must have taken T steps plus all of the previous executions which got restarted. Any previous execution must have restarted before the Tth step; any execution before that must have taken fewer than T/2 steps; any before that fewer than T/4 steps; and so on. Hence we will reach the Tth step after at most T + T + T/2 + T/4 + T/8 + ... = T(2 + 1/2 + 1/4 + 1/8 + ...) = T(2 + 1) = 3T steps.
- Each program is getting a constant fraction of the execution time, but longer programs get exponentially smaller fractions. Programs of length N+1 get half as much execution time as those of length N. The overall execution time is divided up between program lengths like 1/2 + 1/4 + 1/8 + ... Since we're using a fixed language there are a constant number of programs of each length, so each gets a constant fraction. (Note: The prefix-free property is useful for making this work)
These tiny fractions of time are the main reason this is an impractical algorithm. Although they're exponentially small, big-O only cares about functions of the input size; whilst these exponentials depend only on the program lengths.
Note that our search finishes when we find the minimal program for x. Since that program (whatever it is) is being run in linear time, our search will finish once that program has output x.
by hakuseki on 5/5/21, 1:11 AM
The article contains a misleading description of NP-hardness:
> Computing the Kolmogorov complexity for a sequence X of N bits length is an NP-hard problem, since the search space of possible programs producing any sequence of length N is growing exponentially with respect to the input size N
NP-hardness would actually mean that any NP problem, e.g. a traveling salesman problem, could be "reduced" within polynomial time to calculating the Kolmogorov Complexity of a given string.
by amelius on 5/5/21, 9:57 AM
> The most obvious upper bound for K(X) is obviously the length N of X itself.
This seems false as you'd have to account for the header of the program. Otherwise how can you tell programs and texts apart?
by sischoel on 5/4/21, 11:42 PM
This article is surprisingly short on actual applications.
For some interesting applications one might look at the the Wikipedia article for the normalized compression distance: https://en.wikipedia.org/wiki/Normalized_compression_distanc...
This paper here shows some interesting applications - it is not the first one, but the first non-paywalled one that I found: https://arxiv.org/pdf/cs/0312044.pdf
There is also a book by Paul Vitanyi - it has been a while since I looked at it, but if I remember correctly it also discusses some of these applications: https://www.amazon.com/Introduction-Kolmogorov-Complexity-Ap...