from Hacker News

Understanding Shannon's Entropy metric for Information [pdf]

by micouay on 10/7/21, 1:24 PM with 42 comments

  • by jlpom on 10/8/21, 12:05 PM

    In 1939, when Shannon had been working on his equations for some time, he happened to visit the mathematician John von Neumann. During their discussions, regarding what Shannon should call the "measure of uncertainty" or attenuation in phone-line signals with reference to his new information theory, according to one source:[10]

    > My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea: ‘You should call it entropy, for two reasons: In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.

    [10]: M. Tribus, E.C. McIrvine, "Energy and information", Scientific American, 224 https://en.wikipedia.org/wiki/History_of_entropy

  • by dswilkerson on 10/7/21, 4:12 PM

    I read your arxiv.org article on Shannon Entropy: https://arxiv.org/abs/1405.2061

    What I am amazed at is that arxiv accepted this. I submitted exactly the same argument to them on 27 Jan 2009. They rejected it.

    Here it is on my home page: http://dsw.users.sonic.net/entropy.html

    Entropy is (1) the expected value of (2) the information of an event.

    You might want to post a link to your answer here and see which of our explanations is more popular: https://math.stackexchange.com/questions/331103/intuitive-ex...

  • by soyiuz on 10/7/21, 9:00 PM

    I have several major problems with this explanation (often endemic in the discussion more generally):

    1. Shannon's original paper relates "the capacity [of a channel] to transmit information" as well as the potential of a system (e.g. the English language) to generate information. In other words, your pipe needs to be able to accommodate the potential volume coming from the source ("the entropy of the source determines the channel capacity"). The amount of information in a message or the amount of surprise, as the author has it, should not be confused with the amount of potential information.

    2. Instead of "system" and "channel" the author uses the word "variable" which I find misleading. Channel (like a telegraph cable) and system (the English language) are specifically relevant to Shannon's discussion.

    3. The discussion of "surprise," as the author has it, is misleading. Shannon is writing his paper in conversation with Hartley and Nyquist---all three specifically attempting to bracket out the subjective psychological factors such as surprise, in order to describe the capacity for information transmission in terms of quantitative measures, based on "physical considerations alone" (Hartley). Surprise reintroduces a subjective, relative, psychological understanding of information the original authors wanted to avoid.

  • by amelius on 10/8/21, 8:46 AM

    Shannon's original paper is very readable.
  • by arketyp on 10/8/21, 7:33 AM

    Compression is a good entry to Shannon entropy. What was also eye-opening for me was that the metric was motivated by fulfilling a specification which matches an intuitive notion of information [1], much like how the Kolmogorov axioms were characterized to capture the intuitive notion of probability, or how the Church-Turing thesis defines computation.

    [1] https://en.wikipedia.org/wiki/Entropy_(information_theory)#C...

  • by hdjjhhvvhga on 10/8/21, 1:54 PM

    If someone is interested in this, I highly recommend J R Pierce's Symbols, Signals, and Noise. The book puts Shannon's work in perspective and gives extremely useful context so that one can better appreciate its value. One can also clearly understand, for example, why the original title of Shannon's seminal work was "The Mathematical Theory of Communication" instead of Information.
  • by stuartjbray on 10/8/21, 8:52 PM

      To store the result of a coin toss requires 1 bit of information, I can either give you a 0 or a 1. But implicit in me communicating that to you is that it is 1 'out of' something, namely 1 'out of' 2.
      To store a trit, ie a 0, 1 or 2, requires 2 bits. You will recieve either a 00, 01, or 10. You will never recieve a 11, because that is not a valid trit.
      Huffman compression reduces the ammount of bits needed to send a block of data, by prematurely terminating the data communicated once a specific sequence has passed. So the huffman compression for a trit value of 0, would not be '00', it would be '0'. We pass fewer bits, because some sequences carry an implication that this is the end of the sequence. The reciever has to know which sequences indicate a termination.
      A trit translates to 2 bits, with a wastage of .5 bits. The concept of 'compression' only makes sense when person A throws a sequence of bits at you, and when they stop, you have to interpret what they just said in order to work out what it meant. In truth, the extra bits may not have been sent, but they are implied.
      When I write '1' on a piece of paper and slide it across the table to you, I am cheating, I am being ambiguous. 1 out of what? shold be the correct response. A bit is more accurately communicated as 1 out of 2. A 'double' would be 1 out of a million ish. You can only store numbers in a binary computer, out of something. A 1 stored in a bit field is a different quantity than a 1 out of a million ish. In lazy human parlance a '1' represents 1 out of infinity, but this is not technologically possible to store. No computer hard drive can store enough digits to contain the information that 1 out of infinity represents.
      
      If I toss a coin and tell you it is a 1 out of 2, this is valid. But if I have cheated, and tossed a double headed coin then from my perspective the '1' aka 'heads' is 100% predictable, and therefore, for me, does not contain any information. For you, ignorant of my cheating coin, the data I communicate to you is 1 out of 2. What is the entropy of that information? Is it 1 bit, as you would believe, or 0 bits, as I would believe? The answer is that ALL MEASUREMENTS ARE NOT PROPERTIES OF OBJECTS, BUT OF RELATIONSHIPS BETWEEN A MEASURING SYSTEM AND AN OBJECT. The entropy of that coin toss does not live inside the data communicated, it lives inside my head (where the entropy is 0 bits) and also in your head (where the answer is 1 bit). The coin toss measurement of '1' does not contain either 0 or 1 bits because a piece of data in isolation does not have any meaning. Only in my head or yours can it's meaning be measured.
  • by dr_dshiv on 10/8/21, 9:22 AM

    Is there a distinction between the entropy of a message and the impact of a message on the entropy of the system? Where inputs tend to increase system entropy?
  • by kasperset on 10/8/21, 12:41 PM

    Shannon's entropy is also used for calculating one of the measure of alpha-diversity in ecological and microbiome studies.
  • by dr_dshiv on 10/8/21, 9:20 AM

    The thing about Shannon entropy is that it depends upon alphabets and symbol systems. I want to understand how it might be used to describe presymbolic computational systems.