John Baez, Tobias Fritz, and Tom Leinster wrote a nice paper [1] that shows Shannon entropy is the only measure of information loss that acts as you’d expect, assuming of course you have the right expectations. See their paper for all the heavy lifting. All I offer here is an informal rewording of the hypotheses.
The authors say that
We show that Shannon entropy gives the only concept of information loss that is functorial, convex-linear and continuous.
You could reword this by saying
Shannon entropy is the only measure of information loss that composes well, mixes well, and is robust.
Saying that a measure of information loss composes well means that the information lost by doing two processes f and then g is the sum of the information lost by doing f and the information lost by doing g. This is what the authors describe as functorial.
When I refer to something mixing well, I have in mind mixtures in the sense of probability. A mixture is a random choice of random variables, such as flipping a coin to decide which of two dice to roll.
Going back to our two processes f and g, if we choose to do f with probability p and to do g with probability (1 – p) then the information loss from this mixture process should be p times the information loss from f plus (1-p) times the information lost from g. Instead of saying this mixes well, you could say it behaves as expected, where “as expected” is a pun on expected value. The authors describe this as convex-linear because the expected information loss is a convex combination of the two possible losses.
Robustness is a more colloquial term for continuity. Something is robust if small changes in inputs should lead to small changes in outputs. If you make this statement rigorous, you end up with the definition of continuity. If we change a process f a little then we should change our measure of information loss a little.
More on information theory
[1] A characterization of entropy in terms of information loss. On arXiv.
“… right expectations.” -> “… wrong expectation.” ;-)
Snark aside, there is a very large audience who need to understand these concepts but who are not taught them. To this audience, “information loss” has an entirely different connotation, appearing to refer not to bits in the context of line encodings, but to analogues at much higher syntactic levels, such as the elements of database tuples. Yet the same principles apply, because data/information is being transformed in order to accommodate the characteristics of storage and transmission media, and those transformations had better be faithful. What are the analogues of Shannon entropy, or of error-correcting codes, at the level of the relational algebra?