In the paper Specification: The Pattern That Signifies Intelligence (2005), William Dembski has provided his as of this writing latest definition of specified complexity. My purpose with this post is to clarify, what Dembski means by this concept as it is explained in that paper. Of course, my exposition is a personal interpretation, and it may possibly be far from, what William Dembski intended.
Where's the problem, if anywhere?
Is specified complexity information?
Can specified complexity increase?
The definition of contextdependent specified complexity of a pattern T given a (chance) hypothesis H is given in section 7, "Specified Complexity", p. 21 as:
χ = –log2[M·N·φS (T)·P(T| H)].
We won't worry about the contextindependent version, in which M·N is replaced by 10120.
Let T be some observed event, such as the poker hand Ten, Jack, Queen, King, and Ace of the same suite, also known as a Royal Flush.
The hypothesis H here could be the assumption that the deck of cards was thoroughly shuffled, such that each particular position in the deck could be assigned a probability of 1/52 of holding any particular card, and that each partucular position in the deck with the first card removed could be assigned a probability 1/51 of holding any particular of the remaining 51 cards, and so on.
Subject to H, the probability of being dealt Ten, Jack, Queen, King, and Ace in that order of a specific suite is 1/52·1/51·1/50·/49·1/48, or around 1/3·10-8. Since there are four suites, we multiply by four, and since the order of the cards makes no difference, we additionally multiply by 5! = 120 to get a probabilty of around 1.6·10-6, somewhat higher than one in a million.
That is, in the case at hand, P(T| H) = 1.6·10-6, or at least very close to that value.
If there are M = 20 groups of people playing poker, and each group has played N = 10 games, the probability of at least one Royal Flush having been dealt in the first round of a game is therefore M·N·P(T | H) = 3.2·10-4. That is, M is the number of independent observers, and N is the numbers of times that each observer checks for an event.
Now, say that a hand with Deuce and Five of Hearts, Nine of Spades, King of Diamonds, and Six of Spades had been dealt. The probability of this is actually lower than the probability of a Royal Flush; but even if such a hand had been dealt, no-one would have noticed, since it's not really any remarable poker hand, although it has a lower probability. If any cheating is going on, we would not expect any increase in the occurence hands like that, but rather in the high value hands, such as Royal Flush and Four Aces.
This leads us to the term φS(T), the specificational resources associated by S with T. The subscript S denotes a semiotic agent, which is simply anyone/anything that can communicate using some symbolic language. An event such as our T must conform to some pattern P for S to be able to communicate its occurence, and such a pattern can be described using a string of symbols such as "Royal Flush", "Ten, Jack, Queen, King, and Ace of the same suite", or "Ten to Ace of the same suite". The descriptive complexity or semiotic cost φ'S(P) of a pattern P is the number of symbols used in the shortest description of P available to S. Conceptually, we can think of it as that S has a dictionary of descriptions relevant to the subject area beginning with descriptions of length one, continuing with descriptions of length two, and so on, and S goes through this dictionary until a matching description of P is found. Assuming S has found a description for P, yet continues to go through the dictionary to the last entry of the same length, the number of descriptions checked is the number of all descriptions with a length shorter or equal to the length of the shortest description of P.
The formal definition of φS(T) can be found in section 6, "Specificity", p. 17:
φS(T) = the number of patterns for which S’s semiotic description of them is at least as simple as S’s semiotic description of T.
So, it's not actually the number of descriptions available, but the number of patterns, whose shortest description is shorter than or of the same length as the shortest description of T, or, put differently, whose descriptive complexity is at most the same as the descriptive complexity of T.
That is, the patterns Four Aces and Royal Flush have the same specificational resources, and the pattern Poker Hand has the same specificational resources; but these three patterns have different probabilities subject to the hypothesis H.
What is the point in the specificational resources? Dembski's claim is that a simple pattern, that is a pattern with a short description, is a stronger indicator for design than is a complex pattern. The 'complexity' in 'specified complexity' refers primarily to low probability of an event to occur by chance (what Dembski calls 'statistically complex'). A pattern such as Poker Hand is as simple as Royal Flush, but, of course, any poker hand is a Poker Hand, so simplicity of the pattern is not sufficient to say that we have a case of design. A pattern such as Deuce and Five of Hearts, Nine of Spades, King of Diamonds, and Six of Spades has a very low probability to occur; but it's nor really a pattern we are concerned about, if by 'design' we mean 'cheating', although someone might claim that it's not every day you see exactly this poker hand. It's the combination of a simple pattern and a low probability that should arise our suspecion, according to Dembski.
Why the subscript S? Because dufferent observers may not have the same descriptions at disposition; for instance, a person unfamiliar with poker might not know, what a "Royal Flush" is, and not know that it has special significance within the game. Therefore, specified complexity is a subjective measure.
If we look at the product φS(T)·P(T | H), then it is an upper bound on the probability of S to observe an event that is at most as descriptive complex as T and has at most the same probability (cf. p. 18).
In short, the whole product M·N·φS(T)·P(T | H) is an upper bound to the probability subject to H that at least one of M independent observers during one of N observations will report to the semiotic agent S at least one event that is at most as descriptive complex as T and has at most the same probability.
Converting to binary logarithm reverses the scale and turns the product into a number of bits. If M·N·φS(T)·P(T | H) < 1/2, then χ > 1. That is, if χ > 1, it can be considered more reasonable to conclude design than to conclude chance.
Where's the problem, if anywhere?
First off, to make specified complexity of any use in a given situation, it is necessary to know the value of P(T | H). We don't always do that, nor necessarily do we have much of a way to estimate the value.
Second, as many critics, for instance Elliott Sober, have pointed out: this one-sided approach is not actually used in design detection. There is always an implicit assumption about the capabilities of a designer, and those capabilities are assumed to be the same as those of humans. Especially in criminal cases, also a motive is required for a design conclusion, and again, what can count as a motive depends on assumptions about the designer.
Third, we are dealing with a moving target: what counts as 'design' varies from context to context. If someone writes a book, that's design, and if someone else later writes a book taht is suspeciously similar to the first book, it may be a case of plagiarizing. We here have two designed objects, no matter what; yet, Dembski wants design here to mean that the second author plagiarized the book of the first author.
Is specified complexity information?
Dembski doesn't anywhere in the Specification paper claim that specified complexity is information – though he has on pp. 11-12 a discussion about Fisher's eliminative approach and algorithmic information theory, also known a Kolmogorov-Chaitin information theory, or simply K-C information theory.
For any event E with probability p, the value I(E) = -log2(p) can be considered information, by Claude Shannon called the self-information. Not really something much used by Shannon, who instead used the average value of the selfinformation called entropy.
Dembski, however, frequently uses the self-information, so we'll do it here as well.
Now, if M·N·φS(T)·P(T | H) < 1/2, then P(T | H) < 1/[2·M·N·φS(T)], and therefore
(EQ1) I(T | H) > log2(M) + log2(N) + log2(φS (T)) + 1.
Let's use a slightly different way than Dembski's to present part of, what he writes pp. 11-12.
A fundamental result in algorithmic information theory is that for any natural number N there exist bit strings of length N that cannot be compressed to a string of length d < N by the same compression algorithm. We can formulate this as
Theorem 1: For any natural number N and any compression algorithm P there exists a string S of length N such that |P(S)| ≥ N.
Proof: For any natural number d the number of strings of length exactly d is 2d. The number of strings of length at most d is then
Σ0≤i≤d 2i = 20 + 21 + … + 2d = 1 + 2 + … + 2d = 2d+1 − 1 < 2d+1
If d < N, then 2d+1 − 1 < 2N, the number of strings of length N. There are therefore more strings of length N than strings of length less than N. It is therefore not possible for P to map every string of length N to a string of length less than N.
In EQ1 there is a clear similarity between the term log2(φS(T)) + 1 and log2(2d+1) = d + 1 in the above.
As mentioned, the product φS(T)·P(T | H) is an upper bound on the probability of S to observe an event that is at most as descriptive complex as T and has at most the same probability, or, to add some confusion, at most as descriptive complex as T and at least as stastically complex. Let U be any such event. Then we have that
(EQ2) I(U | H) ≥ I(T | H) > log2(M) + log2(N) + log2(φS(T)) + 1 ≥ log2(M) + log2(N) + log2(φS(U)) + 1.
To give an intuition for, what this means, consider a communication system with M senders that each sends N messages to a receiver S that in turn uses some compression algorithm φS to compress all received messages. Let T be some message and assume S to know the probability of receiving any particular message – just as in Shannon's model. Then S discards all messages with a higher probability than T and all messages whose compressed version is longer than the compressed version of T. If any message is kept, it will have a specified complexity at least the same as T.
That is, if any message U is kept, and it satisfies I(U | H) > log2(M) + log2(N) + log2(φS(U)) + 1, then a design inference should be triggered; that message is not generated by chance.
While it may be debatable, whether specified complexity itself can be considered a kind of information, information theoretical concepts do enter into it.
Can specified complexity increase?
Assume M people during N rounds toss a coin 100 times and after each round send a message with the resulting sequence to S, who in turn measures the specified complexity of each sequence. Have we any particular reason to assume any increase in specified complexity over time? Not really, since each round simply starts out the same place as the first round.
Now assume instead that after each round, those K people, 0 ≤ K ≤ M, that scored the highest specified complexity copy their sequence to the next round. If K = 0 we have the same as above, and if K = M, nothing new will come up in the remaining rounds, so we assume 0 < K < M; that is in each round after the first, some people will recast their sequence, and some people will retain their sequence. Then, obviously, we can expect the maximum amount of specified complexity to increase.
This isn't much of a model of evolution, of course, but it does illustrate that natural selection with some variation to work with can increase specified complexity, as long as there is neither complete randomization nor complete stasis.
Sure, it can be said that S as the selector infuses intelligence into the game; but S does not employ any particular purpose or anything, only the per round selection of the sequences with the highest specified complexity.
See also: Dembski vs. Hume