The fundamental theorem of logic

John Bjarne Grover

This article was published on this internet page on 12 december 2021 under the title 'The fundamental theorem of statistics' and has in the mean time changed to a fundamental theorem of logic. Those are two sides of the same coin - the same maya screen of cognition seen from history or from eternity. Logic has, if I have understood it right, reduced its number of logical operators untill there was only one left - the negation - and with the present fundamental theorem even that disappears and only 'pure logic' remains - providing the tools for 'eternal' counting which is needed for 'historic' statistics to function.

The theorem is the formula which has been on my homepage in this article since 2007 as a 'ditch' into which I ran my further theoretic studies in distributional linguistics when embarking on the theme in 1991 and I could not solve the paradox in those days. Today it seems that this is the paradox which everything in current history is about. Entropy measures information flux and with the expansion (or 'expanse') given below (or 'above'?) in the multiplication with unity it turns out that the commutative law of conditional entropies - H(A) + H(B|A) = H(B) + H(A|B) = H(AB) - is valid only on the condition of absolute equiprobability of the occurring symbols. But that is not a realistic corpus of symbols - that is an abstract theoretic condition defining a 'semiotic' unit of cognition or 'philosophy'. It really means that for a logic to function, each logical unit must be 'equivalent' or 'equally important' in some semiotic sense of it - so that none of them count as 'cursed' or 'privileged' - when the information flux goes through the maya screen either this way or that.

The entropy of a category A = a1, a2... an is computed as

H(A) = - ∑i p(ai)/p(A) log p(ai)/p(A)

The entropy of a compound aibj, when for example A = noun roots and B = grammatical morphs which can be attached to it, or A = noun and B = verb, is consequently the simple

H(AB) = - ∑i,j p(aibj)/p(AB) log p(aibj)/p(AB)

Conditional entropy is the measure on probability distribution in a category B when it occurs dependently relative to a category A, such as morphs relative to roots. A certain morph category B can occur in many contexts in a corpus but the entropy measure of B in the context of the root category A will not be the same as the entropy of the category B (or A) considered independently. The conditional entropy of B relative to A is and is computed by stopping at all A in the corpus and then counting how many times a B follows in the neighbourhood of A. The probability of the compound p(AB) globally in the corpus is normally defined as the probability of A = p(A) multiplied with the probability of B in the context of A = p(B|A). That is, p(AB) = p(A) p(B|A). One can count the number of AB's in a corpus and compute their probability, and one can count the number of B's in the neighbourhood of A's and compute the relative probability independently - and then counting discrete symbols in a finite corpus will prove that the mathematical relation H(A) + H(B|A) = H(AB) is valid. That is when the symbols are discrete and the corpus is finite.

This is the one out of the laws which apply to this - the addition and commutative laws of conditional entropies. The addition law is this H(A) + H(B|A) = H(AB) - which means that the entropy of category A plus the conditional entropy of category B in the context of A equals the entropy of the compound AB. The commutative law says that H(A) + H(B|A) = H(B) + H(A|B), which means that the addition law applies equally when the dependencies of the relation are imposed on either A or B. Now since

H(B|A) = - ∑i p(ai) ∑j p(bj|ai)/p(B|ai) log p(bj|ai)/p(B|ai)

and

H(A) = - ∑ p(ai)/p(A) log p(ai)/p(A)

and since ∑j p(bj|ai)/p(B|ai) = 1.0 for all ai, this can be inserted into the expression for H(A), which then gives

H(B|A) + H(A) = - ∑i,j p(ai)p(bj|ai)/p(A)p(B|ai) log p(bj|ai)/p(B|ai) - ∑i,j p(ai)p(bj|ai)/p(A)p(B|ai) log p(ai)/p(A) = - ∑i,j p(aibj)/p(A)p(B|ai) log p(aibj)/p(A)p(B|ai)

while the entropy of the compound AB is given by

H(AB) = - ∑i,j p(aibj)/p(AB) log p(aibj)/p(AB)
= - ∑i,j p(aibj)/p(A)p(B|A) log p(aibj)/p(A)p(B|A)

This gives - if I have got the math right here and it is not an error - the formula of addition of conditional subset entropies telling that H(A) + H(B|A)  =  H(AB) means that

 - ∑i∑j p(aibj) p(A)p(B|ai) log p(aibj)p(A)p(B|ai) = - ∑i∑j p(aibj)p(A)p(B|A) log p(aibj)p(A)p(B|A)

It means that the formula is valid if p(B|ai) = p(B|A) for all ai, which I suppose it is normally not. It is so only for the case of equiprobability of all p(B|ai) - or maybe one can find a strange distribution wherein it happens to be equal against expectations. That distribution should be interesting.

This phenomenon can be called 'the fundamental theorem of statistics' - which tells that symbolic manipulation in theoretic space can give an adequate understanding of a distribution in historic time if the historic distribution behaves as regularly as the semantic assignment of the symbols do. Conditional entropies concern the amount of dependency between occurrences, and clearly for a symbolic manipulation to be valid there must be some dependency but not too much, so to speak. 'The fundamental theorem of statistics' proposes that the adherence of occurrences in history is about as sticky as it is among theoretic symbols and therefore statistical computation is possible. But it must be admitted that it is a strange idea, tells the theorem.

Statistics is probably an inherently applied science - and this theorem tells that symbolic language takes shape when the distributions have become sufficiently regular.

I discovered the puzzling phenomenon in the spring 1991 when I was about to write my MA thesis on non-discrete probabilistic grammar and was soon left in the roadside with this riddle and could not really get off the spot, so I wrote a thesis with some phonemic corpus investigations with phonetic duration values attached. The thesis and degree was accepted but I later got into difficulties and was outright ditched through some years after I wrote a doctorate degree on the reasons for assuming a poetic logic in natural language. Considering how massively important statistics is for the administration of society, one could ponder whether this riddle still is the problem.

Discussion continues here