Generalized numbers and natural language

John Bjarne Grover

I wrote the following in Oslo in probably 1993 while I lived in Norderhovgata on Kampen. The 'paper' is not entirely finished and it is not really my field of interest or ability, and it can be seen as quite simple and can basically be reduced to the trivial idea that if the base for numeral notation is the same as the number of alphabetic letters, then the only difference between numbers and words is the redundancy of distribution in natural language - and that cannot be seen as very original. However, today this 'paper' can be seen as the theoretic basis for the chapter '20 Gedichte - in September und Oktober' in my 'SNEEFT COEIL' (2014-2018): That chapter, as goes the theory, is the implementation of the idea that the 20 poems tell in each and every part of themselves the secret of the corresponding number, such that the words in poem 3 at the point 0,718 (that is, poem 2,718) will tell the character of the constant 'e' (line 12 of the 16-liner is "dass jeder Gott war ewig mein") and in poem 4 about 0,1428 through there is the character of the constant 'π' (line 3 of the 16-liner is "Geht es schnell über den Ort" - 'snelle' is norwegian for 'reel'). See this poem #11 from the chapter in the book - on the number '11'. Now the interesting discovery - told in this file - is that the 20 poems constitute a long string of alphabetic symbols that seem (as goes the theory) to follow quite closely the progression of digits in the socalled 'Catalan constant' G = 0,91596.... = the integral from 0 to 1 of arctan(x) / x. Indeed, if my poems describe the nature of number for each instant (philosophically, poetically, semiotially, psychologically), and these poems also follow the progression of the digits in the Catalan constant, then it is far beyond any trivial idea of relation between letters and numbers. Then one can also assume something very interesting for language in the definition of the constant. Line 15 in the first 16-line poem - #1 - is "Der Mensch wenn er ist selbst einsam". It is probably this which is the poetic implications of the theoretic basis in this paper.

I do not know when knowledge of 'Al Qaida' (arabic for 'the base') first surfaced in western news, but I think it was the name used for the bombings of the US embassies in Dar-es-Salaam and Nairobi on 7 august 1998. See this file for more about this. If the US embassies were bombed for lending punch to the rejection of my PhD dissertation, one could also speculate that the name of Al Qaida could be conceived in reference to the following (and untill the present day 26 may 2021) unpublished article. I found it again some months ago when I went through some old papers. Of course my computer could have been surveyed and the 'paper' stolen from it already in 1993 or later. I gave also a copy of a probably earlier draft to Herman Ruge Jervell and I think he read through it - and I think he mentioned uncountability ('overtellbarhet') as a relevant aspect - which was nice of him - I think I was not even formally attached to the institute any longer but was planning a project and application for it. Today I can recognize the theory of rapes of me (in drugged condition, that would have been, and the theory has little or no empirical basis) in puberty age - including sanitary towel with menstrual blood over mouth and nose - in the names of the two most famous leaders of Al Qaida. If that were the reason for the western news on the terror, even after 11 september 2001, it could have been for the idea 'where did the ideas come from?' - or, simply, the bombings could have been conceived as the view of certain 'secret intelligence services' that my ideas could come from their intrigues on my life and work. Which is nonsense, such as my TEQ book 14 seems to prove as well.

When I was hospitalized against my will in the psychiatric hospital 'Sandviken Sykehus' in Bergen in 1999, I was forced to eat the psychiatric medication called 'olanzapin' - which for my personal story could mean 'U-land-seminaret' with Olav Skundberg in 1973 and the woman in the hotel room. I drank my first red wine there. If one considers this red wine a 'numb-er', there is also the possible idea of the 'generalized number' in the recent military coup in Burma.

Is my 2019 discovery - or call it 'theory' since I have not had the chance to study it more thoroughly - of the Catalan number relative to my '20 Gedichte' verified? If it is, then there could be quite interesting corollaries there. If this would suffice for more largescaled intrigues, I dont know. Could be it could. The following article looks quite innocent and was perhaps considered harmless and could safely be launched as a 'real bomb', the PTRSIM PIK (a la 'Wittgenstein'?) striking back against the empire of suppression with this 'proof' of his genius - and some could perhaps say that the US embassies of 1998 or the twin towers of Manhattan 2001 could have been such 'proof-making' - in which case it could perhaps have been felt as more alarming if the Catalan material should contain more interesting material relative to my book or chapter.

If this also should lead onto new semiotic theories with potentially far-reaching (technological or other) consequences, it is interesting to observe also the fraction of the line in the introductory abstract below, how "to give the broad outlines of a unified theory of signification in number and natural language" even could have given rise to constructions of 'submorphemic' (not 'submarinic') kind in "road outlines of a unified theory of signification in number" - see also these comments of the apparent female body in the 'road outlines' - in which case the program could have been to go for a mock heralding of (me as) the PTRSIM PIK in the form of a new 'nazi' - as a disciple of Adolf Hitler. Which would be nonsense, of course. But it is not impossible that somebody is trying to turn some of my ideas into their own.

Added on 28 may 2021: In the APP-s-tract to the paper below there is also the line "approaching a common theory for the natural and the social sciences as well as the humanities". Why is it called APPs today, while it some years ago was called 'computer program'? APP-roading a common theory - and the 'road/raw-Ding' is in Hütteldorfer 'Arabe'? When I was hospitalized against my will at 'Sandviken sykehus' 1999-2000, I explained the differences in evaluation of my condition as a matter of 'faculties': A psychiatrist may consider it a disease and a sign of bad health while a poet considers it inspiration (and a mystic a revelation) and hence a sign of good health. If there is a socalled 'intelligence project' on my person - regulating matters like these - it is of course a strictly illegal project that must be stopped.

Here is the article in its original unfinished form from I think 1993:

Generalized numbers and natural language

John Grøver

The paper is an attempt to bring the signification of number and natural language on a common form. It is suggested that the notion of countability, such as it was developed by Cantor, must be replaced by a semantically based definition, and a revised generalized continuum hypothesis is suggested in the continuation of a revised definition of number. The main interest in the paper is nevertheless to give the broad outlines of a unified theory of signification in number and natural language, with the further goal of approaching a common theory for the natural and the social sciences as well as the humanities.

1. The arbitrary signification of natural language.

It is a generally acknowledged trait of human natural language that it signifies arbitrarily. This may well be considered a dogma of modern linguistics, at least since de Saussure, although the history of the idea of arbitrarity in natural language is long and complex - often considered as stemming back at least to Aristotle or even Democritus, and contrasting with an equally perennial view that language signifies motivatedly, in terms of some sort of natural iconicity with inherent meanings in its smaller parts. I will not enter neither the problem of iconicity in language, nor the history of this thought here, but restrict this important and complex topic to the observation that the idea of arbitrarity seems generally to have been confused with the idea that the distribution of speech sounds is irrelevant for linguistic signification.

2. Redundancy in submorphemic structure.

This has led to the situation that one has largely ignored the wellknown redundancy patterns in questions of semantics. On the one hand, one has readily acknowledged the fact that there are large distributional constraints on speech sounds, such as is described in phonotactics and in observations on the universal presence of morpheme structure constraints. On the other hand, the pervasive idea of arbitrarity seems to prohibit any assumptions of a systematic significational function of these distributional constraints, which has led to the situation that these phenomena are described in isolation from the signification of language.

To assert a symbolic level in linguistics is to maintain the double articulation of language, whereby there is a phonological level which is principally independent of the signification on the morphological (symbolic) level. This conception of language as doubly articulated allows for acknowledgement of morpheme distribution as relevant for grammatical signification, while phoneme distribution is seen as irrelevant for lexical signification. It is here that we find the systematic split in lexicon and grammar, which is what the notion of a double articulation with a symbolic level is all about. It means that the distributional constraints are recognized as significative down to morpheme level, but not further. Submorphemic redundancy is seen as relevant for perception and segmentation of natural language, but not for signification.

3. The 'glue' in the symbol.

This means that there is internal structure in linguistic symbols which is ignored as far as signification is concerned, while the redundancy (= structure) above the morphemic level is considered relevant for signification (by syntactic structure). Hence the notion of a symbolic level, on which signification takes place arbitrarily in an unmotivated labelling, is closely associated with a lower bound on the scope for relevance of redundancy for signification. The symbol thus emerges as that which signifies unmotivatedly. This leaves the old question of how the semantic content is 'glued' to the formal expression of the symbol open. On the symbolist view, the 'glue' between the formal representation and the content of the symbol may remain mysterious: A symbolic level says nothing about how the two sides of the symbol came to be united, as long as they are tied to each other.

I will here suggest that the glue in linguistic symbols is the redundancy itself. That is to say, there must be internal structure in terms of distributional constraints for a string to be a symbolic carrier of meaning in natural language. By this, we do not have to propose that there must be a consistency in the mapping from the internal structure of the formal expression onto the internal structure in the entity referred to, although there are probably reasons to assume such a submorphemic mapping. All we say is that there must be redundancy in both sides of the symbol for signification to be established in natural language semantics. This is, of course, not much more than to say that both the formal expression and the content must be perceptually and cognitively distinguishable - which makes the claim rather weak. It rests on the assumption that there must be redundancy in perceptual input for this to be singled out as an entity on basis of experience. The assumption is perhaps almost too self-evident to call for scrutiny. Hence we do not have to assume a motivatedness in the sense of a submorphemic signification, which has been so frequently sought for in the history of linguistic thought ever since Greek antiquity. All we say is that the two sides of the linguistic symbol is glued to each other by redundancy in the distribution of perceptual data, and, therefore, that the signification of natural language strictly presupposes redundancy patternings.

4. The general nature of submorphemic signification.

But although we maintain that we need not propose the reality of any submorphemic signification in natural language for this definition, we may still consider what will have to be the general nature of such signification on a formal basis, without any assumptions as to its empirical reality.

In general, the longer a string is, the higher will its semantic specificity on the average be: Its capacity for selecting subsets from the set of facts-of-reality increases with its length, and, conversely, the shorter the string is, the larger will the selected subsets become. (For example, the set of 'red houses' is smaller than the set of 'houses'). When a basic symbolic level is hypothesized as coinciding with the morphemic, it means that one assumes a limit of generality, which is reached by the morpheme level: Below this, there cannot be linguistic signification.

Nevertheless, it is clear that if a submorphemic signification should be hypothesized, one would have to expect far more general meanings than is found on the morphemic level. Since we know not where these meanings should be found, the question must remain open. But we may still assume that semantic specificity correlates with string length, such that a submorphemic signification would pertain to properties more general than that found on the symbolic/morphemic level.

In a forthcoming work I discuss the possible nature of submorphemic signification in more detail, and suggest that the denotata of such signification are to be found jointly in the social and the perceptual spaces, to the effect that submorphemic signification must refer largely to social events, and as such deviate markedly from the signification we ordinarily associate with linguistic naming. The referents of subsymbolic signification is not strictly in the sensorially perceptual domain, which explains why it has been so difficult to recognize.

5. The semantics of natural number.

Since, then, it is by its constrained strings and its particular redundancy patterning that natural language refers to the actual world, we will assume that, as a general principle, the degree of redundancy in the string somehow correlates with the degree of context-dependency in the reference, such that a highly constrained string is associated with a contextually constrained referent, and thus that the reference is constrained as to the number of possible worlds. This is, basically, the content of the claim when we assert that the linguistic symbol and its referent must be cognitively and perceptually distinguishable.

However, on this assumption, we will find that the maximal semantic generality appears on the single symbol level in a language in which there are no recurring constraints on the distribution of symbol sequences, and the number of distinct referents will be equal to the number of symbols in the alphabet. Evidently, this will not be a natural language, but it will be the semantics of the symbols occurring in natural numbers.

Natural number can be defined in this way: It is that property which can be assigned semantically to the symbols of a completely unconstrained language, wherein the conditional probabilities of all symbols equal the unconditional, and the symbols are equifrequent.

Now, if we think of languages in terms of degrees of redundancy, we will find that a highly constrained language, such as natural language typically is, will possess this property by virtue of referring to the actual world. When, for example, generative grammar isolates the syntactic component to let it generate syntactically wellformed sentences which are not really found in actual use ('colourless green ideas sleep furiously'), we find that the redundancy of the linguistic code (here on the syntactic level) is reduced, while the set of worlds which is referred to is increased. In general, we will expect to find that the more the language approaches random distribution by relaxation of distributional constraints, that is, the more the conditional probability approaches the unconditional, the larger will also the set of worlds referred to become. This means that numbers refer to the entire set of possible worlds, while natural languages refer to a restricted set of possible worlds (approaching the actual).

If the definition of rigid designators is that they refer uniformly to all possible worlds, then we see that natural numbers will be rigid designators par excellence.

6. The reference of natural language.

Ntural language has redundancy, and we can see the redundancy patterning og a specific language, which as a total completely determines its lexicon and grammar, and therefore each sentence as well, as idiosynchratically referring to a specific world by virtue of this particular patterning. By this, the reference of natural language can be seen as a selection from the reference to the entire set of possible worlds into the reference to the actual world by means of the imposition of a particular redundancy patterning on the language. Any such selectional constraint is possible, and there is an infinity of worlds (or languages) expressible by such redundancy imposition. To the extent that natural language refers to the actual world, we can see the ontology of the actual world as described by the way the redundancy is imposed on the language. The actual world is restricted relative to the entire set of possible worlds in the same way as the natural language is restricted distributionally relative to the equivalent number system without redundancy.

7. Real numbers.

I have suggested, in line with common views, that the natural numbers should be interpreted as referring to the set of possible worlds and thus be rigid designators. This is done by means of a semantically based definition of number. I will now suggest that this definition shall be refined to yield real numbers when the alphabet is infinite, and natural numbers when it is finite. The basic idea behind this is, in contradistinction to Cantor's classical account, not a matter of countability, but rather a property of undefined signification in strings over infinite alphabets. To make this clearer

However, this should be modified to yield reference to discrete entities, since they will refer to countables. The specific semantic content of each symbol (beyond its general property of referring to a discrete entity) in its position in a string will be a function of alphabet size, and we have tacitly assumed a finite alphabet and equifrequent symbols. We will now define the real numbers as the extension of number strings over an infinite alphabet (but still with equifrequent symbols). However, we will here, contrary to what is suggested by Cantor's principle of diagonalization, maintain that the real numbers do not differ from the naturals in terms of countability, but rather in the semantic impossibility of assigning a numerical interpretation to strings over an infinite alphabet. But this, again, depends on a revised definition of infinity, on basis of the following argument:

We can enumerate all finite subsets of N by means of an indexation in binary notation as follows: We let the first digit position in a number string represent the number 1, the second digit the number 2, and so forth, such that each digit position represents the respective natural number. Then we let '1' in a digit position represent that the respective number is present in the subset and '0' that it is absent, such that 110 represents the subset {3,2}, while 1001 encodes {4,1}, and so forth. Then each finite string representing a distinct natural number in binary notation will encode a certain finite set of natural numbers, and the index, in binary notation, of the set will be identical with the encoding of the indexed set. In this way, any finite subset of N can be listed, which means that every finite set will receive a finite index in binary notation.

The indexes of this list is now rewritten into ternary notation, and we enter the finite subsets on the same indexes as they had in the former list. We can then use all indexes containing the digit '2' to enter infinite subsets. Now, if N can be exhaustively enumerated in binary as well as ternary notation, we face the following paradox: If we now take this list and rewrite it into binary notated indexes, all finite subsets can receive the same index as they had in the first list, but we will now not have any finite indexes left for the infinite subsets. Then one of two must be the case: Either the cardinality of the set of numbers which can be written out in a certain base will depend on the base, or the infinite subsets which were in the list with ternary notated indexes must now receive infinite indexes in binary notation. Neither of these possibilities are in line with traditional assumptions. Since we will here assume that both possibilities express the same underlying point, we will adhere to the second before we revert to a discussion of both.

So, when we rewrite the indexes of this list into binary notation, we propose that the infinite subsets in the ternary notated indexes must now receive infinite indexes in binary notation: There is no other possibility. We then have a list which contains finite and infinite indexes, on which finite and infinite subsets are entered. In principle, we can rewrite a list with indexes in any base in the same manner. Since we can choose a base which contains all letters and digits, all finitely definable infinite subsets can be entered in a list with indexes in expanded base, and then rewritten into binary notation with finite and infinite indexes.

There are, in this context, three interesting groups of subsets of the natural numbers:

1) The finite subsets
2) The finitely definable infinite subsets
3) The (by finite means) undefinable infinite subsets

The union of the two first of these is of course countable according to Cantor's concept of enumerability, but the crucial question is whether the third is a countable set or not. We here propose that it is countable, for the following reason: Both 2) and 3) consist of infinite strings (they jointly add up to the set of infinite strings over a binary alphabet), and the strings in 2) represent natural numbers. If only 2) were countable, it would have implied that only some infinite strings could be uniquely interpreted as natural numbers, and therefore that not all digit sequences could be interpreted as natural numbers. In short, if 3) is uncountable, the infinite string representation of natural numbers would be restricted to only a subset of the infinite strings, and therefore infinite strings, as natural numbers, would have to exhibit distributional constraints on permissible symbol occurrences.

If, now, we consider all infinite strings representing natural numbers as a number written backwards, we find that all digit positions will have a definite semantic value, and the semantic principle will be identical to the principle for interpretation of the finite strings. Since any finite substring of an infinite string will have a finite string correlate, it is clear that there cannot be constraints against certain digit sequences if the strings are to be interpreted as numbers. Hence if some intinite strings can be interpreted as (unique) natural numbers, then all infinite strings must be interpretable in this way.

This means, in short, that all strings, finite or infinite, over a finite alphabet can be interpreted as natural numbers, and the set of infinite strings must therefore, contrary to what is asserted by Cantor's diagonal proof, be countable.

Since, then, the set of finite and infinite strings over a binary alphabet is countable, and there is a bijection from this set to the power set of the natural numbers by the encoding principle we have extablished, it follows that the power set of N must be countable as well. Indeed, if this be the case, we may well ask whether we can assume the existence of any set larger than N in terms of countability.

This suggests that there must be something wrong with the basis for claiming the uncountability of the power set, that is, with Cantor's proof by diagonalization. On the one hand, I think it can be shown that the traditional Cantorian approach can be maintained, just as Newtonian physics has not been invalidated for practical physics by the Einsteinean relativity theory, and therefore that Cantor's proof can be seen as valid for any conception of number which proceeds from the successor function and ignores the impact from the symbolic representation of number. This will, then, be a different conception of number and arithmetics as will emerge from the present account. The paradox nevertheless points to a true incompatibility between the approaches. I think it can be shown that this amounts to a distinction between the inner and the outer: When number is defined in terms of its symbolic representation, then it arises from the intramental interpretation of the extramental existence of symbols. When, however, it is defined by means of the successor function, then an intramental construct (without any independent extramental existence) is projected onto an extramental symbolic representation. The former approach, which is advocated here, yields a finitist priority to the extramental existence, in which there cannot be true infinity, while the successor function, which is a mental construct, can of course produce any theoretical infinity without having to make it representable extramentally. This seems to be the difference, and it suggests that the incompatibility we have pointed to here concerns the compatibility of the ontological status of extra- as compared to intra-mental existences. Hence if we acknowledge the validity of both approaches, we do in fact recognize a true schism between the infinity we can represent externally and the one we can recognize internally.

This suggests that we must redefine the notion of infinity in order to account for this peculiarity. If we now revert to the list with infinite indexes, we may ask where in the list an infinite subset is entered. If, say, we have the index i = 010101..., which indexes the set of even numbers, we ask: Where is it in the list? The answer is, of course, that we do not know where it is. We know a lot about it: For example, we can say a lot about where it can be, and where it cannot be (we can specify uniquely all position in the list at which it cannot be found), but we do not have enough knowledge about the index to settle upon one sepcific position in the list. This is, we will contend, what makes this index different from a finite one: For the latter, we have enough knowledge to specify its position in the list uniquely, but an infinite index differs from this in that we have incomplete knowledge about it. It is, we will propose, not different from the finite index in terms of string length per se, but it is different from it in terms of completeness of knowledge. We have in-finite knowledge about the index: Therefore we know not exactly where it is in the list.

Infinity on this account concerns, then, the intramental knowledge about the extramental string. When the cognitive mapping from the external string to the internal interpretation is complete, then the knowledge (or index, if you like) is finite, but if the mapping is faulty, then the index is infinite.
Hence an infinite string is not necessarily longer than a finite one: Rather, the question of length is irrelevant for an infinite string, since we have incomplete knowledge about its length. Hence if the difference between finite and infinite strings is not in terms of length (but in knowledge about the length), then we see that the underlying (that is, on this account, the extramental) set of infinite strings is identical to the set of (extramental) finite strings. Therefore the set of infinite strings will be countable, since it is identical to the set of finite strings, and this explains the apparent paradox.

And, having reached this conclusion, we can return to the beginning of the argument, which now will look slightly different: We stated that when we switched from ternary back to binary notation of the indexes, there were a lot of infinite strings which we could not get into the list, even if we stated that they had to be there, since there was space for them within N, and this was the reason for assuming that infinite strings could denote unique natural numbers. But since we now state that the set of infinite strings is identical to the set of finite strings, we may well ask whether this is a valid implication at all: On the present account, the infinite strings come not in addition to the finite ones, but are simply incomplete representations of them. Hence there is no need to assume that we have a host of additional strings which we cannot get into the list of binary notated indexes: In fact, the finite strings empty all possibilities. But then: What comes out of our argument? Does it not collapse under its own weight by this? Far from it. If we give up the idea of infinity as incomplete knowledge about the finite strings, then we are thrown back on the beginning, and we can once again generate the same argument validly. Hence our argument is a ladder which we, to use Wittgenstein's (TLP 6.54) metaphor, throw away after we have used it: It takes us from the intramental to the extramental priority for the definition of number, and we leave it there.

Thus, the set of finite strings is identical with the set of infinite strings: The only difference is in the degree of specificness with which we can select a member from the set, which is a difference in knowledge about the set. If the string is finite, then we know which string we are choosing from the set. If, however, the string is infinite, then, due to incomplete knowledge, we know not precisely which string we are selecting, even if we may be able to delimit the set of possibilities somewhat.

Similarly, the infinite set of finite strings is also underlyingly finite: It is infinite because we do not know its size. Therefore it is also one single set. But since we do not know its size, we cannot determine any upper limit to this size. Therefore it is infinite in the good old sense of it, and yet we must assume that this infinity arises from a cognitively faulty mapping from the underlying finite set onto our incomplete knowledge about it. For example, if the set is larger than can be enumerated in a lifetime or in the span of the human race, it may well be finite, but is of course still completely infinite in any cognitive sense of it. This is a reasonable perspective to adopt in the present context.

8. The semantics of the real numbers.

On this background, we can attempt a definition of the real numbers as based on the semantics of an infinite alphabet. That the alphabet is infinite means, with our definition of infinity, that we do not know its size. This suggests that we cannot assign any interpretation to a string (that is, to more than one symbol in succession). The string '11' can mean the natural number 'three' if the alphabet is binary, or 'eleven' if it is decimal, or, in general, it can mean any natural number, at least any natural number equal to or larger than its own length. The problem for our interpretation is that we do not know the base.

A geometric illustration: We let a digit position amount to unity on a number line. The number of symbols in the alphabet tells us how many parts we shall divide the unit into. Binary yields two halves, decimal ten parts: Infinity means that we do not know how many parts it is divided into. This is how the continuum can appear from this code: If we know not the number of parts which the line is divided into, it is clear that any point on the line can be the address of a division line between two parts. If, though, we know the base (it is finite), then there is a limited number of points which can be on the boundary between parts of the unit length: Not any point can appear in this position.

This is the sense in which rational numbers will be distinct from the real numbers, on this view. We can define rational numbers as pairs of natural number strings for which we know the base, and real numbers as such pairs in a language in which we do not know the base. This means that there will not be more reals than rationals (for example, both are equally countable, in the Cantorian sense), but, nevertheless, the real numbers, by being infinite, can describe the continuum, while the rationals cannot. They do so by virtue of their indefiniteness: Since the line can be divided into any number of parts, the real numbers cover all points on the line, in contrast to the rational numbers, which are based on finite alphabets. Whatever base we choose, there will be an indefinite number of points which fall outside the division. But as soon as we do not know the base, there will be no points which cannot happen to occur on a division line.

So, if this can count as a demarcation of the real numbers as against the naturals, then we have an interesting philosophical issue in the fact that the real numbers will be essentially semantically and not enumerably distinct from the rational numbers.

9. Oral language as derived from real numbers.

The relation suggested above between natural number and natural language, whereby the latter possesses redundancy patterns which are not in the former, will evidently be valid for the written language, which indeed can be seen as distinct from the oral language primarily in the sense that it has a finite and known number of symbols. But for the continuous oral language, a comparison with this definition of the real numbers is clearly more relevant. The continuous acoustic signal can be segmented (parsed) into any number of discrete symbols, and, in fact, it may well be a central property of this code that it simultaneously can interpreted in several distinct ways. For example, it is evident that the semantic content of the written representation of an oral utterance will be a possible interpretation of it, but it is also clear that an oral utterance is susceptible to a large number of interpretations on various other levels (for example, by prosodic information). This property of oral language, that it possesses an indefinite number of discrete segments, is what motivates our comparison with the real numbers here, and it suggests that oral language will relate to the real numbers in the same way as the written language relates to the natural numbers. The redundancy patterning of the oral language can then be seen as conveying a (potentially different, as compared to the written language) ontology of the actual world determined by the particular redundancy patterning which it possesses.

10. Generalized numbers.

Real numbers appear when we allow for infinite (in the sense of unknown as to size) symbol inventories. Now, an interesting question concerns what happens if we allow for non-equifrequent symbols, if these nevertheless have non-distinct conditional and unconditional probabilities. Since we have suggested that the languages with equiprobable symbols can be interpreted as the natural and real numbers, it is near at hand to ask whether we are now approaching the complex numbers. It is probable that we cannot arrive at the complex numbers directly by this approach, but, on the other hand, the difficulties inhering in the definition of complex numbers, such as they are conceived traditionally, suggests that an alternative definition may be wellcome. In the following, I will refer to these hypothesized numbers as generalized numbers, to avoid a too strict comparison with the traditional complex numbers.

Nevertheless, we search for a two-component analysis of these numbers, which we find in the two non-coinciding semantic components of finitude vs. infinity and of equifrequency vs. non-equifrequency. It is the latter component which these numbers will deviate from the natural and real numbers, and we therefore will attempt to recognize the imaginary part of a complex number in the deviation from equifrequency. Hence if the symbols are equifrequent, the imaginary part will be zero, and any difference between symbol frequencies will amount to imaginarity in the number.

Thus, if we attempt to approach the traditional complex numbers, we can suggest a binary alphabet to have the frequency distribution bivariate, and we can then let the relation between the two frequencies p(0) and p(1) (of the two symbols '0' and '1') determine an angle of inclination, for example θ = arctg(p(0)/p(1)) - p/4 in polar notation. (p/4 subtracted since arctg(p(0)/p(1)) = p/4 when p(0)=p(1)). The absolute value r in the pair (r,θ) would then be the corresponding number (natural or real) under the assumption of equifrequency of the symbols.

Within this definition, each language will correspond to a certain frequency distribution and thus, geometrically interpreted, to a line with a certain inclination going through origo in the complex plane. The infinity of possible languages would yield the complex plane defined for integer values. The language with equifrequent symbols will constitute what corresponds to the real line in the complex plane.

This is, though, on the presupposition that we have a binary alphabet. If, then, we expand the alphabet, we no longer have the simple doubly-parametrized ratio, but a multivariate frequency distribution, which means that we get in fact a high-dimensional space rather than the simple Cartesian plane, and we should expect to find that the properties of the generalized numbers in binary notation differ from the properties of the numbers in ternary notation. The mathematical properties of these numbers will, on this view, be dependent on the base, which indeed also follows from the importance of the base in the discussion of the countability of the power set of N above.

The numbers we are dealing with here arise from the assignment of a semantics to a set of symbols, and therefore their mathematical properties will be dependent on the symbol inventory to a degree unknown in the traditional understanding of numbers.

11. Natural language as extensions from generalized numbers.

We have loosely suggested that natural language can be seen as related to the natural numbers in its written form, and to the real numbers in its oral form. But a typical natural language, if we analyze it into a finite set of phonemes, will not consist of equifrequent symbols: On the contrary, the phonemes of the written language and the 'sound segments', however they are defined, of the oral language are highly non-equifrequent, which suggests that the world which the natural languages refer to are realizations of the set of possible worlds which are denoted in the space of generalized numbers. Indeed, this is an interesting possibility in light of the fact that complex numbers are of crucial importance in the analysis of the acoustic signals of oral language.

Hence we suggest that natural language in its oral form is indeed an extension (by conditional probabilities) from these generalized numbers, and this suggests that an ontology can be erected from the joint study of such generalized numbers and natural oral language.

12. Grammatical levels as transfinite numbers.

We have asserted that the notion of transfinity based on countability, such as it was established by Cantor, is insufficient, to the effect that countability cannot serve to distinguish the real numbers from the rationals. But this does not necessarily mean that Cantor was wrong in his assumption of transfinity generally. We will still assume that his idea of transfinity derives from a genuine intuition, which, though, received an unhappy interpretation in the form of countability criteria. We also need a way to distinguish the semantic difference between real and rational numbers.

We have suggested that the distinction between rationals and reals amounts to a difference in the semantic interpretation of the strings, a distinction which we preliminarily have outlined as a matter of alphabet finitude or knowledge of symbol inventory size. The continuum arises from the absence of knowledge of the specific number of symbols in the alphabet. This means, then, that if we redefine the words of a language to count as uncompounded symbols in a symbol inventory, and we do not impose any limit to the possible string length of the words in a language, then we can create the real numbers by extension from the natural numbers even when the latter arise from a finite number of symbols, by reinterpreting every sentence (sequence of such words, delimited in some way) in this language as a real number. Since there will be indefinitely many words (they can be indefinitely long), and we can allow for indefinitely long sentences, this yields the set of real numbers by extension from the natural numbers over a finite alphabet.

In fact, this means that both natural and real numbers will appear as different grammatical levels: The natural numbers will be composed of 'phonemes', the real numbers of 'words'. This suggests that we can redefine Cantor's transfinite numbers, wherein the continuum is aleph-one, in terms of hierarchies of units in written languages. Aleph-null will be what corresponds to the phoneme level, aleph-one to the word (or morpheme) level, aleph-two to the sentence level, aleph-three to the paragraph level, and so forth.

We can then take an infinite string of symbols over a finite alphabet (with sufficient symbols for the purpose), and define a subset of the alphabet as 'phrase' markers, such that one of the symbols marks word boundary, another sentence boundary, a third 'paragraph' boundary, and so forth. (To avoid overlapping, we can let all markers higher up in the hierarchy implicitly denote all lower boundaries as well). If we will ensure an infinity of such phrase markers, we can let (possibly infinitely long) words of such phrase markers denote these as well, to create yet another dimension of infinity. And so forth. In this way, we can, on a semantic basis, generate successively larger transfinite numbers from one and the same infinite string simply by inserting phrase markers, and there is no end to the level of infinity we can define semantically from this infinite string with a completely random distribution of the finite number of symbols.

In the text metaphor, these transfinity numbers will mark the knowledge obtained on the levels of word, sentence, paragraph, chapter, book volume, and so forth.

This is therefore the meaning of grammatical level as abstracted from any embodiment in any particular syntactic structure: Grammatical levels are semantically defined transfinite cardinal numbers.

This emerges, therefore, from the assignment of a semantics to a language (written out as an infinite string) in the absence of a structure in the syntagmatic dimension. Hence the infinity of transfinite cardinal numbers is a semantic property of the infinite string of symbols which is completely devoid of any syntactic structure.

Syntactic structure is, then, the imposition of a redundancy patterning on this language, to restrict the reference to an actual world rather than to the set of possible worlds referred to by the language without syntactic structure.

13. The double articulation of natural language.

Now, there is an apparent paradox here: We have defined the random language and the 'phrase' markers such that the language is in the form of one infinite string of symbols. We maintain that the different level of transfinity arises from the different semantics which we assign to this language, rather than from the cardinality of symbols in it or from the countability of the set of strings. However, it is clear that if we, for example, consider the language which has words as the basic unit (that is, the language of real numbers, or the transfinite number one), it is clear that each such word will also be a (long) natural number, since it is one long string of equifrequent and non-redundantly distributed symbols which have their semantics from their distribution. Then why is this not a language of natural numbers instead? It is clear that this argument will apply to all levels of transfinity, and we must therefore, in order to arrive at higher levels of infinity, deliberately look apart from the internal composition in the strings which perform the function of basic symbols. That is, we must impose a double articulation on the language before we assign a semantics to it. Hence to create the real numbers, we must deliberately look apart from the internal composition of each word when we interpret it semantically, although, of course, we must take notice of this composition in order to distinguish the one word from the other. We must state that the distribution above the word level is relevant for the semantics, but the composition below has relevance for 'perception' only. Only then will we obtain a language of real and not natural numbers.

This is precisely what applies to linguistic theory when it asserts that the linguistic symbol is arbitrary, and that submorphemic structure is relevant for perception and segmentation, but irrelevant for signification. The double articulation in natural language will then serve to generate a higher transfinite level than can be achieved on the phonemic level. However, if we maintain that written language is an extension from natural numbers, while oral language extends from real numbers, then we may also suggest that the pervasive notion of the double articulation in natural language stems from its written representation, and that oral language exhibits a submorphemic signification which is not immediately traceable in the written form. This is an important conclusion from a linguistic point of view.

The double articulation of natural language can be interpreted mathematically by saying that the infinite set of all infinite strings over a finite alphabet has cardinal number zero, while, if we let all strings in a language count as an uncompounded and discrete symbol, we find that the infinite set of languages possessing infinitely long strings over finite alphabets, even if these languages can all be written out as infinite strings and therefore should have the same cardinality as the set of infinite strings, nevertheless will have the cardinality of the transfinite number one. This derives from semantic considerations, and has nothing to do with countability in the ordinary sense.

We can now interpret Cantor's century-old intuition of a necessary transfinity in the following way: It derives from the sense of a qualitative difference in the epistemic properties of the phonological and the morphological levels of language, that is, it is his interpretation of the double articulation in language. It is interesting to observe that this happened precisely at the time when Ferdinand de Saussure was working out the principles of arbitrarity in natural language, such as it is found in modern linguistics. In this sense, his intuition was profoundly sound, but we suggest that the way he spelled it out in terms of countability was unhappy in the sense of leading into paradoxes.

On the present view, this concept of transfinity is equivalent to the insight that the semantics of the word level cannot be arrived at through the composition of the semantics of the phoneme (or submorphemic) level. The present account suggests in fact also that the semantics of the sentence level cannot be appropriately obtained through the composition of the part-meanings of the word level, but will contain a semantics which cannot be traced to any lower level. It is this intuition which we suggest underlies Cantor's concept of transfinity, when it is asserted that the continuum cannot by any means be exhaustively described by a set which is enumerable by the natural numbers, an intuition which derives genuinely from the hierarchical structure of natural language with essentially different knowledges represented on each level.

14. Natural language as an arithmetics.

I will not attempt to define an arithmetics on basis of the present definition of number in this context, but it is clear that such an arithmetics must be based on the distributional properties of the symbols and the units of the various levels of transfinity. The language of arithmetics, such as it is traditionally conceived of, exhibits strings which are constrained in the sense that not all strings occur, and there will be conditional probabilities which deviate from the unconditional. And yet, we propose that it is a characteristic of the language of arithmetics that it maintains the overall equiprobability of the symbols. This means that we can set up the following taxonomy:

1) Numbers
2) Arithmetics
3) Generalized numbers        
4) Natural language


Free distribution

Hence, as far as the central properties of equiprobability and distributional constraints are concerned, natural language relates to generalized numbers such as arithmetics relates to traditional numbers (in terms of probability distribution and redundancy). This means that natural language is an arithmetics over generalized numbers and contains propositions over these numbers. These particular propositions, which amount to words and sentences of natural language, will thus be extensions from the set of possible worlds onto the aspects of the actual world of reference which we assume that these generalized meanings apply to.

In sum, a definition of number and an arithmetics based on the distributional properties of the symbols in the digit inventory may provide us with access to a unified theory which comprises the theory of linguistic signification and mathematical computation in one joint account.

© John Bjarne Grover
On the web 26 may 2021
Last updated 28 may 2021