The 1997 PhD dissertation part 5
John Bjarne Grover
Introductory remark
The following is part 5 of my 1997 PhD dissertation 'A waist of time' - see this file for the contents - this contents is for the edition contained in volume 3, appr the first half of the volume (p.583-612). This part 5 is a summary of the three main parts of the dissertation (the three first books - each of which could have been published as independent books) plus a brief and sketchy outline of what an acoustic analysis could have been if the poetic logic had been elaborated. The following (part 5) contains an analysis of an acoustic source (from John Betjeman reading a poem in the MARSEC corpus) but it must at best be considered suggestive. The theoretic value of the following part 5 is that it suggests a theoretic basis for the idea of grammatical structure based on measurement records where featural valuations are considered the same - that is in accordance with my fundamental theorem of linguistics, and the discussion suggests that these measurement points expectedly will approximate a symmetric distribution in phrasal segments. However, not only would a serious study of acoustic data have to be at least as big as the rest of the dissertation and be based on principled discussions, secondly the dissertation led - on basis of a principled rationality contained in available theoretic constructs and empirical data at that time - to the conclusion that it is probable that a grammatical structure or poetic grammar exists in acoustic data (in natural language speech) and therefore it is permitted to start the big work - but how it would eventually come to look would in 1997 be at best guesswork only - and the data file I present is of course illustrative only - and my study was in addition and therefore based on all too unsophisticated tools and material. It is only by now, after having written TEQ ('The Endmorgan Quartet') and the blue, red, yellow and white metres, that I could have started approaching the question of how such an acoustic-poetic grammar could come to look. I would say it is still early for that but it is perhaps possible to start thinking of it.
The dissertastion was (perhaps not entirely unexpected?) rejected for the PhD degree in 1998 - see this file for mention of the context - and it is not impossible that this rejection was scientifically unserious and a matter of political abuse that could have had an aftermath into current covid times. Is it WHO that should come out with their hands above their heads - or maybe at least with the story as it was? I quote the below lines: "Proceeding from the assumption of a sentential sign residing in the collective consciousness with a rooting in the primordial signification in early childhood..." it is clear that this can explain Hitler's ideas of a 'dritte Reich' in terms of a theoretically interesting and culturally appropriate field of study converted into a bodily representation as a pretext for exploitative abuse - however, if 'dritte Reich' is an intended parody on 'PTRSIM PIK', that does not mean that the theoretically interesting and culturally appropriate field of study is the same as PTRSIM PIK.
A waist of time
This is a continuation of the first four books in this work. The background can be summed up as follows.
Book 1 interprets early child language in terms of a triadic sign with one social, one observational, and one phonological component. The single-word stage is submorphemic, presumably with a featural rooting of the signification. At the transition from the single-word stage to the onset of syntax, a transitory period of attachment to caretaker is characteristic. In this period, the child relegates control of signification to the joint attention, which I here interpret as an instantiation of the collective consciousness. It is through the guiding of this collective consciousness that the new level of knowledge is attained, which suggests that the significational consistency on the new level allows for erasure of the internal consistency of the submorphemic level, which comes to be replaced by the new knowledge. It is because this development into the arbitrary morphemic signification seems to require a transitional attachment that we can hypothesize that the two levels are significationally incommensurable. This does not imply that submorphemic signification vanishes from adult language, but it implies that it loses internal consistency. Its cultural manifestation is shown in Grover (1995) in the presence of universals in submorphemic signification in personal pronouns and in the cross-linguistic patterns shown by Greenberg (1987).
Book 2 is a historical study which concludes that there is a parallel development of the formal description of the levels of language, invention of new information technology and the nature of logical paradoxes. In addition, it is shown that Christianity (in extension from the Old Testament origin in the invention of the alphabetic script) represents the morphemic signification, while the preceding cuneiform episteme represents a submorphemic signification on some level or other. It is also suggested that cuneiform predominantly encodes cognitive categories which need not have a consistent phonetic representation. An account which rests on a fundamentally mirror-symmetric distribution is also suggested to have considerable explanatory potential, and it is this which will be pursued here. The present study does, therefore, support the analysis in book 2 as well. Another central idea discussed in that book is the presumption that an arbitrary sentential sign will develop in the continuation of the computer as a revolutionary advance in information technology, with a shift of computational domain from the symbolic, which it shares with the alphabetic script, to the social domain. It follows, in the continuation from book 1, that there will be a corresponding period of social attachment to the shared consciousness (a stage of puppets? or apes?) which eventually will output an arbitrary sentential level which is significationally consistent, and which leaves the sentential level, such as we know it today, internally inconsistent, even if word syntax is retained developmentally and therefore must be interpretable in language. This arbitrary sentential sign will refer to complexes distributed in space and time. Book 2 suggests that it is likely that this also will run in parallel with a religious revolution, and that the alphabetic script eventually goes out of use when the essential contents of it is relegated to this new religious representation.
Book 3 investigates what probably are instantiations of the new arbitrary sign in empirical data covering certain correlations with the narrative structure in Rilke's fourth Duino elegy. It is found that units of signification in these correlations varies from a few words up to segments of more than 60 words, normally covering one or a few sentences. The dominant segment sizes are found in the present data as well. The important result in book 3 is that it provides empirical support for the assumption of a sentence level sign. In addition, it present data for the relevance of parallel-reading of related narratives - a way of reading which also recurs as relevant in the present data as a regulative principle for parameter values.
Book 4 presents further empirical data which investigates the interface between the individual signifying mind and the collective consciousness - that interface where the arbitrary sentential sign is supposed to find its reference. The study supports empirically the conclusions in book 2 for the sign wherein the diachronic and synchronic dimensions receive a unified account, and this, again, supports the empirical investigation in book 3. An argument is presented and analyzed which shows that it reaches internal logical consistency only on basis of the postulation of a signification on a universal featural level which allows for communication between the subjective mind and the collective consciousness.
The present study attempts to show how the arbitrary sentential sign can be detected in signal files. It rests essentially on the conception of the split triadic sign from book 1, such as this is interpreted in book 2 with the split in the phonological component, and on the role of the mirror structure.
The sentential level signs work in the collective consciousness. In the narrative analyses I have carried out in book 3, it seems convenient to conceive of these distributed signs as shared through their potential for swapping among various parts of the social space - which comes out as another way of stating that they subsist as shared knowledge in terms of a mirror-symmetric narrative structure.
This is supported also by the submorphemic level of knowledge in early childhood, in the period preceding the word spurt and the development of morphemic syntax (around the age of 18 months), which is a strong developmental parallel to the present stage relative to the arbitrary sentential sign. This is the period which Lacan (1949) calls the mirror phase. The distributed objects which are the referents for the sentential signs can conveniently be compared with the primordial objects of knowledge in this mirror stage, even if a potentially infinite set of arbitrary sentential signs cannot be identified with these developmentally rooted complex signs constrained by culture (associative complexes, such as conceived of by Vygotsky 1962).
This suggests that we should assign importance to mirror symmetry as a metric for boundary valuation when we search for the representation of arbitrarily signifying segments. This can be motivated also by the following argument:
Proceeding from the assumption of a sentential sign residing in the collective consciousness with a rooting in the primordial signification in early childhood, we can conveniently conceive of this as represented in the triadic sign discussed in books 1, 2, 3. It turned out to be convenient to work with the concept of mirror symmetry in these narratives. That also motivates the concept for the lower levels of instantiation of the same triadic sign.
The submorphemic stage falls apart as the child starts developing a word syntax around the age of 18 months. This coincides also with the word spurt and with the significational differentiation of the child's self from its caretakers after a period of transitional attachment. We can conceive of this differentiation as a distancing of the frontal social cognition from the cognition in the posterior regions including the temporal lobes, such that the primordial unity falls apart into the child's SELF on the hand and the OTHERS on the other, along with a differentiation of the ACOUSTIC processing in the temporal region from the ARTICULATORY encoding of speech in the frontal regions. In this development, we must think of an initial image in the frontal region (the OTHERS) gradually differentiating away from its mirror image in the posterior regions (the child's SELF). Concomitantly with this, a social structure arises, along with its linguistic counterpart in syntactic structure.
In this development, it is the very moment of split from the primordial unity onto the differentiated self which is of interest. If we are going to search for a defining clue to this earliest cleft in the phonological component in the triadic sign, that should emerge in the form of a mirror structure in the speech signals.
This, finally, supports the contention that the boundary to the collective consciousness, such as this is represented linguistically in sentential signs, must be traced in the mirror structure of speech.
This particular development is recognized as parallel to the lateralization which follows as a function of the linguistic development of the child. It is the front/back differentiation of the processing of phonological material and its interpretation that the wellknown 'FIS phonemenon' in childhood is about:
CHILD: ADULT: CHILD: |
I don't like fis. Don't you like fis? No, not fis. FIS!!! |
The child has two different codes: The one belongs to the acoustic encoding of the adult's signal, to be located in Wernicke's area, the other belongs to the articulatory encoding of the child's own utterance - an encoding which the child cannot recognize in other persons' utterances. The word FISH is the adult's code, perceived and encoded by the child in Wernicke's area. The word FIS is the child's code, articulatorily encoded in the frontal area. The FISH paradox (cp. the story of Noah and the question whether he should take fish with him in the ark) says that the adult word FISH for the child in the process of attaining independence is the same word as the child's own word FIS. These two are MIRROR IMAGES, but with a MIRROR AMBIGUITY which the social PDP device is about.
It is likely that this is the most important aspect of the socalled FIS PHENOMENON in child language. It pertains to the child's growing independence from the shared consciousness wherein parallel distributed processing is the processing mode. The child strives towards gaining independence from this and develop its own consistent grammar, more limited in computational capacity, but with the encapsulation which allows for rule processing (production systems) in isolation from the PDP of the shared consciousness. This development takes place through the front/back differentiation which sets the semantics to the phonology of the articulatory code off from the semantics to the phonology of the acoustic code.
We should, consequently, be able to trace the necessary information for modelling the grammar in the differentiation between the acoustic and the articulatory encoding of phonology. In short, if we study the universal definition of DISTINCTIVE FEATURES based on ACOUSTIC signals, canonically in Jakobson/Halle (1956), in comparison with the universal DISTINCTIVE FEATURES based on ARTICULATORY properties, canonically in Chomsky/Halle (1968), we should be able to find the key to the grammar in the essential difference between these two codes. The cue to the interpretation will be that this difference represents the differentiation of finite STRINGS (in the acoustic interpretation) from infinite LANGUAGES in the socially oriented cognition of the frontal regions. That is, when it comes to considering the string representation only, it represents the difference between strings constituting symbols (representing infinite theories), that is, the demarcation of boundaries between such symbols. Since this differentiation is supposed to take on a symmetric form in the transitional period of new symbol formation on a higher level, we should also expect to find that the series of symbols outputted from the analysis should be symmetric. In extension from Grover (1997e), this should obtain for the stage of arbitrary morpheme formation as well as for the arbitrary sentential sign, when signals are analyzed in terms of this interface.
It is out of reach for the present study to work with articulations empirically, so I am left with the acoustic speech signals only. The task for a practical assessment of the hypothesis will be to define the essential differences between the two phonological codes in such a manner that these be traceable in the signal. The optimal path will be to search not for the acoustic properties which carry the fullfledged differences in themselves, but rather those properties of the signal which represent the very first traces of a differentiation between the acoustic and the articulatory code. To the extent that we find traces of these very first splits, we will have an inherent access to the articulatory domain in its offset from the acoustic domain, even if the full articulatory code itself is out of reach for comparison with the acoustic signal in itself.
There seems to be reasons to believe that the Fourier transform of the signal is relevant for the perception of speech. At least, this is the transform phonetics has been concerned with, and it also seems as if the neural cortical map is organized in striations which recognize in frequency spectras (Cook 1986). The neural map is obviously more complex than what a simple FFT transform can provide us with, but the transform is still probably relevant for the cortical representation of speech sounds. The fact that the VOWEL SQUARE for a long time was thought of as being a good representation of the HIGHEST POINT OF THE TONGUE during the articulatory formation of the vowels, while it later was discovered that this was not the case, but, rather, these 'highest peaks' consisted in the relationship between the first and the second formants in the FREQUENCY SPECTRUM only (Ladefoged 1975), strongly supports the assumption that the Fourier transform is important for the relationship between the acoustic and the articulatory encoding of phonology. In addition, this interesting fact also points to the FIS PHENOMENON in the sense of concerning the subjective articulatory interpretation of acoustic material - not only for the child, but for the adult phonetician as well.
This observation may be to some help in the analysis of the essential differences between the Jakobson/Halle (1956) and the Chomsky/Halle (1968) definitions of phonological features. The first is acoustically oriented, the latter in important respects articulatorily oriented. What seem to be the most obvious and interesting differences are found in the surrounding of the COMPACT/DIFFUSE as well as GRAVE/ACUTE (Jakobson/Halle) in comparison with the corresponding oppositions in Chomsky/Halle.
COMPACT/DIFFUSE: Compacts exhibit "concentration of energy in a relatively narrow, central region of the auditory spectrum" (Jakobson/Halle 1956:41). Diffuses are concentrated or spread out elsewhere, only not concentrated in the middle of the spectrum. The articulatory correlates are the BACK (post-alveolar) consonants and the LOW vowels. 'Diffuse' articulations are consonants which are either DENTAL (including alveolars) or LABIAL as well as HIGH VOWELS. According to Hyman (1975:45f.), the DIFFUSE, when compared with the HIGH feature, varies across vowels and consonants. Also, Chomsky/Halle's ANTERIORS (from labials back to dentals) are [+diff] consonants, but are unspecified for vowels in terms of diffuseness. In terms of the feature BACK, both dentals and post-alveolars are [-back]. This leaves us with an area of conflicting classes in the post-alveolar region as compared to the dental or alveolar region in terms of DIFFUSENESS of consonants. There are not so many such consonants in English: The SH as in 'fish', the CH as in 'church', and the J as in 'John'. This is what is contained in the FIS PHENOMENON, or the FISH PARADOX: The post-alveolar fricative in FISH is ambiguous in the first beginning of the differentiation of the articulatory as against the acoustic phonological feature inventory. When the child has two encodings of FIS, this is because one of them resides in the acoustic, the other in the articulatory interpretation of the sound.
Consequently, we may conjecture that the DIFFUSENESS is a highly important acoustic cue to the very beginning of the differentiation. Such diffuseness can be defined in its most simple fashion as being a matter of spreading of energy over the spectrum, in such a manner that the narrow middle region of the spectrum is not prominent as far as energy concentration is concerned. Generally, this should indicate that there is a certain dichotomy of the two halves of the spectrum, vs. the opposite of this dichotomy.
According to Hyman (1975:45f.), the other feature which is prominent in the essential difference between the systems is the GRAVE/ACUTE feature. GRAVE articulations are labial and velar consonants together with back vowels. GRAVE sounds exhibit a "predominance of the low (vs. high) part of the spectrum" (Jakobson/Halle 1956:43).
Consequently, in both the DIFFUSE/COMPACT and the GRAVE/ACUTE feature opposition, the essential difference between the two systems is tied up to the MIDDLE POINT of the spectrum. That is, the VERTICAL MIDDLE POINT, when we think of the spectrum along a vertical dimension over the horizontal speech flow.
The following chart is based on a comparison in Hyman (1975:46):
JAKOBSON | ET AL. | CHOMSKY | AND HALLE |
[+diff] | [-diff] | [-high] | [+high] |
labial | palatals | labials | palatals |
dentals | velars | dentals | velars |
high V's | nonhigh V's | nonhigh V's | high V's |
The classes differ in terms of VOCALITY as to the HIGH feature. The same ambiguity as to VOCALITY is found in my discussion of NASALITY in Grover (1995). Also, some GRAVE sounds are classified differently for consonants as compared to vowels.
In short, when we define the DIFFUSE and the GRAVE features to be the most important for the spectral recognition of the very beginning differentiation of the featural systems, we find that there is an obvious correlate to this FREQUENCY DIMENSION in the TIME DIMENSION which characterize the CONSONANTAL and the VOCALIC FEATURES.
Jakobson/Halle (1956:40) define the VOCALIC (vs. NON-VOCALIC) feature as follows: "Presence (vs. absence) of a sharply defined formant structure". This means, in a simple sense of it, that VOCALIC sounds exhibit PERIODICITY. That is, the measured energy over a certain TIME INTERVAL is constant in the sense that two comparable chunks of the time interval possess roughly the same amount of overall energy. (We can take this to be a rough approximation to the definition since the formants carry, by far, the larger part of the energy in the spectrum). Now this means that we here have a very obvious counterpart in the TIME DIMENSION (the horizontal dimension) to the essential role of the MIDDLE POINT in the FREQUENCY DIMENSION (the vertical dimension): If we take an arbitrary finite time interval and compare the first half of it with the second half of it, then we would call it a VOCALIC sound in this interval if the difference between the two halves are less than a certain threshold value. Obviously, this is the most general definition we can find of vocality, and there are many ways to construct non-vocalic sounds spread over a longer time interval in such a manner that the two halves in the time dimension are roughly equal as to overall amount of energy. This is, though, unimportant in comparison with the HIGH GENERALITY which is obtained: In such cases, we would simply call the interval VOCALIC in this abstract sense of it, without paying any attention to what is the actual contents of the two halves of the time interval. Rather, we are concerned with the hypothesized state of the small child who is about to set its own significational performance off from the competence shared by the linguistic community. VOCALITY defined in this manner is an abstract property which is assigned to a part of the signal without any considerations as to the later systematic function of VOCALITY in the linguistic system which is addressed with a more elaborate differentiation between the two featural systems.
It is this generality of definition which should be adopted also for the feature GRAVE in the frequency spectrum: Since GRAVE is defined to be a case of predominance of energy in the lower half of the spectrum, we may compare the two halves of the spectrum and conclude that if there is a difference which exceeds a certain threshold value, then the feature GRAVE is significant for the analysis. Conversely, if the difference in overall energy between the two halves does not exceed a certain threshold value, then we have a case wherein the GRAVE/ACUTE opposition is not developed. That is, these marginal cases will be instances wherein the feature which carries the role of signifying the systematic differences between acoustic and articulatory encoding is not 'developed' - that is, the difference between the halves is as yet insignificant. This is the counterpart in the frequency domain to the vocalic feature in the time domain. Assuming that articulation cannot really come about without consonantality, VOCALITY carries inherently in it the state of undeveloped signification - a state of fused spaces and absence of differentiation. The frequency counterpart to this is the state wherein the difference between a GRAVE and an ACUTE cannot be determined on. This is the case where the difference between the overall amount of energy in the two halves is smaller than a certain threshold value.
The MIDDLE POINT is no less important for the DIFFUSE feature: Its opposite in the COMPACT valuation is the case where there is concentration of energy in the middle of the spectrum, in a region sufficiently narrow to allow us to identify it as THE MIDDLE POINT. This is the point we also use for dividing the frequency interval for assessing the GRAVE feature. Since COMPACTS exhibit a concentration of energy in this point, and DIFFUSES do not, we can carry the same way of reasoning over to this pair of feature valuations and conclude that the interesting observation to be made is in the relationship between the amount of energy in the middle point of the spectrum as compared to the average amount of energy in the entire spectrum. In a similar manner, we conclude that if the difference between these two energy measures does not exceed a certain threshold value, then the valuation of the COMPACT vs. DIFFUSE is as yet indeterminate, that is, it is yet about to develop. It is the differences which come out below this threshold value which will be telling for that initial state wherein the individual mind is as yet not encapsulated from the shared mind of the community, wherein the linguistic competence is encoded. It is consequently conjectured that these threshold cases provide for recognition of the significant boundaries between the FINITE (of the individual) and the INFINITE (of the community) which surface in the boundaries between the discrete symbols.
It is this measurement technique which can be carried over to the time domain, such that we can obtain a similar counterpart to the DIFFUSE in this domain as we obtained to the GRAVE in the VOCALIC valuation. This should, expectedly, be the CONSONANTAL feature. Accordingly, we should be able to find a representation of the primordial differentiation which subsequently comes to define the (abstract) CONSONANTAL feature by measuring the energy in the MIDDLE POINT of a certain time interval and compare this with the AVERAGE ENERGY in the interval: If the difference is below a certain threshold value, then we should have found a measurement point which is significant for the developmental relationship between the acoustic and the articulatory way of conceiving the CONSONANTAL feature.
How do Jakobson/Halle define the consonantal feature? As follows: "Presence (vs. absence) of a characteristic lowering in frequency of the first formant, a lowering which results in a reduction of the overall intensity of the sound and/or of only certain frequency regions" (1956:40). Jakobson et al. (1952) define it as follows: "Phonemes possessing the consonantal feature are acoustically characterised by the presence of zeroes that affect the entire spectrum".
It is difficult to decide on whether this is something which is detected by the algorithm suggested here. It may be. I don't know. I may perhaps share this ignorance with Hume & Odden (1996:370f), when they conclude after having investigated the character of the CONSONANTAL feature in various phonological systems:
"With respect to phonemic contrasts, we have shown that reference to [consonantal] is unnecessary since contrasts which seem to exist can be characterised by other features, while those that we might expect to find are not attested. The role of [consonantal] in characterising natural classes of sounds is also shown to be superfluous. Moreover, our study has revealed no compelling evidence for the natural classes which [consonantal] is intended to capture. [...] We conclude that the feature [consonantal] is superfluous and can therefore be eliminated from feature theory".
This means, in short, that the distinguishing force of this most profound phonological feature is blurred in adult language. Still, it intuitively has relevance, even if the function of it need not be to establish non-intersecting classes of phonemic elements. Rather, if its function is something else, such as partaking in the constitution of the mirror-symmetric pattern which are investigated here, then we can understand why the acoustic definition fails to distinguish phonemic classes, and also why it may be open to revision.
Anyhow, the point is not to approximate such a definition too strictly, but rather to capture a HIGHLY GENERAL counterpart in the time domain to the obviously relevant procedure in the frequency domain. To compare the earliest consonantal articulations with the later refined classes may anyhow be futile.
Consequently, I arrive at a very general algorithm for detecting what may be seen as pivotal occurrences of boundaries in the speech signal, resting on binary oppositions in TIME and FREQUENCY domains:
1) MID POINT vs. AVERAGE OF TOTAL
2) FIRST HALF vs. SECOND HALF
Again, we are left with the typical four classes which represent the four feature valuations:
MID | HALF | |
1. | – | – |
2. | – | + |
3. | + | – |
4. | + | + |
There is an obvious similarity with the classification of Zeno's paradoxes discussed in Grover (1997e). I find that this adds to the generality of the procedure.
The key is to be found in the cases where the differences between the binary oppositions are levelled out and approach zero. The conjecture is that the pattern of such PIVOTAL observations should exhibit MIRROR STRUCTURES in the (conjecturally arbitrarily signifying) segments in the acoustic signals of utterances. This is indeed testable. If we find that these feature observations conspire to a mirror pattern, and we can conclude that we have found a symbol demarcation of some kind, then these symbols can be compared with the convenient segments of some narrative.
In the following, I report on such a test, but without making any comparisons with symbols defined differently.
There is one prominent grammatical model from recent years (Optimality theory) which seems to lend itself particularly well to an interpretation in the present framework. Optimality theory makes use of the following basic concepts:
INPUT | - the input string to the grammar |
OUTPUT | - the outputted string from the grammar |
GEN | - the generator creating alternative outputs |
EVAL | - the evaluator selecting the optimal output |
CON | - the universal set of (innate) constraints |
VIOLABILITY | - the universal constraints in CON are violable in the individual language |
RANKING | - each language ranks the universal constraints, and violations are assessed for ranking |
OPTIMALITY | - is a case of selecting the least violated output |
FAITHFULNESS CONSTRAINTS | - the requirement that OUTPUT should be maximally similar (or faithful) to INPUT. |
The flexibility of this model lies in the role of the EVAL component, which, as is my interpretation of optimality theory, opens for incorporation of a concept of AMBIGUITY in the processing, thereby making space for the PDP approach. To clarify what I mean by this: PDP and PRODUCTION systems are equivalent if the former is controlled by processors in such a manner that the order of the processes can be unambiguously determined. That is, if they can be enumerated. If they cannot, such as in the case of human processors working in parallel in a social space, then we have a potentially uncountable domain, according to the enumerability criteria. If, therefore, there are more than one processor depending on interaction with humans, and the processors work distributed in parallel, then we can propose a level of control which ultimately must be relegated to the collective consciousness. It is this which opens for a PDP system, with the processing essentially relegated to the collective consciousness, which transcends the Turing-computability boundary which all enumerable processes will meet. The non-denumerability of the social PDP system derives from the impossibility of ordering the human choices (in interaction with the computers) when these ultimately are dependent on the collective consciousness. It is this and only this moment of indeterminacy in social PDP processing which can allow for a processing beyond the Turing boundary, and it is this which the EVAL component in the optimality framework is suitable for handling. It is an essential property of optimality theory that it is not rule-based (Archangeli & Langendoen 1997). This must of course be taken to mean that the productions are not entirely predictable from the cognitive endowment, and not from the input itself. If they were, then the system would be equivalent to a rule-based production system. Consequently, if the talk about optimality theory as non-rule-based has substance, then we must assume that the EVAL component implies a moment of indeterminacy, which makes of the system a PDP system. This must be the contents of the assertion that optimality theory does not entail rule-governedness.
The present model lends itself eminently to an interpretation in terms of optimality theory in this sense of it. It goes as follows:
1) The INPUT is a CONTINUOUS (non-discrete) acoustic signal.
2) The OUTPUT is a CONTINUOUS (non-discrete) series of articulatory gestures (which also produces a signal).
3) The 'OPTIMALITY' GRAMMAR is a DISCRETE component which works as follows:
4) GEN makes use of the set of constraints in CON to generate DISCRETE IMITATIONS to the non-discrete signal, such as the latter would be parsed by means of the parameters determined by CON. Each approximation entails a certain selection of parameter values which all bring various CONSTRAINTS into play. In the present case, GEN generates various discrete mirror-symmetric symbol forms.
5) EVAL selects the optimal approximation by evaluating the amount and nature of the violations of the constraints required to generate the OUTPUT as faithfully similar to the INPUT as possible. These violations are finite and measurable because the grammar is discrete. In the present case, the input can be analyzed into discrete mirror-symmetric strings by deleting and inserting certain points. By this, the strings generated by GEN can be approximated. The distance between the parsed input and the generated strings, that is, the amount of deletion and insertion required, is measurable in terms of violation of the universal constraints (CON) which are at work in the interpretation of the input. The EVAL component therefore selects the violations of the CON constraints, and it is the EVAL component which takes care of the important INDETERMINACY parameter.
6) This means that optimality theory in this case works in the discrete interface between the continuous acoustic INPUT and the continuous articulatory OUTPUT, and in this interface, ambiguous derivations allow for INDETERMINACY and MIRROR AMBIGUITY.
SYMMETRY is the simple fundamental assumption that the addressee understands the communication in the same way as the speaker: The OTHER SIDE must be the same as THIS SIDE. It is this fundamental communicational presupposition which is invoked when the parser of the input signal IMPOSES SYMMETRY onto it, and selects the parsing or generated string which is closest (in the sense of least violated in terms of universals) to a symmetric interpretation of the signal. In this way, the small child learns to establish a MIRROR IMAGE of itself on the OTHER SIDE of the symbol, with the presupposition that this mirror image is virtually identical to itself: It is this mirror image which gradually emancipates itself in the course of the second year, and attains distance from the small child by being subjected to syntactic tranformations (movements). Through the DISCRETE ARCHITECTURE which is at the heart of the grammar, a TURING-computable overview of the OTHER PERSON's knowledge can be maintained, even after the child has liberated itself from the other person and acquired its own individuality. The wellknown FIS phenomenon is about this liberation of the other person.
This model is eminently applied to the analysis which is suggested here. I will here go through some selected parts of an analysis of a signal file: JOHN BETJEMAN reading his own poem "The sunlit weeks between...", the poem which was available on signal file from the MARSEC corpus of spoken English. Before I enter into this, some more general remarks.
My procedure is the following:
I read samples from the signal file with regular intervals (the intervals are the resolution). Each sample is an integer power of 2, to suit the Fast Fourier Transform (FFT) requirements. In the present case, I have used 2048 sample points in each reading from the file.
I make the following four 'feature' analyses:
1) CONS (= 'consonantal') is recorded if the value of the middle sample point (in the present case, sample no.1024) is close enough to the average value in the set. (That is, if the difference is below a certain threshold value). For various practical reasons, I used the two middle points balanced against the doubled average.
2) VOC is recorded if the two sums of energy in the two halves are sufficiently similar - that is, if the difference between the two sums is below a certain threshold value.
3) DIFF is recorded if the energy in the average middle point of the FFT array is sufficiently close to the average energy in the total spectrum. To find the average middle point, one must first make one run through the signal file only to find this middle point. The difference between the energy in the average middle point and the overall average energy in the spectrum must be below a certain threshold value to make a record of it.
4) GRAVE is recorded if the energy in the lower part of the spectrum (below the average middle point) is sufficiently close to the energy in the upper part of the spectrum, - with the difference below a certain threshold value.
The rationale for recording only differences below a small threshold value, that is, distinctions which approach zero, is the following:
All the four features I make use of in this exemplifying test have absolute zero values (that is, the differences are zero) only in the case of SILENCE and in some idealized averaged RANDOM NOISE. The outputted values rise above zero (to positive values) when the signal carries information. This means that the patterns of points are representing the boundaries to information transmission. When this also is taken to represent the boundaries to the OTHER'S SELF (compared to the child's self), it means that we may conceive of theoretical information as that which is transmitted between selves.
It is this boundary which also is taken to represent the boundary between the frontal and the occipital parts of the neural representation of the sign, with its social and observational components, and it is in the creation of this boundary that the child acquires its linguistic capacity. Consequently, it is also here that we must search for the mirror symmetry which lets the child face its OTHER as a mirror image of itself - exactly on the boundary to liberating the OTHER in an essentially different knowledge-space. In the space between SILENCE and RANDOM NOISE, there is the space of linguistic redundancy which any linguistic system must be situated in. In the midst of this space, we need not find any notable symmetry: Rather, transformations on all the different levels yield a surface which is not immediately symmetric, and where a symmetry can be regained only by a meticulous reconstruction through reversal of transformations. This is the space of the wellknown grammars of the various levels.
Consequently, in the midst of the grammatical space, there is no surface symmetry, and outside it, in the SILENCE and in the RANDOM NOISE, there are no traces of symmetry (or, if you like, it is perfectly symmetrical without boundaries). It is only on the very boundary to the grammatical space that we will find the characteristic signal symmetry of the linguistic symbols. It is this boundary which I hypothesize that the small child addresses in its beginning lateralization.
Consequently, to trace the mirror-symmetric symbol strings in the signal, we must search only along the boundary to silence and noise. That will be the boundary to the faintest articulations and acoustic features.
Or, as Rilke puts it in his fourth Duino elegy:
Da wird für eines Augenblickes Zeichnung
ein Grund von Gegenteil bereitet, mühsam,
dass wir sie sehen; denn man ist sehr deutlich
mit uns. Wir kennen den Kontur
des Fühlens nicht: nur, was ihn formt von aussen.
Wer sass nicht bang vor seines Herzen Vorhang?
This 'Herzens Vorhang' here represents the boundary to a symbol which enters into a string representation of speech.
This is the rationale for recording only values close to zero.
I output the recordings according to a scheme as in the following example:
CONS | VOC | DIFF | GRAVE | TIME | WORD |
x | t1 | ||||
x | t2 | w1 | |||
x | t3 | ||||
x | t4 | ||||
x | x | t5 | w2 | ||
The x's represent records in some hypothetical file, and the t's represent time (that is, position in the file) of the record, successively from the beginning to the end of the file. The w's are WORDS occurring in the file here and there, positioned at onset of word. In the results discussed here, there are normally much more records in the course of one word, but this is typically a variable parameter.
An important CONSTRAINT is consequently the THRESHOLD values. Variation of these values leads, of course, to variation in the outputted pattern.
It is here that we can see the close correspondence with optimality theory. The essential guiding principle is the search for mirror-symmetric patterns. In the present example, there is one such symmetric pattern which can be found in the interval from t2 to t4: VOC-GRAVE-VOC is a mirror-symmetric pattern. However, we may expand the size of this symbol in two ways: Either we can delete the DIFF record in t5, which produces the string CONS-VOC-GRAVE-VOC-CONS, or we can insert a DIFF in t1, to yield the CONS/DIFF-VOC-GRAVE-VOC-CON/DIFF series.
The question is now which solution is the best. Seen from the point of view of FAITHFULNESS CONSTRAINTS, the optimal solution may be to restrict the symbol size to the three records in t2-t4 only. However, this leaves us with single records fluttering in the air, and it is obvious (or, let us say, it seems intuitively likely) that the larger the length of the mirror-symmetric string, the stronger will also the explanatory potential of the grammar which the child attempts to construct be. So, from the point of view of overall explanatory potential of the grammar, or from the difficulty in incorporating the 'loose ends' in t1 and t5 into the surrounding elements, it may be desirable to delete in t5 or insert in t1 a DIFF value. It is the EVAL component which must take care of this evaluation.
Lowering or heightening of the threshold values may provide solutions without violations: A lowering may cause the DIFF in t5 to disappear altogether by itself, and a heightening may cause a DIFF in t1 to occur, without violation. However, these changes in the threshold values have repercussions for the overall patterns in the particular language which is specified by its RANKING of the constraints. (I here speak about ranking in terms of distance from zero of the records, which is not exactly the optimality theory sense of it, but can illustrate the point). We may therefore think of the following general architecture: In the individual case, symmetry is obtained by optimal solutions of violations, and in the collective case, these individual solutions conspire to create a certain ranking pattern which characterizes the language, and which takes the place of individual violations.
In addition to these threshold value variations, there are a host of other possible parameters which can enter into such evaluations. For the present example, we can think of 1) number of samples in the FFT, 2) size of the middle point for CONS and DIFF, 3) filtering techniques for the FFT, 4) paradigmatic resolution (8-, 12-, 16-, 32-bit resolution), 5) syntagmatic resolution.
In the present example, I include two parallel readings which differ only in the last parameter. The one has a 5-point resolution (there are 5 sample points between each reading), the second has a 20-point resolution. As will be seen, the differences between them are considerable indeed.
To avoid the problem of setting some arbitrary threshold value, I have taken recourse to the method of selecting a specified number of records from a file: That is, if N is this number, I select the N smallest values of CONS, VOC, DIFF and GRAVE from the file. The records are RANKED in a sorting array. Since it happens that records can have the same value, to avoid that the last of such identical records in a file is lost when the number N is reached earlier in the file, there is the additional constraint imposed on the analysis that at least N records from each feature must be selected, and no records should have a higher value than the record ranked as the N'th record (in the sorting array).
For reasons of paradigmatic resolution, the VOC is particularly vulnerable to this. If the syntagmatic resolution is good (small intervals), the number of identical zeros or close to zeros may be high. The CONS is second to VOC in vulnerability in this sense of it. For this reason, the number of VOC records is much higher than DIFF and GRAVE, and there are, in general, more CONS records than DIFF and GRAVE.
In the below analysis, the records are printed out not as x's, but as the RANKING NUMBER in the sorted array of records. It will be seen that all the VOC records in the analysis with 5 as syntagmatic resolution (the column to the left) have successively rising ranking number: This means that they are ranked according to TIME position in the file only, which again means that all the VOC records have the same value. In effect, this means that all of them have zero value, and all zero values in the tested points are included. The reason for this large heap of VOC's is to be found in the paradigmatic resolution, which is a poor 8-bit resolution with only 256 possible amplitude values. This makes the differences a matter of integer differences, and it lends a certain rigidity to the VOC value due to the poor paradigmatic resolution in the time domain. This poverty is flattened out in the DIFF and GRAVE cases due to the intermediate FFT which outputs floating point values which are vulnerable to filtering techniques. I have made use of no filters at all. Varying the filters or the paradigmatic resolution will no doubt produce differences in the outputted patterns.
This exemplifies the role of the technical constraints which indeed must be considered UNIVERSAL and INNATE in the neural interpretation of them. The system seems indeed to be quite 'nervous' and behave somewhat 'chaotically' in terms of symmetry properties when the parameters are varied (even if this of course only is a matter of computational resources as long as the data are finite). Still, the 'nervousness' of the system is an appealing property when we consider the suggested detachment of the GEN, the CON and the EVAL components in the optimality theory framework.
In the following analyses, I have selected at least the 40 smallest values from each of the features. (VOC ran to 141 values - all of them zero - in the lefthand analysis, and to 120 in the righthand - with zero and one as values: The value one started exactly on the 40th record and ran to the 120th).
In the discussion below, it is implied that repeated occurrences of the same feature should be deleted in the interpretation - which attenuates the noise from such variation.
The two analyses are in the two main columns, and the columns in each analysis is as given in the example above, that is, with the order:
CONS - VOC - DIFF - GRAVE - TIME - WORD
The two analyses are time-aligned, but the vertical position in the columns does not give a true impression of the position in the file. The text common to both columns is given in the middle, between them.
John Betjeman: The sunlits weeks between...
A | B | ||||||||||
5 ms | 20 ms | ||||||||||
C | V | D | G | TIME | WORD | C | V | D | G | TIME | |
33 | 0:00 | 0:00 | |||||||||
13 | 0:15 | ||||||||||
0:25 | the | ||||||||||
1 | 0:40 | sunlit | 18 | 0:40 | |||||||
8 | 0:42 | ||||||||||
11 | 0:42 | ||||||||||
2 | 0:47 | ||||||||||
3 | 0:47 | ||||||||||
4 | 0:53 | ||||||||||
15 | 0:57 | ||||||||||
14 | 0:68 | ||||||||||
8 | 0:68 | ||||||||||
33 | 0:69 | ||||||||||
5 | 0:70 | ||||||||||
34 | 0:73 | ||||||||||
25 | 0:76 | ||||||||||
38 | 0:94 | weeks | 26 | 0:94 | |||||||
6 | 1:15 | ||||||||||
37 | 1:30 | ||||||||||
7 | 1:40 | 40 | 1:40 | ||||||||
between | |||||||||||
35 | 1:42 | ||||||||||
24 | 1:50 | ||||||||||
29 | 1:59 | ||||||||||
28 | 1:60 | ||||||||||
12 | 1:62 | ||||||||||
41 | 1:64 | ||||||||||
13 | 1:76 | 2 | 1:70 | ||||||||
8 | 1:76 | 38 | 1:76 | ||||||||
9 | 1:78 | 42 | 1:78 | ||||||||
39 | 10 | 1:79 | |||||||||
43 | 1:87 | ||||||||||
11 | 1:90 | 44 | 1:90 | ||||||||
40 | 1:99 | ||||||||||
18 | 2:00 | ||||||||||
15 | 2:11 | ||||||||||
45 | 2:12 | ||||||||||
12 | 2:25 | ||||||||||
46 | 2:27 | ||||||||||
13 | 2:28 | ||||||||||
14 | 2:30 | were | 4 | 2:30 | |||||||
32 | 2:37 | ||||||||||
41 | 2:39 | 47 | 2:39 | ||||||||
0 | 2:41 | 48 | 2:41 | ||||||||
16 | 2:42 | ||||||||||
49 | 2:44 | ||||||||||
50 | 2:45 | ||||||||||
51 | 2:46 | ||||||||||
0 | 2:51 | ||||||||||
15 | 2:53 | ||||||||||
37 | 2:55 | full | 52 | 2:55 | |||||||
53 | 2:58 | ||||||||||
25 | 2:72 | of | |||||||||
14 | 2:85 | maids | 1 | 2:85 | |||||||
16 | 2:90 | ||||||||||
17 | 3:28 | 2 | 3:28 | ||||||||
27 | 3:43 | ||||||||||
1 | 3:44 | ||||||||||
15 | 3:50 | ||||||||||
42 | 3:65 : | 54 | 3:65 | ||||||||
23 | 3:65 | ||||||||||
3 | 3:66 | ||||||||||
18 | 3:80 | ||||||||||
19 | 3:83 | ||||||||||
20 | 3:88 | ||||||||||
21 | 4:00 | Sarah | 5 | 4:00 | |||||||
7 | 4:02 | ||||||||||
2 | 4:04 | ||||||||||
6 | 4:05 | ||||||||||
2 | 4:06 | ||||||||||
31 | 4:06 | ||||||||||
4 | 4:07 | ||||||||||
22 | 4:09 | 39 | 4:09 | ||||||||
28 | 4:09 | ||||||||||
11 | 4:10 | ||||||||||
11 | 4:11 | ||||||||||
43 | 4:12 | 28 | 4:12 | ||||||||
29 | 4:13 | ||||||||||
16 | 4:14 | 5 | 4:14 | ||||||||
27 | 4:14 | ||||||||||
23 | 4:16 | 26 | 4:16 | ||||||||
2 | 4:16 | ||||||||||
44 | 4:17 | ||||||||||
29 | 4:65 | , | 55 | 4:65 | |||||||
5 | 4:76 | ||||||||||
24 | 4:80 | ||||||||||
34 | 4:80 | ||||||||||
25 | 4:82 | ||||||||||
45 | 4:83 | ||||||||||
8 | 4:88 | ||||||||||
26 | 4:92 | ||||||||||
17 | 4:93 | ||||||||||
18 | 4:94 | 56 | 4:94 | ||||||||
46 | 4:94 | ||||||||||
47 | 4:95 | ||||||||||
27 | 5:00 | with | 6 | 5:00 | |||||||
57 | 5:05 | ||||||||||
28 | 5:06 | ||||||||||
19 | 5:06 | 9 | 5:12 | ||||||||
48 | 5:12 | orange | 31 | 5:14 | |||||||
29 | 5:15 | 3 | 5:17 | ||||||||
7 | 5:23 | ||||||||||
5:70 | wig | 38 | 5:70 | ||||||||
58 | 5:75 | ||||||||||
59 | 5:83 | ||||||||||
30 | 5:98 | and | |||||||||
6:08 | |||||||||||
31 | 6:20 | horsy | 60 | 6:20 | |||||||
32 | 6:22 | ||||||||||
10 | 6:33 | 61 | 6:33 | ||||||||
6:33 | |||||||||||
33 | 6:62 | teeth | 62 | 6:62 | |||||||
34 | 6:67 | ||||||||||
49 | 6:72 | ||||||||||
63 | 6:73 | ||||||||||
3 | 6:76 | ||||||||||
35 | 6:77 | ||||||||||
20 | 6:80 | ||||||||||
50 | 6:82 | 30 | 6:82 | ||||||||
10 | 6:85 | ||||||||||
8 | 6:86 | ||||||||||
36 | 6:87 | 64 | 6:87 | ||||||||
21 | 6:87 | ||||||||||
6:97 | |||||||||||
37 | 7:05 , | ||||||||||
3 | 7:05 | 27 | 7:05 | ||||||||
38 | 7:07 | ||||||||||
31 | 7:13 | ||||||||||
9 | 7:19 | ||||||||||
39 | 7:20 | ||||||||||
40 | 7:24 | ||||||||||
22 | 7:31 | 19 | 7:31 | ||||||||
23 | 7:32 | ||||||||||
20 | 7:33 | ||||||||||
21 | 7:35 | ||||||||||
40 | 7:35 | ||||||||||
41 | 7:36 | ||||||||||
20 | 7:38 | ||||||||||
22 | 7:38 | ||||||||||
42 | 7:40 | ||||||||||
10 | 7:41 | ||||||||||
43 | 7:43 | 65 | 7:43 | ||||||||
17 | 7:46 | ||||||||||
44 | 7:50 | was | 1 | 7:50 | |||||||
30 | 7:51 | ||||||||||
66 | 7:53 | ||||||||||
51 | 7:54 | ||||||||||
67 | 7:56 | ||||||||||
45 | 7:58 | 11 | 7:58 | ||||||||
46 | 7:60 | ||||||||||
38 | 7:67 | so | 22 | 7:67 | |||||||
9 | 7:83 | ||||||||||
10 | 7:84 | ||||||||||
47 | 8:00 | bad | 68 | 8:00 | |||||||
tempered | |||||||||||
48 | 8:17 | ||||||||||
49 | 8:18 | ||||||||||
5 | 8:54 | ||||||||||
50 | 8:57 | ||||||||||
8:75 | that | ||||||||||
32 | 8:85 | she | 69 | 8:85 | |||||||
4 | 8:88 | ||||||||||
51 | 9:05 | 70 | 9:05 | ||||||||
scarcely | |||||||||||
52 | 9:10 | ||||||||||
37 | 9:12 | ||||||||||
52 | 9:21 | ||||||||||
53 | 9:31 | ||||||||||
12 | 9:33 | ||||||||||
54 | 9:34 | ||||||||||
24 | 9:38 | ||||||||||
55 | 9:41 | ||||||||||
56 | 9:49 | ||||||||||
57 | 9:52 | ||||||||||
58 | 9:54 | ||||||||||
59 | 9:70 | spoke | 71 | 9:70 | |||||||
60 | 9:75 | ||||||||||
61 | 9:77 | ||||||||||
72 | 9:80 | ||||||||||
23 | 9:92 | ||||||||||
53 | 10:05 | ||||||||||
11 | 10:07 | ||||||||||
14 | 10:10 | 73 | 10:10 | ||||||||
62 | 10:11 | ||||||||||
63 | 10:20 | ; | 13 | 10:20 | |||||||
64 | 10:24 | 33 | 10:24 | ||||||||
65 | 10:28 | ||||||||||
39 | 10:39 | 14 | 10:32 | ||||||||
66 | 10:41 | 15 | 10:39 | ||||||||
18 | 10:41 | 74 | 10:41 | ||||||||
75 | 10:43 | ||||||||||
19 | 10:45 | ||||||||||
67 | 10:47 | ||||||||||
76 | 10:52 | ||||||||||
25 | 10:53 | ||||||||||
4 | 10:54 | ||||||||||
6 | 10:55 | 77 | 10:55 | ||||||||
54 | 10:60 | Maud | 12 | 10:60 | |||||||
55 | 10:62 | ||||||||||
56 | 10:62 | ||||||||||
57 | 10:66 | ||||||||||
26 | 10:68 | 37 | 10:68 | ||||||||
16 | 10:69 | ||||||||||
68 | 10:70 | ||||||||||
32 | 10:71 | 78 | 10:71 | ||||||||
17 | 10:75 | ||||||||||
69 | 10:78 | ||||||||||
70 | 11:12 | was | |||||||||
71 | 11:30 | my | 18 | 11:30 | |||||||
72 | 11:31 | 79 | 11:31 | ||||||||
80 | 11:33 | ||||||||||
73 | 11:60 | 81 | 11:60 | ||||||||
hateful | |||||||||||
19 | 11:76 | ||||||||||
74 | 11:90 | 82 | 11:90 | ||||||||
75 | 11:97 | ||||||||||
76 | 12:10 | nurse | 20 | 12:10 | |||||||
26 | 12:12 | 7 | 12:12 | ||||||||
77 | 12:27 | ||||||||||
30 | 12:29 | ||||||||||
17 | 12:40 | ||||||||||
30 | 12:47 | who | 83 | 12:47 | |||||||
78 | 12:60 | 29 | 12:60 | ||||||||
smelled | 25 | 12:63 | |||||||||
79 | 12:72 | ||||||||||
80 | 12:73 | ||||||||||
81 | 12:76 | 84 | 12:81 | ||||||||
85 | 12:81 | ||||||||||
21 | 12:92 | of | 86 | 12:92 | |||||||
58 | 13:04 | soap | 24 | 13:04 | |||||||
26 | 13:06 | ||||||||||
82 | 13:07 | ||||||||||
83 | 13:10 | 20 | 13:10 | ||||||||
87 | 13:12 | ||||||||||
84 | 13:23 | ||||||||||
85 | 13:30 | ||||||||||
86 | 13:52 | ||||||||||
87 | 13:55 | , | 88 | 13:55 | |||||||
27 | 13:56 | ||||||||||
31 | 13:61 | ||||||||||
24 | 13:69 | ||||||||||
89 | 13:71 | ||||||||||
88 | 13:72 | ||||||||||
15 | 13:72 | ||||||||||
16 | 13:74 | 90 | 13:74 | ||||||||
59 | 13:76 | ||||||||||
39 | 13:78 | ||||||||||
28 | 13:79 | ||||||||||
89 | 13:81 | ||||||||||
90 | 13:87 | and | 91 | 13:87 | |||||||
21 | 13:90 | ||||||||||
92 | 13:91 | ||||||||||
91 | 13:92 | ||||||||||
21 | 13:93 | ||||||||||
27 | 13:94 | ||||||||||
93 | 13:99 | ||||||||||
60 | 14:05 | 10 | 14:05 | ||||||||
forced | |||||||||||
9 | 14:07 | ||||||||||
94 | 14:09 | ||||||||||
92 | 14:23 | ||||||||||
93 | 14:50 | me | 22 | 14:50 | |||||||
4 | 14:58 | 95 | 14:58 | ||||||||
61 | 14:67 | ||||||||||
29 | 14:68 | to | 8 | 14:68 | |||||||
41 | 14:75 | ||||||||||
14:85 | eat | 35 | 14:85 | ||||||||
96 | 14:97 | ||||||||||
97 | 15:00 | ||||||||||
6 | 15:03 | ||||||||||
94 | 15:05 | chewy | 23 | 15:05 | |||||||
23 | 15:06 | ||||||||||
11 | 15:16 | 98 | 15:16 | ||||||||
27 | 15:22 | ||||||||||
95 | 15:25 | ||||||||||
24 | 15:30 | ||||||||||
21 | 15:34 | ||||||||||
96 | 15:35 | ||||||||||
23 | 15:47 | bits | 25 | 15:47 | |||||||
97 | 15:47 | ||||||||||
98 | 15:53 | ||||||||||
99 | 15:59 | ||||||||||
100 | 15:71 | ||||||||||
101 | 15:75 | of | 7 | 15:75 | |||||||
12 | 15:79 | ||||||||||
62 | 15:82 | ||||||||||
42 | 15:85 | ||||||||||
26 | 15:85 | ||||||||||
102 | 15:87 | 99 | 15:87 | ||||||||
10 | 15:87 | ||||||||||
1 | 15:93 | fish | 14 | 15:93 | |||||||
29 | 15:95 | 100 | 15:95 | ||||||||
5 | 16:03 | ||||||||||
63 | 16:13 | ||||||||||
30 | 16:19 | ||||||||||
103 | 16:22 | 101 | 16:22 | ||||||||
31 | 16:35 | , | 6 | 16:35 | |||||||
37 | 16:36 | ||||||||||
5 | 16:37 | 32 | 16:37 | ||||||||
104 | 16:52 | ||||||||||
43 | 16:56 | ||||||||||
44 | 16:61 | ||||||||||
23 | 16:65 | ||||||||||
105 | 16:72 | 102 | 16:72 | ||||||||
thrusting | |||||||||||
103 | 16:75 | ||||||||||
106 | 16:79 | 13 | 16:79 | ||||||||
64 | 16:80 | 104 | 16:80 | ||||||||
6 | 16:80 | ||||||||||
32 | 16:81 | ||||||||||
105 | 16:82 | ||||||||||
106 | 16:84 | ||||||||||
14 | 16:90 | ||||||||||
107 | 16:92 | ||||||||||
108 | 16:99 | ||||||||||
1 | 17:23 | ||||||||||
109 | 17:30 | me | 39 | 17:30 | |||||||
110 | 17:43 | back | 27 | 17:43 | |||||||
111 | 17:56 | 28 | 17:56 | ||||||||
35 | 17:65 | to | 29 | 17:65 | |||||||
112 | 17:66 | ||||||||||
113 | 17:82 | ||||||||||
65 | 17:85 | 36 | 17:85 | ||||||||
babyhood | |||||||||||
114 | 17:95 | ||||||||||
107 | 17:96 | ||||||||||
12 | 18:03 | ||||||||||
30 | 18:07 | ||||||||||
20 | 18:10 | ||||||||||
115 | 18:45 | , | |||||||||
116 | 18:45 | 31 | 18:45 | ||||||||
26 | 18:51 | ||||||||||
7 | 18:52 | ||||||||||
21 | 18:58 | ||||||||||
117 | 18:61 | ||||||||||
19 | 18:68 | ||||||||||
108 | 18:70 | ||||||||||
66 | 18:75 | with | 32 | 18:75 | |||||||
118 | 18:80 | ||||||||||
119 | 18:81 | ||||||||||
120 | 18:83 | ||||||||||
109 | 18:84 | ||||||||||
33 | 18:85 | ||||||||||
110 | 18:98 | ||||||||||
19:00 | 33 | 19:00 | |||||||||
threats | 33 | 19:00 | |||||||||
121 | 19:00 | ||||||||||
122 | 19:02 | 111 | 19:02 | ||||||||
123 | 19:02 | ||||||||||
112 | 19:16 | ||||||||||
113 | 19:28 | ||||||||||
45 | 19:37 | ||||||||||
8 | 19:45 | of | |||||||||
124 | 19:55 | 11 | 19:55 | ||||||||
nappies | |||||||||||
125 | 19:56 | ||||||||||
19 | 19:58 | ||||||||||
126 | 19:70 | 34 | 19:70 | ||||||||
127 | 19:75 | 22 | 19:75 | ||||||||
128 | 19:75 | ||||||||||
129 | 19:76 | ||||||||||
13 | 19:76 | ||||||||||
17 | 19:90 | ||||||||||
3 | 19:92 | ||||||||||
24 | 19:93 | ||||||||||
20:10 | , | 34 | 20:10 | ||||||||
114 | 20:19 | ||||||||||
130 | 20:30 | 32 | 20:30 | ||||||||
dummies | |||||||||||
131 | 20:42 | ||||||||||
4 | 20:44 | ||||||||||
9 | 20:45 | ||||||||||
39 | 20:47 | ||||||||||
2 | 20:49 | ||||||||||
3 | 20:49 | ||||||||||
115 | 20:50 | ||||||||||
132 | 20:64 | ||||||||||
20:75 | , | ||||||||||
36 | 20:80 | and | 19 | 20:80 | |||||||
133 | 20:83 | ||||||||||
2 | 20:89 | ||||||||||
134 | 20:93 | the | 116 | 20:93 | |||||||
135 | 21:05 | ||||||||||
34 | 21:10 | ||||||||||
67 | 21:20 | 117 | 21:20 | ||||||||
feeding | 14 | 21:20 | |||||||||
36 | 21:53 | bottle | 5 | 21:53 | |||||||
118 | 21:56 | ||||||||||
13 | 21:57 | ||||||||||
18 | 21:58 | ||||||||||
28 | 21:63 | ||||||||||
6 | 21:64 | ||||||||||
136 | 21:64 | ||||||||||
22 | 21:64 | ||||||||||
137 | 21:65 | ||||||||||
68 | 21:66 | ||||||||||
138 | 21:68 | ||||||||||
7 | 21:68 | ||||||||||
3 | 21:73 | ||||||||||
139 | 21:73 | 119 | 21:73 | ||||||||
140 | 21:74 | ||||||||||
141 | 21:77 | ||||||||||
69 | 21:78 | ||||||||||
120 | 21:80 | ||||||||||
4 | 21:83 | ||||||||||
In the following, I call the lefthand analysis for A (the one with 5 ms as syntagmatic resolution), and the righthand analysis for B (with 20 ms as syntagmatic resolution).
First of all, the output is simplified by the restriction against counting repeated occurrences of a feature record, at least when no other records come inbetween. The two GRAVEs (ranked 33 and 13) in the beginning of A therefore count as only one GRAVE. The beginning of A then comes out as follows:
GRAVE-VOC-GRAVE-CONS-VOC-DIFF-VOC-CONS-VOC
going from time 0.00 to 1.15 (or, to 1.40 if we want). If we now want to make changes in this pattern (that is, letting the EVAL component work on it) in order to obtain a symmetric pattern in the NP "the sunlit weeks", we can obtain that in one of two ways, which in either case includes looking apart from the two first GRAVEs before the beginning of the NP:
1) We can insert a GRAVE inbetween 0.94 and 1.15
2) We can delete the GRAVE at 0.42.
Which is the optimal solution must be up to the EVAL component. Inspection of the overall structure is needed to see if heightening of the GRAVE threshold can insert a record between 0.94 and 1.15. It seems unlikely that, in the framework we have set up here, a lowering of the threshold can delete the GRAVE at 0.42, since it has RANK number 8 (out of roughly 40 ranked records). Consequently, it is possible that solution 1) can be obtained by variation of the threshold value, thereby avoiding violation, but it is, in the terrain we are into here, not possible to obtain 2) without violating a constraint. Consequently, a MIRROR-SYMMETRIC symbol covering the first NP can be obtained by violating the constraint in 2) or by violating the one in 1) or by varying the overall threshold value determining 1).
Pursueing the same way of reasoning, we can obtain a mirror-symmetric pattern covering the whole of the first sentence "The sunlit weeks between were full of maids", going from 0.25 to 3.98 (we must, consequently, include the pause preceding the word 'Sarah'), by the following deletions in A:
1) Delete GRAVE at 0.42
2) Delete VOC at 1.79 (leaving only CONS there)
3) Delete CONS at 2.85
As appears from the parallel column in B, a different spread of the measurement points can have fairly radical effects. Even if the GRAVE at 0.42 in A has a rank close to zero and therefore hardly can be deleted by threshold variation, it is of course well possible that it is a sole occurrence of a low value in a surrounding ocean of high values, such as column B may indicate. In such a case, a slight displacement of the offset from the beginning of the file could solve the problem. These are also among the remedies which the EVAL component could measure.
For the segment 'Sarah' (4.00-4.65), deleting the two last records (GRAVE and CONS at 4.16 and 4.17) yields a mirror-symmetric pattern VOC-GRAVE-CONS-GRAVE-VOC-GRAVE-CONS-GRAVE-VOC.
'With orange wig (and)' (5.00-5.97/6.19) is symmetric as it is. The full epithet 'with orange wig and horsy teeth' from 5.00-7.03 can be made symmetric by deleting the CONS at 6.72 as well as the CONS at 6.87. The value at 6.72 has a ranking number above the upper limit 40, which means that it is a maximal value and therefore is sensitive to threshold variation if a finer paradigmatic resolution is introduced. The last value at 6.87 can be skipped by moving the boundary.
If the two adjacent values at 10.05 and 10.10 are juxtaposed (assuming that some displacement changes the order of them), we get a mirror pattern in the phrase 'scarcely spoke' in 9.05-10.13.
'Smelled of soap' 12.60-13.52 is symmetric.
The phrase 'me to eat chewy bits of fish' in 14.50-16.34 is symmetric as it is.
'Thrusting me back to babyhood' 16.72-18.44 is also symmetric if only we confine it to 16.72-18.09, that is, if we delete the last record in it.
The last phrase 'and the feeding bottle' becomes symmetric if we confine it to 20.80-21.73 (to the GRAVE) and delete VOC records 133 (at 20.83) and 137 (at 21.65).
So, most of the utterance can be segmented into mirror-symmetric chunks which also seem to coincide fairly well with nice phrase boundaries by a reasonably small amount of violations of the constraints.
Analysis B seems to provide some mirror patterns which are much longer. Most of the phrase 'were full of maids' plus trailing silence 2.39-3.99 is symmetric. 'Sarah' 4.00-4.64 is symmetric. Next there is a symmetric segment 4.65-5.11 (roughly the preposition 'with'), followed by 'orange', and finally the long segment 'wig and horsy teeth, was so bad-tempered that she scarcely spoke' from 5.70 to 10.24, which can be made mirror-symmetric by inserting a VOC inbetween CONS and GRAVE at 7.35/7.38. The segmentation suggests the childish interpretation that the maid had 1) an orange, and 2) wig and horsy teeth, and the 'wig and horsy teeth' was bad-tempered.
There are strong tendencies to a mirror-symmetry in B over the long segment from 10.24 to around 16.36. This is the segment 'Maud was my hateful nurse who smelled of soap, and forced me to eat chewy bits of fish'. The mid point around which the symmetry must be sought is in the GRAVE records at 13.93/13.94. Insertions can remedy for the weaknesses, or the following deletions: The VOC at 13.12 and 13.30, the DIFF at 14.05 and 14.07, and the DIFF at 14.68. Also, there is the DIFF element at 15.03/15.05 which should have been GRAVE or vice versa.
'Thrusting me back to babyhood with threats' in the interval 16.72-19.28 is symmetric as it is.
And so forth.
The analyses can be pursued in more or less detail, and the segmetations can be carried out in various ways. There is no need to go into more detail here, as long as the EVAL emendations cannot be seen in a wider perspective of the overall architecture (which is a large project, of course), and as long as the further discrepancies between acoustic and articulatory/motoric features have not been incorporated. The analyses which I have carried out suggest that there is a natural tendency to mirror symmetry in phrasal segments of the utterances, which in general supports the hypothesis. It is also of some interest to observe that the model of analysis discussed here was arrived at through theoretical prediction only, and was not discovered through experimenting: I predicted, on basis of Grover (1995) and (1997e), that a certain mirror symmetry pattern would be found under these conditions, and found it on the first attempt. However, the purpose of the present analysis has been mainly to show how the suggested grammar is supposed to work, and how it conveniently can be conceived of in terms of optimality theory (or something like that).
The chaotic behavior of the mirror patterns when the parameters change is a quite interesting property of this system. I don't know if it is possible to describe the mirror patterns as a function of the values in the high-dimensional parameter space (it looks difficult to me), but even if this should have been possible, there is the additional EVAL component which must be presumed to behave somewhat unpredictively here (otherwise the claim that the OT system is not rule-governed makes no sense). This in total makes the behavior of the mirror-patterns rather unpredictable from the input, which is a strong expression for the rooting of this system in the collective consciousness (Grover 1997e, 1997t, 1997p). It means that the new epistemological stuff predicted in Grover (1997e) (the new substance) which the collective consciousness imposes on the new arbitrary sentential (or phrasal) signs, here is directly incorporated into the grammatical system.
This, together with the high degree of economy (the simplicity), is what makes this model appealing for a processing of the new arbitrary sentential (phrasal) sign which signifies on the interface between the subjective mind and the collective consciousness.
Finally, the similarity with the parallel-readings discussed in Grover (1997t) suggests how this can be brought into a very general theory of narrative structure.
The dissertation was written while I was a research fellow at the university of Bergen in 1995-1998.
References:
Archangeli, D. & Langendoen, D.T., eds. (1997): Optimality theory. An overview. Blackwell, Oxford.
Chomsky, N. & Halle, M. (1968): The sound pattern of English. Harper & Row, New York.
Cook, N.D. (1986): The brain code. Methuen, London.
Grover, J. (1995): Submorphemic signification. PO Box 13867, London N4 2WB = The PhD dissertation Book 1
Grover, J. (1997e): Epistemes, language and information technology. PO Box 13867, London N4 2WB = The PhD dissertation Book 2
Grover, J. (1997p): A pilot study for a poetic science. PO Box 13867, London N4 2WB = The PhD dissertation Book 4
Grover, J. (1997t): The theatre of the heart. PO Box 13867, London N4 2WB = The PhD dissertation Book 3
Greenberg, J. (1987): Language in the Americas. Stanford University Press, Stanford.
Hume, E. & Odden, D. (1996): Reconsidering [consonantal]. Phonology 13: 345-376.
Hyman, L. (1975): Phonology. Theory and analysis. Holt, Rinehart and Winston, New York.
Jakobson, R., Fant, G. & Halle, M. (1952): Preliminaries to speech analysis. MIT Press, Cambridge, Mass.
Jakobson, R. & Halle, M. (1956): Fundamentals of language. Mouton, The Hague.
Lacan, J. (1949): The mirror-phase. Translated by Jean Roussel. In: New Left Review (1968), no.51, pp.71-77.
Ladefoged, P. (1975): A course in phonetics. Harcourt Brace Jovanovitch, New York.
MARSEC = MAchine-Readable Spoken English Corpus
http://www.reading.ac.uk/AcaDepts/II/speechlab/marsec/
Vygotsky, L. (1962/1986): Thought and language. The MIT Press, Cambridge, Massachusetts.
© John Bjarne Grover
On the web 29 may 2022