A Study of Optimality Theory and the Human Sentence Processing Mechanism Rajvinder Singh Abstract: From a computational perspective, parsing is a very interesting phenomenon. All people do it quickly, and all people do it well. The history of cognitive science has been filled with attempts to explain the mechanisms that guide the human sentence processing mechanism (hf. HSPM). These have typically made underlying assumptions about other cognitive faculties, such as working memory and modularity. Our work will be no different. In what follows, we develop a theory of parsing that is grounded in the framework of optimality theory (hf. OT). The goals of this work may be explicitly stated as follows. First, we hope to construct a theory of the HSPM that is both descriptive and explanatory. This will require us to use both rational analysis and empirical data in order to enumerate all and only those constraints that are involved in the functioning of the HSPM. Second, we want to implement these constraints into an OT system. Before doing so, however, it is important to ask: Why should we want to marry parsing theory with OT? There are two main reasons. First, the notion of 'optimality' has been assumed throughout the history of research on parsing. When faced with local ambiguities, listeners/readers must choose between a set of candidate structures, guided by interacting constraints. At an abstract level, this process of resolving syntactic ambiguities is almost identical to the process of determining grammaticality in optimality theoretic systems. At the very least, the structural similarity suggests that OT might prove to be useful in modelling observed parsing behaviour. Second, although OT's origins (and greatest successes) lie in the domain of phonology, it has recently undergone extensive expansion into the domains of syntax, semantics, and pragmatics. This expansion, coupled with its infamous narrowing of the competence-performance distinction, leads very naturally to the question, can OT encompass the domain of language processing as well? Part of our quest is to answer this question in the affirmative, thereby continuing the expansion of OT qua linguistic theory qua theory of the language faculty. However, we are not the first to attempt to merge OT and the HSPM. Gibson and Broihier (1998, hf. GB) provide OT implementations of various prominent theories of parsing from the current literature. They translate three different constraint sets into OT systems. The first is the famous 'garden path theory of sentence processing', which consists of the constraints Minimal Attachment and Late Closure. The second contains constraints involving thematic role assignments and preferences to attach locally. The third set consists of constraints that indicate a preference for attachments that are local and near a predicate.1 Unfortunately, it is found that none of the OT implementations are able to account for the data. GB use this to argue that 'standard OT', which is used here to mean OT with the property of strict domination2, is unable to accommodate observed parsing preferences. They claim that a weighted constraint theory that allows lower ranked constraints to additively outweigh higher ranked constraints would yield greater empirical coverage. In this thesis, we take issue with GB's conclusion by attempting to demonstrate the effectiveness of 'standard OT' in accounting for the experimental data. To accomplish our task, it will be necessary to turn to the psycholinguistic literature to help guide the development of our system. We claim that the theories of the HSPM posed in GB are unsuccessful precisely because they fail to incorporate well known psycholinguistic results into their theoretical formalism, not because standard OT is not up to the task. By incorporating the experimental results into our OT system, we hope to have a more psychologically plausible theoretical construction that improves current descriptions of the HSPM. Upon developing our system and analysing its adequacy in describing and explaining the data, we will come back to the issue of whether or not standard OT is able to account for the observed parsing preferences. So, here is the overall structure of our story. We begin in Chapter 1 with a gloss of the most influential account of parsing to date, viz., the 'garden path theory of sentence processing'. Almost all of the research in language processing has been a response to the garden path theory (hf. GPT). Each response has served to either contribute to the GPT or to provide a critique of its axioms and theorems. Indeed, the psycholinguistic literature we examine is a debate on the status of the constraints making up the GPT. Thus, in order to appreciate the experimental results examined in this work, it will be necessary to have the GPT as the backdrop against which we interpret the psycholinguistic data. Hence, we begin with a brief overview of the GPT. In Chapter 2 we examine a broad range of psycholinguistic results with two goals in mind. First, we will use the experimental data to test the adequacy of the GPT. Second, by analysing the GPT's successes and, more importantly, its failures, it is hoped that important insights will be revealed that will guide us to a more accurate theory of the HSPM. The experimental results reveal that, indeed, the GPT is flawed in fundamental respects. First, it includes constraints that are not 'doing anything' in that they do not seem to be involved in the determination of parsing preferences. Second, it is missing certain constraints that are necessary to capture the data. These results can be thought of as remarks on the 'soundness' and 'completeness' of the garden path theory.3 Suppose G is a theory (constraint set) for a particular domain X (eg. Language processing). Suppose further that some subset of G may be expanded (by adding constraints) to the 'right' theory T of domain X.4 We say that G is 'sound' if G ( T. We say that G is 'complete' if G (T. According to these definitions, G will be sound only if it does not contain any superfluous or vacuous constraints. G will be 'complete' only if it includes all the constraints that are involved in some domain X. In Chapter 2, we will find that the GPT is not 'sound' because it includes constraints that are not reflective of the actual constraints guiding the HSPM. It is not 'complete' because it is missing constraints that are necessary to explain the computations of the HSPM. In Chapter 3, using these results as our guide, we implement the constraints that are involved in parsing preferences into a very natural OT system. The constructed system is a standard OT system whose constraints satisfy the property of strict domination. This is important, for, contra GB we are attempting to demonstrate that standard OT is well-suited to the task of predicting parsing data. The system we develop has many nice properties. First, it consists of a small set of constraints that carry both rational and empirical support. Second, the constraints are clearly motivated by the need to make the HSPM computationally efficient, which we consider to be its most pervasive feature. Third, not only does the theory incorporate well-known psycholinguistic results into its formalism, it makes very explicit the nature of the cognitive architecture that allows the observed psycholinguistic phenomenon to take place at all. This takes our work farther than most other theories of the HSPM, for they often have little to say about the architectures that allow the cognitive phenomenon they describe to arise in the first place (but see Lewis 2000). By being precise about both the architectural and computational properties of the phenomenon under consideration, we are able to make clear, falsifiable predictions. Our theory is further constrained by the fact that architectural assumptions and computational assumptions mutually restrict each other; certain kinds of architectures rule out (and imply) certain kinds of computations, and certain kinds of computations rule out (and imply) certain kinds of architectures. In Chapter 4, we test the adequacy of the system by comparing its predictions with the observed data. We demonstrate that it is able to capture a large array of experimental results, predicting observed parsing preferences in English and Spanish ambiguities. Additionally, the system is able to predict differences in the processing complexity of unambiguous sentences where there are no 'preferences'. For example, it is well known that centre-embedded structures are more difficult to process than right (and left) branching structures. Our system is able to predict this difference in processing complexity. One of the factors contributing to the theory's descriptive power is that the notion of structural ambiguity becomes much streamlined in our work. We illustrate that a set of structural ambiguities previously thought to be unrelated can in fact be reduced to a smaller set of ambiguity types. This result follows almost directly from our architectural assumptions. We demonstrate that there is actually a redundancy in the kinds of ambiguities faced by the HSPM, allowing it to repeatedly use general resolution methods rather than construction specific mechanisms to resolve the ambiguities it comes across. This serves to add efficiency to the HSPM's computations and adds a touch of elegance to the theory being developed. In Chapter 5, we offer some remarks on the impact of our work on broader issues in linguistic theory, theories of the HSPM, and cognitive science in general. These include, inter alia, discussions of topics such as the relation between the parser and the grammar, general cognitive architecture, language acquisition, and the adequacy of standard OT as a framework for language processing. Such discussions will also be found interspersed throughout the text, often made to help motivate or justify various assumptions or conclusions that we make or draw. Ultimately, we hope to have developed a sophisticated theory of the human sentence processing mechanism that is descriptively powerful, theoretically sound, and consistent with what we know about human psychology. Furthermore, by providing a successful translation of the theory of the HSPM into an OT constraint system, we hope to expand OT to encompass linguistic performance in addition to its coverage of linguistic competence. As mentioned at the outset, the determination of parsing preferences and the determination of optimality in OT are structurally remarkably similar. Hence, the prospects for a successful merger between OT and language processing are a priori quite promising. This work examines the extent to which these prospects may be formally achieved. Enough with the introductory remarks! A story is waiting to be told, and so to the garden paths we go. 1 The first constraint set is outlined in: Frazier, L. (1978). On comprehending sentences: Syntactic parsing strategies. Ph.D. Dissertation, University of Connecticut. The second is outlined in: Pritchett, B. (1988). Garden path phenomena and the grammatical basis of language processing. Language 64: 539-576. The third is outlined in Gibson, E., N. Pearlmutter, E. Canseco-Gonzales and G. Hickok. (1996). Recency preference in the human sentence processing mechanism. Cognition 59, 23-59. 2 The property of strict domination says that, for any two constraints C and D such that C is ranked higher than D (notationally represented as C >> D), no number of violations of constraint D is as destructive as a single violation of constraint C. For example, if two candidate representations X and Y are such that X incurs no violations of C and five violations of D, whereas Y has one violation of C and none of D, and neither X nor Y violate any other constraints, then X is more optimal than Y. This is also represented as X >> Y. 3 According to standard usage, grammars are theories, and hence are subject to adequacy conditions. Here, we propose 'soundness' and 'completeness' as two such conditions. I use scare quotes only because these terms are also used in the logical literature as remarks on logical theories. Although there are some clear parallels between the way the terms are used in the well-established logical tradition and the way they are used here, I do not want to confound the two. Hence, as a sign of respect to the logical tradition, I use 'scare quotes' to highlight the fact that these terms are not new, are not unique, and are borrowed from a rich tradition, and would like to note that the usage has been modified. 4 Of course, this is abstracting away from the fact that for any given data set, there is an unbounded number of theories that can correctly describe that data set. The discussion here assumes that we are 'within' a Kuhnian paradigm, and that, within this paradigm, there is some such 'right' theory T that G may or may not be close to. The 'soundness' and 'completeness' of G are measures of how well it approximates T. This will not please philosophers, but we do not import any realist assumptions here. We are simply making terminological definitions so that we may discuss the 'goodness' of theories in the processing literature, such as the GPT.