A Study of Optimality Theory and the Human Sentence Processing Mechanism
Rajvinder Singh

Abstract:
From a computational perspective, parsing is a very interesting
phenomenon.  All people do it quickly, and all people do it well.  The
history of cognitive science has been filled with attempts to explain
the mechanisms that guide the human sentence processing mechanism
(hf. HSPM).  These have typically made underlying assumptions about
other cognitive faculties, such as working memory and modularity.  Our
work will be no different.

In what follows, we develop a theory of parsing that is grounded in
the framework of optimality theory (hf. OT).  The goals of this work
may be explicitly stated as follows.  First, we hope to construct a
theory of the HSPM that is both descriptive and explanatory.  This
will require us to use both rational analysis and empirical data in
order to enumerate all and only those constraints that are involved in
the functioning of the HSPM.  Second, we want to implement these
constraints into an OT system.  Before doing so, however, it is
important to ask: Why should we want to marry parsing theory with OT?

There are two main reasons.  First, the notion of 'optimality' has
been assumed throughout the history of research on parsing.  When
faced with local ambiguities, listeners/readers must choose between a
set of candidate structures, guided by interacting constraints. At an
abstract level, this process of resolving syntactic ambiguities is
almost identical to the process of determining grammaticality in
optimality theoretic systems.  At the very least, the structural
similarity suggests that OT might prove to be useful in modelling
observed parsing behaviour.  Second, although OT's origins (and
greatest successes) lie in the domain of phonology, it has recently
undergone extensive expansion into the domains of syntax, semantics,
and pragmatics.  This expansion, coupled with its infamous narrowing
of the competence-performance distinction, leads very naturally to the
question, can OT encompass the domain of language processing as well?
Part of our quest is to answer this question in the affirmative,
thereby continuing the expansion of OT qua linguistic theory qua
theory of the language faculty.  However, we are not the first to
attempt to merge OT and the HSPM.

Gibson and Broihier (1998, hf. GB) provide OT implementations of
various prominent theories of parsing from the current literature.
They translate three different constraint sets into OT systems.  The
first is the famous 'garden path theory of sentence processing', which
consists of the constraints Minimal Attachment and Late Closure.  The
second contains constraints involving thematic role assignments and
preferences to attach locally.  The third set consists of constraints
that indicate a preference for attachments that are local and near a
predicate.1 Unfortunately, it is found that none of the OT
implementations are able to account for the data.  GB use this to
argue that 'standard OT', which is used here to mean OT with the
property of strict domination2, is unable to accommodate observed
parsing preferences.  They claim that a weighted constraint theory
that allows lower ranked constraints to additively outweigh higher
ranked constraints would yield greater empirical coverage.  In this
thesis, we take issue with GB's conclusion by attempting to
demonstrate the effectiveness of 'standard OT' in accounting for the
experimental data.  To accomplish our task, it will be necessary to
turn to the psycholinguistic literature to help guide the development
of our system.  We claim that the theories of the HSPM posed in GB are
unsuccessful precisely because they fail to incorporate well known
psycholinguistic results into their theoretical formalism, not because
standard OT is not up to the task.  By incorporating the experimental
results into our OT system, we hope to have a more psychologically
plausible theoretical construction that improves current descriptions
of the HSPM.  Upon developing our system and analysing its adequacy in
describing and explaining the data, we will come back to the issue of
whether or not standard OT is able to account for the observed parsing
preferences.

So, here is the overall structure of our story.  We begin in Chapter 1
with a gloss of the most influential account of parsing to date, viz.,
the 'garden path theory of sentence processing'.  Almost all of the
research in language processing has been a response to the garden path
theory (hf. GPT).  Each response has served to either contribute to
the GPT or to provide a critique of its axioms and theorems.  Indeed,
the psycholinguistic literature we examine is a debate on the status
of the constraints making up the GPT.  Thus, in order to appreciate
the experimental results examined in this work, it will be necessary
to have the GPT as the backdrop against which we interpret the
psycholinguistic data.  Hence, we begin with a brief overview of the
GPT.

In Chapter 2 we examine a broad range of psycholinguistic results with
two goals in mind.  First, we will use the experimental data to test
the adequacy of the GPT.  Second, by analysing the GPT's successes
and, more importantly, its failures, it is hoped that important
insights will be revealed that will guide us to a more accurate theory
of the HSPM.

The experimental results reveal that, indeed, the GPT is flawed in
fundamental respects.  First, it includes constraints that are not
'doing anything' in that they do not seem to be involved in the
determination of parsing preferences.  Second, it is missing certain
constraints that are necessary to capture the data.  These results can
be thought of as remarks on the 'soundness' and 'completeness' of the
garden path theory.3 Suppose G is a theory (constraint set) for a
particular domain X (eg. Language processing).  Suppose further that
some subset of G may be expanded (by adding constraints) to the
'right' theory T of domain X.4 We say that G is 'sound' if G ( T.  We
say that G is 'complete' if G (T.  According to these definitions, G
will be sound only if it does not contain any superfluous or vacuous
constraints.  G will be 'complete' only if it includes all the
constraints that are involved in some domain X.  In Chapter 2, we will
find that the GPT is not 'sound' because it includes constraints that
are not reflective of the actual constraints guiding the HSPM.  It is
not 'complete' because it is missing constraints that are necessary to
explain the computations of the HSPM.

In Chapter 3, using these results as our guide, we implement the
constraints that are involved in parsing preferences into a very
natural OT system.  The constructed system is a standard OT system
whose constraints satisfy the property of strict domination.  This is
important, for, contra GB we are attempting to demonstrate that
standard OT is well-suited to the task of predicting parsing data.
The system we develop has many nice properties.  First, it consists of
a small set of constraints that carry both rational and empirical
support.  Second, the constraints are clearly motivated by the need to
make the HSPM computationally efficient, which we consider to be its
most pervasive feature.  Third, not only does the theory incorporate
well-known psycholinguistic results into its formalism, it makes very
explicit the nature of the cognitive architecture that allows the
observed psycholinguistic phenomenon to take place at all.  This takes
our work farther than most other theories of the HSPM, for they often
have little to say about the architectures that allow the cognitive
phenomenon they describe to arise in the first place (but see Lewis
2000).  By being precise about both the architectural and
computational properties of the phenomenon under consideration, we are
able to make clear, falsifiable predictions.  Our theory is further
constrained by the fact that architectural assumptions and
computational assumptions mutually restrict each other; certain kinds
of architectures rule out (and imply) certain kinds of computations,
and certain kinds of computations rule out (and imply) certain kinds
of architectures.

In Chapter 4, we test the adequacy of the system by comparing its
predictions with the observed data.  We demonstrate that it is able to
capture a large array of experimental results, predicting observed
parsing preferences in English and Spanish ambiguities.  Additionally,
the system is able to predict differences in the processing complexity
of unambiguous sentences where there are no 'preferences'.  For
example, it is well known that centre-embedded structures are more
difficult to process than right (and left) branching structures.  Our
system is able to predict this difference in processing complexity.

One of the factors contributing to the theory's descriptive power is
that the notion of structural ambiguity becomes much streamlined in
our work.  We illustrate that a set of structural ambiguities
previously thought to be unrelated can in fact be reduced to a smaller
set of ambiguity types.  This result follows almost directly from our
architectural assumptions.  We demonstrate that there is actually a
redundancy in the kinds of ambiguities faced by the HSPM, allowing it
to repeatedly use general resolution methods rather than construction
specific mechanisms to resolve the ambiguities it comes across.  This
serves to add efficiency to the HSPM's computations and adds a touch
of elegance to the theory being developed.

In Chapter 5, we offer some remarks on the impact of our work on
broader issues in linguistic theory, theories of the HSPM, and
cognitive science in general.  These include, inter alia, discussions
of topics such as the relation between the parser and the grammar,
general cognitive architecture, language acquisition, and the adequacy
of standard OT as a framework for language processing.  Such
discussions will also be found interspersed throughout the text, often
made to help motivate or justify various assumptions or conclusions
that we make or draw.  Ultimately, we hope to have developed a
sophisticated theory of the human sentence processing mechanism that
is descriptively powerful, theoretically sound, and consistent with
what we know about human psychology.  Furthermore, by providing a
successful translation of the theory of the HSPM into an OT constraint
system, we hope to expand OT to encompass linguistic performance in
addition to its coverage of linguistic competence.  As mentioned at
the outset, the determination of parsing preferences and the
determination of optimality in OT are structurally remarkably similar.
Hence, the prospects for a successful merger between OT and language
processing are a priori quite promising.  This work examines the
extent to which these prospects may be formally achieved.

Enough with the introductory remarks!  A story is waiting to be told,
and so to the garden paths we go.

1 The first constraint set is outlined in: Frazier, L. (1978). On
  comprehending sentences: Syntactic parsing
  strategies. Ph.D. Dissertation, University of Connecticut.  The
  second is outlined in: Pritchett, B. (1988). Garden path phenomena
  and the grammatical basis of language processing. Language 64:
  539-576.  The third is outlined in Gibson, E., N. Pearlmutter,
  E. Canseco-Gonzales and G. Hickok. (1996). Recency preference in the
  human sentence processing mechanism. Cognition 59, 23-59.
2 The property of strict domination says that, for any two constraints
  C and D such that C is ranked higher than D (notationally
  represented as C >> D), no number of violations of constraint D is
  as destructive as a single violation of constraint C.  For example,
  if two candidate representations X and Y are such that X incurs no
  violations of C and five violations of D, whereas Y has one
  violation of C and none of D, and neither X nor Y violate any other
  constraints, then X is more optimal than Y.  This is also
  represented as X >> Y.
3 According to standard usage, grammars are theories, and hence are
  subject to adequacy conditions. Here, we propose 'soundness' and
  'completeness' as two such conditions. I use scare quotes only
  because these terms are also used in the logical literature as
  remarks on logical theories.  Although there are some clear
  parallels between the way the terms are used in the well-established
  logical tradition and the way they are used here, I do not want to
  confound the two. Hence, as a sign of respect to the logical
  tradition, I use 'scare quotes' to highlight the fact that these
  terms are not new, are not unique, and are borrowed from a rich
  tradition, and would like to note that the usage has been modified.
4 Of course, this is abstracting away from the fact that for any given
  data set, there is an unbounded number of theories that can
  correctly describe that data set.  The discussion here assumes that
  we are 'within' a Kuhnian paradigm, and that, within this paradigm,
  there is some such 'right' theory T that G may or may not be close
  to.  The 'soundness' and 'completeness' of G are measures of how
  well it approximates T. This will not please philosophers, but we do
  not import any realist assumptions here. We are simply making
  terminological definitions so that we may discuss the 'goodness' of
  theories in the processing literature, such as the GPT.