Projects in Previous Years

1st Semester 2019/20: Language complexity: From part to whole

Instructors

Tom Lentz and Jan Hulstijn.

If you are interested in this project, please contact the instructor(s) by email.

Registration until 17 December 2019, 13:00 hr. through https://datanose.nl/#specialenrol, using the course code: 5314RLCF6Y

ECTS
6
Description

Language in a person’s mind/brain, as well as language as it can be perceived in spoken or written discourse has the characteristics of a Complex Adaptive System (CAS).

This means that language production is probabilistic and exhibits inherent variance. The challenge for the study of individual differences in language proficiency is to empirically tease apart (a) the inherent probabilistic, within-person variance (CAS) in every person’s language production, from (b) differences in language production reflecting genuine differences of linguistic cognition between persons. Towards this goal, several researchers have tried to compute syntactic complexity and lexical richness of language production (e.g., in written essays), using part-of-speech (POS) taggers, then comparing essays written by different people (or by the same person at different points of time in the course of language development) on these measures. For the Dutch language, linguists have constructed a tool, called T-Scan, which currently computes 420 (!) text features of syntactic complexity and lexical richness (visit https://webservices-lst.science.ru.nl and scroll down to T-Scan). We would like to explore the possibility of moving from the measurement of many different syntactic and lexical features separately to the measurement of the system’s complexity (interconnectedness) as a whole1.

Prerequisites

Basic knowledge of linguistics; knowledge of complexity.

Assessment

To be discussed; e.g., an article, essay or algorithm / computational tool.

1 Two small corpora are available for this project. One corpus contains transcripts of language produced by 110 second-language learners of Dutch at two different levels of proficiency. The other corpus consists of language produced by 98 adult native speakers of Dutch, differing in (a) level of education and profession and (b) age (18-76 years).