2nd Semester 2012/13: Collective Annotation of Linguistic Resources

Ulle Endriss and Raquel Fernández
If you are interested in this project, please contact the instructors by email.
Crowdsourcing provides new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online. This method has revolutionised the collection of labelled data, in computational linguistics and elsewhere. However, to create annotated linguistic resources from crowdsourced data we face the problem of having to combine the judgements of a potentially large group of annotators. In a recent paper we have put forward the idea of using principles of social choice theory (which has traditionally dealt with the aggregation of the preferences of individual voters in an election) to design new methods for aggregating linguistic annotations provided by individuals into a single collective annotation.

At this point, this is only a nice idea. In this project we want to explore it further, by designing new concrete aggregation methods and testing them on a wide variety of real data from linguistic annotation tasks.

In the first week we will cover some of the background knowledge required for this project, regarding both computational linguistics and social choice theory. The rest of the project period will be devoted to original research: identifying useful data sets, defining interesting methods of aggregation, and testing the latter on the former. Ideally, we would like to end up with a jointly written paper that provides a thorough study of different aggregation methods for different types of annotated data, on the basis of which we can make recommendations to researchers in computational linguistics for what method to use under which circumstances.
This is an interdisciplinary project and no single person will have all the technical background required for its success. What we are looking for is someone with some background in (computational) linguistics, someone familiar with the basics of social choice theory, and someone with a certain amount of programming experience.
The following paper is the starting point for this project:

