2nd Semester 2025/26: Teaching Synthetic Personae to Reason about Political Stimuli like Humans
- Instructors
- Roberto Cerina
- ECTS
- 6
- Description
Large language models and recent work on synthetic polling raise the question whether open-source models can be fine-tuned to represent temporally evolving synthetic voters. This project investigates that question using longitudinal tweet histories, dated news data, and repeated interview transcripts with Dutch participants. The aim is to build a model that, conditioned on demographic information, prior history, date, and contemporaneous news, can generate plausible out-of-sample opinions or short responses for synthetic personae.
Students will compare several open-source language models before fine-tuning, design a suitable representation of persona state over time, optimize the final training format, and explore whether data augmentation improves generalization. The central technical objective is to fine-tune a shared model using managed training infrastructure, with a particular focus on temporal coherence, responsiveness to news, and the believability of generated responses.
The project is designed as a coordinated team effort. Different student groups will work on model evaluation, data representation, training, and evaluation, but all groups contribute to a single final model and a shared experimental analysis.
The project also asks students to reflect critically on what such models can and cannot capture about real voters, especially when the available data are social-media traces and interview material rather than direct access to beliefs themselves.- Organisation
Up to 12 Master of Logic students can participate.
We work together as a single team, but students are dived into groups based on tasks. Below are some indicative group tasks / goals, but these can be expanded / changed depending on the projects' needs.
1. Base-model evaluation group
Compares several candidate open-source LLMs before fine-tuning. Designs the pre-fine-tuning benchmark and identifies which base model gives the most believable and temporally coherent outputs, including on Dutch or multilingual material where relevant.
2. Data representation and augmentation group
Builds the final data format for tweets, dated news, demographics, and interview transcripts. Reviews and substitites / conceals personally identifiable information prior to training. Designs train/dev/test splits, studies whether summarization or retrieval should be used, and explores data augmentation where appropriate.
3. Fine-tuning and infrastructure group
Sets up the actual training runs, manages prompts and schemas for the final dataset, runs LoRA or other parameter-efficient fine-tuning experiments, and tracks checkpoints and experimental results using Tinker.
4. Evaluation and analysis group
Designs held-out evaluation, including realism, temporal consistency, persona fidelity, and response-to-news sensitivity. Performs error analysis and helps determine which checkpoint should count as the final model.
Weekly structureWe meet twice a week for 1 to 2 hours in room 3.40, JK building, Roeterseilandcampus, starting Tuesday June 2nd. We meet on Tuesdays and Fridays at 11am. The project's last meeting is Tuesday 30th of June. Students can work in the room or the Computational Social Science common space (3rd floor of JK building), which has purpose-built rooms for "project-based learning". Students are expected to work full time on the project during those 4 weeks. Remote attendence to the meeting is typically not possible.
- Prerequisites
- No prior background in polling or social science is required.
- Each group needs some students with basic Python experience.
- Familiarity with machine learning, large language models, data analysis, or experimental evaluation is a plus.
- Reading knowledge of Dutch is helpful, though not mandatory, given that part of the material includes interviews with Dutch participants.
- Assessment
The coordinated project is graded PASS/FAIL. A pass will be awarded if students:
1. Deliver a functioning fine-tuned model and training pipeline.
The final system must:- include a documented comparison of several base open-source models before fine-tuning;
- produce at least one completed fine-tuning run on the agreed dataset;
- yield a final checkpoint or runnable inference setup that can be inspected by the instructor;
- be reproducible enough that the training and evaluation pipeline can be understood and rerun.
2. Provide clear documentation of the modelling and evaluation design.
The written report should:- define the final data representation for persona history, demographics, date, news, and interview material;
- explain the rationale for the chosen base model and fine-tuning strategy;
- describe any augmentation or preprocessing steps;
- explain the held-out evaluation protocol and the criteria for believability, temporal consistency, and persona fidelity;
- discuss limitations, risks, and possible failure modes.
3. Give a coherent final presentation and demonstration.
Students will jointly present:- the overall project architecture;
- the pre-fine-tuning benchmark and why a particular base model was chosen;
- the final fine-tuned model and representative outputs;
- a critical discussion of what the model captures, what it misses, and what ethical constraints should apply.
A well-documented negative or mixed result is acceptable, provided the project delivers a serious pipeline, evaluation, and critical analysis.
- References
Cerina, R., & Duch, R. M. (2026). Artificially intelligent opinion polling. Royal Society Open Science, 13(3), 251150.
Sun, C., Li, J., Chan, H. P., Zhai, C., & Ji, H. (2023, July). Measuring the effect of influential messages on varying personas. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 554-562).Yang, F., Dragut, E., & Mukherjee, A. (2020, December). Predicting personal opinion on future events with fingerprints. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 1802-1807).
Liu, D., Wu, Z., Song, D., & Huang, H. Y. (2025, July). A Persona-Aware LLM-Enhanced Framework for Multi-Session Personalized Dialogue Generation. In Findings of the Association for Computational Linguistics: ACL 2025 (pp. 103-123).
Yang, W., Li, Y., Fang, M., & Chen, L. (2025, April). MTPChat: A multimodal time-aware persona dataset for conversational agents. In Findings of the Association for Computational Linguistics: NAACL 2025 (pp. 5815-5826).
Jin, X., Zhang, D., Zhu, H., Xiao, W., Li, S. W., Wei, X., ... & Ren, X. (2022, July). Lifelong pretraining: Continually adapting language models to emerging corpora. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4764-4780).