Projects

1st Semester 2025/26: Building and Surveying Synthetic Voters with LLMs

Instructors
Roberto Cerina
ECTS
6
Description

Large language models and recent work on “silicon sampling” suggest that, under some conditions, synthetic agents can be used as stand‑ins for human respondents in public opinion research. This project is focused on building the logical and epistemic underpinnings of such synthetic polling and build a working prototype of an AI pollster.

Students will design a simple formal representation for synthetic polling personae (including demographic traits and attitude predicates), use a language model to answer polling questions on their behalf, and aggregate those answers into opinion estimates with explicit uncertainty. At the same time, they will implement an “ask‑the‑persona” interface: a conversational front‑end where users can query individual personae about the reasons behind their answers.

Participants will be divided into two collaborating groups:

  1. Synthetic persona group
    • specifies a small logical language for persona profiles and constraints;
    • implements a constrained sampler that generates persona profiles;
    • connects these profiles to a language model to answer polling questions.
  2. Ask‑the‑persona group
    • designs a chat interface that conditions on persona profiles;
    • formulates logical/semantic constraints on acceptable answers (faithfulness to the profile, no outright contradictions, explicit hedging under uncertainty);
    • exposes basic uncertainty information and an audit trail to the user.

The two groups jointly produce a single, minimal web application that integrates both components: it should display headline polling estimates and allow interactive conversations with selected personae that make up the poll. Throughout, students are expected to make their logical, semantic and probabilistic assumptions explicit and to reflect on what kind of evidence such a synthetic pollster can, and cannot, provide.

Organisation

Up to 10 Master of Logic students can participate.

We have physical space where we work together two times per week ad-minimum at room JK3.02, Roeterseilandcampus.

Students will work in two fixed groups of roughly equal size:

  • a persona / polling backend group;
  • an ask‑the‑persona interface group.

The project runs over four weeks (standard 6 EC workload, ~168 hours):

  • Week 1 – Common foundations.
    Introductory meetings on public‑opinion polling, stratification, and recent work on silicon sampling. We fix shared data formats (e.g. a JSON schema for persona profiles), interfaces between the two groups, and a minimal list of polling questions. Each group writes a short design note outlining its formal framework and technical plan.
  • Weeks 2–3 – Group work with regular integration.
    Each group works semi‑independently on its component (representation and sampling vs. dialogue and interface), with at least two joint meetings per week to synchronise interfaces, discuss logical choices, and test partial integrations. Students with stronger programming skills can focus more on implementation; others can focus more on formal specification, semantics, and evaluation design.
  • Week 4 – Integration, testing, and presentation.
    The groups are expected to integrate their components into a single runnable web application (local or simple server deployment), test it on a small set of example questions, and prepare:
    • a short joint report (approx. 8–10 pages) explaining the logical framework, implementation, and limitations;
    • a final demonstration talk where they walk through the web application, its assumptions, and typical use‑cases.

Meetings will take place primarily in the AI POP‑Up Lab space (JK3.02) on the CSSci floor at Roeterseiland Campus (3rd floor), with additional online coordination if needed.

Prerequisites

No prior background in polling or social science is required.

Students should:

  • be comfortable with basic propositional and first‑order logic;
  • have an interest in at least one of:
    • logic & computation (formal specification, constraint solving, verification),
    • logic & language (semantics/pragmatics of dialogue, meaning representation),
    • logical aspects of probability and uncertainty;
  • each group needs students with basic programming experience, preferably in Python (or a similar language), and willingness to work with simple web technologies (e.g. a lightweight web framework or notebook‑based interface).

Familiarity with large language models, statistics, or web development is a plus but not strictly required; the approach can be tailored to students’ backgrounds.



Assessment

The coordinated project is graded PASS/FAIL. A pass will be awarded only if students:

  1. Deliver a functioning prototype web application.
    The final system must:
    • generate synthetic personae using the agreed formalism;
    • obtain answers to polling questions via a language model;
    • provide an “ask‑the‑persona” chat interface that is constrained by the persona profile;
    • run reliably enough for the instructor to inspect and interact with it.
  2. Provide clear documentation of the logical and probabilistic design.
    The written report should:
    • define the formal language or data structures used for persona profiles;
    • explain the aggregation and uncertainty scheme used for opinion estimates;
    • describe the semantic/epistemic constraints imposed on persona dialogue;
    • discuss limitations and possible failure modes.
  3. Give a coherent final presentation and demonstration.
    Students will jointly present:
    • the overall architecture of the pollster;
    • a live demonstration of the web application;
    • a critical discussion of when, if at all, synthetic polling can be informative for real‑world public opinion.

The emphasis is on the quality of the actual system the students develop (its functionality, robustness, and conceptual clarity), not solely on a theoretical write‑up. A purely theoretical project or a partially implemented prototype will not be sufficient for a pass.



References

Cerina, R., & Duch, R. (2023). Artificially intelligent opinion polling. arXiv. https://doi.org/10.48550/arXiv.2309.06029

Cerina, R. (2025). PoSSUM: A protocol for surveying social-media users with multimodal LLMs. arXiv. https://doi.org/10.48550/arXiv.2503.05529

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351.

Bisbee, J., Clinton, J. D., Dorff, C., Kenkel, B., & Larson, J. M. (2024). Synthetic replacements for human survey data? The perils of large language models. Political Analysis, 32(4), 401–416.

Peters, H., & Matz, S. C. (2024). Large language models can infer psychological dispositions of social media users. PNAS Nexus, 3(6), pgae231.