“Synthetic sample” has recently emerged as a term describing survey responses generated by artificial intelligence (AI) models instead of real people. Put simply, synthetic sample means using AI to predict what a human’s response would be to a given question, rather than asking actual individuals.
While this approach can offer speed and lower costs, it is crucial to recognise that a synthetic sample has only a very limited and carefully defined role in social research. For public sector and policy work, relying on a synthetic sample risks undermining the integrity and democratic value of research. Those risks are significant as synthetic samples can:
For social research, where understanding authentic public opinion is essential, synthetic sample cannot replace the richness and legitimacy of real human responses. At Verian, we are committed to ensuring that real voices remain at the heart of our work, especially when research informs public policy or democratic decisions.
Synthetic sample involves training an AI model to simulate how real human respondents might answer survey questions. This means the AI is predicting responses based on patterns it has learned from existing data. At first glance, the difference to direct prompting seems minor: in both cases, a language model provides answers to questions we would otherwise ask humans.
In practice, however, the expectations and applications differ. As opposed to simple prompting using general AI foundation models, suppliers of synthetic sample claim that they can specifically train an AI model with historical survey data to simulate real human respondents in a survey and give the answers they would have given. This supposedly differs from more general AI models who simulate the average human response.
While AI-driven data augmentation of existing data can supplement traditional statistical data augmentation methods and AI can support certain technical tasks, the use of a synthetic sample only gives cause for concern.
AI prompting in general, and synthetic sample specifically, both struggle with a central issue: lack of transparency in training data, prompt strategies, and model parameters. Good research thrives on maximum transparency, which is often missing here.
Even small prompt variations can lead to drastically different results. Different models, temperature settings, and sampling parameters affect the results but the direction and size of each parameter’s effect is rarely documented. Without clear disclosure, trial-and-error replaces methodological precision.
Large Language Models (LLMs) naturally produce diversity – useful for creative tasks, but problematic for empirical research. Even supposedly “representative” synthetic samples show high variance across runs; a robust “best setting” doesn’t exist. Correlations between samples across quality metrics are often low, and relationships between questionnaire variables fluctuate. This complicates replication and raises the question: What is the “true” synthetic distribution?
If someone wants to highlight a particular story in the data, they can tweak prompts and parameters until the result fits. Without versioned prompts, model IDs, timestamps, and sensitivity analyses, it is impossible to assess how reliable a finding really is. Even small methodological missteps or on-purpose tweaking can have major substantive consequences.
The quality of synthetic responses depends heavily on training data, which remains largely undisclosed in foundation models.
Moreover, an LLM can never consider all characteristics of human respondents: their memories, emotions, experiences, personality, and other unique factors which fundamentally influence the answers individuals might give in a survey. This limitation means that synthetic responses, no matter how sophisticated, cannot fully capture the richness and diversity of real human perspectives.
Even if suppliers of synthetic samples claim they trained their model with real historic survey data, it remains unclear how they got access to often proprietary surveys, how many of which quality they fed into their model and if the amount of survey data they could source is enough to adequately train their model. Synthetic sample represent a complete departure from standard, transparent imputation methods that retain the covariance structure of the real data used to ‘train’ the imputation model.
Traditional survey research draws conclusions about a population from samples with defined selection probabilities, error margins, and significance tests.
Synthetic samples break this logic: there is no reliable link between model-generated responses and real population parameters. Bootstrapping can test stability within the simulation, but the bridge to the real population remains speculative. The fundamental problem is not just lack of defined selection probabilities. It is that LLMs do not represent a sample from any real population. They represent patterns in training data, which systematically underrepresents marginalized groups, non-English speakers, and offline populations. Claiming results are 'representative' without valid anchor data is methodologically indefensible, and for seldom heard groups where no reliable data exists, synthetic samples compound rather than solve equity problems.
From a statistical perspective, if you simply generate a large enough synthetic sample size, every difference becomes statistically significant and thus loses its meaning. The illusion of significance in massive synthetic datasets can mask the absence of true population insights and undermine the interpretability of results.
Models also tend to flatten variance (answers are clustered around the mean) and overlook nuances, such as value orientations or emerging topics. Yet this is precisely where social research seeks new insights, not reproducing the known.
Social and market research serve a democratic function: to make citizens’ voices heard. If these voices are replaced by AI outputs, two effects threaten the industry and more generally evidence-based decision-making:
By using synthetic sample for political decision-making, three forms of democratic legitimacy are harmed simultaneously:
Inclusion of all citizens has always been central for political and social research as a normative principle of research ethics. For decisions with major impact or involving high budgets, therefore, real people must be surveyed. If politicians base policy decisions on synthetic data, citizens may ask: what is the difference between that and simply asking ChatGPT for the “most popular ” political decision?
Political decisions are often made in an environment characterized by new challenges and unprecedented crises. Only when surveying real humans do truly new insights emerge - turning points in opinions, new topics or subtle dynamics not found in training corpora. Election and political research illustrate this challenge well. It is real-time, context-rich, and challenging even for experienced research teams . Synthetic samples do not reliably capture this complexity, especially when offline discourse or non-articulated sentiments shape voting decisions.
Despite its limitations, there are useful and justifiable applications of synthetic sample, especially where the goal is not statistical inference about a population, but exploratory, iterative, or supplementary tasks:
However, even in these areas, synthetic data does not replace validation with real respondents. It shortens paths, improves instruments, and helps allocate resources more effectively.
While synthetic data therefore has legitimate applications, these succeed precisely because they do not claim to measure actual public opinion. Synthetic data serves auxiliary functions within larger research processes anchored in real-world observation. It is never the sole or definitive source for claims about human behaviour, preferences, or opinions.
Synthetic samples are an exciting tool in the methodological toolbox – especially for faster iteration, refining instruments, and preparing hypotheses. However, they are not a substitute for human data collection when it comes to reliable, legitimate, and democratically relevant insights. Even if AI models advance to address issues with reproducibility and training data, challenges around transparency, democratic legitimacy and statistical inference remain.
At Verian, our mission remains to make new societal developments visible – especially those that (still) are not in the training corpus. That is why we use AI wisely, document transparently, validate robustly and we put real people front and centre.
Bisbee J, Clinton JD, Dorff C, Kenkel B, Larson JM. Synthetic Replacements for Human Survey Data? The Perils of Large Language Models. Political Analysis. 2024;32(4):401-416. doi:10.1017/pan.2024.5
Boelaert, J., Coavoux, S., Ollion, É., Petev, I., & Präg, P. (2025). Machine Bias. How Do Generative Language Models Answer Opinion Polls?1. Sociological Methods & Research, 54(3), 1156-1196.
Cummins, J. (2025) The threat of analytic flexibility in using large language models to simulate human data: A call to attention. https://arxiv.org/abs/2509.13397
Lutz, M., Sen, I., Ahnert, G., Rogers, E., & Strohmaier, M. (2025). The prompt makes the person (a): A systematic evaluation of sociodemographic persona prompting for large language models. arXiv preprint arXiv:2507.16076.
Morris, Elliott G. 2025. “Your Polls on ChatGPT.” Verasight White Paper Series.
Morris, G. Elliott, Benjamin Leff, and Peter K. Enns. 2025. “The Limits of Synthetic Samples in Survey Research” Verasight White Paper Series.
Sarstedt, M., S. Adler, L. Rau, and Bernd Schmitt 2024 “Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines..” vol. 41, no. 6: 1254-1270.
Von der Heyde, L., Haensch, A.-C., & Wenz, A. (2025). Vox Populi, Vox AI? Using Large Language Models to Estimate German Vote Choice. Social Science Computer Review, 0(0).
Wang, A., Morgenstern, J., & Dickerson, J. P. (2024). Large language models that replace human participants can harmfully misportray and flatten identity groups. arXiv preprint arXiv:2402.01908.