Synthetic Sample in Social Research: significant limitations of AI generated responses

Written by Verian Group | Dec 16, 2025 12:25:00 AM

“Synthetic sample” has recently emerged as a term describing survey responses generated by artificial intelligence (AI) models instead of real people. Put simply, synthetic sample means using AI to predict what a human’s response would be to a given question, rather than asking actual individuals.

While this approach can offer speed and lower costs, it is crucial to recognise that a synthetic sample has only a very limited and carefully defined role in social research. For public sector and policy work, relying on a synthetic sample risks undermining the integrity and democratic value of research. Those risks are significant as synthetic samples can:

introduce hidden biases,
lack transparency, particularly with respect to their generation and
fail to capture the diversity and nuance of real people’s views and evolving opinion on current, fast-moving issues

For social research, where understanding authentic public opinion is essential, synthetic sample cannot replace the richness and legitimacy of real human responses. At Verian, we are committed to ensuring that real voices remain at the heart of our work, especially when research informs public policy or democratic decisions.

What is “Synthetic Sample”- and how is it different from direct AI prompting?

Synthetic sample involves training an AI model to simulate how real human respondents might answer survey questions. This means the AI is predicting responses based on patterns it has learned from existing data. At first glance, the difference to direct prompting seems minor: in both cases, a language model provides answers to questions we would otherwise ask humans.

In practice, however, the expectations and applications differ. As opposed to simple prompting using general AI foundation models, suppliers of synthetic sample claim that they can specifically train an AI model with historical survey data to simulate real human respondents in a survey and give the answers they would have given. This supposedly differs from more general AI models who simulate the average human response.

While AI-driven data augmentation of existing data can supplement traditional statistical data augmentation methods and AI can support certain technical tasks, the use of a synthetic sample only gives cause for concern.

Transparency is a challenge in synthetic sampling

AI prompting in general, and synthetic sample specifically, both struggle with a central issue: lack of transparency in training data, prompt strategies, and model parameters. Good research thrives on maximum transparency, which is often missing here.

Even small prompt variations can lead to drastically different results. Different models, temperature settings, and sampling parameters affect the results but the direction and size of each parameter’s effect is rarely documented. Without clear disclosure, trial-and-error replaces methodological precision.

Reproducibility: When identical questions yield different “samples”

Large Language Models (LLMs) naturally produce diversity – useful for creative tasks, but problematic for empirical research. Even supposedly “representative” synthetic samples show high variance across runs; a robust “best setting” doesn’t exist. Correlations between samples across quality metrics are often low, and relationships between questionnaire variables fluctuate. This complicates replication and raises the question: What is the “true” synthetic distribution?

If someone wants to highlight a particular story in the data, they can tweak prompts and parameters until the result fits. Without versioned prompts, model IDs, timestamps, and sensitivity analyses, it is impossible to assess how reliable a finding really is. Even small methodological missteps or on-purpose tweaking can have major substantive consequences.

Training Data: Gaps, Biases, and the Illusion of Precision

The quality of synthetic responses depends heavily on training data, which remains largely undisclosed in foundation models.

Coverage of topics, data collection methods, timeliness, cultural contexts: much is unclear, outdated, or United States-centric.
Especially problematic is the digital divide: people with low online presence (e.g., older, less educated, rural groups) are underrepresented. What’s loud and frequent online shapes the model: quiet, offline attitudes are undercovered.
In political scenarios, analyses show ideological biases (liberal, left, and green views are overrepresented) and stereotypical simplifications that can lead to systematic measurement errors, especially for subgroups.
For election research, which is highly current, dynamic, and context-dependent, synthetic samples currently perform poorly: short-term trends, local issues, and offline influences escape the model’s training data.

Moreover, an LLM can never consider all characteristics of human respondents: their memories, emotions, experiences, personality, and other unique factors which fundamentally influence the answers individuals might give in a survey. This limitation means that synthetic responses, no matter how sophisticated, cannot fully capture the richness and diversity of real human perspectives.

Even if suppliers of synthetic samples claim they trained their model with real historic survey data, it remains unclear how they got access to often proprietary surveys, how many of which quality they fed into their model and if the amount of survey data they could source is enough to adequately train their model. Synthetic sample represent a complete departure from standard, transparent imputation methods that retain the covariance structure of the real data used to ‘train’ the imputation model.

Statistical Inference: Why synthetic data is not a shortcut to population insights

Traditional survey research draws conclusions about a population from samples with defined selection probabilities, error margins, and significance tests.

Synthetic samples break this logic: there is no reliable link between model-generated responses and real population parameters. Bootstrapping can test stability within the simulation, but the bridge to the real population remains speculative. The fundamental problem is not just lack of defined selection probabilities. It is that LLMs do not represent a sample from any real population. They represent patterns in training data, which systematically underrepresents marginalized groups, non-English speakers, and offline populations. Claiming results are 'representative' without valid anchor data is methodologically indefensible, and for seldom heard groups where no reliable data exists, synthetic samples compound rather than solve equity problems.

From a statistical perspective, if you simply generate a large enough synthetic sample size, every difference becomes statistically significant and thus loses its meaning. The illusion of significance in massive synthetic datasets can mask the absence of true population insights and undermine the interpretability of results.

Models also tend to flatten variance (answers are clustered around the mean) and overlook nuances, such as value orientations or emerging topics. Yet this is precisely where social research seeks new insights, not reproducing the known.

Democratic Theory & Legitimacy: Why real voices remain essential

Social and market research serve a democratic function: to make citizens’ voices heard. If these voices are replaced by AI outputs, two effects threaten the industry and more generally evidence-based decision-making:

loss of acceptance for results whose origins are opaque
loss of trust in science and democratic processes. If citizens cannot tell which surveys are legitimate and statistically sound, they may extend their scepticism to all studies, even methodologically solid ones.

By using synthetic sample for political decision-making, three forms of democratic legitimacy are harmed simultaneously:

input legitimacy (decisions do not reflect actual citizen preferences, only algorithmic predictions),
output legitimacy (biased results run the risk of not serving all populations equally), and
throughput legitimacy (opaque algorithms lack transparency required for governance and administration).

Decision-making based on insights from all citizens

Inclusion of all citizens has always been central for political and social research as a normative principle of research ethics. For decisions with major impact or involving high budgets, therefore, real people must be surveyed. If politicians base policy decisions on synthetic data, citizens may ask: what is the difference between that and simply asking ChatGPT for the “most popular ” political decision?

Political decisions are often made in an environment characterized by new challenges and unprecedented crises. Only when surveying real humans do truly new insights emerge - turning points in opinions, new topics or subtle dynamics not found in training corpora. Election and political research illustrate this challenge well. It is real-time, context-rich, and challenging even for experienced research teams . Synthetic samples do not reliably capture this complexity, especially when offline discourse or non-articulated sentiments shape voting decisions.

Meaningful use cases: where Synthetic Sample can add value in social research

Despite its limitations, there are useful and justifiable applications of synthetic sample, especially where the goal is not statistical inference about a population, but exploratory, iterative, or supplementary tasks:

Communication & Creative Testing of design or product preferences: gauging initial reactions to claims, tones, or visuals to sharpen hypotheses for real fieldwork when campaigns are targeting general population rather than niche audiences. In low-risk settings with many historical comparison studies, AI can help narrow down options and reduce respondent burden.
Cognitive Pretests: heuristically testing question comprehension, response scales, or instructions before piloting with humans .
Data Augmentation: supplementing existing real data, e.g., simulating “don’t know” responses or cautiously boosting small subgroups.

However, even in these areas, synthetic data does not replace validation with real respondents. It shortens paths, improves instruments, and helps allocate resources more effectively.

While synthetic data therefore has legitimate applications, these succeed precisely because they do not claim to measure actual public opinion. Synthetic data serves auxiliary functions within larger research processes anchored in real-world observation. It is never the sole or definitive source for claims about human behaviour, preferences, or opinions.

Embrace Progress, But Do not Replace Human Voices

Synthetic samples are an exciting tool in the methodological toolbox – especially for faster iteration, refining instruments, and preparing hypotheses. However, they are not a substitute for human data collection when it comes to reliable, legitimate, and democratically relevant insights. Even if AI models advance to address issues with reproducibility and training data, challenges around transparency, democratic legitimacy and statistical inference remain.

At Verian, our mission remains to make new societal developments visible – especially those that (still) are not in the training corpus. That is why we use AI wisely, document transparently, validate robustly and we put real people front and centre.

References:

Bisbee J, Clinton JD, Dorff C, Kenkel B, Larson JM. Synthetic Replacements for Human Survey Data? The Perils of Large Language Models. Political Analysis. 2024;32(4):401-416. doi:10.1017/pan.2024.5

Boelaert, J., Coavoux, S., Ollion, É., Petev, I., & Präg, P. (2025). Machine Bias. How Do Generative Language Models Answer Opinion Polls?1. Sociological Methods & Research, 54(3), 1156-1196.

Cummins, J. (2025) The threat of analytic flexibility in using large language models to simulate human data: A call to attention. https://arxiv.org/abs/2509.13397
Lutz, M., Sen, I., Ahnert, G., Rogers, E., & Strohmaier, M. (2025). The prompt makes the person (a): A systematic evaluation of sociodemographic persona prompting for large language models. arXiv preprint arXiv:2507.16076.

Morris, Elliott G. 2025. “Your Polls on ChatGPT.” Verasight White Paper Series.

Morris, G. Elliott, Benjamin Leff, and Peter K. Enns. 2025. “The Limits of Synthetic Samples in Survey Research” Verasight White Paper Series.

Sarstedt, M., S. Adler, L. Rau, and Bernd Schmitt 2024 “Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines..” vol. 41, no. 6: 1254-1270.

Von der Heyde, L., Haensch, A.-C., & Wenz, A. (2025). Vox Populi, Vox AI? Using Large Language Models to Estimate German Vote Choice. Social Science Computer Review, 0(0).

Wang, A., Morgenstern, J., & Dickerson, J. P. (2024). Large language models that replace human participants can harmfully misportray and flatten identity groups. arXiv preprint arXiv:2402.01908.

View full post