Building and Using a Synthetic Expert Panel for National Capability Analysis

Building and Using a Synthetic Expert Panel for National Capability Analysis

The use of synthetic simulation in the early development of the National Capability Framework required more than isolated, one-off model outputs. To meaningfully stress-test assumptions, compare perspectives, and observe patterns across judgments, we needed a coherent and internally consistent synthetic panel, one that could approximate the diversity, structure, and variance found in real expert communities. This article describes how that synthetic panel was constructed, how it was used, and how its outputs were analysed.

The work is exploratory by design. The purpose is not to assert truth, but to create a disciplined environment in which hypotheses about national capability measurement, evaluation techniques, and expert judgment can be examined.

A. Panel Construction

The first step was the deliberate construction of synthetic experts. Rather than treating “the expert” as a generic respondent, we built structured biographical personas that reflect the diversity of how real experts think, decide, and assess risk.

Across the full framework, spanning roughly 250 distinct capabilities, we generated a cohort of more than 50,000 unique biographical personas. From this cohort, synthetic panels are drawn to ensure continuity across survey rounds while still allowing controlled variation. For the prototype described here, we assume a panel size of 100 respondents.

Three core design principles guided panel construction.

1. Geographic and cultural diversity.

No single country is permitted to account for more than 5% of the panel. Country weights are informed by population and GDP, but moderated to avoid dominance by large economies. Representation rotates across regions, including Africa, the Middle East, South Asia, East Asia, Europe, and Latin America, to ensure that assessments reflect diverse geopolitical and cultural frames of reference rather than a narrow transatlantic or OECD-centric lens.

2. Decision-making personas.

Each respondent is assigned a structured set of persona attributes that shape how they evaluate evidence, frame uncertainty, and form judgments.

Synthetic Panel Persona Attributes

  1. Respondent Country. Country providing the respondent’s social, cultural, and geopolitical frame of reference.
  2. Epistemic Orientation. Qualitative, Mixed, or Quantitative
  3. Judgement Method. Analytic, Technical-Operational, Political-Strategic– Normative-Standards
  4. Risk Orientation. Resilience-First, Adaptive Base-Case, Expansionist
  5. Institutional Vantage. State Policy, State Operational, Market Financial– Market Industrial, Civic Knowledge

These attributes are not intended to represent personality, but rather the dominant analytical lenses through which experts in practice tend to interpret complex capability questions. The intent is to simulate not only differences in what experts know, but systematic differences in how they reason.

Epistemic orientation captures how a respondent primarily forms beliefs and evaluates evidence. Respondents with a qualitative orientation privilege case studies, institutional narratives, historical analogies, and lived experience, often placing weight on context and path dependence over formal metrics. Those with a quantitative orientation prioritise models, datasets, trend analysis, and forecast-based reasoning, and tend to be more sensitive to scale, rates of change, and measurable performance. A mixed orientation reflects the reality of many expert communities, blending structured data with contextual interpretation and using each to cross-check the other. This dimension is particularly important for capabilities where data availability is uneven or contested.

Judgement method reflects the primary analytical toolkit a respondent uses to interpret a capability. An analytic method emphasises legal frameworks, economic incentives, policy design, and institutional arrangements. A technical-operational method focuses on engineering feasibility, system reliability, logistics, operational constraints, and execution risk. A political-strategic method situates capability within geopolitical competition, alliance structures, conflict scenarios, and strategic leverage. A normative-standards method emphasises regulation, rule-setting, intellectual property, safety regimes, and international norms. This attribute shapes not only conclusions, but which factors are treated as decisive versus secondary.

Risk orientation captures a respondent’s baseline stance toward uncertainty, disruption, and future trajectories. A resilience-first orientation prioritises redundancy, robustness, safety margins, and performance under stress, often leading to more conservative assessments and cautious outlooks. An adaptive base-case orientation reflects a balanced posture, weighting central estimates and plausible scenarios while acknowledging both upside and downside risks. An expansionist orientation places greater emphasis on opportunity, frontier advancement, and growth potential, and may assign greater weight to emerging trajectories even when current performance is uneven. This dimension strongly influences outlook assessments and the framing of upward and downward triggers.

Institutional vantage represents the organisational position from which a respondent views the capability landscape, shaping incentives, constraints, and evaluative priorities. Respondents with a state policy vantage focus on regulation, national strategy, public investment, and long-term coordination. A state operational vantage emphasises delivery, maintenance, system integration, and real-world performance of infrastructure and services. A market financial vantage centres on capital allocation, risk-adjusted returns, scalability, and financing structures. A market industrial vantage reflects the concerns of firms embedded in supply chains, production systems, and operational markets. A civic knowledge vantage, common in academia and civil society, emphasises evidence quality, institutional accountability, social impact, and long-term public value. This attribute often explains why equally informed experts reach different conclusions.

Taken together, these attributes allow the synthetic panel to reproduce patterned disagreement rather than random noise. They make it possible to observe how different ways of knowing, judging, and managing risk interact with the same underlying capability evidence—an essential step in understanding both the strengths and limitations of any national capability framework.

3. Capability-specific expertise.

Every respondent is treated as an expert in the specific capability they are assessing. This avoids the artefact of generalist responses and allows persona differences to express themselves through interpretation, emphasis, and risk framing rather than through gaps in domain knowledge.

To illustrate the diversity this produces, example biographies include respondents framed as: a US-based infrastructure financier with a quantitative bias; a Chinese state-operational planner focused on scale and delivery; an Indian policy analyst balancing development constraints and institutional capacity; an Iranian technical specialist shaped by sanctions and resilience constraints; and respondents from other regions whose perspectives are informed by smaller markets, import dependence, or regulatory exposure.

The result is not a claim about who should be on a panel, but a controlled approximation of how diverse expert communities often look in practice.

B. Synthetic Expert Assessments

Once the panel is constructed, each synthetic expert is asked to provide a full assessment for a specific capability, country, and year. All respondents operate against a shared generational rubric, ranging from Planning to Frontier, which anchors judgments in capability maturity rather than outputs or outcomes

Capability Evaluation Rubric

  • Frontier. Global leader at the cutting edge, with indigenous innovation, standard-setting influence, and robust, resilient capability.
  • Advanced. Highly developed and competitive, using state-of-the-art methods; next-generation leadership often adopted rather than led.
  • Developed. Mature, reliable, and institutionalised capability meeting current best practice, but not at the global frontier.
  • Intermediate. Mixed capability with uneven performance; some modern elements alongside outdated or transitional systems.
  • Foundation. Basic capability structures in place, but limited in scope, fragile, or reliant on older generations.
  • Emerging. Early pilots or prototypes only; small-scale, fragile, and not yet institutionalised.
  • Planning. Capability exists in strategy or intent only, with no operational deployment.

Each synthetic expert assessment consists of five components:

  • Capability level, selected from seven rubric categories.
  • Justification, a 100–200 word narrative written in a style consistent with the respondent’s persona attributes.
  • Outlook, selected from Positive, Stable, or Negative, reflecting the balance of risks over a 2–3 year horizon.
  • Upward triggers, describing conditions or developments that could plausibly lead to a higher capability level.
  • Downward triggers, describing risks that could lead to deterioration.

Prompts are structured to ensure consistency while still allowing interpretation. Respondents are explicitly instructed to answer as unique human experts embedded within a larger panel, rather than as detached analysts. Persona attributes are included verbatim in the prompt to anchor tone, emphasis, and framing.

This design allows the same capability to be assessed multiple times through different lenses. For example, a state-operational expert may emphasise delivery risk and system reliability, while a market-financial respondent may focus on capital discipline and project economics, even when selecting the same capability level.

C. Evaluation and Synthesis of Results

Following data collection, assessments are transformed and analysed in several stages.

First, categorical capability levels are mapped onto a 0–20 scale to allow quantitative comparison and downstream analysis. This is not treated as a definitive score, but as a convenience for exploring distributions, variance, and sensitivity.

Second, textual justifications are analysed collectively. Rather than averaging narratives, we examine recurring themes, points of convergence, and notable divergences. This allows us to identify where consensus forms quickly and where judgments are highly sensitive to framing or assumptions.

Third, upward and downward triggers are synthesised to surface common pathways of improvement and shared risk factors. This often reveals asymmetries: many respondents may agree on what could drive improvement, but differ sharply on which risks are most salient.

Importantly, this stage moves beyond simple aggregation. Techniques such as frontier distance and dominance relationships are used to compare multi-dimensional capability profiles, allowing us to observe whether a country’s strength in one dimension meaningfully offsets weakness in another, or whether certain deficits dominate overall assessments.

D. Outlier and Bias Analysis

A final layer of analysis focuses explicitly on outliers and potential bias.

We examine whether assessments systematically differ by respondent attributes such as country of origin, epistemic orientation, or institutional vantage. For example, do expansionist respondents consistently assign higher outlooks? Do state-policy experts weight institutional coherence more heavily than market-industrial respondents?

Outliers are not treated as errors to be removed, but as signals to be understood. In some cases, they reveal genuine alternative readings of the same evidence. In others, they expose assumptions embedded in the rubric itself.

This analysis is essential for evaluating the synthetic simulation approach. If results collapse into uniformity, the panel is too homogeneous. If variance is unstructured and extreme, the personas are not sufficiently constrained. The goal is neither consensus nor noise, but interpretable diversity.

E. Planned Evaluation of Synthetic Simulation Efficacy

A critical next step in this research program is the explicit evaluation of the efficacy of synthetic simulation itself. While the current phase uses synthetic panels to support hypothesis development and methodological stress-testing, the subsequent phase will introduce human expert panels assessing the same capabilities, countries, and time horizons using identical rubrics and instruments.

This design creates an empirical foundation for comparing synthetic and human-generated assessments. Rather than treating alignment as a binary success criterion, the analysis will examine where synthetic simulations converge with human judgment, where they systematically diverge, and under what conditions those differences arise. Particular attention will be paid to distributions of scores, variance across respondent attributes, framing of justifications, and the structure of upward and downward triggers.

The introduction of human experts allows several methodological questions to be tested directly. First, it enables assessment of whether synthetic panels reproduce the shape of expert disagreement observed in real panels, rather than merely central tendencies. Second, it allows examination of whether certain persona attributes—such as epistemic orientation or institutional vantage—exert similar effects in both synthetic and human samples. Third, it provides a basis for evaluating whether dominance-based and frontier-oriented evaluation techniques behave consistently across synthetic and empirical inputs.

Importantly, this exercise is not framed as an attempt to “validate” synthetic simulation as a replacement for human expertise. Instead, it is designed to clarify the appropriate role and limits of synthetic methods within national capability research. Areas of close alignment may indicate where synthetic simulation can be safely used for early-stage exploration, scenario testing, or instrument refinement. Areas of divergence may highlight where lived experience, contextual intuition, or tacit knowledge remain essential.

By sequencing synthetic and human panels in this way, the framework itself is subjected to a higher standard of scrutiny. If synthetic simulations consistently reinforce fragile assumptions or mask important disagreements later revealed by human experts, those weaknesses can be addressed early. Conversely, if synthetic outputs reliably anticipate expert patterns, this strengthens confidence in their use as a methodological scaffold.

This planned efficacy exercise therefore serves two purposes simultaneously: it strengthens the empirical grounding of the National Capability Framework, and it contributes to a broader methodological understanding of when and how synthetic simulation can be responsibly used in expert-driven domains.

Closing Perspective

The synthetic panel described here is not a substitute for real experts. It is a methodological instrument designed to help refine frameworks, improve measurement techniques, and clarify where disagreement is structural rather than accidental.

For the Global Institute for National Capability, the value of this approach lies in what it makes visible: hidden assumptions, fragile constructs, and the conditions under which evaluation techniques succeed or fail. The ultimate test of national capability analysis will always rest on real data and real judgment. Synthetic panels simply allow that work to begin with greater discipline, transparency, and humility.