Virtual Trading Plus

Experiment settings

Trade settings (N = 6)
0
0
Risk preferences
33%
34%
33%
33%
34%
33%
AI endpoint Plan II · LLM + utility forms

Required for Plan II. Every period boundary, each Utility agent calls the LLM for a direct trading action (BUY_NOW, SELL_NOW, BID_1, BID_3, ASK_1, ASK_3, HOLD). The structured prompt includes market rules, decision-making principles, role-specific guidance, and the explicit closed-form utility function ($U_L$, $U_N$, $U_A$). Fields are read fresh from the DOM on every run (no localStorage); if the key is empty the Start button will refuse to launch.

AI endpoint Plan III · LLM + risk label only

Required for Plan III. Every period boundary, each Utility agent calls the LLM for a direct trading action. The structured prompt supplies market rules, decision-making principles, and only a natural-language risk-preference label (risk-loving, risk-neutral, or risk-averse) — no closed-form utility function is provided. Fields are read fresh from the DOM on every run (no localStorage); if the key is empty the Start button will refuse to launch.

Paper constants Dufwenberg, Lindqvist & Moore (2005), §I. Design
N = 6
Subjects per session
“At each session, six subjects participated in a sequence of four consecutive markets for an experimental asset.”§I, p. 1733
rounds / session = 4
Consecutive markets played by the same subjects
“A session involved four consecutive markets. In the following, we shall talk in terms of four different rounds. Note the distinction between rounds and periods; a round (being a market) consists of ten periods.”§I, p. 1733
= 10
Asset life, in periods (per round)
“An asset's life span is ten periods.”§I, p. 1732
dividend ∈ {0, 20}¢
Per-period draw, equiprobable
“In each period, it pays a dividend of 0 or 20 U.S. cents, with equal probability.”§I, p. 1732
= 10¢
Expected dividend per period
“The expected dividend in each period is 10 cents (= ½ × 0 cents + ½ × 20 cents).”§I, footnote 5
Fundamental value, by backward induction
“With k periods remaining, the fundamental value is k × 10 cents.”§I, p. 1732
endowment A 200¢, 6 shares
One of two discrete starting bundles
“Before a market opened, half of the traders started with 200 cents and six assets, while each of the other traders started with 600 cents and two assets.”§I, p. 1733
endowment B 600¢, 2 shares
The other discrete starting bundle
Both bundles have an identical buy-and-hold value of 1,000¢ under the risk-neutral fundamental ; the two types differ only in inventory/cash mix, so the split is distributional, not value-creating.§I, p. 1733
round-4 replacement R4-⅔ or R4-⅓
Two-treatment design
“In the fourth round, depending on treatment, two or four experienced subjects who had participated in the first three rounds were randomly selected, removed, and replaced by the same number of inexperienced subjects.” The paper labels these conditions by the fraction of experienced subjects remaining in round 4: R4-⅔ (four veterans + two fresh, shorthand T2) and R4-⅓ (two veterans + four fresh, shorthand T4); the R4-⅔ / R4-⅓ notation appears in the hypothesis row of Table 2.§I, p. 1733; Table 2, p. 1735
sessions = 10
Five per treatment (R4-⅔ and R4-⅓)
The multi-session batch runner in the DLM panel reproduces DLM's 10-session design by sequencing 5 × T2 (R4-⅔) then 5 × T4 (R4-⅓) through the simulator in one click; each session uses a fresh engine seed and a fresh two-type endowment draw.§I, Table 1
payoff Σ final cash + 500¢
Session payoff per subject
“Subjects were privately paid, in cash, the amount of their final cash holdings from each round. They were also paid a show-up fee of $5.” All four rounds count; shares held at the end of a round are worth nothing (the asset’s life span has ended).§I, p. 1735
Hidden Constants
ticks / period = 18
Agent decision rounds inside one period
DLM 2005 runs a continuous 2-minute z-Tree double auction per period; this simulator discretizes that window into 18 decision rounds (≈ one agent turn every 6.7 real-time seconds) so the engine loop can step deterministically. 18 is dense enough to reproduce the bubble-crash pattern while keeping the replay buffer compact.engine.js — period-boundary trigger
naive prior weight = 0.60
Belief blend for naive Utility agents
Weight on the agent's own prior when blending incoming peer messages: $V_i^{\text{post}} = 0.60 \cdot V_i^{\text{prior}} + 0.40 \cdot \bar{m}$, i.e. $w = 0.60$ in the Plan I formula (see Architecture Figure 3). Not specified by DLM 2005, which studies human subjects and has no belief-update model. Chosen so naive agents move noticeably toward peers without collapsing onto them.agents.js — UTILITY_DEFAULTS.naivePriorWeight
skeptical prior weight = 0.90
Belief blend for skeptical Utility agents
Same convex combination as the naive weight but $w = 0.90$: $V_i^{\text{post}} = 0.90 \cdot V_i^{\text{prior}} + 0.10 \cdot \bar{m}$, so a skeptical agent hears messages but is barely moved by them. Not in DLM 2005; introduced so the strategy cube contains a "listen but don't trust" archetype.agents.js — UTILITY_DEFAULTS.skepticalPriorWeight
adaptive weight cap = 0.50
Max one-period belief shift toward peers
Upper bound on the fraction of belief an adaptive agent can shift toward the trust-weighted message mean $\bar{m}$ in a single period: even with fully-trusted senders, $w \geq 0.50$ so $V_i^{\text{post}}$ is at most 50% $\bar{m}$ + 50% $V_i^{\text{prior}}$. Not in DLM 2005; guards against runaway over-update from a single high-trust period.agents.js — UTILITY_DEFAULTS.adaptiveWeightCap
valuation noise = ±3%
Per-tick uniform noise on the Utility-agent prior
Before any bias or message update, each Utility agent draws $\text{prior}_t = \mathrm{FV}_t \cdot (1 + b_i + \varepsilon)$, $\varepsilon \sim \mathcal{U}[-n,\, n]$, $n = 0.03$ (see Architecture Figure 2). Not in DLM 2005; added so trade decisions do not degenerate to lockstep when every biased/unbiased agent starts each tick from the identical prior.agents.js — UTILITY_DEFAULTS.valuationNoise
trust λ = 0.30
EMA learning rate for the pairwise trust update
Pairwise trust is updated as $\tau_{r \to s} \leftarrow (1 - \lambda)\,\tau_{r \to s} + \lambda \cdot \text{closeness}$, where closeness $= \max(0,\, 1 - |\hat{v}_s - \mathrm{VWAP}_t| / \mathrm{VWAP}_t)$. $\lambda = 0.30$ weights each new observation at 30%. Not in DLM 2005, which has no messaging layer; chosen for a balance between responsiveness and stability.engine.js — TrustTracker period close-out
passive fill probability = 0.30
$p_{\text{fill}}$ heuristic for scoring non-crossing quotes
Expected-utility score for a passive quote is $\mathrm{EU}(\alpha) = p_{\text{fill}} \cdot U(w_1) + (1 - p_{\text{fill}}) \cdot U(w_0)$ with $p_{\text{fill}} = 0.30$ (see Architecture Figure 2). A full model would estimate $p_{\text{fill}}$ from order-book state; this is a deliberate constant placeholder and is not proposed by DLM 2005.agents.js — UtilityAgent scoring loop
bias magnitude = 15%
Persistent over/under-valuation of biased Utility agents
Applied as $b_i = \delta_i \cdot \beta$ with $\beta = 0.15$; sign set by the per-slot bias direction $\delta_i \in \{-1, 0, +1\}$ (see Architecture Figure 2). Drives the biased U-agent slots in the default strategy cube (U2, U4, U5). Not in DLM 2005; chosen large enough to perturb the market without dominating the risk-preference split.agents.js — UTILITY_DEFAULTS.biasAmount
Session— / 10
Round1 / 4
Period1 / 10
Tick0
Price
Fundamental
Mispricing
Volume · period0

Agents Pre-run draft · editable before the simulation starts

Note

Cash
experimental-currency balance held by agent i at tick t, used to finance bids and grown by realized sales plus end-of-period dividend receipts. The pre-run editable value is the initial endowment ; in Dufwenberg, Lindqvist & Moore (2005) subjects were seeded with either 200¢ or 600¢, while this simulator draws each slot uniformly from [800, 1200] ¢.
Shares
holding of the finite-life asset at tick t (initial endowment ). Each held share pays a random dividend drawn from {0, 2} at the end of every trading period (DLM 2005), so the theoretical risk-neutral fundamental value at the start of period t is . DLM endowment classes held 6 or 2 shares; this simulator draws from {2, 3, 4}.
Wealth
mark-to-fundamental total wealth, defined as + · , or for Utility agents as + · (Lopez-Lira 2025). The Normalized Agent Utility plot is .
P&L
running change in total wealth relative to the initial endowment , reported in experimental cents. Positive values render in green, losses in red. Aggregated across all agents, P&L equals the cumulative dividends paid so the market is zero-sum up to the dividend stream, as in the Smith–Suchanek–Williams design replicated by DLM.
Subj V
the Utility agent's private subjective valuation per share — the posterior $V_i^{\text{post}}$ from the active plan (Architecture Figure 3), updated each tick from $\text{prior}_t = \mathrm{FV}_t \cdot (1 + b_i + \varepsilon)$ via the Plan I/II/III belief-revision protocol. Corresponds to the valuation field in Lopez-Lira's (2025) TradeDecisionSchema.
Report
the valuation the Utility agent broadcasts to peers in its messages. Under communication strategy $\sigma_m = D$ (deceptive), via the distortion multiplier $\phi_m$ (see Architecture Figure 3, Plan I card); the lie-gap magnitude drives the trust EMA update $\tau_{r \to s} \leftarrow (1 - \lambda)\,\tau + \lambda \cdot \text{closeness}$ and the mean-lie-magnitude statistic in the Experiment Metrics table.
Last action
the most recent decision taken by agent i at tick t, displayed as a coloured tag on the card. In Plan I the agent selects $\alpha^\star_{i,t} = \arg\max_\alpha \mathrm{EU}(\alpha)$ over $\alpha_{i,t} \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$ scored under the risk-typed utility functional (see Architecture Figure 2).
Subtitle
for classic agents, the strategy class (Fundamentalist, Trend follower, Random ZI, Experienced) together with set membership (, , , ). For Utility agents, the risk preference and its functional: Risk-loving (convex, upside-seeking), Risk-neutral (linear expected value), and Risk-averse (concave, downside-sensitive).

Trade & Dividend Feed

    Figure 1

    Transaction Price Trajectory versus Risk-Neutral Fundamental Value

    risk-neutral fundamental value at the start of period t

    Tick-level transaction prices (accent line, one dot per executed trade) plotted against the deterministic step function (amber dashes). Alternating vertical bands delimit the ten trading periods. In the Dufwenberg, Lindqvist & Moore (2005) design a rational market should track the step line exactly; persistent excursions above it are the bubble and the crash toward = in the final period is the collapse.

    Note

    observed transaction price at tick t
    theoretical fundamental value at the start of period t
    expected per-period dividend (drawn uniformly from {0, 2})
    terminal period of the finite-life asset

    Order Book

    BIDS
    PriceQtyAgent
      ASKS
      PriceQtyAgent
        Figure 2

        Mispricing Magnitude and Price-to-Fundamental Ratio

        · absolute and relative mispricing

        Absolute departure of the observed price from the theoretical fundamental value, filled as a red area. In Lopez-Lira (2025) the same information is expressed as the price-to-fundamental ratio : values above one mark an overvaluation regime, values below one mark an undervaluation regime, and ≈ 1 is consistent with rational pricing. The Experiment Metrics panel reports the normalized-deviation and amplitude statistics derived from this series.

        Note

        absolute mispricing at tick t
        price-to-fundamental ratio (Lopez-Lira 2025)
        Figure 3

        Trade Volume per Period

        shares transacted in period t

        Sum of share quantities exchanged within each trading period. High and persistent bars indicate active speculation; the classic Smith–Suchanek–Williams bubble is typically associated with a volume peak in the inflation phase followed by a cliff as the asset approaches expiry.

        Note

        total share volume traded in period t
        order quantity of a single executed trade
        Figure 4

        Transaction Density over Price × Period

        two-dimensional trade histogram

        Two-dimensional histogram of share quantity binned by transaction price (vertical axis) and trading period (horizontal axis). Warm cells concentrate the market's liquidity. Comparing the heat cloud against the downward-sloping fundamental staircase reveals whether the market is trading near rational value or persistently above it.

        Note

        cumulative share volume in the (price, period) bin
        Figure 5

        Agent Action Timeline

        per-tick agent decision
        Bid Ask Hold Executed

        One row per agent, one mark per decision. Column colour encodes the action type and a small accent dot below the mark records whether the submitted order was filled on the same tick. Contiguous green runs identify accumulators; contiguous red runs identify distributors; holds are the market's waiting population.

        Note

        action taken by agent i at tick t
        Figure 6

        Subjective Valuation: True versus Reported

        · lie gap = private belief versus broadcast claim

        Solid lines trace each Utility agent's private belief over time. Filled dots mark broadcast messages carrying a reported valuation ; deceptive reports are ringed red and connected to the sender's true belief by a dotted segment — the vertical distance between ring and line is the lie gap. The amber step line is the fundamental value for reference.

        Note

        agent i's private (true) subjective valuation at tick t
        valuation reported in a broadcast message
        lie gap for deceptive messages
        Figure 7

        Normalized Agent Utility over Time

        risk-adjusted wealth, normalized to initial endowment

        Per-agent expected utility evaluated at the running wealth = + · , divided by the agent's own initial utility so every trajectory starts at 1.0. Lines above the dashed baseline indicate positive risk-adjusted PnL; lines below indicate loss. The risk preference attached to each agent (convex, linear, concave) determines how aggressively a given wealth change is penalised or rewarded.

        Note

        risk-typed utility: , ,
        mark-to-fundamental wealth at tick t
        Figure 8

        Asset Ownership over Time

        shares held; total supply conserved

        Stacked area of each agent's inventory across ticks. Because the double auction conserves shares, the total height is always the aggregate endowment . Widening bands identify agents who are accumulating, shrinking bands identify distributors, and any dramatic redistribution in the last few periods is typically the experienced trader liquidating before the asset expires worthless.

        Note

        shares held by agent i at tick t
        total shares outstanding (conserved across time)
        Figure 9

        Broadcast Message Log

        per-tick public broadcast to all other agents
        Buy signal Sell signal Hold signal Deceptive

        One dot per broadcast message, placed on the sender's row at the tick the message was sent. Dot colour encodes the signal (buy/sell/hold) and a red ring flags messages whose reported valuation diverges sufficiently from the sender's private belief to be classified as deceptive by the logger. Reading a column shows the instantaneous rumour mill; reading a row shows each agent's rhetorical stance over time.

        Note

        broadcast from agent i to the population at tick t
        Figure 10

        Pairwise Trust Matrix

        exponential-moving-average update

        Heatmap of receiver-to-sender trust values in [0, 1]. The diagonal is masked. Each off-diagonal cell records how well sender s's recent valuation claims aligned with the period's volume-weighted average price, as seen by receiver r. Warm rows identify agents who tend to trust broadly; warm columns identify agents whose claims the population finds credible.

        Note

        trust held by receiver r in sender s
        trust learning rate (exponential-moving-average weight)
        1 − |claim − VWAP| / VWAP, clipped to [0, 1]
        Table 1

        Market-Quality Statistics (Current Session)

        Quantitative summary in the notation of Dufwenberg, Lindqvist & Moore (2005) and Lopez-Lira (2025). Haessel R² measures fit of the per-period mean price to fundamental value; the two normalized deviations capture total and average mispricing per share outstanding; amplitude is the peak-to-trough excursion of the mean-price residual normalized by the initial fundamental; turnover is the total shares traded divided by shares outstanding. The lower group reports allocative efficiency, aggregate welfare, and the deception statistics unique to the Utility population.

        Table 2

        10-Session Batch Results

        Per-round market-quality metrics across the 10-session DLM batch (5 × first treatment + 5 × second treatment). Each row is labelled Rr_Ss (Round r of Session s). dev = mean absolute deviation |P − FV| in ¢; turn = shares traded / shares outstanding; vol = total shares exchanged; payoff = aggregate agent cash at round end.

        Replay & Trace Inspector

        Live — tick 0

        Decisions recorded at this tick

        System Design

        Figures 1–4: the four-stage pipeline from asset fundamentals through information aggregation to price discovery. A single Start press runs 10 sessions ($R = 4$ rounds each, 720 ticks/session) with per-round data collected as $\texttt{R\{r\}\_S\{s\}}$.

        Figure 1 — Agent Decision Pipeline
        Edit in draw.io
        Asset Fundamentals
        $\mathrm{FV}_t = \mathbb{E}[d]\cdot(T - t + 1)$,   $d \in \{0,20\}$¢,   $T = 10$
        Prior Formation
        $\text{prior}_t = \mathrm{FV}_t\!\cdot\!(1 + b_i + \varepsilon)$
        $\mathrm{EU}(\alpha) = p_{\text{fill}}\!\cdot\!U(w_1) + (1\!-\!p_{\text{fill}})\!\cdot\!U(w_0)$
        Information Aggregation
        Plan I · algorithmic
        Plan II · LLM + forms
        Plan III · LLM + label
        Price Discovery
        CDA: $\text{bid} \geq \text{ask} \Rightarrow$ trade  |  $R^2 \;\cdot\; \mathrm{ND} \;\cdot\; A \;\cdot\; \mathrm{TO}$

        Fundamental Value

        Market constant

        $$ \mathrm{FV}_t \;=\; \mathbb{E}[d] \cdot (T - t + 1), \qquad \mathbb{E}[d] = \tfrac{1}{2}(0) + \tfrac{1}{2}(20) = 10\text{¢} $$

        Risk-neutral price of one share at the start of period $t$: $T = 10$ periods of remaining asset life, $d \in \{0, 20\}$¢ i.i.d. dividend draws, yielding a deterministic staircase from $\mathrm{FV}_1 = 100$¢ to $\mathrm{FV}_{10} = 10$¢ that resets at every round boundary. The DLM (2005) market substrate lives underneath every plan.

        Figure 2 — Prior Elicitation and Expected-Utility Scoring
        Edit in draw.io
        Prior Formation
        $T = 10$ periods   $d \in \{0,20\}$¢   $\mathbb{E}[d] = 10$¢
        $\text{prior}_t = \mathrm{FV}_t\!\cdot\!(1 + b_i + \varepsilon)$
        Expected-Utility Maximization
        $U_L(w) = (w/w_0)^2$
        $U_N(w) = w/w_0$
        $U_A(w) = \sqrt{w/w_0}$
        $\mathrm{EU}(\alpha) = p_{\text{fill}}\!\cdot\!U(w_1) + (1 - p_{\text{fill}})\!\cdot\!U(w_0)$
        $\alpha^\star_{i,t} = \arg\max_\alpha \mathrm{EU}(\alpha)$,   $\alpha \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$

        Prior Elicitation

        Shared across plans

        $$ \text{prior}_t \;=\; \max\!\bigl(0,\; \mathrm{FV}_t \cdot (1 + b_i + \varepsilon)\bigr) $$
        $$ b_i \;=\; \delta_i \cdot \beta, \quad \delta_i \in \{-1,\, 0,\, +1\}, \quad \beta = 0.15 $$
        $$ \varepsilon \sim \mathcal{U}[-n,\, n], \quad n = 0.03 $$

        $\mathrm{FV}_t$ is the fundamental value at period $t$ (see Figure 1). $b_i$ is a persistent per-agent bias: $\delta_i$ is the bias direction drawn at birth ($+1$ = optimistic, $-1$ = pessimistic, $0$ = unbiased) and $\beta$ is the bias magnitude. $\varepsilon$ is i.i.d. per-tick noise drawn from a uniform distribution on $[-n, n]$ via the seeded PRNG. When bias and noise are disabled, $\text{prior}_t = \mathrm{FV}_t$ exactly. The result is clamped to $\geq 0$. All three plans start from this prior and update it to $V_i^{\text{post}}$ over the course of a period.

        Expected-Utility Scoring

        Shared across plans

        $$ \mathrm{EU}(\alpha) \;=\; p_{\text{fill}} \cdot U(w_1) \;+\; (1 - p_{\text{fill}}) \cdot U(w_0), \qquad \alpha^\star_{i,t} = \arg\max_\alpha \mathrm{EU}(\alpha) $$
        $$ U_L(w) = (w/w_0)^2,\qquad U_N(w) = w/w_0,\qquad U_A(w) = \sqrt{w/w_0} $$

        $\hat{V}_i \equiv V_i^{\text{post}}$ is agent $i$'s subjective valuation — the posterior output of the active plan (see Figure 3). $c_i$ is cash; $q_i$ is share inventory. $w_0 = c_i + q_i \cdot \hat{V}_i$ is current wealth; $w_1 = (c_i \pm p_{\text{order}}) + (q_i \pm 1) \cdot \hat{V}_i$ is wealth after a hypothetical fill at order price $p_{\text{order}}$. For crossing actions (buy@$A_t$ and sell@$B_t$), $p_{\text{fill}} = 1$ (deterministic); for passive quotes (bid, ask), $p_{\text{fill}} = 0.30$ (tunable). In Plan I, agent $i$ maximizes EU over the five-element action set $\alpha_{i,t} \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$, where $A_t$ is the best ask and $B_t$ is the best bid on the order book at tick $t$. In Plans II/III the LLM selects directly from $\{\text{BUY\_NOW, SELL\_NOW, BID\_1, BID\_3, ASK\_1, ASK\_3, HOLD}\}$. Risk-loving / neutral / averse pick $U_L$, $U_N$, or $U_A$.

        Figure 3 — Information Aggregation Protocols
        Edit in draw.io
        Belief Revision Protocols
        Plan I · algorithmic
        $V^{\text{post}} = w \!\cdot\! V^{\text{prior}} + (1\!-\!w)\!\cdot\!\bar{m}$
        $w = 0.6 + 0.1\,\min(3,k)$
        Plan II · LLM + forms
        prompt $\supseteq \{U_L, U_N, U_A\}$
        $\alpha^\star_{i,t} \!\leftarrow\! \text{LLM}(\pi^{\text{II}})$
        Plan III · LLM + label
        prompt $\supseteq$ risk label only
        $\alpha^\star_{i,t} \!\leftarrow\! \text{LLM}(\pi^{\text{III}})$
        Trust EMA   $\tau_{r \to s} \!\leftarrow\! (1\!-\!\lambda)\,\tau + \lambda \!\cdot\! \text{closeness}$   $\lambda\!=\!0.30$

        Plan I — Algorithmic Posterior

        Deterministic

        $$ V_i^{\text{post}} \;=\; w \cdot V_i^{\text{prior}} \;+\; (1 - w) \cdot \underbrace{\tfrac{1}{|M|}\sum_{m \in M} \hat v_m}_{\bar{m}\;\text{(peer-message mean)}}, \qquad w \;=\; 0.6 + 0.1\,\min(3,\,k_i) $$
        $$ \hat v_m \;=\; \max\!\bigl(0,\; V_m \cdot \phi_m\bigr), \qquad \sigma_m \in \{H,\, B,\, D\} $$
        $$ \phi_m \;=\; \begin{cases} 1 + \mathcal{U}[-h,\,h] & \sigma_m = H \\[4pt] 1 + \delta_m \cdot \gamma & \sigma_m = B \\[4pt] \kappa^+\,\mathbf{1}_{q_m > q_m^0} \;+\; \kappa^-\,\mathbf{1}_{q_m < q_m^0} \;+\; (1 + \delta_m \cdot \gamma)\,\mathbf{1}_{q_m = q_m^0} & \sigma_m = D \end{cases} $$
        $$ h = 0.02, \quad \gamma = 0.10, \quad \kappa^+ = 1.25, \quad \kappa^- = 0.75 $$

        $V_i^{\text{prior}} = \text{prior}_t$ from Figure 2 (the bias- and noise-adjusted fundamental). $\hat{v}_m$ is the claimed valuation that peer $m$ broadcasts, derived from $m$'s subjective valuation $V_m$ via a distortion multiplier $\phi_m$ determined by $m$'s communication strategy $\sigma_m$: $H$ (truthful) adds small uniform jitter of half-width $h$; $B$ (biased) applies a fixed-sign tilt of magnitude $\gamma$ with direction $\delta_m \in \{-1, 0, +1\}$; $D$ (strategic) overstates by $\kappa^+$ when inventory $q_m$ exceeds initial endowment $q_m^0$ (to inflate price for selling) and understates by $\kappa^-$ when below (to depress price for buying), with a bias-like fallback at $q_m = q_m^0$. $\bar{m}$ averages the $\hat{v}_m$ over the set $M$ of non-self messages received this period. Weight $w$ ramps from $0.60$ to $0.90$ over the first three rounds of experience ($k_i$ starting at $0$, incremented at each round boundary), so novices listen freely and veterans rely on their own prior. If $|M| = 0$, $V_i^{\text{post}} = V_i^{\text{prior}}$ (no blend). No network calls — the run is deterministic under the seeded PRNG.

        Pairwise Trust Dynamics

        Social learning

        $$ \tau_{r \to s} \;\leftarrow\; (1 - \lambda)\, \tau_{r \to s} \;+\; \lambda \cdot \underbrace{\max\!\Bigl(0,\,1 - \tfrac{|\hat v_s - \mathrm{VWAP}_t|}{\mathrm{VWAP}_t}\Bigr)}_{\text{closeness}_{r,s}}, \quad \lambda = 0.30 $$

        $\lambda$ is the EMA learning rate (not to be confused with the action variable $\alpha_{i,t}$). Receiver $r$'s trust in sender $s$ is reinforced when $s$'s claimed valuation $\hat{v}_s$ (see Plan I card) tracks the period's volume-weighted average price $\mathrm{VWAP}_t = \sum_j p_j q_j / \sum_j q_j$, where $p_j$ and $q_j$ are the price and quantity of trade $j$ in period $t$. Closeness is clipped to $[0, 1]$; $\tau$ is initialized at $0.5$ (neutral) with self-trust fixed at $1.0$. Logged on every plan, read by Plan II/III prompts for context.

        Plan II — LLM Posterior with Utility Forms

        LLM-based

        $$ \alpha^\star_{i,t} \;\leftarrow\; \mathrm{LLM}\bigl(\pi_i^{\text{II}}\bigr), \qquad \pi_i^{\text{II}} \supset \{\,U_L(w) = (w/w_0)^2,\; U_N(w) = w/w_0,\; U_A(w) = \sqrt{w/w_0}\,\} $$
        $$ \alpha^\star_{i,t} \in \{\text{BUY\_NOW},\,\text{SELL\_NOW},\,\text{BID\_1},\,\text{BID\_3},\,\text{ASK\_1},\,\text{ASK\_3},\,\text{HOLD}\} $$

        $\pi_i^{\text{II}}$ is a structured prompt injected with: market rules ($T$, dividend structure), the agent's private state ($c_i$, $q_i$, $\hat{V}_i$, $w_0 = c_i + q_i \cdot \hat{V}_i$), the explicit utility formula for the agent's risk type ($U_L$, $U_N$, or $U_A$ with $w_0$ as the normalization base), current book state (best bid/ask), peer trust scores $\tau_{r \to s}$, and received messages $\{\hat{v}_m\}$. The LLM returns a discrete action $\alpha^\star_{i,t}$ from the seven-element set; the engine translates it to an order at the current book price. Fire-and-forget: a failed or invalid action falls back to the Plan I EU evaluation.

        Plan III — LLM Posterior with Risk Label

        LLM-based

        $$ \alpha^\star_{i,t} \;\leftarrow\; \mathrm{LLM}\bigl(\pi_i^{\text{III}}\bigr), \qquad \pi_i^{\text{III}} \supset \{\text{risk-loving} \mid \text{risk-neutral} \mid \text{risk-averse}\} $$

        Identical wiring to Plan II but the prompt omits the closed-form $U_L / U_N / U_A$ expressions and only names the risk-preference category. Isolates the effect of giving the LLM an explicit functional form versus just a label. Same seven-action output set.

        Figure 4 — Price Discovery and Market-Quality Diagnostics
        Edit in draw.io
        Continuous Double Auction
        Order book (price–time priority)   $\text{bid} \geq \text{ask} \Rightarrow$ trade at resting price
        Haessel $R^2$
        fit of $\bar{p}_t$ to $\mathrm{FV}_t$
        Norm. deviation
        $\Sigma|p_j\!-\!\mathrm{FV}_{t(j)}|\!\cdot\!q_j\,/\,Q$
        Amplitude
        peak–trough / $\mathrm{FV}_1$
        TO · AE
        $\Sigma q_j/Q$ · $\sum V_i q_i\,/\,V_{\max} Q$

        Mispricing Measures

        Diagnostic

        $$ R^2_{\text{Haessel}} \;=\; 1 - \frac{\sum_t (\bar p_t - \mathrm{FV}_t)^2}{\sum_t (\bar p_t - \bar{\bar{p}})^2}, \qquad \mathrm{ND} = \frac{\sum_j |p_j - \mathrm{FV}_{t(j)}| \cdot q_j}{Q} $$

        $\bar{p}_t$ is the mean trade price in global period $t$ (averaged over all trades in that period); $\bar{\bar{p}}$ is the grand mean of $\bar{p}_t$ across all traded periods; $\mathrm{FV}_t$ is the fundamental value at period $t$. Haessel $R^2$ measures how closely per-period mean prices fit the fundamental staircase — it can be negative if mispricing exceeds the sample variance of $\bar{p}_t$. ND sums the absolute deviation $|p_j - \mathrm{FV}_{t(j)}|$ of each individual trade $j$ weighted by its quantity $q_j$, divided by total shares outstanding $Q = \sum_i q_i$; here $t(j)$ is the period in which trade $j$ occurred.

        Volume and Efficiency Measures

        Diagnostic

        $$ A \;=\; \frac{\max_t (\bar p_t - \mathrm{FV}_t) - \min_t (\bar p_t - \mathrm{FV}_t)}{\mathrm{FV}_1}, \qquad \mathrm{TO} \;=\; \frac{\sum_j q_j}{Q}, \qquad \mathrm{AE} \;=\; \frac{\sum_i \hat{V}_i \cdot q_i}{\hat{V}_{\max} \cdot Q} $$

        $A$ (amplitude) measures the peak-to-trough excursion of the mean-price residual $\bar{p}_t - \mathrm{FV}_t$, normalized by the initial fundamental $\mathrm{FV}_1 = \mathbb{E}[d] \cdot T = 100$¢. $\mathrm{TO}$ (turnover) sums the quantity $q_j$ across all trades $j$ and divides by total shares outstanding $Q = \sum_i q_i$ (conserved under double-auction trades); a value of $1.0$ means every share changed hands once. $\mathrm{AE}$ (allocative efficiency) is the ratio of realized to optimal aggregate valuation: $\hat{V}_i$ is agent $i$'s subjective valuation ($= V_i^{\text{post}}$), $q_i$ is agent $i$'s current inventory, and $\hat{V}_{\max} = \max_i \hat{V}_i$; the optimal allocation assigns all $Q$ shares to the highest-valuation agent. All three are reported in the Market-Quality Statistics panel and the batch results table.

        Figure 5 — Sample Prompt · Round-3 Experienced LLM Trader

        Plan II/III do not tell the LLM its "experience level" as a number. Instead they inject the actual lived record — the price paths, peaks, and end-of-round cash from every prior round the agent played through — so the LLM infers experience the way a human subject does, from memory of what happened. The sample below is the exact system + user prompt generated by AI.getPlanBeliefs for agent_2 (risk-neutral, Plan II) at round 3, period 3, with roundsPlayed = 2. A fresh R4-⅓/R4-⅔ replacement (roundsPlayed = 0) receives the same template, but the history block is replaced with a single line stating it is new to this market and has no memory of prior rounds — the LLM reasons about its own naivety from that absence.

        System Prompt · $\pi^{\mathrm{II}}_{\text{sys}}$

        Identical across agents, plans, and ticks

        You are a trader in an experimental double auction asset market. Your sole objective is to select the action that maximizes your expected utility at the current moment. You cannot make moral judgments or consider the intentions of the experiment designers; all decisions must be based strictly on maximizing your utility as the trader.
        
        Important Rules:
        
        1. You must select exactly one action from the given set of actions.
        2. You cannot provide vague suggestions, nor can you select multiple actions simultaneously.
        3. You cannot say "depends on" or "insufficient information." You must make the best decision based on the given information.
        4. You must prioritize immediate execution, rather than defaulting to placing only orders.
        5. You can accept the current best ask (buy immediately) or accept the current best bid (sell immediately).
        6. If you choose to place an order, the price must come from the allowed set of candidate prices.
        7. Your output must strictly conform to the specified format.

        User Prompt · $\pi^{\mathrm{II}}_{\text{usr}}(\text{agent}_2)$

        Round 3 · period 3 · roundsPlayed = 2 · risk-neutral · cash 965¢ · inventory 4

        You are a trader in the market, agent_2.
        
        【Your Type】
        - Risk Preference Type: Risk neutral
          Makes decisions based on expected returns
        - Your utility function: U_N(w) = w / w0  (linear, EV-indifferent)
          w0 (initial wealth) = 1300 cents.
        
        【Your Past Experience in This Market】
        You have already traded 2 rounds in this market. The records below are the price paths you observed and the payoff you earned. Use them to judge how seriously to weight fundamental value vs. recent prices and short-term trends — your own memory is the best guide.
        
        Round 1 (your first in this market):
          - FV path (p1..p10):    100 / 90 / 80 / 70 / 60 / 50 / 40 / 30 / 20 / 10
          - Last-trade price path:        110 / 145 / 170 / 185 / 165 / 130 / 90 / 55 / 28 / 12
          - Peak price: 185 at p4 (FV then = 70, deviation +164%)
          - Round-end last price: 12 (FV at p10 = 10; gap +2)
          - Your end-of-round cash: 1480¢  (round-start mark-to-market wealth = 1300¢ = cash + shares × FV₁)
        
        Round 2 (most recent):
          - FV path (p1..p10):    100 / 90 / 80 / 70 / 60 / 50 / 40 / 30 / 20 / 10
          - Last-trade price path:        102 / 115 / 125 / 118 / 95 / 72 / 50 / 32 / 18 / 10
          - Peak price: 125 at p3 (FV then = 80, deviation +56%)
          - Round-end last price: 10 (FV at p10 = 10; gap +0)
          - Your end-of-round cash: 1365¢  (round-start mark-to-market wealth = 1300¢ = cash + shares × FV₁)
        
        【Market Rules】
        1. This is a 10-period asset market.
        2. Each asset pays a dividend of 0 or 20 in each remaining period, with a 50% probability of each.
        3. Therefore, the expected dividend for each remaining period is 10.
        4. If the current remaining period is k, then the fundamental value = 10 × k.
        5. All traders know how this fundamental value is calculated.
        6. Double Auction Rules:
           - You can buy the lowest ask immediately.
           - You can sell the highest bid immediately.
           - You can submit a new bid.
           - You can submit a new ask.
           - You can also choose not to trade.
        7. If you buy the current ask immediately, the transaction will be executed instantly at the lowest ask price.
        8. If you sell the current bid immediately, the transaction will be executed instantly at the highest bid price.
        9. The last price is only updated when a transaction occurs.
        
        【Your Status】
        - Current Cash: 965
        - Current Asset Holdings: 4
        
        【Current Market Status】
        - Current Period: 3
        - Current Remaining Periods k: 8
        - Current Fundamental Value (FV): 80
        - Last Price: 95
        - Highest Bid: 88
        - Lowest Ask: 94
        - Previous Reference Price: 95
        - This round so far (last trade per period): p1=98 (FV 100), p2=95 (FV 90)
        
        【Your Decision-Making Principles】
        You want to maximize the following intuitive utilities:
        1. The higher the wealth, the better;
        2. You evaluate expected returns linearly;
        3. Buying at a price lower than the last traded price increases utility; buying at a price higher than the last traded price decreases utility;
        4. Selling at a price higher than the last traded price increases utility; selling at a price lower than the last traded price decreases utility;
        5. Holding too many positions increases inventory risk;
        
        【Additional Requirements】
        1. You cannot mechanically favor holding.
        2. If the utility of immediate execution is similar to holding, you should prioritize actions that facilitate the trade.
        3. You must consider "execution opportunities" valuable because not executing means you cannot improve your position.
        4. When you hold a lot of assets, you should seriously consider selling; when you hold a lot of cash and fewer assets, you should seriously consider buying.
        5. Towards the later stages, you should focus more on fundamental value than short-term resale opportunities.
        
        【Role-Specific Guidance】
        - As a risk-neutral trader, you should focus more on expected returns.
        
        【Peer Messages from Last Period】
        - agent 1: claimed value 85 cents
        - agent 3: claimed value 78 cents
        - agent 4: claimed value 92 cents
        - agent 5: claimed value 80 cents
        - agent 6: claimed value 74 cents
        
        【You must choose one of the following actions】
        1. BUY_NOW: Immediately buy 1 unit at the current lowest ask price (94).
        2. SELL_NOW: Immediately sell 1 unit at the current highest bid price (88).
        3. BID_1: Submit bid = best_bid + 1 = 89.
        4. BID_3: Submit bid = best_bid + 3 = 91.
        5. ASK_1: Submit ask = best_ask - 1 = 93.
        6. ASK_3: Submit ask = best_ask - 3 = 91.
        7. HOLD: Do not trade.
        
        【Your Task】
        Please briefly compare the available actions to determine which is most advantageous to you:
        - Buy immediately
        - Sell immediately
        - Place a more aggressive bid
        - Place a more aggressive ask
        - Do not trade
        Then output only one final action.
        
        【Strict Output Format】
        Reason: <Explain in 3-6 sentences why this action maximizes your utility>
        Action: <BUY_NOW / SELL_NOW / BID_1 / BID_3 / ASK_1 / ASK_3 / HOLD>

        How experience changes the prompt

        Difference across roundsPlayed

        The history block scales with roundsPlayed — one entry per round the agent has lived through. An agent with roundsPlayed = 3 at the start of round 4 sees three blocks (rounds 1, 2, 3). A fresh R4 replacement (roundsPlayed = 0) sees no history block at all; in its place the prompt contains a single declarative sentence: "This is your first round in this market. You have never traded this asset before and have no memory of prior rounds — you only see the rules, the fundamental value, and whatever trading has happened so far in the current round." Nothing else changes. The rule-based experience labels that used to appear in 【Your Type】 ("Experience level: N"), 【Your Decision-Making Principles】 (principle #6), and 【Role-Specific Guidance】 ("As a highly experienced trader ...") are gone — the LLM decides how to weight fundamentals versus recent prices from its own observed history, not from an instruction telling it to do so. Plan III differs from Plan II only by omitting the explicit $U_L / U_N / U_A$ formula; the history block and all other sections are identical.

        Glossary & Reference

        Abbreviations & indices

        TermExpansionMeaning
        Plan IAlgorithmic posteriorDeterministic baseline — $V_i^{\text{post}} = w\cdot V_i^{\text{prior}} + (1-w)\cdot\bar{m}$ with $w = 0.6 + 0.1\,\min(3, k_i)$.
        Plan IILLM posterior · utility formsOne chat completion per Utility agent per period. LLM returns a discrete action from {BUY_NOW, SELL_NOW, BID_1, BID_3, ASK_1, ASK_3, HOLD}. Prompt includes the closed-form $U_L / U_N / U_A$ expressions for the agent's risk type.
        Plan IIILLM posterior · risk label onlySame wiring as Plan II but the prompt only names the risk-preference category; no functional form is supplied. Same seven-action output set.
        DLMDufwenberg, Lindqvist & Moore (2005)Source paper for the shared market substrate: $T$, $\mathbb{E}[d]$, $\mathrm{FV}_t$, and the four-round session loop.
        UUtilityEU-maximising agent — the sole agent class ($N = 6$). Per-period belief update is what Plans I, II, and III compare.
        FVFundamental value$\mathrm{FV}_t = \mathbb{E}[d] \cdot (T - t + 1)$ — risk-neutral value at the start of period $t$.
        EUExpected utility$\mathrm{EU}(\alpha) = p_{\text{fill}} \cdot U(w_1) + (1-p_{\text{fill}})\cdot U(w_0)$ — the Utility agent's scoring functional over $\alpha \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$.
        VWAPVolume-weighted average pricePer-period average trade price weighted by quantity; baseline for the trust EMA update.
        NDNormalized deviationTotal absolute mispricing: $\mathrm{ND} = \sum_j |p_j - \mathrm{FV}_{t(j)}| \cdot q_j \,/\, Q$, where $j$ indexes trades, $q_j$ is trade quantity, and $Q$ is total shares outstanding.
        Haessel R²Coefficient of determination of mean price against fundamental value.
        TOTurnoverTotal shares traded divided by total shares outstanding — reports speculative intensity.
        AEAllocative efficiencyRealized aggregate valuation divided by the theoretical maximum: $\mathrm{AE} = \sum_i \hat{V}_i q_i \,/\, (\hat{V}_{\max} \cdot Q)$, where $\hat{V}_i = V_i^{\text{post}}$.
        Session10-session DLM batchOne click of Start runs 10 sessions (5 × first treatment + 5 × second treatment). Each session is a complete $R = 4$ round game; data is collected per round with labels $\texttt{R\{r\}\_S\{s\}}$.
        Rr_SsRound–session labelIdentifies Round $r$ of Session $s$ in the batch results table. Example: R3_S7 = round 3 of session 7.
        T2 / T4DLM treatment sizesT2 (R4-⅔): 2 agents replaced in R4, 4 veterans remain. T4 (R4-⅓): 4 replaced, 2 veterans remain. First 5 sessions use the selected treatment, last 5 use the other.

        Mathematical notation

        SymbolDefinitionWhere it appears
        $\mathrm{FV}_t$Fundamental value at the start of period $t$. $\mathrm{FV}_t = \mathbb{E}[d]\cdot(T - t + 1)$, with $\mathbb{E}[d] = \tfrac{1}{2}(0) + \tfrac{1}{2}(20) = 10$¢ and $T = 10$. Yields a staircase from $\mathrm{FV}_1 = 100$¢ to $\mathrm{FV}_{10} = 10$¢, resetting at every round boundary.Shared substrate — drives every agent's prior (Figures 1–4)
        $\text{prior}_t$Agent $i$'s pre-update valuation at period $t$. $\text{prior}_t = \mathrm{FV}_t \cdot (1 + b_i + \varepsilon)$, clamped to $\geq 0$. Identical across all three plans.Prior Formation stage (Figures 1–2)
        $b_i = \delta_i \cdot \beta$Persistent per-agent valuation bias. $\delta_i \in \{-1, 0, +1\}$ is the bias direction drawn at birth (pessimistic, unbiased, optimistic) and $\beta = 0.15$ is the bias magnitude. Enters the prior as $\text{prior}_t = \mathrm{FV}_t \cdot (1 + b_i + \varepsilon)$.Prior formation (Figure 2)
        $\varepsilon \sim \mathcal{U}[-n, n]$Per-tick i.i.d. valuation noise, $n = 0.03$. Drawn fresh each decision via the seeded PRNG.Prior formation — jitter term in $\text{prior}_t$ (Figure 2)
        $k_i$Agent $i$'s experience counter. Starts at $0$; incremented by $1$ at every round boundary. Controls the Plan I blend weight: $w = 0.6 + 0.1\,\min(3, k_i)$, so $w \in \{0.6, 0.7, 0.8, 0.9\}$ for $k_i = 0, 1, 2, \geq 3$.Plan I posterior weight; DLM experience channel (Figure 3)
        $\hat{v}_m$Claimed valuation reported by peer agent $m$. Computed as $\hat{v}_m = \max(0,\, V_m \cdot \phi_m)$ where $\phi_m$ is a distortion multiplier determined by $m$'s communication strategy $\sigma_m \in \{H, B, D\}$ (see Figure 3). The peer-message mean is $\bar{m} = \tfrac{1}{|M|}\sum_{m \in M} \hat{v}_m$ where $M$ is the set of non-self messages received this period.Plan I posterior — blended with prior via weight $w$ (Figure 3)
        $\sigma_m \in \{H, B, D\}$Communication strategy of agent $m$: $H$ = truthful (small uniform jitter), $B$ = biased (fixed-sign tilt), $D$ = strategic (inventory-dependent over/understatement). Assigned at birth and persistent across rounds.Distortion multiplier $\phi_m$ in $\hat{v}_m$ (Figure 3)
        $\phi_m$Communication distortion multiplier. $\phi_m = 1 + \mathcal{U}[-h, h]$ if $\sigma_m = H$; $\phi_m = 1 + \delta_m \gamma$ if $\sigma_m = B$; $\phi_m = \kappa^+$ or $\kappa^-$ if $\sigma_m = D$ (depending on $q_m$ vs $q_m^0$). Parameters: $h = 0.02$, $\gamma = 0.10$, $\kappa^+ = 1.25$, $\kappa^- = 0.75$.$\hat{v}_m = \max(0,\, V_m \cdot \phi_m)$ (Figure 3)
        $V_i^{\text{post}}$Agent $i$'s period-end valuation — the output of the active plan. Plan I (algorithmic): $V_i^{\text{post}} = w \cdot V_i^{\text{prior}} + (1 - w) \cdot \bar{m}$, or $V_i^{\text{prior}}$ if no messages. Plans II/III: set directly by the LLM's chosen action.Becomes next period's prior in all three plans (Figure 3)
        $U_L, U_N, U_A$Risk-typed utility families. $U_L(w) = (w/w_0)^2$ (risk-loving, strictly convex); $U_N(w) = w/w_0$ (risk-neutral, linear); $U_A(w) = \sqrt{w/w_0}$ (risk-averse, strictly concave). All normalized by initial wealth $w_0$.EU scoring; formulas appear explicitly in Plan II prompts (Figures 2–3)
        $w_0, w_1$Wealth states for EU evaluation. $w_0 = c_i + q_i \cdot \hat{V}_i$ (wealth if no trade); $w_1 = (c_i \pm p_{\text{order}}) + (q_i \pm 1) \cdot \hat{V}_i$ (wealth if the order fills at price $p_{\text{order}}$), where $c_i$ is cash, $q_i$ is inventory, and $\hat{V}_i \equiv V_i^{\text{post}}$ is the agent's subjective valuation.EU scoring — $\mathrm{EU}(\alpha) = p_{\text{fill}} \cdot U(w_1) + (1 - p_{\text{fill}}) \cdot U(w_0)$ (Figure 2)
        $p_{\text{fill}} = 0.30$Assumed fill probability for a non-crossing (passive) quote. Used in the EU functional: $\mathrm{EU}(\alpha) = p_{\text{fill}} \cdot U(w_1) + (1 - p_{\text{fill}}) \cdot U(w_0)$. For crossing actions (buy@$A_t$, sell@$B_t$), $p_{\text{fill}} = 1$ (deterministic); for passive actions (bid, ask), $p_{\text{fill}} = 0.30$ (tunable).EU scoring — $\alpha^\star_{i,t}$ action evaluation (Figure 2)
        $\alpha^\star_{i,t}$Optimal action for agent $i$ at tick $t$. $\alpha^\star_{i,t} = \arg\max_\alpha \mathrm{EU}(\alpha)$ over the five-element set $\alpha \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$, where $A_t$ is the current best ask and $B_t$ is the current best bid. buy@$A_t$ crosses the book at the resting ask (deterministic fill, $p_{\text{fill}} = 1$); sell@$B_t$ lifts the resting bid (deterministic fill); bid and ask post passive quotes ($p_{\text{fill}} = 0.30$). Plans II/III use a seven-element LLM action set: $\{\text{BUY\_NOW, SELL\_NOW, BID\_1, BID\_3, ASK\_1, ASK\_3, HOLD}\}$.Action selection — output of EU maximization (Figures 2–3)
        $\tau_{r \to s}$Trust of receiver $r$ in sender $s$. Updated by exponential moving average: $\tau_{r \to s} \leftarrow (1 - \lambda)\,\tau_{r \to s} + \lambda \cdot \text{closeness}_{r,s}$, where $\lambda = 0.30$ is the EMA learning rate and $\text{closeness} = \max\!\bigl(0,\, 1 - |\hat{v}_s - \text{VWAP}_t|\,/\,\text{VWAP}_t\bigr)$. Initialized at $0.5$; self-trust fixed at $1.0$.Messaging diagnostic; context for Plan II/III prompts (Figure 3)
        $\pi_i^{\text{II}}, \pi_i^{\text{III}}$Structured LLM prompts for Plans II and III. $\pi^{\text{II}}$ includes market rules, agent state, and the explicit utility formula $U_L/U_N/U_A$ for the agent's risk type. $\pi^{\text{III}}$ omits the formula and supplies only the risk-preference label.LLM posterior — input to $\alpha^\star_{i,t} \leftarrow \text{LLM}(\pi_i)$ (Figure 3)
        $Q$Total shares outstanding, $Q = \sum_i q_i$, conserved under double-auction trades (shares transfer, never created or destroyed).Normalized deviation $\mathrm{ND}$, turnover $\mathrm{TO}$ (Figure 4)
        $\bar{p}_t$Mean trade price in global period $t$. $\bar{p}_t = \sum_{j \in \mathcal{T}_t} p_j \,/\, |\mathcal{T}_t|$ where $\mathcal{T}_t$ is the set of trades in period $t$. Used as the basis for Haessel $R^2$ and amplitude.Market-quality diagnostics (Figure 4, Table 1)
        $R^2_{\text{Haessel}}$Haessel (1978) coefficient of determination. $R^2 = 1 - \sum_t (\bar{p}_t - \mathrm{FV}_t)^2 \,/\, \sum_t (\bar{p}_t - \overline{\bar{p}})^2$. Measures how closely per-period mean prices fit the fundamental staircase; can be negative if mispricing exceeds sample variance.Market-quality diagnostics (Figure 4, Table 1)
        $\mathrm{ND}$Normalized absolute price deviation. $\mathrm{ND} = \sum_j |p_j - \mathrm{FV}_{t(j)}| \cdot q_j \,/\, Q$, summing over all trades $j$ weighted by quantity, divided by total shares outstanding.Market-quality diagnostics (Figure 4, Table 1)
        $A$Price amplitude. $A = \bigl(\max_t (\bar{p}_t - \mathrm{FV}_t) - \min_t (\bar{p}_t - \mathrm{FV}_t)\bigr) \,/\, \mathrm{FV}_1$. Peak-to-trough excursion of the mean-price residual, normalized by the initial fundamental.Market-quality diagnostics (Figure 4, Table 1)
        $\mathrm{TO}$Turnover. $\mathrm{TO} = \sum_j q_j \,/\, Q$ — total shares traded (summing quantity $q_j$ over all trades $j$) divided by total shares outstanding $Q$. A value of $1.0$ means every share changed hands once.Market-quality diagnostics (Figure 4, Table 1)
        $\rho_t$Price-to-fundamental ratio. $\rho_t = p_t \,/\, \mathrm{FV}_t$ (Lopez-Lira 2025), where $p_t$ is the most recent trade price at tick $t$. Values $> 1$ indicate overpricing; persistent $\rho_t \gg 1$ signals a bubble.Market-quality diagnostics (Table 1)

        Figures

        1

        Transaction Price Trajectory vs Fundamental Value

        Tick-level price plotted against the deterministic step function $\mathrm{FV}_t$. Persistent excursions above the staircase mark the bubble; the final-period collapse is the crash.

        2

        Mispricing Magnitude

        Absolute departure $|p_t - \mathrm{FV}_t|$ filled as a red area; equivalent to Lopez-Lira's price-to-fundamental ratio $\rho_t$.

        3

        Trade Volume per Period

        Bar chart of per-period share quantity exchanged. A volume peak in the inflation phase followed by a cliff near $T$ is the SSW signature.

        4

        Transaction Density Heatmap

        Two-dimensional trade histogram over (price, period). Warm cells concentrate liquidity; compared against the $\mathrm{FV}_t$ staircase reveals rational vs speculative regimes.

        5

        Agent Action Timeline

        One row per agent, one cell per decision. Encodes {bid, ask, hold} and whether the submitted order was filled on the same tick.

        6

        Subjective Valuation · Per agent

        Each Utility agent's period-end belief $V_i^{\text{post}}$ — the same trace regardless of which plan produced it, so trajectories can be compared across Plans I/II/III.

        7

        Pairwise Trust Matrix

        Heatmap of $\tau_{r \to s}$ on $[0, 1]$ with the diagonal masked. Warm columns identify agents whose claims the population finds credible.

        T1

        Market-Quality Statistics (Table 1)

        Live session metrics: Haessel $R^2$, normalized price deviations, amplitude, turnover, allocative efficiency, welfare, and deception statistics. Updates every render tick.

        T2

        10-Session Batch Results (Table 2)

        Per-round market-quality metrics across the 10-session DLM batch. Each row is labelled $\texttt{R\{r\}\_S\{s\}}$ with mean deviation, turnover, trade count, volume, and aggregate payoff. Per-treatment aggregates summarise T2 vs T4 performance.

        Source papers

        TagCitationRole in this simulator
        DLM 2005Dufwenberg, Lindqvist & Moore, Bubbles and Experience: An Experiment, AER 95(5), 1731–1737Market substrate — asset life, dividend shape, $\mathrm{FV}_t$, session loop
        LL 2025Lopez-Lira, AI-Agent Expected-Utility Market Makers (working paper)Utility agent, EU scoring, risk functionals, trust EMA
        SSW 1988Smith, Suchanek & Williams, Bubbles, Crashes and Endogenous Expectations in Experimental Spot Asset Markets, Econometrica 56(5)Canonical experimental-bubble design; the asset-life and dividend structure that DLM 2005 inherits
        1 / 14

        AI-Agent Prior Elicitation in Experimental Asset Markets

        Algorithmic, LLM-Augmented, and Label-Only Belief Formation in a Continuous Double Auction

        Plan I · Algorithmic Plan II · LLM + Utility Forms Plan III · LLM + Risk Label

        Browser-based experimental platform · Reproducible via seeded PRNG · Open-source

        Motivation

        Asset price bubbles are among the most robust phenomena in experimental economics. Smith, Suchanek & Williams (1988) demonstrated that even under common knowledge of fundamentals, laboratory markets consistently produce price trajectories that deviate from risk-neutral fundamental value. Dufwenberg, Lindqvist & Moore (2005) showed that experience — repeated participation in the same market structure — is a powerful bubble-suppressing channel.

        Two developments motivate this project: (i) the emergence of large language models as plausible artificial economic agents (Horton, 2023; Brand et al., 2023), and (ii) the open question of whether LLM-driven belief formation reproduces, amplifies, or dampens the bubble dynamics that arise under algorithmic updating rules.

        $$ \underbrace{\text{SSW (1988)}}_{\text{bubbles exist}} \;\longrightarrow\; \underbrace{\text{DLM (2005)}}_{\text{experience kills bubbles}} \;\longrightarrow\; \underbrace{\text{This paper}}_{\text{LLM belief update}\,\overset{?}{=}\,\text{algorithmic}} $$

        SSW: AER 78(5); DLM: AER 95(5), 1731–1737; Horton: arXiv 2301.07543; Brand et al.: NBER w31122

        Literature & positioning

        Experimental asset markets

        Smith, Suchanek & Williams (1988) — canonical bubble design

        Dufwenberg, Lindqvist & Moore (2005) — experience effect, 4-round session

        Hussam, Porter & Smith (2008) — bubble persistence under parameter shocks

        Haruvy, Lahav & Noussair (2007) — expectation formation and price dynamics

        LLMs as economic agents

        Horton (2023) — LLMs as simulated economic agents (homo silicus)

        Park et al. (2023) — generative agents and social simulation

        Aher et al. (2023) — using LLMs to simulate survey responses

        Lopez-Lira (2025) — expected-utility scoring framework for AI agents

        Gap

        No controlled factorial comparison of algorithmic vs. LLM belief update within the same continuous double auction substrate, holding market structure, endowments, and information sets constant.

        Research questions

        $$ \text{Plan I} \;\;\overset{?}{\approx}\;\; \text{Plan II} \;\;\overset{?}{\approx}\;\; \text{Plan III} $$

        RQ 1. Does an LLM-driven belief update (Plan II) produce market dynamics statistically equivalent to the deterministic algorithmic baseline (Plan I), as measured by Haessel $R^2$, normalised deviation, amplitude, and turnover?

        RQ 2. Does providing the LLM with the explicit closed-form utility function (Plan II: $U_L, U_N, U_A$) yield different posterior valuations than providing only a risk-preference label (Plan III)?

        RQ 3. How do risk composition $(\alpha_L, \alpha_N, \alpha_A)$, strategic deception, and endogenous experience $(k_i)$ interact with the belief-formation channel across all three plans?

        Key idea · three-plan factorial

        We hold the market microstructure constant and vary only the belief-update mechanism. Each Utility agent $i$ forms a prior $V_i^{\text{prior}}$ at every period boundary, then updates it to a posterior $V_i^{\text{post}}$ through exactly one of three channels:

        Plan I · Algorithm

        $V_i^{\text{post}} = f(V_i^{\text{prior}}, M, k_i)$

        Deterministic weighted blend. No stochasticity beyond the seeded PRNG. Reproducible baseline.

        Plan II · LLM + Forms

        $V_i^{\text{post}} \leftarrow \text{LLM}(\pi_i^{\text{II}})$

        Prompt includes the explicit utility functional $U_L, U_N, U_A$ for agent $i$'s risk type, plus wealth and peer claims.

        Plan III · LLM + Label

        $V_i^{\text{post}} \leftarrow \text{LLM}(\pi_i^{\text{III}})$

        Same wiring as Plan II, but prompt contains only the risk-preference label — no functional form.

        Identification: same seed, same endowments, same market order → differences arise solely from the update channel.

        Market substrate · DLM (2005)

        The shared environment replicates the Dufwenberg, Lindqvist & Moore (2005) continuous double auction exactly. A session consists of $R = 4$ rounds, each a complete $T = 10$-period market. Dividends are i.i.d. draws from $\{0, 20\}$¢ with $\mathbb{E}[d] = 10$¢.

        $$ \mathrm{FV}_t \;=\; \mathbb{E}[d] \cdot (T - t + 1) \;=\; 10\,(T - t + 1), \qquad \mathrm{FV}_1 = 100\text{¢},\;\; \mathrm{FV}_{10} = 10\text{¢} $$

        Round boundary protocol

        1. Snapshot cash → $\texttt{roundFinalCash}[r]$

        2. Increment $k_i \leftarrow k_i + 1$ for all survivors

        3. (Round 3 → 4) Run R4 replacement: T2 or T4

        4. Reset cash, inventory to endowment; clear order book

        Session payoff & batch structure

        $\pi_i = \sum_{r=1}^{R} c_{i,r}^{\text{final}} + 500$¢

        Show-up fee: 500¢. One Start click runs 10 sessions (5 × T2 + 5 × T4). Per-round data is labelled $\texttt{R\{r\}\_S\{s\}}$ in the batch results table.

        Agent design · $N = 6$ Utility agents

        U · Utility agent (sole agent class)

        $\text{prior}_t = \mathrm{FV}_t \cdot (1 + b_i + \varepsilon),\quad \alpha^\star_{i,t} = \arg\max_{\alpha \in \mathcal{A}} \mathrm{EU}(\alpha)$

        All six agents are EU-maximising Utility agents following the Lopez-Lira (2025) framework. Per-period belief update is the experimental variable — the only dimension that varies across Plans I, II, and III.

        Risk composition

        $(\alpha_L, \alpha_N, \alpha_A)$ summing to 100%

        Risk-loving / neutral / averse mix is the sole composition knob. Controlled by three linked sliders.

        Strategy cube

        $\text{bias} \times \text{belief} \times \text{risk}$

        Each agent draws from $b_i \in \{-0.15, 0, +0.15\}$, a belief mode (honest/deceptive), and a risk preference.

        Endogenous experience

        $k_i = \texttt{roundsPlayed} \in \{0,1,2,\ldots\}$

        Experience is procedural: $k_i$ starts at 0, incremented each round. Plan I blend weight $w(k_i)$ ramps from 0.60 to 0.90.

        DLM 2005 uses $N = 6$ homogeneous human subjects. This simulator replaces them with $N = 6$ heterogeneous Utility agents whose belief formation is the experimental treatment.

        Expected-utility framework

        $$ \mathrm{EU}(\alpha) \;=\; p_{\text{fill}} \cdot U\!\bigl(w_1(\alpha)\bigr) \;+\; (1 - p_{\text{fill}}) \cdot U\!\bigl(w_0\bigr), \qquad \alpha^\star_{i,t} = \arg\max_{\alpha \,\in\, \mathcal{A}} \mathrm{EU}(\alpha) $$
        $$ \mathcal{A} \;=\; \{\text{hold},\; \text{buy@}A_t,\; \text{sell@}B_t,\; \text{bid},\; \text{ask}\}, \qquad p_{\text{fill}}^{\text{cross}} = 1,\;\; p_{\text{fill}}^{\text{passive}} = 0.30 $$

        $U_L$ · Risk-loving

        $U_L(w) = (w/w_0)^2$

        Convex. Overweights upside. Bubble-amplifying.

        $U_N$ · Risk-neutral

        $U_N(w) = w/w_0$

        Linear. Prices at expected value. Baseline.

        $U_A$ · Risk-averse

        $U_A(w) = \sqrt{w/w_0}$

        Concave. Penalises downside. Dampening.

        Normalized: $u_{i,t} = U_i(w_{i,t})\,/\,U_i(w_{i,0})$ so all agents start at $u = 1$. Risk composition controlled by $(\alpha_L, \alpha_N, \alpha_A)$ summing to 100%.

        Shared prior & endogenous experience

        $$ V_i^{\text{prior}}(t) \;=\; \mathrm{FV}_t \cdot \bigl(1 + b_i + \varepsilon_t\bigr), \qquad b_i \in \{-0.15,\,0,\,+0.15\}, \qquad \varepsilon_t \sim \mathcal{U}[-0.03,\,0.03] $$

        All Utility agents share the Lopez-Lira prior parametrisation. Persistent bias $b_i$ induces heterogeneous beliefs; per-tick noise $\varepsilon_t$ is drawn fresh every decision via the seeded PRNG. The three plans differ only in how this prior maps to a posterior $V_i^{\text{post}}$.

        $$ k_i \;=\; \texttt{roundsPlayed}_i \;\in\; \{0, 1, 2, \ldots\}, \qquad k_i \leftarrow k_i + 1 \;\text{at each round boundary} $$

        Experience is purely endogenous: every agent enters with $k_i = 0$ and the engine increments the counter after each round. No agent is ever instantiated with $k_i > 0$. Plan I reads $k_i$ to ramp its blend weight; Plans II/III can expose it in the prompt context.

        Plan I · algorithmic belief update

        $$ V_i^{\text{post}} \;=\; w(k_i) \cdot V_i^{\text{prior}} \;+\; \bigl(1 - w(k_i)\bigr) \cdot \frac{1}{|M|}\sum_{m \in M} \hat{v}_m $$
        $$ w(k_i) \;=\; 0.6 + 0.1 \cdot \min(3,\, k_i) \;\in\; [0.60,\, 0.90] $$

        Novice · $k=0$

        $w = 0.60$

        40% weight on peer messages. High social influence.

        Intermediate · $k \in \{1,2\}$

        $w \in \{0.70, 0.80\}$

        Declining openness to external signals.

        Veteran · $k \geq 3$

        $w = 0.90$

        90% self-anchored. Minimal social updating.

        Deterministic — no network calls, no stochasticity beyond the seeded PRNG. Plan I is the offline baseline both LLM plans are compared against.

        Social learning · trust EMA & strategic deception

        $$ \tau_{r \to s} \;\leftarrow\; (1 - \lambda)\,\tau_{r \to s} \;+\; \lambda \cdot \max\!\Bigl(0,\; 1 - \frac{|\hat{v}_s - \mathrm{VWAP}_t|}{\mathrm{VWAP}_t}\Bigr), \qquad \lambda = 0.30 $$

        Trust of receiver $r$ in sender $s$ is updated by exponential moving average. Closeness is measured against the period's volume-weighted average price, clamped to $[0,1]$. Trust matrices persist across round boundaries — they constitute part of the agent's cumulative experience.

        $$ \hat{v}_s^{\text{report}} \;\neq\; V_s^{\text{true}} \;\implies\; \text{lie gap} \;=\; |\hat{v}_s^{\text{report}} - V_s^{\text{true}}| \;>\; 0 $$

        Under a deceptive strategy, the agent broadcasts a claim diverging from its private valuation. The lie-gap magnitude endogenously erodes trust via the EMA and feeds the mean-lie-magnitude diagnostic. This connects to the cheap talk literature (Crawford & Sobel, 1982): signals are credible only when senders' and receivers' incentives align.

        Plan II · LLM update with explicit utility forms

        $$ V_i^{\text{post}} \;\leftarrow\; \mathrm{LLM}\bigl(\pi_i^{\text{II}}\bigr), \qquad \pi_i^{\text{II}} \;\supset\; \bigl\{\,U_L(w)=(w/w_0)^2,\;\; U_N(w)=w/w_0,\;\; U_A(w)=\sqrt{w/w_0}\,\bigr\} $$

        At every period boundary the engine fires one parallel chat completion per Utility agent. The prompt $\pi_i^{\text{II}}$ includes:

        Prompt context $\pi_i^{\text{II}}$

        Explicit risk-utility formula for agent $i$'s type

        Current mark-to-market wealth $w_{i,t}$

        Recent peer claims $\{\hat{v}_m : m \in M\}$

        Trust vector $\{\tau_{i \to s}\}$ for context weighting

        Experience counter $k_i$ and market state

        Execution semantics

        Fire-and-forget: failed LLM call → silent fallback to Plan I for that agent

        Returned valuation $\in [\mathrm{FV}_t \cdot 0.5,\; \mathrm{FV}_t \cdot 2.0]$ (clamped)

        Parallel dispatch: $N_U$ completions per period boundary

        LLM temperature and provider configurable at runtime

        Plan III · LLM update with risk label only

        $$ V_i^{\text{post}} \;\leftarrow\; \mathrm{LLM}\bigl(\pi_i^{\text{III}}\bigr), \qquad \pi_i^{\text{III}} \;\supset\; \{\,\text{risk-loving} \mid \text{risk-neutral} \mid \text{risk-averse}\,\} $$

        Identical wiring and execution semantics to Plan II. The sole difference: the prompt $\pi_i^{\text{III}}$ omits the closed-form $U_L / U_N / U_A$ expressions and supplies only a natural-language risk-preference label.

        Identification argument

        Plan II $\setminus$ Plan III $=$ the causal effect of providing an explicit functional form. If Plan II $\approx$ Plan III, the LLM has already internalised the mapping from "risk-loving" to convex preferences — the formula is redundant. If Plan II $\neq$ Plan III, the functional form carries information the label does not, and the LLM's implicit risk model diverges from the specified one.

        Both Plans II and III require an API key. Fallback semantics, clamping, and parallel dispatch are shared.

        Experimental design · treatments & parameters

        Factorial structure

        Belief channel: Plan I / Plan II / Plan III

        Risk composition: $(\alpha_L, \alpha_N, \alpha_A)$ summing to 100%

        Deception: honest / strategic misreporting

        Population: $N = 6$ Utility agents; risk split $(\alpha_L, \alpha_N, \alpha_A)$

        Session structure: $R = 4$ rounds of $T = 10$ periods

        Controls & reproducibility

        Seeded mulberry32 PRNG: identical $(\text{pop.}, \text{seed})$ → identical run under Plan I

        Endowments: $c_i \sim \mathcal{U}[800, 1200]$, $q_i \sim \mathcal{U}\{2,3,4\}$

        DLM strict mode: $N = 6$, type A/B, T2/T4 R4 replacement

        Tick resolution: $K$ ticks/period $\times\, T \times R$

        $$ \text{Session} \;=\; R \times T \times K \;=\; 4 \times 10 \times 18 \;=\; 720 \;\text{ticks}, \qquad \text{Batch} = 10 \;\text{sessions} \;=\; 7200 \;\text{ticks} $$

        Data collection. Per-round metrics are labelled $\texttt{R\{r\}\_S\{s\}}$ (Round $r$ of Session $s$). One Start press runs all 10 sessions (5 × first treatment + 5 × other), collecting 40 round-level rows (4 rounds × 10 sessions) with mean deviation, turnover, trades, volume, and payoff.

        DLM replication · Strict-DLM mode

        The Strict-DLM paradigm replicates the exact DLM (2005) protocol: $N = 6$ agents, 3 type A ($c = 200$¢, $q = 6$) + 3 type B ($c = 600$¢, $q = 2$), buy-and-hold value $V_{\text{BH}} = 1000$¢. At the round 3→4 boundary, the engine runs Fisher-Yates replacement:

        T2 treatment (R4-⅔)

        $k = 2$ replaced, 4 veterans remain

        Two-thirds of R4 population is experienced. Expected: bubble suppression carries over from R3.

        T4 treatment (R4-⅓)

        $k = 4$ replaced, 2 veterans remain

        One-third experienced. Expected: fresh agents reignite the bubble in R4 despite veteran presence.

        One click of Start runs 10 animated sessions (5 $\times$ first treatment + 5 $\times$ second) at the Speed slider rate. Per-round metrics are collected with $\texttt{R\{r\}\_S\{s\}}$ labels (40 rows total) into the batch results table, with per-treatment aggregates for T2 vs T4 comparison.

        Market-quality metrics

        Haessel $R^2$

        $R^2_H = 1 - \dfrac{\sum_t (\bar{p}_t - \mathrm{FV}_t)^2}{\sum_t (\mathrm{FV}_t - \overline{\mathrm{FV}})^2}$

        Goodness-of-fit of mean transaction price to the fundamental staircase. $R^2_H \to 1$ indicates efficient pricing.

        Normalized deviation

        $\mathrm{ND} = \dfrac{\sum_j |p_j - \mathrm{FV}_{t(j)}| \cdot q_j}{Q}$

        Total absolute mispricing normalised by shares outstanding and horizon. Comparable across $N$.

        Amplitude

        $A = \dfrac{\max_t (\bar{p}_t - \mathrm{FV}_t) - \min_t (\bar{p}_t - \mathrm{FV}_t)}{\mathrm{FV}_1}$

        Peak-to-trough excursion of the mean-price residual. Captures bubble height.

        Turnover & Allocative efficiency

        $\mathrm{TO} = \dfrac{\sum_t q_t}{Q}, \qquad \mathrm{AE} = \dfrac{U_{\text{realized}}}{U_{\max}}$

        TO measures speculative intensity; AE measures whether assets flow to highest-valuation holders.

        Hypotheses

        H1 (LLM–algorithmic equivalence). Under matched seeds and endowments, Plan II produces market-quality metrics $(R^2_H, \mathrm{ND}, A, \mathrm{TO})$ not significantly different from Plan I: $\Delta_{\text{II-I}} \approx 0$.

        H2 (Form–label divergence). Plan II posterior trajectories $\{V_i^{\text{post}}\}$ differ systematically from Plan III: the explicit functional form carries information that the risk label alone does not convey to the LLM.

        H3 (Risk composition effect). Increasing $\alpha_L$ (risk-loving share) amplifies bubble magnitude (higher $A$, lower $R^2_H$), while increasing $\alpha_A$ dampens it — this effect holds across all three plans.

        H4 (Deception–trust interaction). Strategic deception increases ND and decreases AE by corrupting the social-learning signal. The trust EMA partially mitigates this as $\tau_{r \to s} \to 0$ for persistent liars.

        Testable via within-seed paired comparisons (H1, H2) and across-seed Monte Carlo sweeps over the $(\alpha_L, \alpha_N, \alpha_A)$ simplex (H3, H4).

        Results · analysis framework

        The platform supports three levels of analysis, each targeting different hypotheses:

        Within-seed comparison

        Run identical $(\text{seed}, N, \alpha)$ under Plans I, II, III. Compare price trajectories tick-by-tick and metric vectors period-by-period. Directly tests H1 and H2.

        Monte Carlo sweep

        Vary seeds and risk compositions across the $(\alpha_L, \alpha_N, \alpha_A)$ simplex. Aggregate metric distributions to test H3 and H4 under repeated sampling.

        DLM batch replication

        10-session animated batch (5 $\times$ T2 + 5 $\times$ T4). 40 round-level rows ($\texttt{R\{r\}\_S\{s\}}$) with per-treatment aggregates. Validate against DLM 2005 Table 2.

        $$ \text{Diagnostic vector:} \quad \mathbf{d} \;=\; \bigl(R^2_H,\; \mathrm{ND},\; A,\; \mathrm{TO},\; \mathrm{AE},\; \bar{\tau},\; \overline{|\text{lie}|}\bigr) $$

        Robustness & validation

        Internal validity

        Reproducibility: seeded PRNG guarantees identical runs under Plan I. Endowment edits preserve the engine seed.

        Replay system: append-only history arrays enable exact state reconstruction at any tick via $\texttt{buildViewAt}(t)$.

        Sensitivity: all tunables exposed as sliders with safe defaults via $\texttt{ctx.tunables}$; parameter sweeps are first-class.

        External validity & limitations

        LLM stochasticity: temperature $> 0$ introduces irreducible noise; fallback to Plan I on failure introduces survivorship.

        Single CDA environment: results may not generalise to call markets, limit-order books, or other auction formats.

        API latency: async LLM calls may interact with tick timing; clamped valuations bound extreme outputs.

        The platform's no-dependency, browser-native architecture eliminates environment-configuration confounds and enables full portability.

        Contributions

        Theoretical

        Formal expected-utility framework for heterogeneous belief formation in a CDA, connecting the SSW/DLM experimental-markets tradition to the Lopez-Lira EU-scoring approach. First controlled isolation of the LLM belief channel.

        Methodological

        Open-source, browser-native simulation platform with seeded PRNG reproducibility, replay, and multi-paradigm support (Strict-DLM, Lopez-Lira, AIPE). No build step, no dependencies.

        Empirical

        First within-substrate factorial comparison of algorithmic vs. LLM-augmented vs. label-only belief updating. Three-plan design enables causal identification of the information content of utility-function representations.

        Limitations & future work

        Current limitations

        LLM output is non-deterministic: temperature, context window, and model version affect reproducibility.

        Single auction format (CDA); generalisability to other market mechanisms untested.

        Agent heterogeneity limited to the strategy cube ($\text{bias} \times \text{belief} \times \text{risk}$); richer preference spaces unexplored.

        No human subjects — all agents are artificial; ecological validity requires lab validation.

        Extensions

        Multi-provider benchmarking (GPT-4o/5.4, Claude Opus/Sonnet 4.6, Gemini 3/3.1) to test model-dependence of H2.

        Hybrid markets: mix LLM agents with human subjects for ecological validity.

        Richer communication: multi-round dialogue and explicit reasoning chains in Plan II/III prompts.

        Field-data calibration: estimate $(b_i, \varepsilon)$ distributions from market microstructure data.

        Takeaway

        Can an LLM replicate — or improve upon — a calibrated algorithmic belief-update rule in an experimental asset market?
        $$ \text{Plan I} \;\;\overset{?}{\approx}\;\; \text{Plan II} \;\;\overset{?}{\approx}\;\; \text{Plan III} $$

        This platform provides the controlled experimental environment to answer that question: same market, same endowments, same seed — only the belief channel varies.

        Plan I · Algorithmic Plan II · LLM + Utility Forms Plan III · LLM + Risk Label

        Keyboard: ← / → navigate · F fullscreen · Esc exit  |  Switch to the Experiment tab to run a live session