Study Methodology: Sequential Decisions Research

🎯 The Original Experiment

Participants

Healthy Controls (HC): Neurologically normal participants
Left Brain Damaged (LBD): Patients with left hemisphere brain damage
Right Brain Damaged (RBD): Patients with right hemisphere brain damage

Experimental Design

Task: Participants played 600 rounds of Rock-Paper-Scissors against a computer opponent.

Three Phases (participants were not told about these phases):

Phase 1 (Trials 1-200): Computer plays randomly (33% each choice)
Phase 2 (Trials 201-400): Computer plays rock 50% of the time (lightly biased)
Phase 3 (Trials 401-600): Computer plays paper 80% of the time (heavily biased)

Analysis Focus: Only the last 200 trials (Phase 3) were analyzed, as no learning was detected in the first two phases. This heavy bias toward paper provided the clearest test of whether participants could detect and exploit patterns.

🤖 The Two Computational Models

The researchers developed two competing models to explain how humans might learn in this task:

Model 1: ELPH (Statistical Learning)

Core Idea: Learn to predict what the opponent will play next, then choose the winning response.

How It Works:

Memory: Keeps track of the last n moves in Short Term Memory (STM)
Hypotheses: Generates all possible patterns from recent moves (e.g., "rock followed by paper")
Prediction: Each hypothesis tracks what usually comes next and how often
Selection: Uses entropy (randomness measure) to pick the most reliable hypothesis
Pruning: Deletes hypotheses that are too random or unreliable

Example: If the last two moves were "paper, rock", ELPH creates hypotheses like:

"After rock → usually see paper (60% of time)"
"After paper, rock → usually see rock (80% of time)"
"After paper → usually see rock (70% of time)"

It picks the most consistent hypothesis and predicts accordingly.

Model 2: RELPH (Reinforcement Learning)

Core Idea: Learn directly which moves lead to wins in different situations.

How It Works:

Memory: Same STM system as ELPH
Hypotheses: Track which of YOUR moves worked well in each situation
Rewards: +1 for wins, 0 for ties, -1 for losses
Learning: Updates the value of each move based on recent outcomes
Selection: Uses soft-max rule to balance trying the best move vs. exploring alternatives

Example: When the last two moves were "paper, rock", RELPH remembers:

"In this situation, playing scissors gave me +3 total reward"
"Playing rock gave me -1 total reward"
"Playing paper gave me +1 total reward"

It's most likely to choose scissors, but might sometimes explore other options.

Key Parameters:

Learning rate (α): How much weight to give recent vs. old experiences
Soft-max temperature: Balance between exploitation (best move) and exploration (trying alternatives)

📊 Model Comparison & Parameter Estimation

Fitting Models to Human Data

For each human participant, the researchers found the best parameter settings that made each model play most similarly to that person's actual choices.

Parameters Tested:

n (STM length): How many recent moves to remember (typically 1-3)
Hthr (entropy threshold): How reliable a hypothesis must be to keep it
α (learning rate): RELPH only - how quickly to update from new experiences

Model Evaluation

Models were compared using:

Win Rates: How well each model performed against the computer
Learning Curves: How quickly performance improved over time
Choice Sequences: How closely models matched human choice patterns
Statistical Measures: AIC and Bayes factors to compare model quality

🔍 Key Findings

RELPH vs. ELPH Performance

The researchers used both models to predict human behavior patterns by fitting them to the actual choice sequences participants made.

ELPH Problem: When fitted to the same data, ELPH learned the 80% paper bias faster than humans actually did
RELPH Success: RELPH's learning curve matched human learning rates and final performance
Implication: Humans use reinforcement learning (reward-based) rather than statistical learning (prediction-based)

Brain Damage Effects

Healthy Controls: Best modeled by standard RELPH with exploration/exploitation balance
Left Brain Damage: Best modeled by "greedy" RELPH - always chose the best option without exploration
Right Brain Damage: Best modeled by RELPH with poor exploration strategy - gave up on hypotheses too quickly

💡 Implications for Our Web Experiment

Recreating the Original Experiment

First, we'll implement the core experimental design from the original study:

Three-Phase Structure: Random play (33% each), light bias (50% rock), heavy bias (80% paper)
Biased Computer Opponent: Simple probabilistic opponent matching the original study's design
Data Collection: Track participant choices, reaction times, and win rates to analyze learning patterns
Learning Analysis: Apply RELPH/ELPH models to analyze participant behavior patterns

Extensions: AI Opponents & Detection

Beyond the original experiment, we can explore new research questions:

ELPH AI Opponent: Computer opponent that tries to predict player patterns (statistical learning)
RELPH AI Opponent: Computer opponent that learns from wins/losses (reinforcement learning)
AI Detection Task: Can participants identify when they're playing against different AI learning strategies?
Strategy Comparison: How do participants adapt differently to various AI learning approaches?

Research Questions We Can Explore

Original Study Questions:

Do healthy web participants show RELPH-like learning patterns?
Can we detect individual differences in exploration vs. exploitation strategies?
How quickly do participants adapt to biased opponents?
What factors influence learning rate and final performance?

Extension Questions:

How do participants perform against AI opponents using different learning strategies?
Can participants detect whether they're playing against statistical vs. reinforcement learning AI?
Do participants change their strategies when they think they're playing against AI vs. humans?
Which AI learning approach creates more engaging or challenging gameplay?

🔧 Technical Implementation Notes

Algorithm Implementation

For our web experiment, we'll implement:

RELPH AI: A computer opponent that learns from wins/losses and balances exploration/exploitation
ELPH AI: A computer opponent that tries to predict player patterns
Biased AIs: Simple algorithms that favor specific moves with varying probabilities
Data Logging: Track all choices, outcomes, reaction times, and game states
Analysis Tools: Apply RELPH/ELPH models to analyze participant behavior patterns

Detailed Methodology: Sequential Decisions Study

📖 Study Overview