Detailed Methodology: Sequential Decisions Study

Understanding the Research Methods Behind Our Rock-Paper-Scissors Experiment

📖 Study Overview

This page explains the detailed methodology from Mohammadi Sepahvand et al. (2014), which forms the foundation for our web experiment. The study compared two theories of how humans learn in uncertain environments using a Rock-Paper-Scissors task.

🎯 The Original Experiment

Participants

  • Healthy Controls (HC): Neurologically normal participants
  • Left Brain Damaged (LBD): Patients with left hemisphere brain damage
  • Right Brain Damaged (RBD): Patients with right hemisphere brain damage

Experimental Design

Task: Participants played 600 rounds of Rock-Paper-Scissors against a computer opponent.

Three Phases (participants were not told about these phases):

  1. Phase 1 (Trials 1-200): Computer plays randomly (33% each choice)
  2. Phase 2 (Trials 201-400): Computer plays rock 50% of the time (lightly biased)
  3. Phase 3 (Trials 401-600): Computer plays paper 80% of the time (heavily biased)

Analysis Focus: Only the last 200 trials (Phase 3) were analyzed, as no learning was detected in the first two phases. This heavy bias toward paper provided the clearest test of whether participants could detect and exploit patterns.

🤖 The Two Computational Models

The researchers developed two competing models to explain how humans might learn in this task:

Model 1: ELPH (Statistical Learning)

Core Idea: Learn to predict what the opponent will play next, then choose the winning response.

How It Works:

  1. Memory: Keeps track of the last n moves in Short Term Memory (STM)
  2. Hypotheses: Generates all possible patterns from recent moves (e.g., "rock followed by paper")
  3. Prediction: Each hypothesis tracks what usually comes next and how often
  4. Selection: Uses entropy (randomness measure) to pick the most reliable hypothesis
  5. Pruning: Deletes hypotheses that are too random or unreliable

Example: If the last two moves were "paper, rock", ELPH creates hypotheses like:

  • "After rock → usually see paper (60% of time)"
  • "After paper, rock → usually see rock (80% of time)"
  • "After paper → usually see rock (70% of time)"

It picks the most consistent hypothesis and predicts accordingly.

Model 2: RELPH (Reinforcement Learning)

Core Idea: Learn directly which moves lead to wins in different situations.

How It Works:

  1. Memory: Same STM system as ELPH
  2. Hypotheses: Track which of YOUR moves worked well in each situation
  3. Rewards: +1 for wins, 0 for ties, -1 for losses
  4. Learning: Updates the value of each move based on recent outcomes
  5. Selection: Uses soft-max rule to balance trying the best move vs. exploring alternatives

Example: When the last two moves were "paper, rock", RELPH remembers:

  • "In this situation, playing scissors gave me +3 total reward"
  • "Playing rock gave me -1 total reward"
  • "Playing paper gave me +1 total reward"

It's most likely to choose scissors, but might sometimes explore other options.

Key Parameters:

  • Learning rate (α): How much weight to give recent vs. old experiences
  • Soft-max temperature: Balance between exploitation (best move) and exploration (trying alternatives)

📊 Model Comparison & Parameter Estimation

Fitting Models to Human Data

For each human participant, the researchers found the best parameter settings that made each model play most similarly to that person's actual choices.

Parameters Tested:

  • n (STM length): How many recent moves to remember (typically 1-3)
  • Hthr (entropy threshold): How reliable a hypothesis must be to keep it
  • α (learning rate): RELPH only - how quickly to update from new experiences

Model Evaluation

Models were compared using:

  • Win Rates: How well each model performed against the computer
  • Learning Curves: How quickly performance improved over time
  • Choice Sequences: How closely models matched human choice patterns
  • Statistical Measures: AIC and Bayes factors to compare model quality

🔍 Key Findings

RELPH vs. ELPH Performance

The researchers used both models to predict human behavior patterns by fitting them to the actual choice sequences participants made.

  • ELPH Problem: When fitted to the same data, ELPH learned the 80% paper bias faster than humans actually did
  • RELPH Success: RELPH's learning curve matched human learning rates and final performance
  • Implication: Humans use reinforcement learning (reward-based) rather than statistical learning (prediction-based)

Brain Damage Effects

  • Healthy Controls: Best modeled by standard RELPH with exploration/exploitation balance
  • Left Brain Damage: Best modeled by "greedy" RELPH - always chose the best option without exploration
  • Right Brain Damage: Best modeled by RELPH with poor exploration strategy - gave up on hypotheses too quickly

💡 Implications for Our Web Experiment

Recreating the Original Experiment

First, we'll implement the core experimental design from the original study:

  • Three-Phase Structure: Random play (33% each), light bias (50% rock), heavy bias (80% paper)
  • Biased Computer Opponent: Simple probabilistic opponent matching the original study's design
  • Data Collection: Track participant choices, reaction times, and win rates to analyze learning patterns
  • Learning Analysis: Apply RELPH/ELPH models to analyze participant behavior patterns

Extensions: AI Opponents & Detection

Beyond the original experiment, we can explore new research questions:

  • ELPH AI Opponent: Computer opponent that tries to predict player patterns (statistical learning)
  • RELPH AI Opponent: Computer opponent that learns from wins/losses (reinforcement learning)
  • AI Detection Task: Can participants identify when they're playing against different AI learning strategies?
  • Strategy Comparison: How do participants adapt differently to various AI learning approaches?

Research Questions We Can Explore

Original Study Questions:

  • Do healthy web participants show RELPH-like learning patterns?
  • Can we detect individual differences in exploration vs. exploitation strategies?
  • How quickly do participants adapt to biased opponents?
  • What factors influence learning rate and final performance?

Extension Questions:

  • How do participants perform against AI opponents using different learning strategies?
  • Can participants detect whether they're playing against statistical vs. reinforcement learning AI?
  • Do participants change their strategies when they think they're playing against AI vs. humans?
  • Which AI learning approach creates more engaging or challenging gameplay?

🔧 Technical Implementation Notes

Algorithm Implementation

For our web experiment, we'll implement:

  • RELPH AI: A computer opponent that learns from wins/losses and balances exploration/exploitation
  • ELPH AI: A computer opponent that tries to predict player patterns
  • Biased AIs: Simple algorithms that favor specific moves with varying probabilities
  • Data Logging: Track all choices, outcomes, reaction times, and game states
  • Analysis Tools: Apply RELPH/ELPH models to analyze participant behavior patterns