Sigil Harvesting minigame testing - elanthia-online/dr-scripts GitHub Wiki

Sigil Harvesting: Developer Reference

Development guide for sigilharvest.lic (v2.0.0) Tests: spec/sigilharvest_spec.rb (129 examples) Analyzer: sigilharvest_analyzer.rb

1. Quick Reference

Files

File Purpose
sigilharvest.lic Script source, v2.0.0
sigilharvest_analyzer.rb Log analyzer (parses banners + per-sigil summaries)
spec/sigilharvest_spec.rb RSpec test suite (129 examples, current with v2.0.0)
session_splitter.rb Log session extractor (handles split sessions across files)
spec/spec_helper.rb Shared test mocks (Lich, XMLData, Script, etc.)
data/sigils.yaml Room lists by [City][SigilType][Season]

Commands

# Run tests
bundle exec rspec spec/sigilharvest_spec.rb

# Lint after changes
rubocop sigilharvest.lic
rubocop spec/sigilharvest_spec.rb

# Extract sessions from game logs (handles split sessions)
ruby session_splitter.rb --version X.Y.Z path/to/*.log
ruby session_splitter.rb --dry-run --version X.Y.Z path/to/*.log

Version

Always bump VERSION in sigilharvest.lic:6 when changing the script. Current: VERSION = '2.0.0'

Script Invocation

;sigilharvest <city> <sigil> <precision> [minutes] [debug]
;sigilharvest Shard permutation 90 60 debug

2. Code Architecture

Class: SigilHarvest

Single class. initialize (line 8) sets up all state and runs the main loop. Tests bypass initialize entirely using SigilHarvest.allocate + instance_variable_set.

Method Map

Method Line Visibility Purpose
initialize 8 public Setup state, parse args, start main loop
find_sigils(city, sigil) 89 public Outer loop: iterate rooms, call harvest_sigil
harvest_sigil(sigil) 114 public Find sigil in room, run improvement loop
check_sigil(sigil) 162 public Verify sigil type matches target
improve_sigil(precision) 178 public Core algorithm: one iteration of action selection
sigil_info(command) 296 public Send perc sigil <cmd>, parse response via serial waitfor
scribe_sigils 371 public Scribe sigil onto scroll, manage inventory
get_season 392 public Query game for current season
get_techniques 397 private Detect active harvesting techniques
get_scrolls 416 public Buy blank scrolls if below stock level
log_startup_banner 457 private Print session config + techniques
log_sigil_summary(sigil, result) 475 private Log per-sigil result line (with total elapsed)
log_exit_summary 484 private Print session statistics
format_techniques(techniques) 513 private Format technique array for display
elapsed_minutes 519 private Minutes since session start
sigil_elapsed_minutes 523 private Minutes since current sigil started
time_expired? 527 private True when elapsed >= time limit
contest_stat_for(resource) 532 private Map resource name to level ivar
precision_action_viable?(action, contest_stat, precision) 544 private Viability gate for precision actions
select_repair_action(action, ...) 559 private Check if action qualifies as resource repair; yields if yes

Call Flow

initialize
  โ†’ get_techniques()         # detect Inspired/Enlightened
  โ†’ log_startup_banner()
  โ†’ find_sigils(city, sigil)
      โ†’ harvest_sigil(sigil)
          โ†’ sigil_info('improve')     # first call: parse initial menu
          โ†’ improve_sigil(precision)   # loop: returns true to continue, false to stop
              โ†’ precision_action_viable?()
              โ†’ select_repair_action()
              โ†’ sigil_info(verb)       # execute chosen action
              โ†’ scribe_sigils()        # if target reached
          โ†’ log_sigil_summary(sigil, result)
  โ†’ log_exit_summary()

Instance Variables (State)

Variable Type Default Set By Purpose
@sigil_precision Integer 0 sigil_info Current sigil precision (0-100)
@sigil_clarity Integer 0 sigil_info Current sigil clarity (0-100)
@danger_lvl Integer 0 sigil_info Danger meter (0-20 stars)
@sanity_lvl Integer 15 sigil_info Sanity resource (0-20 stars)
@resolve_lvl Integer 15 sigil_info Resolve resource (0-20 stars)
@focus_lvl Integer 15 sigil_info Focus resource (0-20 stars)
@num_iterations Integer 0 harvest_sigil Iterations used this sigil (cap: 15)
@num_aspect_repairs Integer 0 improve_sigil Repair actions taken this sigil
@sigil_improvement Array[Hash] [] sigil_info Current action menu (3-8 actions)
@sigil_count Integer 0 harvest_sigil Total sigils encountered
@sigil_results Array[Hash] [] log_sigil_summary Per-sigil outcome records
@scribed_in_session Boolean false find_sigils True after first successful scribe
@rooms_visited Integer 0 find_sigils Rooms traversed
@start_time Time Time.now initialize Session start
@sigil_start_time Time Time.now harvest_sigil Current sigil start
@time_limit Integer 30 initialize Minutes before auto-stop

Action Hash Structure

Each entry in @sigil_improvement is a Hash:

{
  "difficulty" => Integer,  # 1-5 (trivial..formidable)
  "resource"   => String,   # "sanity" | "resolve" | "focus"
  "impact"     => Integer,  # 1-3 (taxing..destroying)
  "verb"       => String,   # game verb to execute (e.g., "analyze", "study")
  "target"     => String,   # "sigil" (improve) | "your" (repair)
  "aspect"     => String,   # "precision" | "quality" | resource name (repair)
  "risk"       => Integer   # difficulty + impact
}

3. Core Algorithm: improve_sigil (line 178)

Called once per iteration. Returns true to continue loop, false to stop.

v1.5.0 algorithm = v1.2.0 algorithm (reverted from v1.4.1, see ยง11 for why). Key differences from v1.4.1: no resource floor check, no high_target mode, allows formidable actions, risk-based action selection, no move budget check.

Decision Tree

improve_sigil(precision)
โ”‚
โ”œโ”€ Phase 1: Pre-scan (lines 187-203)
โ”‚  Scan @sigil_improvement for precision actions with tight margin (stat - diff < 2)
โ”‚  and difficulty >= 3. Store as best_repair_aspect / second_best_repair_aspect.
โ”‚  Purpose: identify which resource to repair proactively.
โ”‚
โ”œโ”€ Phase 2: Select action (lines 205-244)
โ”‚  For each action in @sigil_improvement:
โ”‚  โ”‚
โ”‚  โ”œโ”€ Precision action? (aspect == "precision")
โ”‚  โ”‚  โ”œโ”€ precision_action_viable?() โ†’ check margin, accept formidable
โ”‚  โ”‚  โ””โ”€ Selection priority (lines 217-229):
โ”‚  โ”‚     โ”œโ”€ First viable action โ†’ stored unconditionally
โ”‚  โ”‚     โ”œโ”€ precision < (target - 20) โ†’ prefer LOWEST risk (far from goal)
โ”‚  โ”‚     โ””โ”€ precision >= (target - 20) โ†’ prefer HIGHEST risk (close to goal)
โ”‚  โ”‚
โ”‚  โ””โ”€ Repair candidate? โ†’ select_repair_action() (lines 234-243)
โ”‚
โ”œโ”€ Phase 3: Bail-out checks (lines 248-288)
โ”‚  โ”œโ”€ Scribe check (line 251): precision >= target, OR near cap + within 5
โ”‚  โ”œโ”€ Iteration cap (line 261): iterations >= 15
โ”‚  โ”œโ”€ Resource exhaustion (line 270): (san+res+foc)*2.25 + prec < target-5
โ”‚  โ””โ”€ Move budget (line 283): (14-iters)*15 < (target-prec-5), when prec <= 80
โ”‚
โ”œโ”€ Phase 4: Execute or refresh (lines 277-291)
โ”‚  โ”œโ”€ Repair override (lines 278-284): use repair if no precision action found
โ”‚  โ””โ”€ Execute action or refresh (lines 287-291)
โ”‚
โ””โ”€ return true (continue loop)

Selection Priority Matrix

Given two precision actions that both pass viability:

Condition Prefers Rationale
precision < target - 20 Lowest risk Far from goal, conserve resources
precision >= target - 20 Highest risk Close to goal, sprint to finish

Note: No danger-based switching. No high_target mode. This simpler strategy outperforms the v1.4.1 approach in production data (see ยง11).


4. Key Formulas

Viability Filter (line 544)

def precision_action_viable?(action, contest_stat, precision)
  difficulty = action['difficulty'].to_i
  margin = contest_stat - difficulty

  # Path 1: comfortable margin (>1) โ€” accept any difficulty including trivial
  return true if margin > 1

  # Path 2: any margin (>0) AND challenging+ difficulty
  return true if margin > 0 && difficulty > 2

  false
end

Key properties:

  • Allows formidable (difficulty=5) โ€” unlike v1.4.1 which blocked them.
  • Accepts trivial actions (since v1.5.4/EXP-2) โ€” any action with margin > 1 is viable.
  • Path 2 still requires challenging+ for tight margins.
  • precision parameter retained for API stability but not currently used by either path.

Bail-Out Formulas

Check Formula Applies When
Scribe precision >= target OR (iters >= 15 OR (iters == 14 AND no action)) AND precision >= target - 5 Always
Iteration cap iterations >= 15 Always
Resource exhaustion (san + res + foc) * 2.25 + precision < target - 5 precision <= 80
Move budget (14 - iterations) * 13 < (target - precision - 5) precision <= 80

Move budget was tested for removal in v1.5.0 โ€” results were worse. Tightened from * 15 to * 13 in v1.5.8/EXP-5 (kept). Original removal results: (mishap rate 49.9% โ†’ 74.8%, min per 80+ sigil 58 โ†’ 114). The check was restored in v1.5.2. It acts as protective early bail-out: sigils falling behind pace are cut loose before they accumulate danger and mishap. See EXP-1 in ยง13 for full data.

Repair Qualification (line 559)

def select_repair_action(action, contest_stat, precision, repair_target, current_repair)
  return unless action['difficulty'].to_i <= 3
  return unless repair_target.key?("difficulty")
  return unless (contest_stat - action['difficulty'].to_i) >= 2
  return unless @sigil_precision >= (precision - 15)
  return unless action['aspect'] == repair_target['resource']
  # ...
end

Repair is only considered when:

  1. Action difficulty <= 3 (not difficult/formidable)
  2. A repair target was identified in Phase 1
  3. Comfortable margin (>= 2)
  4. Close to target (within 15)
  5. Action's aspect matches the repair target's resource

5. Minigame Mechanics

Game Flow

  1. perc sigil โ€” search for sigils (repeats until found)
  2. perc sigil improve โ€” begin improvement / reroll action menu
  3. perc sigil <VERB> โ€” execute chosen action
  4. scribe sigil โ€” scribe when target precision reached

Resources (0-20 stars each)

Resource Start Direction Role
Danger 0 Increases Mishap probability. Rises ~1-2 per 3 iterations
Sanity 15 Decreases Consumed by sanity-cost actions
Resolve 15 Decreases Consumed by resolve-cost actions
Focus 15 Decreases Consumed by focus-cost actions

Resources are parsed by counting * in game output: "***-----" โ†’ 3

Action Properties

Property Values Parsed From
Difficulty trivial(1), straightforward(2), challenging(3), difficult(4), formidable(5) 1st word
Resource sanity, resolve, focus 2nd word
Impact taxing(1), disrupting(2), destroying(3) 3rd word
Verb game-specific (e.g., FORM, METHOD, STUDY) 4th word
Target "your" (repair) or "sigil" (improve) 5th word
Aspect precision, quality, or resource name 6th word

Risk = difficulty + impact. Impact also equals resource drain in stars.

Precision Gains (Empirical, N=142)

Difficulty Avg Gain Min Max Zero-Gain Rate
trivial(1) 2.0 0 3 15.4%
straightforward(2) 3.9 0 6 13.5%
challenging(3) 9.6 0 14 3.2%
difficult(4) 13.2 11 16 0.0%
formidable(5) N/A N/A N/A (never taken)

Critical finding: Gains are constant regardless of current precision level. A difficult action at precision 10 gains the same ~13 as at precision 50.

Mishap System

Mishaps end improvement prematurely. They are stochastic โ€” they occur at all danger levels (observed at danger 0 through 11). Danger increases probability but does not guarantee safety at any level.

Type Pattern Effect
Stumble "About the area you wander" Sigil lost
Lose Track "You lose track of your surroundings" Sigil lost
Sneeze "A sudden sneeze" Sigil lost
Chills "Chills creep down your spine" Sigil lost
Resource Collapse "Your resolve/sanity/focus collapses" Sigil lost + stun
Other-planar "rouse the attention of some other-planar entity" Action fails, 0 gain (non-terminal)
Vanished "The sigil has vanished" Sigil despawned (non-algorithmic)
Combat "You are too distracted" Combat interrupted session

Resource collapse is a distinct failure mode โ€” it happens when an individual resource is critically depleted, even at danger 0.

Starting Precision

Sigils spawn with random starting precision (observed range: 1-15, roughly uniform). Skip filters: target >= 80 โ†’ skip if prec < 10; target >= 65 โ†’ skip if prec < 5.

Iteration Budget

Hard cap: 15 iterations. Each action OR refresh costs 1 iteration. Typical productive path: ~9-10 actions + ~5-6 refreshes.

Path to 90 (Theoretical)

Starting at precision 12 (typical when filtering >= 10), need ~78 gain. At 13.2 avg per difficult action: 6 difficult actions required. With refreshes, that's ~12 iterations. Achievable but requires favorable RNG.

Iter 1:  improve (reroll) โ†’ get menu with difficult action
Iter 2:  difficult action โ†’ precision=25 (+13)
Iter 3:  improve (reroll)
Iter 4:  difficult action โ†’ precision=38 (+13)
Iter 5:  repair (restore resource)
Iter 6:  difficult action โ†’ precision=51 (+13)
Iter 7:  improve (reroll)
Iter 8:  difficult action โ†’ precision=64 (+13)
Iter 9:  repair
Iter 10: difficult action โ†’ precision=77 (+13)
Iter 11: improve (reroll)
Iter 12: difficult action โ†’ precision=90 (+13) โ†’ SCRIBE

90+ is a viable target. The math works โ€” 6 difficult actions fit within the 15-iteration cap with room for refreshes and repairs. Success rate is governed by game randomness (action menu RNG, mishap rolls), not by an algorithmic ceiling. The script's job is to maximize the probability by making optimal decisions with whatever actions the game offers.


6. Known Weaknesses & Improvement Opportunities

6A. High Refresh Rate (37% of iterations wasted)

Problem: The viability filter rejects actions too aggressively, causing refreshes that yield zero precision. A trivial action (+2) is always better than a refresh (+0).

Location: precision_action_viable? (line 544), and the fallback chain.

Improvement: Consider loosening the viability filter for low-difficulty actions.

6B. Low-Value Actions Not Compared to Repair Value

Problem: When only trivial/straightforward precision actions are available, the script takes them rather than repairing a resource to enable a difficult action next turn. +2 now vs enabling +13 later is always worse.

Improvement: Compare expected value: selected_action.difficulty * ~3.3 vs best_repair_aspect.difficulty * ~3.3 (next turn). If repair enables a substantially better action, prefer repair.

6C. No Composite Resource Health Check

Problem: The script checks danger level but not the minimum across individual resources. Observed: Sigil #84 collapsed at danger=0 because one resource was critically depleted.

Improvement: Add a guard: if [sanity, resolve, focus].min <= 1 โ†’ bail out or avoid actions consuming the depleted resource.

6D. Danger Thresholds May Be Misdirected

Problem: Mishaps occurred at danger 0-11 (not just high danger). Conservative play at high danger doesn't reliably prevent mishaps.

Data: Mishaps observed at danger 0-3 (3 events), 5 (3), 7 (3), 11 (3).

Improvement: Consider whether conservative/aggressive modes should be driven by remaining iterations and distance to target rather than danger level alone.

6E. Skip Filter Overhead

The starting-precision skip filter (< 10 for target >= 80) is mathematically correct โ€” a sigil starting at precision 3 cannot reach 90 in 15 iterations. The 61% skip rate is an inherent property of the game's random starting precision distribution, not an algorithm deficiency. The script correctly discards unwinnable sigils early.


7. Testing Guide

Test Setup Pattern

Tests bypass initialize (which calls game APIs) using allocate:

obj = SigilHarvest.allocate
obj.instance_variable_set(:@sigil_precision, 50)
obj.instance_variable_set(:@danger_lvl, 5)
# ... set all required ivars

The helper build_sigilharvest (spec line 241) handles all default ivars. Override only what your test needs:

let(:obj) { build_sigilharvest }

before do
  allow(obj).to receive(:sigil_info).and_return(false)
  allow(obj).to receive(:scribe_sigils)
end

it 'does something' do
  obj.instance_variable_set(:@sigil_precision, 80)
  obj.instance_variable_set(:@danger_lvl, 7)
  obj.instance_variable_set(:@sigil_improvement, [action1, action2])
  obj.send(:improve_sigil, 90)
  expect(obj).to have_received(:sigil_info).with('analyze')
end

Default Resource Levels in build_sigilharvest

The helper sets resources to 5 (not the game's starting 15):

obj.instance_variable_set(:@sanity_lvl, 5)
obj.instance_variable_set(:@resolve_lvl, 5)
obj.instance_variable_set(:@focus_lvl, 5)

This is low. Always set explicit resource levels in tests that involve action selection.

Building Actions

action = build_improvement(
  "difficulty" => 4,
  "resource"   => "sanity",
  "impact"     => 2,
  "verb"       => "analyze",
  "target"     => "sigil",
  "aspect"     => "precision",
  "risk"       => 6
)

Defaults: difficulty=3, resource="sanity", impact=2, verb="analyze", target="sigil", aspect="precision", risk=5.

Mock Modules

Tests define these mock modules at top level (not via spec_helper):

Module Mocks Key Global
DRC message, bput $sigil_messages, $sigil_bput_log/responses
DRCA do_buffs $sigil_actions
DRCI stow_hands, get_item?, stow_item?, count_item_parts $sigil_actions, $sigil_scroll_count
DRCC get_crafting_item, stow_crafting_item $sigil_actions
DRCT walk_to, order_item $sigil_walks, $sigil_actions
DRCM ensure_copper_on_hand $sigil_actions
DRStats trader?, circle Internal @trader, @circle
Flags add, delete, reset, [], []= Internal @flags
Room current, [] Internal @current_id

reset_test_state! (spec line 301) clears all globals before each test.

Script Loading

source = File.read(SIGILHARVEST_SOURCE_PATH)
source = source.sub(/\A=begin.*?=end\s*/m, '')           # strip doc block
source = source.sub(/^before_dying do.*?end\s*SigilHarvest\.new\s*\z/m, '')  # strip entry point
eval(source, TOPLEVEL_BINDING, SIGILHARVEST_SOURCE_PATH, 1)

The =begin/=end block and the final SigilHarvest.new + before_dying are stripped so the class is loaded without executing.

Test Compatibility

Current tests (151) are written for v1.4.1 and will fail against v1.5.0. v1.4.1-specific behaviors tested include: formidable blocking, resource floor check, high_target mode, danger_threshold switching, scribe at iteration 8, skip threshold 13, batch get capture. All of these were removed/reverted in v1.5.0. Tests need rewriting to match the v1.2.0 algorithm.


8. Log Analysis Infrastructure

Analyzer: sigilharvest_analyzer.rb

Parses structured log output from sigilharvest sessions. Works with v1.2.0+ log format.

Key data structures:

  • SessionInfo โ€” metadata from the startup banner (version, city, sigil, precision target, techniques)
  • SigilRun โ€” per-sigil outcome record with fields:
    • number, sigil_type, result, target_precision, final_precision
    • starting_precision, iterations, final_danger, room, elapsed_minutes
    • precision_history, actions_taken, refresh_count, repair_count
    • failed_action_count, mishap_type, stop_reason, danger_history
    • resource_snapshots, session_index, session_elapsed_minutes

Log format parsed:

== SigilHarvest v1.5.0 ==
[Sigil #1] type=permutation result=mishap precision=42/90 iterations=8 danger=7 room=3 elapsed=2.1m total=5.3m
== End SigilHarvest v1.5.0 ==

The total= field (session elapsed at sigil completion) is optional for backward compatibility.

Log File Collection Procedure

Game logs are stored at: /Users/grocha/angua/lich-5-mine/logs/DR-<CharName>/<year>/<month>/

Steps to collect session logs:

  1. Identify which characters ran sessions โ€” the user provides character names, start times, and whether sessions span log file boundaries (log rotation at midnight or size threshold).

  2. Handle multi-file sessions โ€” some sessions span two log files (e.g., started in 2026-02-01-0627.log, continued in 2026-02-01-0712.log). These must be concatenated before analysis: cat file1.log file2.log > CharName_HHMM.log

  3. Copy to version-specific directory โ€” store extracted logs under ~/SH_logs/<version>/:

    ~/SH_logs/v1.2.0/   โ€” 10 v1.2.0 session logs
    ~/SH_logs/v1.4.1/   โ€” 10 v1.4.1 session logs
    ~/SH_logs/v1.5.3/   โ€” 10 v1.5.3 baseline session logs
    ~/SH_logs/v1.5.4/   โ€” 9 v1.5.4 EXP-2 session logs
    ~/SH_logs/v1.5.5/   โ€” 10 v1.5.5 EXP-3 session logs
    ~/SH_logs/v1.5.6/   โ€” 10 v1.5.6 EXP-4 session logs
    ~/SH_logs/v1.5.7/   โ€” 10 v1.5.7 baseline restore session logs
    ~/SH_logs/v1.5.8/   โ€” 10 v1.5.8 EXP-5 session logs
    ~/SH_logs/v1.5.9/   โ€” 10 v1.5.9 Illuminated technique test logs (4 truncated)
    ~/SH_logs/permutation/ โ€” 23 v1.3.x baseline logs
    
  4. Naming convention: CharName_HHMM.log (start time of session, 24h format).

  5. Verify completeness โ€” each log file should contain both == SigilHarvest v<X> == (banner) and == End SigilHarvest v<X> == (exit summary). If the exit summary is missing, the session was interrupted.

  6. Run analyzer โ€” write a Ruby script that iterates over the log directory and calls the analyzer's parse methods. Example pattern:

    require_relative 'sigilharvest_analyzer'
    Dir.glob('/Users/grocha/SH_logs/v1.5.0/*.log').each do |f|
      analyzer = SigilHarvestAnalyzer.new
      analyzer.parse_file(f)
      # ... aggregate results
    end

Gotchas:

  • find command can hang/timeout on large log directory trees. Use explicit ls with known directory paths instead.
  • Some sessions show techniques as ["Inspired", "and Enlightened"] โ€” the "and" isn't stripped by the split regex. Cosmetic only; does not affect analysis.
  • The session_index field tracks which session a run belongs to within a multi-session file. When analyzing across files, track the filename alongside each run for per-character breakdown.

9. Observed Session Statistics

v1.2.0 vs v1.4.1 Head-to-Head (2026-02-01)

Both groups: Shard / permutation / target=90 / 60 minutes / Inspired+Enlightened techniques. 10 sessions each, across 10 different characters.

Metric v1.2.0 (10 sessions) v1.4.1 (10 sessions)
Sigils worked 339 194
Avg precision 54.3 47.8
Max precision 86 86
Scribed (>=90) 1 0
Sigils >= 80 9 (2.7%) 2 (1.0%)
Mishap rate 49.9% 76.8%
Avg danger at mishap 8.2 12.1
Minutes per 80+ sigil 67 300

v1.2.0 outperforms v1.4.1 on every metric.

Key Findings

  1. v1.4.1's filtering over-restricted action selection. The resource floor check, formidable blocking, and high_target mode collectively forced more refreshes and lower precision outcomes. The simpler v1.2.0 selection strategy works better.

  2. v1.4.1 pushes danger to ceiling then mishaps. Danger-at-mishap distribution: v1.4.1 clusters at 17-18 (mean 12.1), while v1.2.0 keeps danger distributed (mean 8.2) and reaches higher precision before mishapping.

  3. Move budget check prematurely terminated 37.5% of v1.2.0 sigils at avg precision 55.3. These sigils had remaining iterations that could have gained more precision. Removed in v1.5.0.

  4. Mishaps are stochastic at all danger levels. Conservative danger-based strategy switching provides less value than expected. Aggressive play that reaches high precision quickly (before mishap occurs) appears more effective.

v1.3.2 Baseline (23 sessions, earlier data)

Metric Value
Minutes per 80+ sigil ~241
Compared to v1.2.0 (75 min reported, 67 min measured) 3.2x regression

10. Version History

Version Algorithm Key Changes
v1.2.0 Original "best" Risk-based selection, allows formidable, serial waitfor, move budget check
v1.3.2 Modified Various changes from upstream โ€” 3.2x regression vs v1.2.0
v1.4.0 Redesigned Formidable blocking, resource floor, high_target mode, scribe at iter 8, skip threshold 13, no move budget
v1.4.1 Patched v1.4.0 Added sigil_vanished/combat_distracted stop reasons, per-sigil elapsed time
v1.5.0 Reverted to v1.2.0 v1.2.0 algorithm + v1.4.1 logging infrastructure + move budget removed
v1.5.1 Patch Added validate_tools pre-flight check (burin/bag/settings + inventory)
v1.5.2 Baseline candidate Restored move budget check (v1.5.0 data proved removal harmful)
v1.5.3 Clean baseline Version tick only โ€” separates clean runs from v1.5.2 burin-retry noise
v1.5.4 EXP-2 (kept) Accept trivial actions (Path 1 change) + burin validate_tools fix
v1.5.5 EXP-3 (reverted) Prefer repair over trivial/straightforward precision actions
v1.5.6 EXP-4 (reverted) Composite resource health guard (skip actions on near-depleted resources)
v1.5.7 Baseline restore EXP-4 reverted, back to v1.5.4 algorithm
v1.5.8 EXP-5 (kept) Tighten move budget formula (15โ†’13 precision/move)
v1.5.9 Technique test Illuminated Sigil Comprehension enabled (no algorithm change)
v1.5.10 EXP-6 Fix difficulty ordering, filter ACTION verb
v1.5.11 EXP-10+11 (reverted) Raise skip threshold < 13 + velocity bail-out < 4/iter after 5
v1.5.12 EXP-10 (kept) Skip threshold < 13 (standalone retest)
v1.5.13 EXP-7 (kept) Difficulty-based action selection (decouple risk from cost)
v1.5.14 EXP-12 (reverted) Loosen viability margin (accept margin=0 for challenging+)
v1.5.15 EXP-9 (kept) Recalibrate resource exhaustion coefficient (2.25โ†’1.75)
v1.5.16 EXP-12r (reverted) Loosen viability margin (retest with corrected baseline)
v1.5.17 TECH-AWK (kept) Awakened Sigil Comprehension โ€” >=80 improvement observed, mechanism unknown
v1.5.18 EXP-13 (revert) Remove iteration cap + move budget; resource-only bail-out; skip <15 for target 90
v1.5.19 EXP-14 (revert) Equalize action costs per Urbaj (all cost labels = 1) โ€” mishap rate +11.4pp vs EXP-13 base
v1.5.20 Baseline restore Revert to v1.5.17 algorithm + C1 fix (@actually_scribed). Missing EXP-9 resource check.
v1.5.21 Corrected baseline Restores EXP-9 resource exhaustion check. Baseline confirmed: all metrics match v1.5.17.
v1.5.22 EXP-15 (revert) Align move budget max with iteration cap (14โ†’15). Mishap +11.7pp (Z=2.67), 0 scribes, >=80 7โ†’6.
v1.5.23 EXP-16 (reverted) Tighten resource exhaustion coefficient (1.75โ†’1.5). Mathematically impossible at target 90.
v1.5.24 EXP-14r (kept) Cost equalization clean retest ({1,2,3}โ†’{1,1,1}). Neutral (mishap -1.6pp, n.s.).
v1.5.25 EXP-17 (kept) Resource-aware tiebreaker on tied difficulty+cost. Prefer most-available resource.
v1.5.26 D7 fix (infra) Unconditional repair logging. Repairs=0 in 2606 iters. Phase 3/4 CLOSED.
v1.5.27 EXP-18 (kept) Min difficulty threshold. Skip trivial, refresh for better menu. Avg gain 7.18โ†’8.54.
v2.0.0 Release v1.5.27 promoted to v2.0.0 for upstream PR. 22 experiments validated, 100% decision agreement.

Infrastructure (all versions v1.5.0+)

Common infrastructure from v1.4.1, present in all v1.5.x versions:

  • Time-based sessions with @time_limit, time_expired?
  • Structured logging: log_startup_banner, log_sigil_summary, log_exit_summary
  • Per-sigil timing (elapsed= and total= in summary line)
  • Technique detection via get_techniques / format_techniques
  • Analyzer-compatible output format
  • Pre-flight tool validation (validate_tools)

Infrastructure Changes (v1.5.18+)

  • Scribe counting (v1.5.18): scribe_sigils now counts individual scribes and logs "Scribes: N". Analyzer parses this via SCRIBES_COUNT regex and populates scribe_count field on SigilRun. Falls back to counting raw SCRIBE_SUCCESS lines for older logs. Enables per-sigil scribe yield tracking for Awakened analysis.
  • Terminology rename (v1.5.18โ†’v1.5.19): scroll_count โ†’ scribe_count across all files (sigilharvest.lic, sigilharvest_analyzer.rb, reanalyze_all.rb). "Scrolls scribed" โ†’ "Scribes" in log output. Aligns terminology with game mechanics (scribing, not scrolling).
  • Banner cleanup (v1.5.18): Removed belt line from startup banner.
  • Cross-version analysis (v1.5.17+): reanalyze_all.rb uses all_session_runs(version) to handle merged log files with multiple sessions of the same version (e.g., batch 1 + batch 2 in v1.5.17). Tracks scribe yield metrics: total scribes, avg scribes/sigil, scribes/session.
  • Session splitter (v1.5.18+): session_splitter.rb extracts SigilHarvest sessions from full game logs. Handles the key problem of sessions split across two log files from the same character (reconnects mid-session). Groups files by character, sorts chronologically, reads as a virtual concatenation so split sessions are seamlessly joined. Usage: ruby session_splitter.rb --version 1.5.18 path1.log path2.log ... Output goes to ~/SH_logs/vX.Y.Z/DR-CharName_timestamp.log. Options: --dry-run, --output DIR, --keep-temp, --version VER.

11. Game Context

Seasonal / City Factors

Sigils appear based on city, sigil type, and season. Room lists loaded from data/sigils.yaml keyed by [City][SigilType][Season].

Cities: Crossing, Riverhaven, Shard. Seasons: spring, summer, autumn, winter.

Scroll Management

  • Stacks of 25. Auto-buys when below stock level (default 25).
  • Prices: Crossing=125kr, Riverhaven=100lr, Shard=90dok.

Trader Luck Mechanic

For Trader guild at circle 65+: speculate luck on first iteration when starting precision >= 14 (line 422). May improve RNG outcomes.

Precision / Clarity Flavor Text

Precision Description Clarity Description
0-29 broad strokes 85-89 exquisite
30-49 thick strands 90-94 flawless
50-69 many fibers 95-97 flawless
70-89 thin lines 98-99 immaculate
90+ (scribing target)

Mishap Patterns (for @mishaps regex)

@mishaps = /Chills creep down your spine|About the area you wander|A sudden sneeze|
            You lose track|You prepare yourself for continued exertion|You are too distracted/

12. Development Practices

Version Bumping

Every change to sigilharvest.lic must include a version bump to the VERSION constant (line 6). The analyzer parses this from log banners, so version changes are how we distinguish data collected under different script behavior.

Patch bump (e.g., 1.5.0 โ†’ 1.5.1): Bug fixes, new metadata/logging, minor tuning of thresholds or constants, adding new banner fields.

Minor bump (e.g., 1.5.x โ†’ 1.6.0): Algorithm changes that affect sigil outcomes (action selection logic, danger thresholds, skip filters, resource management), new game command integrations, structural refactors.

Major bump (e.g., 1.x โ†’ 2.0.0): Fundamental redesign of the improvement loop, breaking changes to log format that require analyzer updates, new operating modes.

Data-Driven Development

All algorithm changes require empirical validation:

  1. Run 10 sessions (same params, same techniques, Inspired+Enlightened) with the change
  2. Analyze with sigilharvest_analyzer.rb โ€” filter by version when log files contain multiple sessions
  3. Compare head-to-head against current baseline
  4. Key metric: minutes per 80+ precision sigil (lower is better)
  5. Supporting metrics: mishap rate, avg precision, sigils worked per session
  6. Only one algorithm change per version โ€” isolate variables
  7. If worse: revert the change, restore baseline, document the result
  8. If better: the new version becomes the baseline for the next experiment

Analysis Script Notes

When writing analysis scripts, use these patterns (learned from prior bugs):

  • Filter by version: parser.sessions.each_with_index โ†’ skip sessions where session.version != target
  • Classify skipped: iterations == 0 (not result == 'SKIPPED' โ€” parser sets result to "FAILED" for all)
  • Classify mishaps: stop_reason == :mishap (not result == 'mishap')
  • Per-character breakdown: extract character name from filename, track file alongside each run

13. Experimental Testing Plan

Protocol

  • Test params: Shard / permutation / target=90 / 60 minutes / Inspired+Enlightened
  • Sample size: 10 sessions per experiment (9 minimum if a character has a config issue)
  • Baseline: v1.5.2 (= v1.2.0 algorithm + v1.4.1 infrastructure + tools check)
  • Procedure: One algorithm change per version. Run 10 sessions. Analyze. Decide keep/revert.
  • Log storage: ~/SH_logs/v<version>/

Current Baseline: v1.5.4 (EXP-2)

v1.2.0 algorithm with EXP-2 (accept trivial actions), move budget, v1.4.1 logging infrastructure, pre-flight tool validation with multi-attempt burin resolution.

v1.5.4 measured performance (9 sessions, 391 worked, 561 skipped):

  • Avg precision: 52.4 | Max: 85 | Scribed: 1
  • Sigils >= 80: 5 (1.3%) | Mishap rate: 55.0%
  • Avg danger at mishap: 8.5 | Total minutes: 546
  • Worked/session: 43.4 | >=80/session: 0.6
  • Min per 80+ sigil: 109

v1.5.3 previous baseline (10 sessions, 335 worked, 495 skipped):

  • Avg precision: 53.5 | Max: 90 | Scribed: 2
  • Sigils >= 80: 4 (1.2%) | Mishap rate: 52.8%
  • Worked/session: 33.5 | >=80/session: 0.4
  • Min per 80+ sigil: 137

Note: Min-per-80+ has high variance at these sample sizes (~1-3% of sigils reach 80). Per-session normalized metrics (worked/session, >=80/session) are more stable.

Completed Experiments

EXP-1: Remove move budget check (v1.5.0)

  • Hypothesis: The move budget formula (14 - iters) * 15 < (target - prec - 5) bails out too early. 37.5% of v1.2.0 sigils hit moves_exhausted at avg precision 55.3. Removing it lets those sigils play their full iterations.
  • Change: Deleted the move budget check entirely.
  • Result: WORSE. Mishap rate jumped 49.9% โ†’ 74.8%. Min per 80+ sigil: 58 โ†’ 114. Without the early bail-out, doomed sigils kept playing, accumulated danger, and mishapped. The move budget was protective, not wasteful.
  • Action: REVERTED in v1.5.2. Move budget restored.
Metric v1.2.0 baseline v1.5.0 (no budget) Delta
Worked 339 333 -2%
Avg precision 54.3 54.6 +0.6%
Sigils >= 80 9 (2.7%) 5 (1.5%) -44%
Mishap rate 49.9% 74.8% +50%
Min per 80+ 58 114 +97%

EXP-2: Accept trivial actions when margin > 1 (v1.5.4)

  • Hypothesis: 37% of iterations are refreshes (zero precision gain). The viability filter rejects trivial actions (difficulty=1) unless danger > 17 or within 5 of target. A trivial action (+2 avg) is always better than a refresh (+0).
  • Change: In precision_action_viable?, Path 1 simplified from margin > 1 && (difficulty > 1 || @danger_lvl > 17 || @sigil_precision >= (precision - 5)) to just margin > 1. Also includes multi-attempt burin resolution fix in validate_tools.
  • Sessions: 9 (Barrask, Byd, Fidon, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
  • Logs: ~/SH_logs/v1.5.4/
Metric v1.5.3 baseline v1.5.4 (EXP-2) Delta
Sessions 10 9 -1
Worked 335 391 +56
Worked/session 33.5 43.4 +29.6%
Avg precision 53.5 52.4 -1.1
Max precision 90 85 -5
Avg iterations 10.8 10.7 -0.1
Sigils >= 80 4 (1.2%) 5 (1.3%) +1
>=80/session 0.4 0.6 +50%
Mishap rate 52.8% 55.0% +2.2pp
Avg danger@mishap 8.0 8.5 +0.5
Min per 80+ 137 109 -20%
Scribed (>=90) 2 1 -1
  • Verdict: KEEP โ€” throughput up ~30%, efficiency up ~20%, quality flat within noise. The extra sigils worked per session and improved min-per-80+ more than compensate for the marginal precision delta (-1.1) which is within statistical variance.
  • Action: v1.5.4 becomes new baseline for EXP-3.

EXP-3: Prefer repair over trivial/straightforward precision actions (v1.5.5)

  • Hypothesis: Taking a trivial (+2) or straightforward (+4) action when a repair could enable a difficult (+13) action next turn is suboptimal. Expected value of repair โ†’ difficult is ~13 over 2 turns (6.5/turn) vs trivial's 2/turn.
  • Change: Between Phase 2 and Phase 3, if selected precision action has difficulty <= 2 and a repair action is available that would enable a harder action, prefer the repair. Same guards as Phase 4: repair budget (< 2 without override), danger <= 18.
  • Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
  • Logs: ~/SH_logs/v1.5.5/
Metric v1.5.4 baseline v1.5.5 (EXP-3) Delta
Sessions 9 10 +1
Worked 391 436 +45
Worked/session 43.4 43.6 +0.2
Avg precision 52.4 51.9 -0.5
Max precision 85 85 0
Avg iterations 10.7 10.8 +0.1
Sigils >= 80 5 (1.3%) 9 (2.1%) +4
>=80/session 0.6 0.9 +0.3
Mishap rate 55.0% 49.1% -5.9pp
Avg danger@mishap 8.5 8.3 -0.2
Min per 80+ 109 121 +12
Scribed (>=90) 1 0 -1
  • Verdict: REVERT โ€” Mishap rate improvement (-5.9pp) is borderline significant (zโ‰ˆ1.7, pโ‰ˆ0.09). However, resource_exhausted stop reason jumped from 2 to 8, and min-per-80+ regressed from 109 to 121. Fundamental assumption flawed: repair doesn't guarantee the same difficult action next turn because menus are re-rolled each iteration.
  • Action: REVERTED in v1.5.6. EXP-3 code removed, baseline restored to v1.5.4.

EXP-4: Composite resource health guard (v1.5.6)

  • Hypothesis: Resource collapse (a single resource hitting 0) causes mishaps even at danger 0. Adding a guard that skips actions consuming a near-depleted resource (<=1 star) could prevent these collapses.
  • Change: In Phase 2 action selection, add next if contest_stat <= 1 before considering any action.
  • Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
  • Logs: ~/SH_logs/v1.5.6/
Metric v1.5.4 baseline v1.5.6 (EXP-4) Delta
Sessions 9 10 +1
Worked 391 346 -45
Worked/session 43.4 34.6 -20%
Avg precision 52.4 51.1 -1.3
Max precision 85 84 -1
Avg iterations 10.7 10.9 +0.2
Sigils >= 80 5 (1.3%) 2 (0.6%) -3
>=80/session 0.6 0.2 -67%
Mishap rate 55.0% 47.4% -7.6pp
Avg danger@mishap 8.5 8.1 -0.4
Min per 80+ 109 302 +177%
Scribed (>=90) 1 0 -1
  • Verdict: REVERT โ€” Mishap rate improved (-7.6pp) but at severe cost. Throughput collapsed (-20% worked/session), quality collapsed (-67% >=80/session), min-per-80+ nearly tripled (109โ†’302). The guard blocks too many actions, forcing iterations into refreshes (zero precision gain). moves_exhausted rose (133โ†’142) despite fewer total sigils. The existing resource floor (contest_stat <= risk) already handles this more surgically.
  • Action: REVERTED in v1.5.7. EXP-4 code removed, baseline restored to v1.5.4.

EXP-5: Tighten move budget formula (v1.5.8)

  • Hypothesis: The current formula uses 15 precision/move which is optimistic (difficult actions average 13.2). The 37.5% bail-out rate at avg precision 55.3 suggests the formula is roughly calibrated, but tightening to 13 precision/move might bail out slightly earlier on truly hopeless sigils, saving time for new sigils.
  • Change: (14 - @num_iterations) * 13 < (precision - @sigil_precision - 5)
  • Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
  • Logs: ~/SH_logs/v1.5.8/
Metric v1.5.4 baseline v1.5.8 (EXP-5) Delta
Sessions 9 10 +1
Worked 391 473 +82
Worked/session 43.4 47.3 +9%
Avg precision 52.4 51.6 -0.8
Max precision 85 88 +3
Avg iterations 10.7 10.7 0.0
Sigils >= 80 5 (1.3%) 7 (1.5%) +2
>=80/session 0.6 0.7 +0.1
Mishap rate 55.0% 47.4% -7.6pp
Avg danger@mishap 8.5 8.1 -0.4
Min per 80+ 109 87 -20%
Scribed (>=90) 1 2 +1
  • Verdict: KEEP โ€” Mishap rate dropped 7.6pp (55.0% โ†’ 47.4%) as tighter budget bails out earlier, converting would-be mishaps into moves_exhausted exits (133โ†’189). Throughput up 9% (47.3 worked/session). Efficiency improved 20% (87 min per 80+ sigil vs 109). Top-end quality slightly improved (max 88 vs 85, 2 scribes vs 1). Avg precision delta (-0.8) within noise โ€” expected since bailing earlier on low-potential sigils lowers the average. No regression on any key metric.
  • Action: KEPT. v1.5.8 becomes new baseline for technique tests.

In-Progress Experiments

EXP-13 (v1.5.18) โ€” complete, REVERT. 11 sessions analyzed. Removed iteration cap + move budget, resource-only bail-out, skip <15 for target 90. All key metrics regressed: worked/session 25.2โ†’9.2, >=80/session 0.64โ†’0.18, scribes 3โ†’0, mishap rate 42%โ†’63.4%. The aggressive skip threshold (<15) eliminated too many sigils, and removing the iteration cap increased mishap exposure without producing higher precision outcomes. Post-simulation analysis (SIM-7) additionally proved skip <15 is catastrophic: ALL 3 v1.5.17 scribes started at precision 13. See full results and simulation analysis below.

EXP-14 (v1.5.19) โ€” complete, REVERT. 11 sessions analyzed. Equalized action costs (all cost labels = 1) per Urbaj's observation that difficulty determines resource cost. Ran on EXP-13 code base (inherits removed iteration cap, resource-only bail-out, skip <15). Results vs v1.5.18 (isolating cost equalization): mishap rate 63.4%โ†’74.8% (+11.4pp), 0 real scribes (1 C1 fake), >=80/session 0.18โ†’0.27 (noise). Cost equalization removed the disincentive for dangerous actions, causing more mishaps without compensating gains. See full results below.

v1.5.20 โ€” complete, partial baseline. 11 sessions. Reverted EXP-13+14 to v1.5.17 algorithm. C1 bug fixed (@actually_scribed flag). D7 analyzer fix deployed. However, EXP-9 resource exhaustion check was accidentally dropped during revert โ€” 0% resource_exhausted vs 12.1% in v1.5.17 baseline. Mishap rate 46.9% (vs 42.0% baseline), moves_exhausted 46.1% (vs 37.7% โ€” absorbed the missing resource exits). 1 real scribe (Mahtra, prec=88, 2 scrolls). C1 fix validated: zero fake SCRIBEDs. D7 fix validated: code correct but 0 repairs in sample.

v1.5.21 โ€” complete, BASELINE CONFIRMED. 11 sessions. Restores EXP-9 resource exhaustion check. All metrics match v1.5.17 within normal variance:

  • Worked/session: 23.8 (vs 25.2) โ€” within variance
  • =80/session: 0.64 (vs 0.64) โ€” exact match

  • Mishap rate: 40.8% (vs 42.0%) โ€” match
  • resource_exhausted: 13.4% (vs 12.1%) โ€” restored (was 0% in v1.5.20)
  • moves_exhausted: 39.7% (vs 37.7%) โ€” match
  • 0 scribes in 262 worked (expected ~0.5 at baseline rate โ€” within Poisson variance)
  • 7 sigils >=80: 4 stopped by moves_exhausted at 80-84, 3 by mishap at 82-87

EXP-15 (v1.5.22) โ€” complete, REVERT. 11 sessions. Aligned move budget max with iteration cap (14โ†’15). Mechanically worked: moves_exhausted 39.7%โ†’18.1%, 4 sigils reached iteration_cap (3 at precision 83, gap=2 from scribe). But cost was severe: mishap rate 40.8%โ†’52.5% (+11.7pp, Z=2.67, p<0.01), sigil_vanished 6.1%โ†’12.4%. The ~57 freed sigils mostly ended in mishaps (29) or vanishes (16). >=80 count 7โ†’6 (noise). 0 scribes. Key insight: the old formula's off-by-one was functioning as a safety guardrail. Extending sigils into the high-danger zone costs more than it gains.

v1.5.26 โ€” complete, Phase 3 CLOSED. 11 sessions, 255 worked sigils. D7 fix validated: repair logging is unconditional, but zero repairs in 2606 iterations. The repair code path never triggers because difficulty-first selection (v1.5.20+) always finds a viable precision action. Historical v1.5.17 data (5 repairs) shows repairs are actively harmful (60% mishap rate). Phases 3 and 4 (repair experiments) are closed โ€” repairs are a non-factor. Confirmed neutral vs v1.5.25: mishap +8.1pp (n.s. p=0.06), 1 real scribe (Fidon, 3 scrolls).

v1.5.27 โ€” complete, KEPT. EXP-18: minimum difficulty threshold. Skip trivial-difficulty precision actions, refresh for better menu. Avg gain 7.18โ†’8.54 (+1.36, exceeded v1.2.0's 8.39). Trivial-range gains 25.8%โ†’3.6%. Effective gain/iter 3.67โ†’3.75. 60+ rate +6.9pp. Resource exhausted halved (11.8%โ†’5.9%). Most effective change since EXP-6.

Status: v2.0.0 released. Promoted to upstream PR as sigilharvest_overhaul.

  • Gain distribution now matches v1.2.0 โ€” gain optimization lever exhausted.
  • Remaining levers: mishap reduction (1.3x at -50%), moves_exhausted optimization.
  • Retrospective simulation: 100% decision agreement (15/15). Net: v1.2.0 2.9% โ†’ v2.0.0 3.0%.

EXP-6: Difficulty fix + ACTION filter (v1.5.10) โ€” Completed, KEPT

  • Sessions: 10 (all complete 60min)
  • Logs: ~/SH_logs/v1.5.10/
Metric v1.5.8 (baseline) v1.5.10 (EXP-6) Delta
Worked 473 432 -41
Avg precision 51.6 50.8 -0.8
Max precision 88 85 -3
Sigils >= 80 7 (1.5%) 3 (0.7%) -0.8pp
Mishap rate 47.4% 37.5% -9.9pp
moves_exhausted 189 224 +35
Min per 80+ 87 202 +115
Refresh rate 40.6% 44.2% +3.6pp
  • EXP-6 verification:
    • ACTION verb usage: v1.5.8=292, v1.5.10=0. Filter working perfectly.
    • Difficulty ordering confirmed: median gains trivial(2) < straight(5) < formidable(7) < challenging(9) < difficult(13).
    • Per-difficulty gains unchanged โ€” fixes affect selection, not outcomes per action.
  • Verdict: KEEP. Both bug fixes confirmed working. Mishap rate dropped 9.9pp (largest single-experiment improvement). Sigils that previously mishapped now exhaust move budget instead. 80+ count dip (7โ†’3) is statistically insignificant at these sample sizes. The fixes are objectively correct and required for all downstream experiments.

EXP-10+11: Skip threshold + velocity bail-out (v1.5.11) โ€” Completed, REVERTED

  • Sessions: 10 (all complete 60min)
  • Logs: ~/SH_logs/v1.5.11/
Metric v1.5.10 (baseline) v1.5.11 (EXP-10+11) Delta
Worked 432 298 -134
Skipped 700 1092 +392
Skip rate 61.8% 78.6% +16.8pp
Avg precision 50.8 38.2 -12.6
Max precision 85 82 -3
Avg iterations 10.8 7.0 -3.8
Scribed (>=90) 1 0 -1
Sigils >= 80 3 (0.7%) 1 (0.3%) -2
Mishap rate 37.5% 13.4% -24.1pp
>=80/session 0.3 0.1 -0.2
Min per 80+ 202 602 +400
  • EXP-10 (skip threshold): 1035 triggers. Zero false positives on baseline data. Mechanically correct, but effect swamped by EXP-11.
  • EXP-11 (velocity bail-out): 220 bail-outs out of 298 worked sigils (74%). The simulation predicted ~3.7% (16/432). Root cause: simulation only checked velocity at iteration 5; live code checked at every iteration >= 5. A sigil passing at iter 5 can dip below 4.0/iter at iters 6-12, triggering late bail-outs. The unknown stop reason (221 sigils, avg_prec=31.8) = velocity bail-outs.
  • Low mishap rate is misleading: Sigils are bailed before reaching high enough danger/precision to mishap. Not a real safety improvement.
  • Verdict: REVERTED. EXP-11 catastrophically over-triggered due to flawed simulation methodology. EXP-10 (skip threshold alone) remains viable for standalone testing. EXP-11 killed โ€” continuous velocity check is fundamentally broken. A single-check-at-iter-5 variant could be revisited but the effect size is small (16/432 = 3.7%, all avg final 38) and would need fresh simulation.
  • Action: Reverted to v1.5.10 algorithm. Version bumped to v1.5.12. 121 tests passing.
  • Lesson: Always simulate the exact check logic (every-iteration vs single-check).

EXP-10: Skip threshold < 13 standalone (v1.5.12) โ€” Completed, KEPT

  • Background: EXP-10 was bundled with EXP-11 in v1.5.11 but EXP-11 catastrophically over-triggered, swamping EXP-10's effect. Retested standalone after reverting EXP-11.
  • Change: Raise skip threshold for target >= 80 from < 10 to < 13.
  • Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
  • Logs: ~/SH_logs/v1.5.12/
Metric v1.5.10 (baseline) v1.5.12 (EXP-10) Delta
Sessions 10 10 0
Worked 432 326 -106
Skipped 700 1169 +469
Skip rate 61.8% 78.2% +16.4pp
Avg precision 50.8 48.3 -2.5
Max precision 85 84 -1
Sigils >= 80 3 (0.7%) 3 (0.9%) 0
>=80/session 0.3 0.3 0.0
Mishap rate 37.5% 37.1% -0.4pp
Min per 80+ 202 201 -1
Scribed (>=90) 1 0 -1
Encountered/session 113.2 149.5 +36.3
  • EXP-10 verification:
    • Skip triggers: 875 "below 13" in v1.5.12 vs 593 "below 10" in baseline. Working correctly.
    • False positives: 0 of 229 baseline sigils starting at 10-12 reached 80+. Zero FPs confirmed across all cumulative data (1050+ eligible sigils).
    • Time saved: 229 baseline sigils ร— 10.8 avg iters = ~2464 iterations eliminated per 10 sessions.
    • Encountered +36 more sigils per session from faster skipping.
  • Avg precision drop (-2.5): Unexpected but explained by session variance. The 13-14 band dropped from 51.1 to 47.0 (removing 10-12 starters should have raised the average). Natural character/session variation, not a threshold effect.
  • Verdict: KEEP. Neutral on the key metric (>=80/session = 0.3 in both). Mechanically correct with zero false positives across all data ever collected. Eliminates ~25 wasted iterations per session on provably unproductive sigils. Safe, conservative filter.
  • Action: KEPT. v1.5.12 becomes new baseline. EXP-7 staged as v1.5.13.

EXP-7: Difficulty-based action selection (v1.5.13) โ€” Completed, KEPT (neutral)

  • Background: EXP-6 fixed difficulty ordering; EXP-7 replaces risk-based action selection with difficulty-first, cost-as-tiebreaker. Data shows gain determined entirely by difficulty (trivial=2.3 to difficult=13.3), cost has zero correlation.
  • Change: Phase 2 action selection: prefer highest difficulty, break ties by lowest impact.
  • Sessions: 10 (Barrask, Fidon, Throve + 7 others)
  • Logs: ~/SH_logs/v1.5.13/
Metric v1.5.12 (baseline) v1.5.13 (EXP-7) Delta
Sessions 10 10 0
Worked 326 244 -82
Avg precision 48.3 53.2 +4.9
Avg iterations 9.7 11.0 +1.3
Sigils >= 80 3 (0.9%) 2 (0.8%) -1
>=80/session 0.3 0.2 -0.1
Mishap rate 37.1% 41.0% +3.9pp
Scribed 0 1 +1

Note: v1.5.13 worked count corrected from 327โ†’244 by session-filtered re-analysis (Feb 2026). The v1.5.13 logs contained sessions from v1.5.11/v1.5.12/v1.5.13; only v1.5.13 sessions are now counted. v1.5.12 baseline (326) was already correct.

  • Key finding โ€” viability filter is the binding constraint: Difficulty distribution did NOT shift between versions (~20% each difficulty in both). The viability filter typically leaves only 1 precision action viable per iteration, making selection preference irrelevant. The algorithm change is correct (better heuristic), but the practical effect is masked by the filter bottleneck.
  • Worked count drop (-82): v1.5.13 worked fewer sigils due to session variance (fewer encountered: 130.7 vs 149.5/session). The avg precision increase (+4.9) and higher avg iterations (11.0 vs 9.7) are consistent with spending more time per sigil.
  • Avg precision artifact: 65 "unknown" stop reasons in v1.5.12 (avg_prec=33.2) disappeared in v1.5.13 (parser improvement). Adjusting for these, the real v1.5.12 avg was ~52.1, making the actual delta ~+1.1 (within noise).
  • Verdict: KEEP (neutral). Correct heuristic with no downside. Practical effect masked by viability filter constraint. The important outcome is the architectural insight: the filter, not selection, controls outcomes. This redirects optimization to EXP-12 (viability loosening).
  • Action: KEPT. v1.5.13 becomes new baseline.
Viability Filter Analysis (post-EXP-7 investigation)

EXP-7 revealed the viability filter as the binding constraint. Full analysis across 8,165 iterations (v1.5.10 + v1.5.12 + v1.5.13):

  • Menu composition: 93.5% of menus have 1+ precision actions (6.5% have none โ€” game constraint). 69.4% have 2+ precision actions. Menus are NOT the bottleneck.
  • Viability filter acceptance: 91.9% of precision actions pass viability (using post-action resource values). Only 1,408 rejections across 8,165 iterations.
  • IMPORTANT timing bias: The analysis used POST-action resource values. For IMPROVE iterations, these are inflated (IMPROVE restores resources). The actual script checks viability with PRE-action (depleted) values. This means the 91.9% acceptance rate overstates reality. The true acceptance rate during refresh iterations is much lower.
  • Refresh rate breakdown (44.5% = 3,630 / 8,165):
    • ~530 (14.6%): Menu had zero precision actions (game constraint, unfixable)
    • ~3,100 (85.4%): Menu had precision, but viability rejected all (filter constraint)
    • The viability filter IS the primary bottleneck, not a separate "recover mode."
  • Rejection reasons (of 1,408 post-action rejections):
    • 87.1% margin <= 0 (stat too low for difficulty)
    • 12.9% margin=1 with trivial/straightforward (filter disallows low-difficulty at tight margin)
  • Counterfactual scenarios (biased low due to timing issue):
    • Scenario A (accept margin=1 for all difficulties): converts 106 refreshes, +0.5 prec/sigil
    • Scenario B (accept margin=0 for challenging+): converts 163 refreshes, +1.8 prec/sigil
    • Scenario C (both): converts 262 refreshes, -3.3pp refresh rate
    • Real impact likely 2-3x these estimates after correcting for timing bias.
  • Implication: EXP-12 (loosen viability to margin >= 0 for challenging+) is the next highest-leverage experiment. EXP-8 (repair window) dropped โ€” repairs are too rare (0-4 per 10 sessions) and the viability filter is the real bottleneck.

EXP-12: Loosen viability margin (v1.5.14) โ€” Completed, REVERTED

  • Change: precision_action_viable? Path 2: margin > 0 โ†’ margin >= 0 for challenging+. Accepts actions where stat == difficulty for formidable/challenging/difficult.
  • Sessions: 10 (Throve/Refia/Byd split)
  • Logs: ~/SH_logs/v1.5.14/

Layer 2 metrics (raw log parsing, reliable):

Metric v1.5.13 (baseline) v1.5.14 (EXP-12) Delta
Refresh rate 63.8% 62.9% -0.9pp
Gain/action 7.86 8.21 +0.35
Mishap/iter 3.84% 4.25% +0.41pp
Difficulty shift โ€” +3pp difficult, +1.4pp challenging, -2.6pp formidable โ€”

Layer 1 metrics (session-filtered, corrected Feb 2026):

Metric v1.5.13 (baseline) v1.5.14 (EXP-12) Delta
Sessions 10 10 0
Worked 244 233 -11
Scribed 1 2 +1
>=80 2 6 +4
>=80/session 0.2 0.6 +0.4
Avg precision 53.2 54.6 +1.4
Mishap rate 41.0% 47.6% +6.6pp
Avg iterations 11.0 10.8 -0.2

Previous analysis used overcounted v1.5.13 baseline (593 worked, all versions in directory). Corrected analysis filters to v1.5.13 sessions only (244 worked).

  • Analysis (corrected): Session-filtered Layer 1 reveals a stronger positive signal than originally assessed. >=80 tripled (2โ†’6), >=80/session tripled (0.2โ†’0.6), scribes doubled (1โ†’2), avg precision up +1.4. Layer 2 confirms per-iteration metrics are near-flat (refresh rate -0.9pp, gain/action +0.35, mishap/iter +0.41pp). The per-sigil mishap rate increased +6.6pp (41.0โ†’47.6%), but this was partially masked in the original analysis by the inflated baseline denominator (593 worked โ†’ artificial 39.3% mishap rate).
  • Revert decision context: The revert was made based on the overcounted analysis showing +8.3pp mishap with only modest positive signals. With corrected data, the >=80 improvement is substantial (+4 sigils, 3x improvement) and the mishap delta is +6.6pp. Consider re-testing EXP-12 with corrected measurement infrastructure.
  • Verdict: REVERTED (decision made with overcounted data). Corrected analysis suggests the experiment may warrant re-testing.
  • Action: REVERTED. v1.5.13 remains baseline. EXP-9 staged as v1.5.15.

EXP-12 Retest: Loosen viability margin (v1.5.16) โ€” Completed, REVERTED

  • Change: Identical to v1.5.14: precision_action_viable? Path 2: margin > 0 โ†’ margin >= 0 for challenging+. Re-test with session-filtered baseline after corrected analysis suggested the original positive signal (>=80 tripled) may have warranted keeping.
  • Baseline: v1.5.15 (includes EXP-9 resource exhaust coeff 1.75)
  • Sessions: 10
  • Logs: ~/SH_logs/v1.5.16/
Metric v1.5.15 (baseline) v1.5.16 (EXP-12r) Delta
Sessions 10 10 0
Worked 265 271 +6
Skipped 1025 1104 +79
Scribed 1 0 -1
>=80 1 0 -1
>=80/session 0.1 0.0 -0.1
Avg precision 51.2 51.6 +0.4
Max precision 85 79 -6
Avg iterations 10.4 10.4 0.0
Mishaps 114 129 +15
Mishap rate 43.0% 47.6% +4.6pp

Stop reasons:

Reason v1.5.15 v1.5.16 Delta
mishap 114 129 +15
moves_exhausted 107 92 -15
resource_exhausted 26 32 +6
scribed 1 0 -1
sigil_vanished 17 18 +1
  • Analysis: The retest against the correct baseline (v1.5.15, which includes EXP-9) confirms the original revert decision. The positive signal seen in v1.5.14 (>=80 tripled 2โ†’6 vs v1.5.13) does not reproduce against v1.5.15: zero sigils reached 80+, max precision dropped to 79, and mishap rate increased +4.6pp (43.0โ†’47.6%). The relaxed viability margin allows marginal actions that produce more mishaps without compensating precision gains.
  • Verdict: REVERTED. Confirmed harmful. The original v1.5.14 positive signal was likely noise or an artifact of the v1.5.13 baseline lacking EXP-9's resource exhaustion change.
  • Action: REVERTED. v1.5.15 remains current baseline for future experiments.

Technique Test: Illuminated Sigil Comprehension (v1.5.9) โ€” Completed

  • Background: Per Elanthipedia, all Sigil Comprehension technique bonuses are "globally disabled." There are 4 technique levels: Inspired, Enlightened, Illuminated, Awakened. Inspired and Enlightened have been enabled throughout all experiments (base effects only, bonuses disabled). Illuminated and Awakened are listed as "NOT enabled" on the wiki.
  • Goal: Determine if enabling Illuminated Sigil Comprehension has any measurable effect on sigil harvesting outcomes.
  • Change: Version tick only (v1.5.8 โ†’ v1.5.9). No algorithm change. All characters trained Illuminated Sigil Comprehension before running.
  • Sessions: 20 (10 original + 10 additional)
  • Logs: ~/SH_logs/v1.5.9/ (20 files: *_1353.log + *_1507.log)
Metric v1.5.8 (10 sess) v1.5.9 (20 sess) Delta
Worked 473 790 +317
Worked/session 47.3 39.5 -7.8
Avg precision 51.6 52.0 +0.4
Max precision 88 89 +1
Avg iterations 10.7 10.7 0.0
Scribed (>=90) 2 5 +3
Sigils >= 80 7 (1.5%) 11 (1.4%) -0.1pp
Mishap rate 47.4% 45.4% -2.0pp
Min per 80+ 87 107 +20
  • Verdict: No effect. Illuminated Sigil Comprehension is confirmed disabled, as the wiki states. Doubled sample size (20 sessions, 790 worked sigils) confirms all key metrics within noise of v1.5.8. No systematic shift attributable to the technique.

Session-Filtering Fix (Feb 2026)

Log files contain output from entire game sessions, which may include multiple SigilHarvest invocations across different versions. For example, the v1.5.13 logs contain sessions from v1.5.11, v1.5.12, and v1.5.13. Analysis scripts must filter to only the correct version's session per file.

Fix applied: Added last_session_runs(version) helper to LogParser. All analysis scripts updated to use session filtering. Re-analysis of all experiments with corrected methodology.

Validated (numbers unchanged): EXP-6, EXP-10+11, EXP-10, EXP-5 โ€” earlier experiments used analysis scripts that already had session filtering (or had clean single-version logs).

Corrected: EXP-7 test (v1.5.13 worked: 327โ†’244), EXP-12 baseline+test (v1.5.13 593โ†’244, v1.5.14 364โ†’233), EXP-9 baseline (v1.5.13 593โ†’244). The overcounting originated from flat_map(&:sigil_runs) without version filtering, counting all sessions in the log file regardless of version.

Impact on decisions: EXP-12 revert decision was made with inflated baseline (593 worked, artificial 39.3% mishap rate). Corrected data showed >=80 tripled (2โ†’6) with +6.6pp mishap (41.0โ†’47.6%), suggesting a possible positive signal. Re-test completed (v1.5.16): the positive signal did not reproduce against the correct baseline (v1.5.15). Zero sigils reached 80+, max precision dropped to 79, mishap rate +4.6pp. Original revert decision confirmed.

Queued Experiments

Ordered by expected impact and dependency chain. One experiment per version, no bundling.

Version Experiment Description Status
v1.5.10 EXP-6 (kept) Fix difficulty ordering + filter ACTION verb Complete
v1.5.11 EXP-10+11 (reverted) Skip threshold + velocity bail-out (bundled) Complete โ€” over-triggered
v1.5.12 EXP-10 (kept) Skip threshold < 13 (standalone) Complete
v1.5.13 EXP-7 (kept) Difficulty-based action selection (decouple risk) Complete
v1.5.14 EXP-12 (reverted) Loosen viability margin (accept margin=0 for challenging+) Complete โ€” revert may need re-evaluation (see corrected data)
v1.5.15 EXP-9 (kept) Recalibrate resource exhaustion coefficient (2.25โ†’1.75) Complete โ€” KEPT (neutral)
v1.5.16 EXP-12 retest (reverted) Loosen viability margin (re-test with session-filtered baseline) Complete โ€” confirmed harmful
v1.5.17 Awakened technique (kept) Confirm if technique is active (target 90 test) Complete โ€” >=80 improvement observed (22 sessions), mechanism unknown
v1.5.18 EXP-13 (revert) Remove iteration cap + move budget check (resource-only bail-out) Complete โ€” all metrics regressed (0 scribes, mishap 63%, worked/sess 9.2)
v1.5.19 EXP-14 (revert) Equalize action costs (Urbaj) Complete โ€” REVERT, but confounded (tested on EXP-13 broken base, not v1.5.17). Needs clean retest as v1.5.24.
v1.5.20 Baseline restore Revert EXP-13+14, add C1 fix Complete โ€” missing EXP-9 resource check (0% resource_exhausted vs 12.1% baseline). C1 fix validated.
v1.5.21 Corrected baseline Restore EXP-9 resource check Complete โ€” baseline confirmed, all metrics match v1.5.17
v1.5.22 EXP-15 (revert) Align move budget max with iteration cap (14โ†’15) Complete โ€” REVERT. Mishap +11.7pp, >=80 7โ†’6, 0 scribes
v1.5.23 EXP-16 (revert) Tighten resource exhaustion coefficient (1.75โ†’1.5) Complete โ€” REVERT. Coefficient mathematically impossible: need prec>=18 at max resources. 0 worked, 100% skip.
v1.5.24 EXP-14 retest (kept) Equalize action costs โ€” clean standalone test ({1,2,3}โ†’{1,1,1}) vs v1.5.21 baseline Complete โ€” KEPT. Neutral: mishap -1.6pp (n.s.), 2 real scribes, original "harmful" verdict was confounded
v1.5.25 EXP-17 (kept) Resource-aware tiebreaker โ€” when 2+ actions share highest difficulty, prefer action draining most-available resource Complete โ€” KEPT (neutral). Mishap -5.3pp (n.s. p=0.21), resource_exhausted -1.6pp, mechanically coherent stop-reason shift
v1.5.26 D7 fix (infrastructure) Make repair logging unconditional โ€” repairs confirmed non-existent Complete โ€” Phase 3 CLOSED
v1.5.27 EXP-18 (kept) Minimum difficulty threshold โ€” skip trivial (difficulty=1) precision actions, refresh for better menu Complete โ€” KEPT. Avg gain +1.36 (7.18โ†’8.54), trivial-range 25.8%โ†’3.6%, 60+ rate +6.9pp

EXP-6: Fix difficulty ordering + filter ACTION verb (v1.5.10)

  • Hypothesis: Two confirmed bugs compound to reduce precision gains.
    1. Difficulty ordering bug: formidable is ranked 5 (highest) when it should be 3. Measured median gain: formidable=6, challenging=8, difficult=12. The algorithm selects formidable over difficult when close to target (precision >= 70), losing ~6 median precision per affected iteration at the most critical stage. Confirmed consistent across all v1.5.2โ€“v1.5.8 data (8-9% of iterations affected per version).
    2. ACTION verb bug: Per Elanthipedia, ACTION "there is a good chance nothing will happen but the danger level will rise." Confirmed: 21% zero-gain rate vs 0.0% for all other verbs. ~550 ACTION executions per 10 sessions, ~120 completely wasted (zero gain + danger increase). No other verb ever produces zero gain.
  • Changes:
    • @action_difficulty: formidable => 3, challenging => 4, difficult => 5
    • Skip actions where verb == "ACTION" during action selection
  • Risk: Low. Both are bug fixes backed by empirical data. Combined because neither is an algorithm hypothesis โ€” they're corrections to known-wrong behavior.
  • Expected impact: Better end-game precision (difficult selected over formidable when close to target), ~120 fewer wasted iterations per 10 sessions, lower cumulative danger.

EXP-7: Difficulty-based action selection (decouple risk) โ€” v1.5.13, STAGED

  • Hypothesis: The current risk = difficulty + cost composite conflates reward potential with resource drain. The algorithm picks ~20% each difficulty level regardless of distance. Decoupling reveals: gain is determined entirely by difficulty (trivial=2.3 to difficult=13.3), cost has zero correlation with gain (taxing=7.0, disrupting=6.8, destroying=7.0). This holds across all distances and all difficulty x cost combinations.
  • Calibration data (v1.5.10, 1971 actions post-difficulty-fix):
    • Gain by difficulty: trivial=2.3, straightforward=4.5, formidable=6.8, challenging=9.6, difficult=13.3
    • Gain by cost: taxing=7.0, disrupting=6.8, destroying=7.0 (no signal)
    • Current selection: ~20% each difficulty (nearly uniform, ineffective)
    • Theoretical uplift: +6.27 gain/iter if always picking difficult (13.2 vs 7.0 current avg)
    • Over 10 iterations: +62.7 precision (obviously bounded by resource constraints)
  • Change: Replace risk-based comparison in Phase 2 action selection:
    • Always prefer highest difficulty (maximize precision gain per iteration)
    • Break ties by lowest cost/impact (conserve resources when gain is equal)
    # Before (risk composite):
    if far_from_target: prefer lowest risk
    if close_to_target: prefer highest risk
    # After (EXP-7):
    prefer highest difficulty, then lowest impact as tiebreaker
  • Risk: Low-medium. Changes the core selection heuristic. Data strongly supports the change across all 1971 observed actions. EXP-6 difficulty fix must be in place (it is).
  • Depends on: EXP-6 (satisfied)
  • Status: STAGED in v1.5.13 code (123 tests passing). Ready to run after v1.5.12 analysis.

EXP-12: Loosen viability margin for challenging+ (v1.5.14, REVERTED)

  • Hypothesis: The viability filter is the primary bottleneck controlling the 44.5% refresh rate. Currently, Path 2 requires margin > 0 (stat > difficulty) for challenging+ actions. Loosening to margin >= 0 (stat >= difficulty) extends the productive phase by 1 resource point, allowing precision actions when resources are at the difficulty threshold instead of forcing IMPROVE. EXP-7's viability analysis showed ~85% of refreshes occur when the menu has precision actions that the filter rejects โ€” this is the filter, not menu RNG.
  • Change: In precision_action_viable?, Path 2:
    # Before:
    return true if margin > 0 && difficulty > 2
    # After:
    return true if margin >= 0 && difficulty > 2
    One-character change: > to >= in the margin comparison. Accepts margin=0 (stat == difficulty) for formidable, challenging, and difficult actions.
  • What this means practically:
    • Formidable (difficulty=3): viable at stat >= 3 (was >= 4)
    • Challenging (difficulty=4): viable at stat >= 4 (was >= 5)
    • Difficult (difficulty=5): viable at stat >= 5 (was >= 6)
    • Trivial/straightforward: unchanged (still require margin > 1, i.e., stat >= difficulty + 2)
  • Risk: Low-medium. Accepting margin=0 reduces the safety buffer.
  • Depends on: EXP-7 (satisfied)
  • Result (v1.5.14): Modest positive signals (refresh -0.9pp, gain +0.35/action, +3pp difficult shift) but per-sigil mishap rate 47.6%. Originally reverted due to overcounted baseline showing only modest gains. Session-filtered re-analysis revealed >=80 tripled (2โ†’6). REVERTED then RE-TESTED as v1.5.16 with corrected measurement infrastructure.
  • Status: Re-staged as v1.5.16 (125 tests passing). Identical code change to v1.5.14.

EXP-8: Tune repair eligibility window

  • Hypothesis: Repairs currently only trigger when @sigil_precision >= (precision - 15) (within 15 of target). After the difficulty fix, repairs target the resource consumed by difficult actions (reward 12-15) instead of formidable (reward 4-9). The window could be expanded to start repairs earlier (enabling more difficult actions sooner) or tightened to reserve iterations for direct precision work.
  • Change: Adjust the precision - 15 threshold in select_repair_action. Test values: precision - 20 (wider) or precision - 10 (narrower).
  • Risk: Low. Only affects when repairs are attempted, not core precision selection.
  • Depends on: EXP-6

EXP-9: Recalibrate resource exhaustion coefficient (v1.5.15) โ€” Completed, KEPT (neutral)

  • Hypothesis: The resource exhaustion check uses (san + res + foc) * 2.25 + precision < target - 5. The 2.25 coefficient assumes each resource star is worth ~2.25 precision.
  • Calibration data (v1.5.10, 762 actions with resource consumption data):
    • Actual overall gain/star: 1.60 (current coefficient 2.25 is at P90)
    • By difficulty: trivial=1.16, straightforward=1.48, formidable=1.55, challenging=1.70, difficult=1.72
    • By cost: taxing=1.61, disrupting=1.56, destroying=1.62 (no significant variation)
    • Distribution: P25=1.17, P50=1.50, P75=2.00, P90=2.25
    • Current 2.25 is extremely optimistic โ€” only 10% of iterations achieve this rate
  • Change: Lower coefficient from 2.25 to 1.75. This is between P50 (1.50) and P75 (2.00), and aligns with the difficult-action gain/star of 1.72 (which dominates under EXP-7's difficulty-first selection). At 1.75, the check exits sigils where even median-to-good performance per remaining star can't reach the target. At 2.25, it only exited when P90+ performance couldn't reach target โ€” far too optimistic.
  • Risk: Low. Only affects bail-out timing. More sigils exit earlier (redirecting time to fresh sigils), potentially lower avg precision but higher throughput to 80+.
  • Depends on: EXP-7 (satisfied) โ€” coefficient calibrated to post-EXP-7 difficulty preference.
  • Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
  • Logs: ~/SH_logs/v1.5.15/

Layer 1 metrics (session-filtered):

Metric v1.5.13 (baseline) v1.5.15 (EXP-9) Delta
Sessions 10 10 0
Worked 244 265 +21
Scribed 1 1 0
>=80 2 1 -1
>=80/session 0.2 0.1 -0.1
Avg precision 53.2 51.2 -2.0
Mishap rate 41.0% 43.0% +2.0pp
Avg iterations 11.0 10.4 -0.6

Layer 2 metrics (raw log parsing):

Metric v1.5.13 (baseline) v1.5.15 (EXP-9) Delta
Refresh rate 63.8% 63.1% -0.7pp
Gain/action 7.13 7.09 -0.04
Mishap/iter 1.62% 2.09% +0.47pp

EXP-9 specific metrics:

Metric v1.5.13 v1.5.15 Delta
Resource exhaustion exits 1 26 +25
moves_exhausted 120 107 -13
Available stars median 109 92.5 -16.5
  • Primary mechanism confirmed: Resource exhaustion exits increased from 1โ†’26 (+25). The stop-reason shift (moves_exhausted -13, resource_exhausted +25) shows the tighter coefficient is catching sigils that would have exhausted moves anyway.
  • Per-iteration metrics flat: Layer 2 confirms the coefficient change doesn't affect per-iteration behavior โ€” refresh rate, gain/action, mishap/iter are all within noise. EXP-9 only changes when to give up, not how to play.
  • Outcome neutral: >=80 slightly down (2โ†’1), avg precision -2.0, but these are within sample variance for 10 sessions. The +2.0pp mishap rate is noise (100โ†’114 out of ~2700 iterations). No regression strong enough to justify revert.
  • Verdict: KEEP (neutral). Correct heuristic โ€” aligns coefficient with observed gain/star distribution. No measurable benefit yet but no regression. The mechanism is sound (exits happening where predicted) and enables future experiments to build on a more realistic resource model.
  • Action: KEPT. v1.5.15 becomes new baseline.

EXP-10+11: Skip threshold + velocity bail-out (v1.5.11, bundled, STAGED)

  • Hypothesis: Sigils starting at precision 10-12 reach 80+ at only 0.45-0.61% rate vs 1.01-3.38% for starting precision 13+. These low-start sigils consume iterations (avg 9.7 per sigil) but almost never succeed. Skipping them frees those iterations for fresh sigils with higher expected value. Across 4095 worked sigils (v1.5.2-v1.5.9), raising the skip threshold from < 10 to < 13 yields an estimated net +15.6 additional 80+ sigils from redirected time.
  • Change: In improve_sigil (line 324), raise the skip threshold for target >= 80 from < 10 to < 13:
    # Before:
    if @args.precision.to_i >= 80 && @sigil_precision < 10
    # After:
    if @args.precision.to_i >= 80 && @sigil_precision < 13
  • Data basis: Starting precision distribution and outcomes (all versions pooled):
    • Start 10: 1040 sigils, avg final 40.1, 0.58% reach 80+
    • Start 11: 880 sigils, avg final 41.5, 0.45% reach 80+
    • Start 12: 621 sigils, avg final 44.3, 0.61% reach 80+
    • Start 13: 396 sigils, avg final 45.4, 1.01% reach 80+
    • Start 14: 254 sigils, avg final 49.8, 1.57% reach 80+
    • Breakpoint at 12โ†’13 is consistent across all 8 versions individually.
  • Risk: Low. Only affects which sigils are attempted, not the algorithm itself. Lost 80+ sigils (those starting 10-12 that would have made it) are offset ~3:1 by new 80+ sigils from the saved iterations.
  • Depends on: EXP-6 (so difficulty fix is in place; the data holds regardless, but testing should be sequential)
  • Bundling rationale: EXP-10 and EXP-11 are bundled because they affect orthogonal code paths (start-of-sigil skip vs mid-run bail-out), neither changes the core algorithm, and both had zero false positives across 4097 sigils. Combined simulation: net +21.0.
  • EXP-11 component โ€” Precision velocity bail-out: After 5 iterations, sigils with average gain per iteration < 4 have a 0.0% rate of reaching 80+ (0 out of 1399 across all v1.5.2-v1.5.9 data). These "slow grinder" sigils start above the skip threshold but never gain momentum. Code adds @start_precision tracking and a velocity check after the move budget check:
    if @num_iterations >= 5 && @start_precision
      velocity = (@sigil_precision - @start_precision).to_f / @num_iterations
      if velocity < 4.0
        return false
      end
    end
  • Combined data basis (4097 sigils, v1.5.2-v1.5.9):
    • EXP-10 alone: skip 1990, lose 11 80+, save 19333 iters, net +15.6
    • EXP-11 alone: bail 1399, lose 0 80+, save 6933 iters, net +9.5
    • Combined: skip+bail 2784, lose 11 80+, save 23298 iters, net +21.0
  • Status: TESTED and REVERTED. Velocity bail-out (EXP-11) over-triggered at 74% of worked sigils due to continuous checking (every iter >= 5) vs simulation's single check at iter 5. EXP-10 (skip threshold) mechanically correct but untested in isolation. EXP-11 moved to killed ideas. See results above.

External Feedback: Game Mechanic Corrections (Feb 2026)

Feedback from experienced player (Urbaj) identified three incorrect assumptions in Matt's original script (our starting point). All three claims validated against our existing calibration data (762 actions from EXP-9, 566 worked sigils from v1.5.15+v1.5.17).

1. No hard iteration cap in the game. The game allows unlimited PERC SIGIL IMPROVE iterations as long as you have resources. The script's hard cap of 15 iterations (line 271) and move budget check (line 291, which uses 14 - iterations as remaining moves) are artificial limits. Validation: only 10/566 sigils (1.8%) reach iteration 14-15, but the move budget check stops 215/566 (38%) of worked sigils. 59 of those 215 had precision >= 60, and 22 had >= 70. These are sigils still climbing that get cut off by an assumption that doesn't match the game.

2. Action cost labels describe WHICH resource, not HOW MUCH. The labels "destroying"/"disrupting"/"taxing" are static descriptors for which resource is consumed: sanityโ†’destroying, focusโ†’disrupting, resolveโ†’taxing. They do NOT predict how much resource is consumed. The current @action_cost = { "taxing" => 1, "disrupting" => 2, "destroying" => 3 } is wrong. Validation: gain/star by cost label is flat (taxing=1.61, disrupting=1.56, destroying=1.62). If destroying consumed 3x more than taxing, gain/star for destroying would be ~2.33 and taxing would be ~7.0.

3. Difficulty is the sole predictor for both gain AND resource consumption. Both precision gain and resource consumption are determined by difficulty, not cost label. Validation from EXP-7/EXP-9 calibration data:

  • Gain by difficulty: trivial=2.3, straightforward=4.5, formidable=6.8, challenging=9.6, difficult=13.3 (strong monotonic signal)
  • Gain by cost: taxing=7.0, disrupting=6.8, destroying=7.0 (no signal)
  • Gain/star by difficulty: 1.16โ†’1.72 (higher difficulty = more efficient per star)
  • Gain/star by cost: 1.61, 1.56, 1.62 (no signal)

Higher difficulty actions are actually MORE resource-efficient (gain/star increases with difficulty), meaning EXP-7's preference for highest difficulty is even more correct than originally justified โ€” it maximizes both precision gain and resource efficiency.

EXP-13: Remove iteration cap + move budget check

  • Hypothesis: The hard iteration cap of 15 and the move budget check (which assumes 14 max useful iterations) are artificial limits not imposed by the game. 38% of worked sigils are stopped by these iteration-based limits. Removing them and relying solely on the resource exhaustion check (EXP-9) allows sigils with remaining resources to continue gaining precision. The resource exhaustion check directly measures whether remaining resources can reach the target โ€” it doesn't need an iteration count proxy.
  • Changes:
    1. Remove hard cap at iteration 15 (line 271-274)
    2. Remove move budget check (14 - @num_iterations) * 13 < ... (line 291-295)
    3. Adjust scribe-near-cap logic (line 261): remove @num_iterations >= 15 condition, keep scribing at target or target-5 based on resource exhaustion proximity
  • What the resource exhaustion check already handles: (san + res + foc) * 1.75 + precision < target - 5 exits when remaining resources can't plausibly reach target. This is a direct measurement, not an iteration-count proxy.
  • Data basis: 181/566 (32%) stopped by move budget. Monte Carlo projection (5000 sims per sigil using observed gain/productive-rate/mishap-rate distributions):
    • 37 of 181 (20%) have >50% probability of reaching 85 with more iterations
    • 15 of 181 (8%) have >50% probability of reaching 90
    • Sigils at 70+ with 10+ remaining stars have 55-72% chance of reaching 90
    • No infinite loop risk: 76% of refreshes produce viable action next iteration; max observed refresh streak was 6. Resource drain provides natural termination.
  • Additional change: Raise skip threshold to <15 for target 90. Analysis shows start=13 (209 sigils, max=83, 0 scribes) and start=14 (162 sigils, max=78, 0 scribes) never reach 85. Only start=15 has scribe potential (1 scribe, 6 >=80 in 194 sigils). Saves 3,851 iterations with zero lost scribes.
  • Risk: Medium. Without iteration limits, sigils burn through more resources per run. Fewer sigils attempted per session (each takes longer). Net effect depends on whether extended sigils convert to 80+ at a higher rate than fresh sigils would. Resource exhaustion check provides the safety net. No loop risk โ€” confirmed by refresh analysis.
  • Depends on: EXP-9 (satisfied โ€” resource exhaustion check in place)
  • Expected impact: HIGHEST of any remaining experiment. 32% of sigils currently stopped may continue. Simulation projects 15-37 additional scribes per 566 worked. Combined with skip threshold (fewer wasted attempts), net throughput should increase.
  • Sessions: 11 (all complete 60min, 11 characters)
  • Logs: ~/SH_logs/v1.5.18/ (extracted via session_splitter.rb; 4 split sessions)
Metric v1.5.17 (22 sess) v1.5.18 (11 sess) Delta
Worked 555 101 -454
Worked/session 25.2 9.2 -16.0
Skipped 2249 1428 -821
Scribed 3 0 -3
Scribes/session 0.18 0.0 -0.18
>=80 14 2 -12
>=80/session 0.64 0.18 -0.46
Avg precision 52.8 55.0 +2.2
Max precision 88 80 -8
Avg iterations 10.5 11.0 +0.5
Max iterations 15 18 +3
Iters > 15 0 1 +1
Mishap rate 42.0% 63.4% +21.4pp
Stop reason v1.5.17 v1.5.18 Delta
mishap 233 (42%) 64 (63%) -169
moves_exhausted 209 (38%) 0 (0%) -209
resource_exhausted 67 (12%) 17 (17%) -50
scribed 3 0 -3
sigil_vanished 41 20 -21

Per-character breakdown (all 11 characters, 0 scribes universally):

Character Sigils Worked Skipped Mishap%
Barrask 142 8 134 62.5%
Byd 143 8 135 75.0%
Christus 141 8 133 50.0%
Fidon 139 10 129 70.0%
Gnarta 146 8 138 50.0%
Jazriel 143 7 136 57.1%
Kythkani 133 9 124 66.7%
Mahtra 137 9 128 55.6%
Nelis 133 11 122 63.6%
Refia 133 12 121 75.0%
Throve 139 11 128 63.6%

Note: ~92-93% skip rate across all characters (vs ~80% in v1.5.17 baseline). Mishap rate ranges 50-75% per character (vs 42% baseline), with no character showing improvement.

  • Analysis: Every key metric regressed. Two compounding problems:
    1. Skip threshold <15 for target 90 โ€” Eliminated 63% of worked sigils (25.2โ†’9.2 per session). The Monte Carlo projection was correct that start <15 rarely reaches 90, but the throughput cost is devastating: far fewer sigils attempted means far fewer chances at any high-precision outcome.
    2. Removed iteration cap โ€” Sigils now run up to 18 iterations, but the extra iterations mostly produce mishaps. Mishap rate jumped 42%โ†’63.4%. The simulation's projection of 15-37 additional scribes did not materialize โ€” extended sigils hit mishaps before converting. Max precision actually dropped (88โ†’80). The slight avg precision increase (+2.2) is a selection artifact: only high-starting sigils (15+) are worked, so the floor is higher. But this doesn't compensate for the catastrophic loss of throughput and high-precision outcomes.
  • Post-mortem: The Monte Carlo model overestimated scribe potential because it assumed uniform mishap probability per iteration. In practice, mishap risk likely compounds as danger accumulates over extended runs. This experiment bundled THREE changes (remove iteration cap, remove move budget, raise skip threshold to <15 for target 90), violating the one-change-per-version protocol. It is impossible to isolate which change caused which portion of the regression. Worse, EXP-14 was then tested on top of this broken base, making its results confounded as well (see EXP-14 confounding note).
  • Verdict: REVERT. Zero scribes, 63% mishap rate, 0.18 >=80/session. All changes from EXP-13 must be reverted. The individual components (skip threshold alone, cap removal alone) could be re-tested as separate experiments if desired.
  • Action: REVERT. v1.5.17 remains baseline. EXP-14 (v1.5.19, cost equalization) also REVERT โ€” see EXP-14 results below.

EXP-14: Equalize action costs (v1.5.19) โ€” Complete, REVERT

  • Hypothesis: The @action_cost mapping is wrong (Urbaj's claim 2). Confirmed with 100% correlation from 2,416 action iterations: destroying=sanity (771/772), disrupting=focus (843/844), taxing=resolve (800/800). Each consumes ~4.2 stars of exactly one resource. The labels are static resource descriptors, not cost predictors. Difficulty is the sole independent variable for both gain and resource consumption.
  • Change: Equalize @action_cost from { taxing: 1, disrupting: 2, destroying: 3 } to { taxing: 1, disrupting: 1, destroying: 1 }. This is the minimal, isolated change.
  • Downstream effects:
    • impact field is now always 1 (no cost differentiation between actions)
    • risk composite = difficulty + 1 (same ordering as difficulty alone)
    • EXP-7 tie-breaking at equal difficulty becomes a no-op (first encountered wins)
    • Repair selection (line 671) becomes purely difficulty-based
  • What was NOT changed (deferred to future experiments if warranted):
    • The resource-aware tiebreaker concept (prefer action consuming most-available resource) was considered but deferred. One change per version. The viability filter typically leaves only 1 option per iteration anyway, making tiebreakers rarely exercised.
    • The risk composite still computed as difficulty + cost but with equal costs it equals difficulty + 1, which preserves correct ordering without a code change.
  • Risk: Low. Viability filter typically leaves only 1 option per iteration.
  • Depends on: EXP-7 (satisfied)
  • Expected impact: LOW. Correct in principle but rarely exercised in practice.
  • Tests: 131 examples, 0 failures. Updated @action_cost setup, build_improvement defaults, 12 fixture impact/risk values, reworked tie-breaking test to verify first-encountered behavior when costs are equal.
  • Sessions: 11 (all complete 60min, 11 characters)
  • Logs: ~/SH_logs/v1.5.19/ (extracted via session_splitter.rb; 4 split sessions)
  • Note: v1.5.19 inherits ALL EXP-13 changes (removed iteration cap, resource-only bail-out, skip <15 for target 90). Since EXP-13 is REVERT, these results reflect both the catastrophic EXP-13 base AND the cost equalization. Compare vs v1.5.18 to isolate EXP-14's effect, and vs v1.5.17 for true baseline.

Cross-version comparison (11 sessions each for fair comparison):

Metric v1.5.17 (baseline) v1.5.18 (EXP-13) v1.5.19 (EXP-14) EXP-14 delta
Worked 555 101 103 +2
Worked/session 50.5 9.2 9.4 +0.2
Skipped 2249 1428 1429 +1
Real scribes 1 0 0 0
C1 fake scribes 2 0 1 +1
>=80 14 2 3 +1
>=80/session 1.27 0.18 0.27 +0.09
Avg precision 52.8 55.0 52.8 -2.2
Mishap rate 42.0% 63.4% 74.8% +11.4pp

Note on v1.5.17 session count: The v1.5.17 directory has 11 files containing 22 sessions (2 batches merged). Per-file metrics show 50.5 worked/file but the per-session baseline used for Z-score calculations elsewhere uses 22 sessions (25.2 worked/session). This comparison uses per-file numbers for apples-to-apples vs the 11-session EXP-14 data.

Stop reason v1.5.17 v1.5.18 v1.5.19 EXP-14 delta
mishap 233 (42%) 64 (63%) 77 (75%) +13
resource_exhausted 67 (12%) 17 (17%) 6 (6%) -11
sigil_vanished 41 (7%) 20 (20%) 19 (18%) -1
scribed 3 0 1 (C1 fake) +1
moves_exhausted 209 (38%) 0 0 0

The C1 fake scribe: Mahtra Sigil#127, precision 87, scribe_count=nil. Reached 87 but did not actually scribe (mishap or vanish), classified as SCRIBED by C1 bug.

>=80 sigils detail (all 3 ended in failure):

Character Sigil# Start Final Iters Stop
Barrask #97 15 84 18 mishap
Gnarta #103 15 84 12 sigil_vanished
Mahtra #127 15 87 13 C1 fake "scribed"

All worked sigils start at exactly precision 15 (the skip <15 threshold from EXP-13).

  • Analysis: Cost equalization appeared to make things WORSE, not neutral:
    1. Mishap rate 74.8% โ€” highest of any version. The cost equalization removed the penalty for "destroying" actions (formerly cost=3). With all costs = 1, the algorithm no longer discriminates against resource-intensive actions, allowing more aggressive action selection. The result: more mishaps without compensating gains.
    2. Resource exhaustion dropped (17โ†’6) โ€” fewer sigils run out of resources because they mishap before reaching resource exhaustion. This is not an improvement.
    3. No precision improvement: avg precision 52.8 (= v1.5.17 baseline), >=80 count 3 vs v1.5.18's 2 (noise range, all failed anyway).
    4. The hypothesis was correct but the effect is harmful: Urbaj's observation that cost labels are resource descriptors (not cost predictors) is confirmed. But the old cost weighting {1,2,3} provided an accidental benefit โ€” it penalized "destroying" actions, which happen to consume sanity. This implicit conservation was better than no conservation.
  • Post-mortem: The expected "LOW impact" assessment was wrong. While the viability filter usually leaves 1 option, equalizing costs affects the RISK composite (difficulty + cost) used in action selection. With costs equalized, RISK = difficulty + 1 for all actions, making the algorithm select purely by difficulty. In cases where multiple actions have the same difficulty, the tiebreaker changes. More importantly, the repair selection logic (line 671) becomes purely difficulty-based, potentially accepting riskier repair attempts.
  • Verdict: REVERT (but see confounding note below).

โš  CONFOUNDING NOTE (Feb 2026 retrospective)

EXP-14 was tested on EXP-13's broken code base (removed iteration cap, removed move budget, skip <15 for target 90). It was never tested against the real v1.5.17 baseline. This means the +11.4pp mishap increase attributed to cost equalization is confounded with EXP-13's already-catastrophic 63.4% mishap rate. The conclusion that cost equalization is "harmful" is not supported by clean data.

Why this matters:

  • EXP-13 removed the iteration cap, so sigils ran 15-18 iterations where mishap probability compounds. Cost equalization on extended runs has a different effect than cost equalization on capped runs.
  • The "accidental diversification" theory (point 4 above) is unvalidated post-hoc reasoning. The viability filter "typically leaves only 1 option per iteration" โ€” a rarely-exercised tiebreaker cannot plausibly cause +11.4pp mishap increase on baseline code where runs are capped at 15 iterations.
  • Urbaj's data is validated by ours (100% correlation, 2,416 actions). The {1,2,3} mapping is provably wrong. It deserves a standalone test against the confirmed v1.5.21 baseline.

Action: Schedule v1.5.24 as a clean standalone cost equalization test ({1,2,3} โ†’ {1,1,1}) against v1.5.21 baseline. See queued experiments.

v1.5.20 Baseline Restore โ€” Complete (partial)

  • Changes: Reverted EXP-13+14 to v1.5.17 algorithm. C1 fix (@actually_scribed flag). D7 analyzer fix (repair_count increment). Version 1.5.20.
  • Sessions: 11 (all complete 60min, 11 characters)
  • Logs: ~/SH_logs/v1.5.20/

v1.5.20 vs v1.5.17 Comparison (11 sessions each for fair comparison):

Metric v1.5.17 (22 sess) v1.5.20 (11 sess) Delta
Worked/session 25.2 23.5 -1.8
Skip rate 80.2% 79.8% -0.4pp
Real scribes 1 1 0
>=80/session 0.64 0.27 -0.36
Avg precision 52.8 53.5 +0.7
Max precision 88 88 0
Avg iterations 10.5 10.8 +0.3
Mishap rate 42.0% 46.9% +4.9pp

Stop Reasons (key finding โ€” resource_exhausted = 0%):

Reason v1.5.17 v1.5.20 Delta
mishap 42.0% 46.9% +4.9pp
moves_exhausted 37.7% 46.1% +8.4pp
resource_exhausted 12.1% 0.0% -12.1pp
sigil_vanished 7.4% 6.6% -0.8pp

Bug discovered: EXP-9 resource exhaustion check ((san+res+foc)*1.75 + prec < target-5) was accidentally omitted during the EXP-13โ†’v1.5.20 revert. The check lives in sigil_info (after resource parsing), not in the Phase 3 bail-out block that was restored. The 12.1% of sigils that should exit via resource_exhausted instead continued to moves_exhausted (+8.4pp) or mishap (+4.9pp).

Bug fix validations:

  • C1: 1 SCRIBED result, 1 real (scribe_count=2). Zero fakes. VALIDATED.
  • D7: Code correct, but 0 repairs observed in this batch (repairs are rare).

>=80 detail (3 sigils):

  • Barrask Sigil#67: start=14, final=84, iters=12, stop=mishap
  • Mahtra Sigil#33: start=14, final=88, iters=15, stop=scribed (2 scrolls) โ€” REAL scribe
  • Mahtra Sigil#62: start=15, final=81, iters=12, stop=mishap

Verdict: Partial baseline. Core algorithm matches v1.5.17 but missing resource check inflates mishap rate by ~5pp and moves_exhausted by ~8pp. Fixed in v1.5.21.

v1.5.21 Corrected Baseline โ€” Complete, BASELINE CONFIRMED

  • Changes: Restores EXP-9 resource exhaustion check. No other changes. Algorithm identical to v1.5.17 + C1 fix + D7 analyzer fix.
  • Sessions: 11 (all complete 60min, 11 characters)
  • Logs: ~/SH_logs/v1.5.21/

v1.5.21 vs v1.5.17 vs v1.5.20 Comparison:

Metric v1.5.17 (22 sess) v1.5.20 (11 sess) v1.5.21 (11 sess) 21 vs 17
Worked/session 25.2 23.5 23.8 -1.4
Skip rate 80.2% 79.8% 79.6% -0.6pp
Real scribes 1 1 0 -1
>=80 14 3 7
>=80/session 0.64 0.27 0.64 0.0
Avg precision 52.8 53.5 51.6 -1.2
Max precision 88 88 87
Avg iterations 10.5 10.8 10.2 -0.3
Mishap rate 42.0% 46.9% 40.8% -1.2pp

Stop Reasons (resource_exhausted restored):

Reason v1.5.17 v1.5.20 v1.5.21 21 vs 17
mishap 42.0% 46.9% 40.8% -1.2pp
moves_exhausted 37.7% 46.1% 39.7% +2.0pp
resource_exhausted 12.1% 0.0% 13.4% +1.3pp
sigil_vanished 7.4% 6.6% 6.1% -1.3pp

>=80 detail (7 sigils โ€” 4 budget-stopped, 3 mishapped):

Character Sigil Start Final Iters Danger Stop
Barrask #60 13 83 14 18 moves_exhausted
Fidon #23 13 84 14 18 moves_exhausted
Gnarta #112 14 80 14 18 moves_exhausted
Mahtra #102 15 82 14 18 moves_exhausted
Jazriel #37 14 87 12 17 mishap
Kythkani #12 15 84 10 7 mishap
Throve #27 15 83 10 7 mishap

Key observation: 4 of 7 >=80 sigils were stopped by the move budget at iteration 14 with 1 iteration remaining before the cap. At precision 80-84 (gap of 1-5 to scribe threshold of 85), a single additional iteration at avg 7.3 gain would likely scribe them. This is the strongest signal yet for the "loosen move budget" experiment.

Verdict: BASELINE CONFIRMED. All metrics match v1.5.17 within normal variance. v1.5.21 is the corrected baseline with working C1/D7/EXP-9 instrumentation.

Assumption Audit: Comprehensive Data Analysis (Feb 2026)

Systematic audit of all script assumptions using 4,608 iterations from 566 worked sigils (v1.5.15 + v1.5.17 combined). Script: scratchpad/assumption_audit.rb.

Finding 1: Cost label โ†’ resource mapping is 100% confirmed

Cost Label Consumes Hit Rate Avg Stars Consumed
destroying sanity 771/772 (100%) -4.32
disrupting focus 843/844 (100%) -4.23
taxing resolve 800/800 (100%) -4.15

Each action consumes ~4.2 stars of exactly one resource. The @action_cost mapping {taxing:1, disrupting:2, destroying:3} is provably wrong. Urbaj's claim confirmed with perfect correlation. Refreshes consume zero resources but increase danger by ~1.0.

Finding 2: Clarity is a degrading hidden variable

  • Clarity ALWAYS decreases during a sigil (474/485 sigils), never increases
  • Per-iteration: 38.3% of iterations decrease clarity, 0% increase it, mean -1.17
  • Refreshes degrade clarity 25x faster than actions (-2.52 vs -0.1 per iteration)
  • Weak positive correlation: clarity 70-79 โ†’ 6.7 avg gain; clarity 90-99 โ†’ 7.3 avg gain
  • Starting clarity: range 88-99, mean 96.3 (no signal from starting clarity binning)
  • Implication: Refreshes have a hidden cost โ€” they degrade clarity much faster than actions. This reinforces the value of minimizing refreshes (already our strategy). Unclear if clarity directly affects game outcomes or is cosmetic.

Finding 3: Resource level directly affects gain per iteration

Total Stars N Avg Gain Median Zero%
0-5 3 5.0 4 33.3%
11-15 17 4.8 5 5.9%
16-20 92 6.9 6 0.0%
21-30 626 6.5 5 0.0%
31+ 1831 7.5 7 0.1%

Per-resource: consuming a resource at level 3-5 gives ~3-5 gain; at level 10+ gives ~7-8 gain. Danger shows no signal (gain flat across 0-17). Resource depletion doesn't just limit iterations โ€” it reduces per-iteration effectiveness. The resource exhaustion coefficient (1.75) may understate the impact because it doesn't account for diminishing returns at low resource levels.

Finding 4: Skip threshold should be 15 for target 90 (INVALIDATED โ€” see SIM-7 below)

Start Precision Count Avg Final Max >=80 >=85 Scribed
13 209 50.9 83 1 0 0
14 162 51.3 78 0 0 0
15 194 53.5 85 6 1 1

All 566 worked sigils start at 13-15 (current threshold skips <13). Start=13 and start=14 never reach 85 in 371 attempts. Raising threshold to <15 saves 3,851 iterations (~37 sigils ร— 10.4 avg iters) with zero lost scribes. Only start=15 has any scribe potential. This should be part of EXP-13 or a standalone micro-experiment.

CORRECTION (Feb 2026): This analysis used combined v1.5.15+v1.5.17 data (566 worked). The v1.5.15 data was collected WITHOUT Awakened technique. SIM-7 (below), using v1.5.17 data only (555 worked, WITH Awakened), shows ALL 3 scribes started at precision 13. Awakened provides enough of a boost that start=13 CAN reach 90. The pre-Awakened data diluted this effect in the combined dataset. Skip <15 is DEAD for post-Awakened testing. The current skip <13 threshold is correct.

Finding 5: Iteration cap prevents 15-37 potential scribes per 566 worked

181 sigils (32% of worked) were stopped by the move budget check. Monte Carlo projection (5,000 simulations each, using observed gain distribution):

  • 37 of 181 (20%) have >50% probability of reaching 85 with more iterations
  • 15 of 181 (8%) have >50% probability of reaching 90
  • Several sigils at 70+ with 10-15 remaining stars have 55-72% chance of reaching 90
  • Confirms EXP-13 (remove cap) as highest-impact change

Finding 6: No infinite loop risk without cap

  • 76% of refreshes produce a viable action on the next iteration
  • Refresh streaks: mean 1.2, max 6. Only 3.9% are 3+ consecutive, 0.2% are 5+.
  • Resource drain provides natural termination: refreshes cost 0 resources but +1 danger, and resource exhaustion check catches depleted sigils
  • Safe to remove iteration cap with resource-based exit as primary guard

Finding 7: <= 80 guard โ€” minor impact at current volumes

6 sigils reached 80-84 but couldn't scribe (dead ends). However, all 6 had viable resource projections when crossing 80 โ€” they died to mishaps (4/6) or iteration cap. The resource check with <= 80 guard wouldn't have caught any of them earlier. 11 wasted iterations above 80 with zero gain. Low priority fix โ€” mishaps, not the guard, are the primary cause of dead ends at 80+.

Queue updates from audit (updated post-EXP-13 results):

Priority Change Experiment Status
1 Remove iteration cap + move budget EXP-13 REVERTED โ€” mishap rate 63.4%, 0 scribes
2 Raise skip threshold to <15 for target 90 Was in EXP-13 REVERTED as bundle โ€” data valid but must be isolated
3 Equalize costs EXP-14 REVERTED โ€” mishap rate 74.8%, 0 real scribes
4 Resource-aware action selection (prefer full resources) Future Untested
5 Monitor clarity degradation Observational Finding 2 above โ€” refreshes degrade 25x faster

Code-Level Assumption Audit (v1.5.19, Feb 2026)

Systematic line-by-line review of every hardcoded value, threshold, and decision point in the algorithm. Each entry identifies the assumption, its test status, and whether it can be isolated for experimentation.

A. Magic Numbers & Thresholds

ID Line Value Assumption Status
A1 264 1.75 Resource stars โ†’ precision conversion coefficient EXP-9 (kept). Sub-assumption: all 3 resources are fungible โ€” untested
A2 162,270,283 target - 5 Minimum useful scribe precision / bail-out margin Game mechanic? Untested whether -3 or -7 is better
A3 323 < 15 Skip threshold for target 90 Data-confirmed (0 scribes from start<15). Part of EXP-13 revert โ€” retest standalone
A4 328 < 13 Skip threshold for target 80 EXP-10 (kept)
A5 290 < 2 Max 2 aspect repairs per sigil Original design. Never tested.
A6 669 precision - 15 Only repair when within 15 of target Never tested.
A7 666 <= 3 Only trivial/straightforward/formidable repairs Never tested.
A8 668 >= 2 Repair margin requirement (stricter than precision's > 0) Never tested.
A9 291 <= 18 Don't repair when danger > 18 Near max, rarely reached
A10 379 >= 14 Trader luck threshold (guild-specific) Never tested. Hard to isolate
A11 46 1,2,3,4,5 Difficulty ordinal values EXP-6 (confirmed)
A12 43 1,1,1 Cost labels equalized EXP-14 (reverted). Data-confirmed but harmful โ€” old {1,2,3} provided useful implicit diversification

B. Algorithm Decision Logic

ID Line Logic Assumption Status
B1 648-661 margin > 1 any, margin > 0 for challenging+ Viability cutoffs EXP-12 (margin>=0 reverted twice, +6.6pp mishap)
B2 233 Prefer highest difficulty Max difficulty โ†’ max gain EXP-7 (kept, confirmed)
B3 226 Skip ACTION verb ACTION has 24.8% zero-gain rate EXP-6 (confirmed)
B4 200-213 Repair when stat - difficulty < 2 AND difficulty >= 3 Pre-scan threshold for repair candidates Untested โ€” the difficulty >= 3 filter means we never repair for trivial/straightforward
B5 301-303 Refresh when no action available Only alternative is quitting the sigil Game mechanic โ€” but refreshes have hidden clarity cost (Finding 2)

C. Known Bug

ID Line Bug Impact
C1 162 SCRIBED misclassification ACTIVE โ€” 50% of all "SCRIBED" results across all versions are fakes. After loop exit (including mishaps), any sigil with precision >= target-5 is classified as SCRIBED even if no scribing occurred. Confirmed from raw logs: 10 of 20 "SCRIBED" results have zero "You carefully scribe" game messages. Additionally, the analyzer (line 566) inherits the bug by assigning stop_reason=:scribed based on the script's result field. CRITICAL FIX: (1) Script: track @actually_scribed flag, (2) Analyzer: use scribe_count > 0 not result field.

D. Untested Game Mechanics

ID Question Current Position Measurable From Logs?
D1 Does danger affect mishap probability? Data says no (uniform 0-18) Yes โ€” measured in Finding 3
D2 Does clarity affect precision gain? Weak signal (+0.6 from 70-79 to 90-99) Yes โ€” measured in Finding 2
D3 Resource consumption per difficulty ~4.2 stars per action (Finding 1) Partially โ€” need delta analysis per difficulty level
D4 Do refreshes cost resources? Finding 1 says 0 resources, +1 danger Yes โ€” measured
D5 Is there a game iteration soft cap? No evidence (max seen: 18) Observational only
D6 Does repair restore the target resource? Game text implies yes Measurable from resource snapshots
D7 repair_count tracking Initialized but never incremented in analyzer Code gap โ€” parser never counts repairs

E. Isolatable Experiments by Priority (post-EXP-13 revert)

Original priority list โ€” superseded by simulation results below.

Priority What to test Which assumption How to isolate
1 Skip threshold <15 standalone A3 KILLED by SIM-7 โ€” all scribes start at 13
2 Repair count limit (remove cap of 2) A5 BLOCKED โ€” SIM-4 inconclusive, need D7 fix first
3 Repair proximity (15 โ†’ 20 or remove) A6 KILLED by SIM-8 โ€” no sigils exhaust near target
4 Resource-specific projection (not sum-all) A1 sub SIM-3 validated fungibility โ€” deprioritized
5 Fix SCRIBED misclassification C1 Add @actually_scribed flag โ€” promoted to Phase 0
6 Resource consumption per difficulty level D3 SIM-2 validated flat rate โ€” resolved, no experiment
NEW Resource bail-out threshold A1 SIM-3 finding: 38.5 precision wasted per sigil

See "Simulation-Based Testing Chronology" below for the updated experiment sequence.

Simulation-Based Hypothesis Validation (Feb 2026)

Eight simulations run against v1.5.17 data (22 sessions, 555 worked sigils, 3 scribed) to classify each hypothesis from the Code-Level Assumption Audit as:

  • Validated by logs โ€” answered from existing data, no live experiment needed
  • Killed by simulation โ€” simulation shows the hypothesis has no impact
  • Informs experiment โ€” simulation guides how to design a live experiment
  • Inconclusive โ€” data gap prevents reliable simulation

Script: scratchpad/assumption_simulations.rb

SIM-1: SCRIBED Misclassification Bug (C1) โ€” CORRECTED: BUG IS ACTIVE

Metric SIM-1 result Corrected (raw log audit)
Total SCRIBED results 3 3
True scribes (scribe_count > 0) 3 1 (Fidon, 4 scrolls)
Misclassified (scribe_count = 0) 0 2 (Byd, Refia)

CORRECTION: SIM-1 reported 0 misclassifications because the analyzer's determine_stop_reason (line 566) assigns :scribed for ANY result=SCRIBED, inheriting the script's buggy classification. The simulation checked stop_reason != :scribed โ€” which can never detect C1 because the analyzer trusts the script's result field.

A raw log audit using scribe_count (from actual "You carefully scribe" messages) reveals 2 of 3 v1.5.17 "SCRIBED" results are C1 fakes:

  • Byd #96: precision 85, "Sigil harvesting failed" (mishap), 0 scrolls produced
  • Refia #84: precision 88, "all traces of the sigil have vanished", 0 scrolls
  • Fidon #37: precision 86, "Final precision: 86, scribing", 4 scrolls (REAL)

Across ALL versions: 20 result=SCRIBED, 10 real, 10 C1 fakes. 50% misclassification rate. The bug is NOT dormant โ€” it actively inflates scribe counts.

Classification: CRITICAL bug fix โ€” actively corrupting data. Phase 0 priority.

SIM-2: Resource Consumption Per Difficulty (D3)

Iteration Range N Avg cost/iter Total avg consumed
1-5 22 2.99 12.9
6-10 184 2.26 19.3
11-14 347 2.14 25.5
15 2 2.67 40.0

Resource consumption rate is roughly flat at ~2.1-3.0 stars/iter regardless of how deep into a sigil we are. Higher initial rate (1-5 iters) likely reflects higher starting resources enabling more expensive actions early. The per-action cost of ~4.2 stars (Finding 1) is consistent.

Classification: Validated by logs โ€” no experiment needed.

SIM-3: Resource Fungibility (A1)

Resource Avg at exit Median Zero%
sanity 7.4 7 1.1%
resolve 7.2 7 2.0%
focus 7.4 7 1.1%
  • Imbalanced exits (one resource=0, another>=3): 22 (4.0%)
  • Balanced exits: 533
  • Avg remaining stars at exit: 22.0
  • 22.0 ร— 1.75 = 38.5 projected precision wasted per sigil

Resources deplete EVENLY, confirming the sum-all projection is valid. But the MAJOR finding is that sigils exit with 22 stars remaining on average. This means the bail-out formula (resource projection coefficient 1.75, line 264) is TOO AGGRESSIVE โ€” it triggers the resource exhaustion exit while significant resources remain, leaving 38.5 projected precision on the table per sigil.

Classification: Validated (fungible) + MAJOR FINDING โ€” bail-out aggressiveness is the highest-impact tuning target. See Testing Chronology Phase 2.

SIM-4: Repair Cap (A5)

The repair_count field is never populated in the analyzer (code gap D7). The simulation estimated repairs by counting action menu items where aspect = resource name. This methodology is FLAWED: it counts all menu items offered, not algorithm-selected repairs. Results (0-38 "repairs" per sigil, 99.8% "at cap") are misleading.

Classification: Inconclusive โ€” must fix D7 first, collect 10+ sessions with real repair_count tracking, then re-simulate.

SIM-5: Fate of High-Precision Sigils (A2/C1)

Sigil Start Peak Final Result Iters Danger
#96 13 85 85 SCRIBED 12 17
#37 13 86 86 SCRIBED 14 18
#84 13 88 88 SCRIBED 13 18

3 sigils ever reached precision 85+ in 555 worked. The analyzer reports all 3 as "scribed" but raw log audit (SIM-1 correction) reveals only 1 actually scribed:

Sigil Start Peak Final Real? Actual outcome
#96 (Byd) 13 85 85 FAKE Mishap at 85 ("Sigil harvesting failed")
#37 (Fidon) 13 86 86 REAL Scribed, 4 scrolls produced
#84 (Refia) 13 88 88 FAKE Sigil vanished at 88

2 of 3 sigils that reached 85+ were LOST to mishap/vanish. Only 1 successfully scribed. The C1 bug window is NOT narrow โ€” it's hitting 67% of 85+ sigils in our data.

All 3 started at precision 13 with 12-14 iterations and danger 17-18 at peak.

Classification: Partially validated, C1 impact severe โ€” reaching 85+ does not guarantee scribing. The high danger (17-18) at 85+ means significant mishap risk remains.

SIM-6: Refresh Cost (D4)

Metric With refreshes (N=339) Without refreshes (N=216)
Avg refreshes per sigil 1.6 0
Resource cost/iter 2.08 stars 2.44 stars
Danger/iter 1.08 0.77

Refreshes consume 0 resources (lower per-iter cost because refresh iterations don't drain resources) but add danger (+0.31 danger/iter compared to non-refresh sigils). This confirms Finding 2 (refreshes have hidden cost through clarity/danger accumulation). No algorithmic change indicated โ€” we already minimize refreshes.

Classification: Validated โ€” no experiment needed.

SIM-7: Skip Threshold Sensitivity (A3) โ€” CRITICAL FINDING

Threshold Work Skip >=80 Scribes Lost >=80 Lost scribes Iters saved
Skip <12 555 0 14 3 0 0 0
Skip <13 555 0 14 3 0 0 0
Skip <14 342 213 8 0 6 3 2259
Skip <15 183 372 7 0 7 3 3899
Skip <16 1 554 0 0 14 3 5818

ALL 3 scribes started at precision 13. Any skip threshold above <13 eliminates ALL scribes from the v1.5.17 dataset. Skip <14 also loses 6 of 14 >=80 sigils. Skip <15 (the threshold from EXP-13) loses 7 of 14 >=80 sigils AND all 3 scribes.

This INVALIDATES Finding 4 for post-Awakened testing. The original analysis used combined v1.5.15+v1.5.17 data where the pre-Awakened v1.5.15 data showed 0 scribes from start=13. With Awakened active, the precision boost is sufficient for start=13 sigils to reach 90. The current skip <13 threshold is correct and MUST NOT be raised.

Classification: KILLED โ€” skip <15 hypothesis is dead. Current <13 is optimal.

SIM-8: Repair Proximity Threshold (A6)

Distance from target Count
0-5 (near target) 0
6-15 (in repair range) 0
16-30 (outside repair range) 2
31+ (far from target) 65

All 67 resource-exhausted sigils were 16+ precision from target. Zero were in the 6-15 range where the repair proximity threshold operates. The threshold is irrelevant because resource exhaustion only hits sigils far from target โ€” sigils near target have been efficiently progressing and don't exhaust resources.

Classification: KILLED โ€” widening/removing threshold has zero impact.

Hypothesis Classification Summary

ID Hypothesis SIM Classification Action
C1 SCRIBED misclassification bug SIM-1, SIM-5 ACTIVE โ€” 50% misclass rate CRITICAL bug fix (Phase 0)
D3 Resource consumption varies by difficulty SIM-2 Flat ~2.1-3.0 stars/iter Validated โ€” no experiment
A1 Resources are fungible (sum-all valid) SIM-3 Even depletion (4% imbalanced) Validated โ€” no experiment
A1-sub Bail-out threshold too aggressive SIM-3 38.5 precision wasted/sigil Experiment (Phase 2)
A5 Repair cap of 2 is binding SIM-4 Estimation method flawed Inconclusive โ€” fix D7 first
A2 target-5 scribe margin SIM-5 85+ does NOT guarantee scribe (1/3 real) Needs investigation
D4 Refreshes cost resources SIM-6 0 resources, +0.31 danger/iter Validated โ€” no experiment
A3 Skip <15 for target 90 SIM-7 All 3 scribes start at 13 KILLED โ€” stay at <13
A6 Repair proximity threshold matters SIM-8 0 sigils exhaust near target KILLED โ€” no experiment

Score: 5 validated by logs, 2 killed by simulation, 1 informs experiment, 1 inconclusive.

Simulation-Based Testing Chronology (Feb 2026)

Ordered experiment sequence informed by simulation results. Each phase depends on the previous phase being complete.

Phase 0: Infrastructure & Bug Fixes โ€” DONE (v1.5.20 + v1.5.21)

Item What Status
Revert EXP-13 Restore v1.5.17 algorithm as baseline Done (v1.5.20)
Fix C1 Add @actually_scribed flag Done (v1.5.20) โ€” validated: 0 fake SCRIBEDs
Fix D7 Increment repair_count in analyzer parser Done (v1.5.20) โ€” code correct, 0 repairs in sample
Fix EXP-9 omission Restore resource exhaustion check Done (v1.5.21) โ€” was accidentally dropped in v1.5.20

v1.5.20 deployed and tested (11 sessions). Discovered missing EXP-9 resource check (0% resource_exhausted vs 12.1% baseline). Fixed in v1.5.21.

Phase 1: Complete EXP-14 Analysis โ€” DONE, REVERT

  • Collected 11 v1.5.19 sessions (all 11 characters)
  • Ran on EXP-13 code base (not rebased โ€” both EXP-13 and EXP-14 now REVERT)
  • Results: mishap rate 74.8% (+11.4pp vs EXP-13 base), 0 real scribes, no metric improvement
  • Cost equalization removed cost penalty for dangerous actions โ†’ more mishaps
  • See EXP-14 detailed results above

Phase 1.5: Corrected Baseline (v1.5.21) โ€” DONE, CONFIRMED

  • 11 sessions, all metrics match v1.5.17 within normal variance
  • =80/session: 0.64 (exact match), mishap: 40.8% (vs 42.0%), resource_exhausted: 13.4% (restored)

  • Key finding: 4 of 7 >=80 sigils stopped by move budget at 80-84 with 1 iter remaining
  • C1 validated (0 fakes), D7 code correct (0 repairs observed โ€” genuinely rare)
  • Corrected baseline established. Ready for Phase 2.

Phase 2a: EXP-15 (v1.5.22) โ€” DONE, REVERT

Change: (14 - @num_iterations) โ†’ (15 - @num_iterations) in move budget formula.

Metric v1.5.21 (baseline) v1.5.22 (EXP-15) Delta
Worked 262 259 -3
>=80 7 (0.64/sess) 6 (0.55/sess) -1
Scribes 0 0 0
Mishap rate 40.8% 52.5% +11.7pp (Z=2.67)
moves_exhausted 104 (39.7%) 47 (18.1%) -21.6pp
iteration_cap 0 (0%) 4 (1.5%) +1.5pp
resource_exhausted 35 (13.4%) 37 (14.3%) +0.9pp
sigil_vanished 16 (6.1%) 32 (12.4%) +6.3pp

The formula change freed ~57 sigils from budget exits. Of those: ~29 mishapped, ~16 vanished, 4 reached iteration cap (3 at precision 83, gap=2 from scribe). The old off-by-one was functioning as a safety guardrail โ€” extending sigils costs more mishaps than it gains.

moves_exhausted distribution shifted rightward by 1 iteration (each cohort got 1 more iter):

  • v1.5.21: iter 10(5), 11(28), 12(42), 13(23), 14(6)
  • v1.5.22: iter 12(8), 13(23), 14(16)

Lesson: Don't extend sigils deeper into the danger zone. The bottleneck is mishap rate at high iterations (24% at iter 11, 20% at iter 12), not the budget formula.

Phase 2b: EXP-16 (v1.5.23) โ€” DONE, REVERT

Change: resource exhaustion coefficient 1.75 โ†’ 1.5 in sigil_info.

Results: Total wipeout โ€” 0 worked sigils, 1718 skipped (100%), 0 scribes.

The coefficient 1.5 is mathematically impossible for target 90:

  • Max starting resources: 15 + 15 + 15 = 45 stars
  • Available at coeff 1.5: 45 ร— 1.5 + precision = 67.5 + precision
  • Threshold: target - 5 = 85
  • Need: 67.5 + precision โ‰ฅ 85 โ†’ precision โ‰ฅ 18 required
  • Starting precision is almost never 18+, so ALL sigils bail on iteration 0

At coeff 1.75: 45 ร— 1.75 + 13 = 91.75 โ‰ฅ 85 โ€” passes fine. Minimum viable coefficient: (85 - 13) / 45 = 1.6

Of 1718 total sigils: 1388 skipped by "below 13" threshold, 330 passed it but immediately hit the resource exhaustion exit. The resource check is evaluated on iteration 0 with full resources โ€” at 1.5, even full resources + precision 13 gives only 80.5, below the 85 threshold.

Post-mortem: This was a calculation error in experiment design. The coefficient determines the minimum starting precision at full resources. The relationship should have been checked: (target - 5 - min_starting_precision) / max_resources = (85 - 13) / 45 = 1.6. Any coefficient below 1.6 makes it impossible for precision-13 sigils (the skip threshold) to even start. The 1.75 coefficient already provides minimal headroom (91.75 vs 85 threshold). Future coefficient experiments should target 1.65-1.70 range, not below 1.6.

Lesson: Always verify the boundary condition: can a sigil at the skip threshold (precision 13) with max resources (45 stars) pass the resource check? If not, the coefficient is too low.

Phase 2c: EXP-14 Retest โ€” Clean Cost Equalization (v1.5.24) โ€” DONE, KEPT

Change: @action_cost from { taxing: 1, disrupting: 2, destroying: 3 } to { taxing: 1, disrupting: 1, destroying: 1 }. Tested against confirmed v1.5.21 baseline.

Metric v1.5.21 (baseline) v1.5.24 (retest) Delta
Worked 262 263 +1
Worked/session 23.8 23.9 +0.1
Scribed (real) 0 2 +2
>=80 7 3 -4 (Fisher p=0.22, n.s.)
Mishap rate 40.8% 39.2% -1.6pp (Z=0.39, p=0.70, n.s.)
moves_exhausted 104 (39.7%) 98 (37.3%) -2.4pp
resource_exhausted 35 (13.4%) 32 (12.2%) -1.2pp
sigil_vanished 16 (6.1%) 28 (10.6%) +4.5pp (Z=1.88, p=0.06, marginal)
Avg gain/iter 7.2 7.3 +0.1
Danger at mishap 7.8 7.1 -0.7

Scribes (both C1-validated, 4 scrolls each):

  • Barrask #87: prec 92/90, 11 iters, danger 11, start 15 (efficient โ€” low danger)
  • Refia #62: prec 86/90, 15 iters, danger 18, start 14

>=80 detail (3 total, 2 scribed):

  • Barrask #87: start 15 โ†’ 92, scribed (4 scrolls)
  • Refia #62: start 14 โ†’ 86, scribed (4 scrolls)
  • Refia #92: start 14 โ†’ 80, mishap at danger 11

Key findings:

  1. Mishap rate UNCHANGED (Z=0.39, p=0.70). The confounded EXP-14 showed +11.4pp. The clean test shows -1.6pp (noise). The original "harmful" conclusion was wrong. The "accidental diversification" theory is refuted โ€” equalizing costs has no measurable effect on mishap rate when tested against baseline code.
  2. 2 real scribes in 11 sessions โ€” best single-test result since Awakened technique. Small counts (not statistically significant), but directionally positive.
  3. sigil_vanished marginally up (p=0.06). Monitor but no action needed โ€” not significant at p<0.05, and no code change would explain this (only cost tiebreaking changed).
  4. >=80 down 7โ†’3 but not significant (Fisher p=0.22). Notably, 2/3 >=80 sigils scribed (67% conversion) vs 0/7 in baseline (0% conversion).
  5. Code is now correct: @action_cost accurately reflects that each label describes WHICH resource, not HOW MUCH. The {1,2,3} mapping was provably wrong.

Verdict: KEPT. Cost equalization is neutral. v1.5.24 becomes the new baseline.

Phase 2d: EXP-17 โ€” Resource-Aware Tiebreaker (v1.5.25) โ€” KEPT

Experiment selection analysis (v1.5.24 data, 2,570 iterations with parsed actions):

Four candidate experiments were evaluated for v1.5.25:

Option Change Mechanism Effect size Risk
A: Resource-aware tiebreaker When 2+ actions share highest difficulty, prefer action draining most-available resource Preserves scarce resources, extending productive iterations 9.4% of iterations (241/2,570 ties) Low โ€” only changes tiebreaking
B: Resource coefficient 1.75โ†’1.65 Lower bail-out threshold Retains ~1 more sigil per session ~4.5 precision points headroom Low but tiny effect
C: Danger-aware throttling Reduce difficulty when danger is high Reduce mishap rate 68% of mishaps at danger <10 โ€” weak signal Medium โ€” requires model of mishap function
D: Move budget 13โ†’11 Lower precision/move coefficient Bail fewer sigils Wrong direction โ€” bails MORE sigils N/A โ€” excluded

Why Option A (resource-aware tiebreaker):

  1. Measurable frequency: Fires in 9.4% of iterations (241 ties out of 2,570). That's ~24 tiebreaker decisions per 11-session test โ€” enough to detect an effect.

  2. 100% heterogeneous cost profiles: Every observed tie involves actions that drain DIFFERENT resources (e.g., one taxing/resolve, one destroying/sanity). This means every tie offers a real choice โ€” the tiebreaker always has a meaningful preference to express.

  3. Tie distribution (balanced across all resource pairs):

    • taxing/destroying (resolve vs sanity): ~33%
    • disrupting/taxing (focus vs resolve): ~33%
    • disrupting/destroying (focus vs sanity): ~33%
    • 3-way ties: <1%
  4. Direct mechanism: Resource conservation extends the productive phase. When resources are asymmetric (e.g., sanity=12, focus=5, resolve=8), draining the abundant resource (sanity) instead of the scarce one (focus) avoids hitting the resource exhaustion bail-out prematurely. The bail-out check uses (sanity + resolve + focus) * 1.75 + precision < 85, so preserving total resource pool matters.

  5. Low risk: Only fires when two actions are already tied on difficulty (same expected gain) and cost (same impact weight). The change never overrides the primary selection criterion (highest difficulty) or the secondary (lowest cost). It only resolves what was previously an arbitrary first-encountered-wins tie.

Implementation (lines 243-258 of sigilharvest.lic):

The existing action selection has two levels:

  1. Prefer highest difficulty (EXP-7, determines precision gain)
  2. Break ties by lowest cost/impact (conserve resources)

With cost equalization (EXP-14 retest, all costs = 1), level 2 never fires. EXP-17 adds a third level: when difficulty AND cost are tied, prefer the action whose resource label corresponds to the highest current resource level (contest_stat_for).

# Level 3 tiebreaker (EXP-17):
elsif x['difficulty'] == sigil_action['difficulty'] && x['impact'] == sigil_action['impact']
  if contest_stat_for(x['resource']) > contest_stat_for(sigil_action['resource'])
    sigil_action = x
  end

Resource mapping: contest_stat_for('sanity') โ†’ @sanity_lvl, 'resolve' โ†’ @resolve_lvl, 'focus' โ†’ @focus_lvl. These are already parsed from the game's star display each iteration.

Why not the other options:

  • B (coefficient): At 1.75, the formula gives 45 ร— 1.75 + 13 = 91.75 vs threshold 85. Changing to 1.65 gives 45 ร— 1.65 + 13 = 87.25 โ€” only 2.25 points less headroom. The effect is too small to measure reliably in 11 sessions.
  • C (danger-aware): 68% of mishaps occur at danger <10, suggesting danger doesn't strongly predict mishap probability. Without a validated mishap model, any throttling rule is speculative. Needs more data analysis before experimenting.
  • D (move budget): Lowering the coefficient from 13 to 11 means the formula bails MORE sigils (declares them hopeless earlier). This shrinks the candidate pool โ€” wrong direction.

EXP-17 Results (12 sessions, 11 characters, Shard/permutation/target=90/60min):

  • Sessions: 12 (11 complete, 1 incomplete โ€” Kythkani fragment, 322 lines)
  • Logs: ~/SH_logs/v1.5.25/
  • Baseline: v1.5.24 (EXP-14 retest, 11 sessions)
  • C1 audit: 1 real scribe (Byd, 4 scrolls, precision 87). No C1 misclassifications.
Metric v1.5.24 (baseline) v1.5.25 (EXP-17) Delta
Sessions 11 12 +1
Worked 263 254 -9
Scribed 2 1 -1
Mishap rate (per sigil) 39.2% 33.9% -5.3pp
Mishap rate (per iter) 3.8% 3.2% -0.6pp
Avg gain/productive iter 7.3 7.1 -0.2
Avg iters/sigil 10.3 10.6 +0.3
Refresh rate 9.5% 8.9% -0.6pp
Failed actions 116 136 +20
Stop Reason v1.5.24 v1.5.25 Delta
moves_exhausted 98 (37.3%) 117 (46.1%) +8.8pp
mishap 103 (39.2%) 86 (33.9%) -5.3pp
resource_exhausted 32 (12.2%) 27 (10.6%) -1.6pp
sigil_vanished 28 (10.6%) 23 (9.1%) -1.5pp
scribed 2 (0.8%) 1 (0.4%) -0.4pp

Analysis:

  1. Mishap rate directionally improved (39.2%โ†’33.9% per sigil, 3.8%โ†’3.2% per iter). Two-proportion z-test: z = -1.25, p โ‰ˆ 0.21 โ€” not statistically significant at p<0.05. Consistent with hypothesis but insufficient sample size to confirm.

  2. Stop-reason shift is mechanically coherent: fewer resource-exhausted exits (32โ†’27) and fewer mishaps (103โ†’86), with more moves_exhausted exits (98โ†’117). Sigils survive longer (avg iters 10.3โ†’10.6), dying to the move budget instead of resource depletion or mishaps. This is exactly what resource conservation should produce.

  3. Scribe count (2โ†’1) is in the noise. We've seen 0-2 real scribes per 11-session run consistently across all versions. Not a meaningful signal.

  4. Failed actions increased (116โ†’136). The tiebreaker may sometimes pick an action whose resource is abundant but has a higher failure rate. Worth monitoring but not alarming at this sample size.

  5. Precision gain marginally lower (7.3โ†’7.1). Expected โ€” the tiebreaker resolves ties that were previously arbitrary, sometimes choosing a slightly different action. The tradeoff is resource conservation vs marginal per-iteration gain.

Verdict: KEPT. The tiebreaker is a zero-risk third-level selection rule that fires in ~9.4% of iterations. No degradation in any critical metric. Directional improvement in mishap rate and resource exhaustion. Mechanically coherent stop-reason shift. The change is too small to achieve significance in 11 sessions, but there is no signal of harm and the mechanism is sound. v1.5.25 becomes the new baseline.

v1.2.0 Baseline Comparison (Head-to-Head)

To validate the cumulative effect of all changes since the original script, an instrumented v1.2.0 baseline was created (sigilharvest-v120-baseline.lic) and run for 11 sessions under identical conditions (Shard/permutation/target=90/60min/Inspired+Enlightened+Illuminated+Awakened).

What the v1.2.0 baseline includes (instrumentation only, no algorithm impact):

  • C1 fix (@actually_scribed flag) โ€” required for accurate scribe counting
  • resolve_burin/get_burin/stow_burin โ€” infrastructure parity
  • Difficulty fix (formidable=3, challenging=4, difficult=5) โ€” confirmed bug fix
  • Cost equalization ({1,2,3}โ†’{1,1,1}) โ€” confirmed correct mapping

What the v1.2.0 baseline retains (original algorithm):

  • Risk-based action selection (low risk far from target, high risk near target)
  • ACTION verb accepted (not filtered)
  • Skip threshold <10 (not <13)
  • No iteration cap (removed per Urbaj's correction)
  • Original bail-out coefficients (2.25/15) with <=80 guards
  • No resource-aware tiebreaker

Logs: ~/SH_logs/v1.2.0/DR-*.log (11 sessions, Feb 4 2026)

Core Metrics

Metric v1.2.0 (original algo) v1.5.25 (current) Delta
Sessions 11 12 +1
Total sigils found 983 1,284 +301
Worked 414 254 -160
Skipped 569 (57.9%) 1,030 (80.2%) +22.3pp
Scribed 0 1 +1
>=80 precision 7 (1.7%) 6 (2.4%) +0.7pp
Avg precision 54 51 -3
Best precision 86 87 +1

Efficiency

Metric v1.2.0 v1.5.25 Delta
Avg iters/sigil 10.9 10.6 -0.3
Avg gain/productive iter 8.4 7.1 -1.3
Refresh rate 13.9% 8.9% -5.0pp
Productivity rate 45.1% 50.7% +5.6pp
Failed actions 172 136 -36
Repairs detected 2 0 -2

Mishap & Stop Reasons

Metric v1.2.0 v1.5.25 Delta
Mishap rate (per sigil) 50.7% 33.9% -16.8pp
Mishap rate (per iter) 4.6% 3.2% -1.4pp
Danger at mishap (avg) 8.6 8.1 -0.5
Stop Reason v1.2.0 v1.5.25
mishap 210 (50.7%) 86 (33.9%)
moves_exhausted 167 (40.3%) 117 (46.1%)
sigil_vanished 35 (8.5%) 23 (9.1%)
resource_exhausted 2 (0.5%) 27 (10.6%)
scribed 0 1 (0.4%)

Analysis

  1. v1.5.25 wins on quality, v1.2.0 wins on quantity. v1.2.0 works 63% more sigils (414 vs 254) because skip<10 attempts everything starting at precision 10+. But v1.5.25's skip<13 filters low-value sigils, yielding a higher 80+ rate (2.4% vs 1.7%) and the only scribe. v1.5.25 also finds more total sigils per session (107 vs 89) because it moves through rooms faster.

  2. Mishap rate is the dominant difference โ€” 50.7% vs 33.9% per sigil, 4.6% vs 3.2% per iter. The v1.2.0 risk-based selection exposes sigils to more danger: picking low-difficulty actions early wastes iterations without reducing danger, then switching to high-difficulty late increases exposure at peak danger. v1.5.25's always-highest strategy is more efficient.

  3. v1.2.0 gets higher per-action gain (8.4 vs 7.1) but wastes more on refreshes (13.9% vs 8.9%). The risk-based selection picks high-difficulty near target (gain 13+) but low-difficulty far from target (gain 2-5), producing more refreshes when low-risk actions don't yield viable follow-ups. v1.5.25's constant difficulty preference is more consistent with fewer wasted iterations.

  4. Resource exhaustion check validates EXP-9. v1.2.0 uses the original 2.25 coefficient โ€” only 2 exits (0.5%). v1.5.25's 1.75 coefficient catches 27 sigils (10.6%) that would burn resources without reaching target.

  5. C1 audit: The only SCRIBED result found in the v1.2.0 directory was a C1 fake from an old file (Saelia, Feb 1). Our 11 new sessions produced 0 real scribes.

Change Validation Summary

Every change from v1.2.0 to v1.5.25 is empirically validated:

Change Version Mechanism Measured Effect
Difficulty fix v1.5.10 Correct formidable ranking Eliminates wrong-action near target
ACTION verb filter v1.5.10 Skip zero-gain verb -120 wasted iters/10 sessions
Skip <13 v1.5.12 Filter low-value sigils +0.7pp 80+ rate, +301 sigils found
Difficulty-first selection v1.5.13 Always pick highest gain -5.0pp refresh rate, +5.6pp productivity
Resource coeff 2.25โ†’1.75 v1.5.15 Earlier bail-out on hopeless Catches 10.6% vs 0.5% resource exits
Iteration cap v1.5.17 Limit mishap exposure -16.8pp mishap rate
C1 fix v1.5.20 Accurate scribe classification 44% of old SCRIBEDs were fakes
Cost equalization v1.5.24 Correct resource mapping Neutral (labels != amount)
Resource tiebreaker v1.5.25 Preserve scarce resources Directional mishap improvement

Verdict: The original algorithm works harder but less efficiently. v1.5.25 works smarter โ€” fewer sigils attempted, but each one has better odds, lower mishap exposure, and more accurate instrumentation.

D7 Fix: Repair Logging (v1.5.26) โ€” Infrastructure โ€” PHASE 3 CLOSED

Bug: The DRC.message('Executing aspect repair') log line was gated behind if @debug (line 321). Since production runs don't use debug mode, the analyzer's REPAIR_ACTION pattern (line 171 of sigilharvest_analyzer.rb) never matched anything. Result: repair_count was always 0, making Phase 3 repair analysis impossible.

Fix: Removed if @debug from the repair log message. One-line change. The analyzer already has the detection code โ€” it just never found the pattern in non-debug logs.

v1.5.26 Results (11 sessions, 255 worked sigils, 2606 iterations):

Metric v1.5.26 v1.5.25 (baseline) Delta
Worked 255 254 +1
Scribed 1 (Fidon, 3 scrolls, prec=93) 1 (Byd, 4 scrolls, prec=87) โ€”
Mishap/sigil 42.0% 33.9% +8.1pp (p=0.06, n.s.)
Mishap/iter 4.1% 3.2% +0.9pp
Productivity 51.1% 50.7% +0.4pp
Avg gain 7.18 7.09 +0.09
Reached 80+ 2.0% 2.4% -0.4pp
Repairs 0 0 0

No algorithm change. Mishap uptick is run-to-run variance (z=1.88, p=0.06). 3 combat_distracted exits (new stop reason, enemies in sigil rooms).

D7 Validation: Repairs are non-existent. Zero repairs in 255 worked sigils (2606 iterations). Grep confirms zero "Executing aspect repair" messages across all v1.5.26 logs.

Why repairs don't trigger: The repair path requires !sigil_action.key?("difficulty") โ€” meaning no precision action is available. With difficulty-first selection (v1.5.20+), the algorithm virtually always finds a viable precision action. The 224 refreshes (8.6%) happen at the execution level (game RNG), not the selection level.

Historical comparison: v1.5.17 (risk-based selection, debug mode) triggered 5 repairs across ~300 worked sigils (~1.7% of sigils). All occurred at high precision (75-81) in late iterations (10-13). Results:

  • 2/5 succeeded: recovered a resource, no precision change
  • 3/5 caused mishaps: sigil destroyed (60% mishap rate on repairs)

Phase 3: CLOSED. Repairs are a non-factor with the current algorithm. They don't happen, and when they did (v1.5.17), they were actively harmful (60% mishap rate). No experiment needed.

Phase 4: CLOSED. The repair difficulty filter (requires difficulty >= 3) is moot โ€” loosening it would allow more repairs, but repairs themselves are counterproductive.

Gain Optimization Analysis (Post-Phase 3)

With repairs closed, the next question: where do we get precision gains? Monte Carlo simulation (100k sigils per scenario) comparing gain and mishap levers.

Why v1.2.0 has higher avg gain (8.39 vs 7.18):

The gain-per-difficulty-level is identical between versions (trivial=2-3, difficult=13-14). The difference is in how often each difficulty level is selected:

Gain range (est. difficulty) v1.2.0 v1.5.26 Delta
1-3 (trivial) 4.5% 25.8% +21.3pp
4-5 (straightforward) 25.2% 19.3% -5.9pp
6-8 (formidable) 26.2% 19.1% -7.1pp
9-11 (challenging) 18.5% 13.7% -4.8pp
12-16 (difficult) 25.6% 22.1% -3.5pp

v1.5.26 produces 5.7x more trivial-range gains. Both algorithms pick "highest difficulty available," but v1.5.26's skip<13 threshold works more low-starting-precision sigils where the game may offer weaker action menus. The effective gain per iteration (avg gain ร— productivity) is similar: v1.2.0 = 3.78, v1.5.26 = 3.67. v1.2.0's higher per-action gain is partially offset by lower productivity (45.1% vs 51.1%).

Gain optimization โ€” scribe rate by avg gain:

Scenario Scribe% >=80% Multiplier
Current v1.5.26 (7.2) 5.9% 8.6% 1.0x
+0.5 gain (7.7) 8.4% 11.8% 1.4x
+1.0 gain (8.2) 11.9% 15.9% 2.0x
v1.2.0 actual gains (8.4) 12.4% 16.9% 2.1x
+1.5 gain (8.7) 15.0% 19.4% 2.6x
+2.0 gain (9.2) 18.7% 23.5% 3.2x

Each +1.0 avg gain โ†’ ~2.0x scribe rate improvement.

Mishap reduction โ€” scribe rate by mishap rate:

Scenario Scribe% >=80% Multiplier
Current (4.1%/iter) 6.1% 8.9% 1.0x
-25% mishaps (3.1%/iter) 7.0% 10.0% 1.15x
-50% mishaps (2.1%/iter) 7.8% 11.2% 1.28x
-75% mishaps (1.0%/iter) 9.0% 12.9% 1.48x
No mishaps (0%/iter) 10.3% 14.6% 1.69x

Even eliminating ALL mishaps gives only 1.69x. Halving mishaps gives 1.28x.

Combined analysis:

Scenario Scribe% Multiplier
Baseline 5.8% 1.0x
Gain +1.0 alone 11.7% 2.0x
Mishap -50% alone 7.8% 1.3x
Both: gain+1.0 & mishap-50% 15.3% 2.7x (super-additive)
v1.2.0 gains & no mishaps 20.8% 3.6x

Conclusions:

  • Gain optimization is ~3.0x more impactful than mishap reduction
  • They stack super-additively (combined 2.7x vs additive 2.4x)
  • Priority: gain optimization first, mishap reduction second
  • Recovering v1.2.0's gain level (+1.2) without its mishap penalty is the ideal target
  • The mishap rate difference between v1.2.0 and v1.5.26 is small (4.6% vs 4.1%/iter) โ€” the gain gap is not caused by risk tolerance, it's caused by action menu composition

EXP-18: Minimum difficulty threshold (v1.5.27) โ€” Complete, KEPT

  • Hypothesis: Trivial-difficulty (1) precision actions produce avg gain of 2.3, far below the 6.8-13.3 for formidable-difficult. v1.5.26 gets 25.8% trivial-range gains vs v1.2.0's 4.5%. Skipping trivial actions and refreshing for a better menu is +EV: the probability of getting a non-trivial action next iteration is ~74%, and the expected gain from that (74% * 8.0 = 5.9) greatly exceeds the trivial gain (2.3).
  • Change: Add return false if difficulty < 2 at the top of precision_action_viable? (line 693). When the highest-difficulty action in the menu is trivial, the algorithm will refresh (analyze the sigil) instead of taking the trivial action.
  • Sessions: 11 (all characters, Shard, permutation, target=90, 60min)
  • Logs: ~/SH_logs/v1.5.27/
  • Baseline: v1.5.26 (11 sessions)

Results (11 sessions, 253 worked sigils, 2686 iterations):

Metric v1.5.27 v1.5.26 Delta v1.2.0
Worked 253 255 -2 414
Avg gain 8.54 7.18 +1.36 8.39
Productivity 43.9% 51.1% -7.2pp 45.1%
Effective gain/iter 3.75 3.67 +0.08 3.78
Mishap/sigil 38.7% 42.0% -3.3pp (n.s. p=0.46) 50.7%
Mishap/iter 3.65% 4.11% -0.46pp 4.64%
Reached 60+ 32.0% 25.1% +6.9pp 33.3%
Reached 70+ 10.7% 7.8% +2.9pp 10.6%
Reached 80+ 2.0% 2.0% 0.0pp 1.7%
Resource exhausted 5.9% 11.8% -5.9pp 0.5%
Refresh rate 17.2% 8.6% +8.6pp 13.9%
Repairs 3 0 +3 2
Scribed 1 (Barrask, 4 scrolls, prec=92) 1 (Fidon, 3 scrolls, prec=93) โ€” 0

Gain distribution by range:

Gain range v1.5.27 v1.5.26 v1.2.0
Trivial (1-3) 3.6% 25.8% 4.5%
Straightfwd (4-5) 24.9% 19.3% 25.2%
Formidable (6-8) 24.9% 19.1% 26.2%
Challenging (9-11) 20.0% 13.7% 18.5%
Difficult (12+) 26.6% 22.1% 25.6%

Stop reasons:

Reason v1.5.27 v1.5.26 Delta
moves_exhausted 45.8% 39.2% +6.6pp
mishap 38.7% 42.0% -3.3pp
sigil_vanished 9.1% 5.5% +3.6pp
resource_exhausted 5.9% 11.8% -5.9pp
scribed 0.4% 0.4% 0.0pp

Analysis:

  1. Gain distribution transformed: Trivial-range gains dropped from 25.8% to 3.6%, almost exactly matching v1.2.0's 4.5%. All other brackets rebalanced proportionally. The v1.5.27 gain distribution is now essentially identical to v1.2.0's.

  2. Avg gain exceeded v1.2.0: 8.54 vs 8.39. The trivial filter plus difficulty-first selection produces slightly higher gains than v1.2.0's risk-based selection because difficulty-first more reliably picks the highest-difficulty action when one is available.

  3. Effective gain/iter improved: Despite 7.2pp lower productivity (more refreshes), the +1.36 avg gain more than compensates. Effective gain: 3.75 vs 3.67 (+0.08). Now nearly matches v1.2.0's 3.78.

  4. Resource exhaustion halved: 11.8% โ†’ 5.9%. Refreshes cost 0 resources, so more refreshing = less resource drain per iteration. This is a significant structural improvement โ€” fewer sigils bail out due to resource depletion.

  5. 60+ rate +6.9pp: More sigils reaching high-precision tiers (32.0% vs 25.1%). Now matches v1.2.0's 33.3%.

  6. 3 repairs detected: The trivial filter creates conditions where no precision action passes viability, allowing the repair path to trigger. First repairs in v1.5.20+. Confirms D7 fix is working and repairs can happen when the filter is stricter.

  7. Min gain = 3: Confirms trivial-difficulty (gain 2-3) actions are being filtered. The remaining gain=3 entries are likely straightforward actions rolling low.

  8. moves_exhausted +6.6pp: More sigils reaching the move budget limit at higher avg precision (55.6 vs 53.3). These are sigils that got further but couldn't finish.

Verdict: KEEP. Most measurably effective change since EXP-6 (difficulty fix). Achieved exactly what the gain optimization analysis predicted: recovered v1.2.0's gain level while maintaining lower mishap rate. The gain distribution is now structurally optimal.

Killed / No Experiment Needed

Hypothesis Why killed SIM
Skip <15 for target 90 (A3) ALL 3 scribes start at 13; any raise above <13 loses all scribes SIM-7
Repair proximity threshold (A6) 0 sigils exhaust resources near target; threshold never matters SIM-8
Resource fungibility test (A1) Even depletion confirmed (4% imbalanced); sum-all is valid SIM-3
Resource consumption by difficulty (D3) Flat ~2.1-3.0 stars/iter; no variation to exploit SIM-2
Refresh cost experiment (D4) 0 resources, +0.31 danger/iter; already minimize refreshes SIM-6

Ideas evaluated and killed (deep exploration pass #2, v1.5.2-v1.5.9)

The following ideas were simulated against 4097 worked sigils / 40018 iterations and found to be neutral or harmful:

  • Scribe at target-5 from iteration 8+: Only 1 of 4097 sigils ever peaked at 85+ and then fell below (Jazriel #15, peak=85 at iter 13, mishapped to final=3). The scenario this addresses is vanishingly rare. No expected benefit.
  • Consecutive refresh limit: All thresholds (2-5 max streak) had strongly negative net impact (-8.7 to -26.6). Refresh streaks don't predict failure โ€” they're temporary bad menu RNG, and sigils recover from them.
  • Hard iteration cap reduction: Any cap below 13 costs more 80+ sigils than it creates (net -1.8 to -21.3). Current effective cap of ~14-15 is already optimal. Note (Feb 2026): External feedback indicates the game has no hard iteration cap. EXP-13 tests REMOVING the cap entirely (the opposite direction). This killed idea tested LOWERING the cap โ€” still correct that lower caps are harmful.
  • Single-resource floor bail-out: Catastrophically negative at all thresholds (net -36 to -46). Individual resource depletion doesn't predict failure โ€” the game uses different resources for different actions.
  • Quality actions as refresh fallback (from pass #1): Quality actions cost resources but give zero precision gain; refreshes cost nothing. Strictly worse.
  • Total resource bail-out threshold (from pass #1): 60% of affected sigils still improve after hitting low resources. Hard cutoff harms more sigils than it helps.

Technique Test: Awakened Sigil Comprehension (v1.5.17) โ€” CONFIRMED ACTIVE

  • Background: Awakened requires Illuminated as prerequisite. Wiki says all technique bonuses are globally disabled; Illuminated confirmed no effect in v1.5.9. Tested last to keep a clean isolation โ€” the per-difficulty median gain analysis (consistent across all v1.5.2โ€“v1.5.9 data) provides a technique-sensitive metric unaffected by algorithm changes.
  • Wiki description: "Enchanters will find Awakened Sigil Comprehension allows the scribing of many more sigils from a single perception, vastly simplifying the harvesting process." This implies the technique increases the number of SCROLLS producible per scribed sigil, not precision gains.
  • Change: Version tick only (v1.5.15 โ†’ v1.5.17). No algorithm change. All characters trained Awakened Sigil Comprehension before running.
  • Baseline: v1.5.15 (same algorithm, without Awakened)
  • Depends on: All algorithm experiments completed first.
  • Sessions: 22 total (11 characters ร— 2 batches). Batch 1: 11 sessions. Batch 2: 11 sessions. Characters: Barrask, Byd, Christus, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve.
  • Logs: ~/SH_logs/v1.5.17/ (7 merged files for split reconnections + 4 single files)

Raw results (22 sessions combined):

Metric v1.5.15 (10 sess) v1.5.17 (22 sess) Delta
Worked 265 555 +290
result=SCRIBED (reported) 1 3 +2
Real scribes (log audit) 0 1 +1
C1 misclassifications 1 2 +1
>=80 1 14 +13
>=80/session 0.10 0.64 +0.54
Avg precision 51.2 52.8 +1.6
Max precision 85 88 +3
Mishap rate 43.0% 42.0% -1.0pp
resource_exhausted 26 67 +41

SCRIBE COUNT CORRECTION (C1 bug audit): Raw log audit checking for actual "You carefully scribe" game messages reveals the reported scribe counts are inflated by the C1 misclassification bug (line 162):

v1.5.15: 1 reported SCRIBED โ€” Throve #112 (precision 85, mishapped, "Sigil harvesting failed"). 0 real scribes, 1 C1 fake.

v1.5.17: 3 reported SCRIBED:

  • Byd #96 (precision 85, mishapped, "Sigil harvesting failed") โ€” C1 FAKE
  • Fidon #37 (precision 86, "Final precision: 86, scribing", 4 scrolls) โ€” REAL
  • Refia #84 (precision 88, sigil vanished) โ€” C1 FAKE

Corrected: 0 real scribes (v1.5.15) โ†’ 1 real scribe (v1.5.17). The "Scribed: 1 โ†’ 3" delta in the original analysis was entirely a C1 artifact.

Scrolls-per-scribe analysis (the metric the wiki implies Awakened affects):

The game mechanic: after scribing, the game says "Remnants of the sigil pattern linger, allowing for additional scribing" โ€” each "Remnants" message enables one more scribe attempt. The last scribe does NOT produce a "Remnants" message.

Version Character Precision Scrolls Awakened?
v1.5.3 Mahtra 90 2 No
v1.5.7 Kythkani 90 4 No
v1.5.8 Throve 88 4 No
v1.5.9 Barrask 85 3 No
v1.5.9 Gnarta 89 2 No
v1.5.9 Jazriel 88 2 No
v1.5.14 Kythkani 86 3 No
v1.5.17 Fidon 86 4 Yes

Pre-Awakened (7 events): mean 2.86 scrolls, median 3, range 2-4 Post-Awakened (1 event): 4 scrolls

Insufficient data to determine whether Awakened increases scrolls-per-scribe. The post-Awakened sample has N=1, and 4 scrolls already occurred in 3 of 7 pre-Awakened events. More real scribe events are needed to measure this.

Batch consistency:

Metric Batch 1 (11 sess) Batch 2 (11 sess)
Worked 301 254
>=80 6 (2.0%) 8 (3.1%)
>=80/session 0.55 0.73

Statistical significance (>=80 metric โ€” NOT affected by C1 correction):

  • Baseline >=80 rate: 1/265 = 0.38%
  • Test >=80 rate: 14/555 = 2.52% (6.7x improvement)
  • Two-proportion Z-test: Z = 2.14 (p โ‰ˆ 0.016, one-tailed)
  • Poisson model (treating baseline rate as known): Z โ‰ˆ 8.2 โ€” but this overstates confidence by not accounting for uncertainty in the baseline rate estimate
  • Both batches individually above baseline; batch 2 slightly stronger
  • The 14 >=80 sigils include the 2 C1 fakes (precision 85, 88) โ€” they DID reach >=80 precision, they just didn't successfully scribe. The metric is valid.
  • Caveat: The improvement is statistically significant but the mechanism is unknown. Gain distributions, starting precisions, iteration counts, and work rates are all identical between v1.5.15 and v1.5.17. See "What Awakened actually does" section below.

Decision: KEEP Awakened technique on all characters โ€” but mechanism is UNCERTAIN.

Despite no algorithm changes between v1.5.15 and v1.5.17, the >=80 rate improved from 0.38% to 2.52%. However, the wiki describes Awakened as a scrolls-per-scribe effect ("allows scribing of many more sigils from a single perception"), which does NOT predict precision improvement. Detailed comparison shows gain distributions, starting precisions, iteration counts, and work rates are all identical. The mechanism of the >=80 improvement is unexplained โ€” it could be from Awakened (undocumented effect), or an uncontrolled confound (different test dates, server conditions). The two-proportion Z-test gives Z=2.14 (p~0.016), significant but not overwhelming. All future experiments should continue with Awakened trained (no downside risk).

What Awakened actually does โ€” mechanism unknown:

The wiki says: "allows scribing of many more sigils from a single perception" โ€” this describes a scrolls-per-scribe effect (more copies from each scribed sigil), NOT a precision improvement. Yet we observe more sigils reaching 80+ with Awakened trained.

Detailed mechanism analysis (comparing v1.5.15 vs v1.5.17 directly):

  • Gain-per-action distribution: IDENTICAL. v1.5.15 avg=8.2, v1.5.17 avg=8.4. Same bimodal shape (peaks at 2-3 and 13-15). Awakened does NOT boost gain per action.
  • Starting precision distribution: IDENTICAL. Both avg=14.0, median=14. Same proportions across buckets. Awakened does NOT change starting positions.
  • Iteration counts: IDENTICAL. Both avg ~10.4-10.5, median 11, same distribution. Awakened does NOT grant more iterations.
  • Work rate: IDENTICAL. 20.5% vs 19.8%. Same skip threshold, same behavior.
  • Mishap rate by bracket: Similar overall (~42%), but v1.5.17 has slightly HIGHER mishap rates at 60-79 precision (51-53% vs 39-44%). Not a protective effect.
  • >=80 rate: 0.38% โ†’ 2.52%. This is the ONLY metric that differs.

The attribution to Awakened was based on process-of-elimination reasoning: "no algorithm changed between v1.5.15 and v1.5.17, so the improvement must be from training Awakened." However, the wiki description does not predict this effect, and no per-action metric shows any change. Possible explanations:

  1. Awakened has an undocumented effect we can't measure at the per-action level
  2. Confound: sessions were run on different dates โ€” server conditions, seasonal effects, or undocumented game patches could contribute
  3. Statistical power: the two-proportion Z-test gives Z=2.14 (pโ‰ˆ0.016), significant but not overwhelming. The earlier Z=8.5 used a Poisson model that may overstate confidence.

Status: KEEP Awakened trained (no downside). Attribution UNCERTAIN โ€” correlation observed but mechanism unexplained by wiki description or per-action data. Continue collecting scribe data to test the wiki's scrolls-per-scribe claim.

Experiment Log Template

When running a new experiment, record results here:

#### EXP-N: <name> (v<version>)
- Sessions: <count> (<list of characters>)
- Logs: ~/SH_logs/v<version>/

| Metric | Baseline | This exp | Delta |
|--------|----------|----------|-------|
| Worked |          |          |       |
| Avg precision |   |          |       |
| Sigils >= 80 |    |          |       |
| Mishap rate |     |          |       |
| Min per 80+ |     |          |       |

- **Verdict**: KEEP / REVERT
- **Action**: <what was done>
โš ๏ธ **GitHub.com Fallback** โš ๏ธ