Sigil Harvesting minigame testing - elanthia-online/dr-scripts GitHub Wiki
Development guide for
sigilharvest.lic(v2.0.0) Tests:spec/sigilharvest_spec.rb(129 examples) Analyzer:sigilharvest_analyzer.rb
| File | Purpose |
|---|---|
sigilharvest.lic |
Script source, v2.0.0 |
sigilharvest_analyzer.rb |
Log analyzer (parses banners + per-sigil summaries) |
spec/sigilharvest_spec.rb |
RSpec test suite (129 examples, current with v2.0.0) |
session_splitter.rb |
Log session extractor (handles split sessions across files) |
spec/spec_helper.rb |
Shared test mocks (Lich, XMLData, Script, etc.) |
data/sigils.yaml |
Room lists by [City][SigilType][Season]
|
# Run tests
bundle exec rspec spec/sigilharvest_spec.rb
# Lint after changes
rubocop sigilharvest.lic
rubocop spec/sigilharvest_spec.rb
# Extract sessions from game logs (handles split sessions)
ruby session_splitter.rb --version X.Y.Z path/to/*.log
ruby session_splitter.rb --dry-run --version X.Y.Z path/to/*.logAlways bump VERSION in sigilharvest.lic:6 when changing the script.
Current: VERSION = '2.0.0'
;sigilharvest <city> <sigil> <precision> [minutes] [debug]
;sigilharvest Shard permutation 90 60 debug
Single class. initialize (line 8) sets up all state and runs the main loop.
Tests bypass initialize entirely using SigilHarvest.allocate + instance_variable_set.
| Method | Line | Visibility | Purpose |
|---|---|---|---|
initialize |
8 | public | Setup state, parse args, start main loop |
find_sigils(city, sigil) |
89 | public | Outer loop: iterate rooms, call harvest_sigil
|
harvest_sigil(sigil) |
114 | public | Find sigil in room, run improvement loop |
check_sigil(sigil) |
162 | public | Verify sigil type matches target |
improve_sigil(precision) |
178 | public | Core algorithm: one iteration of action selection |
sigil_info(command) |
296 | public | Send perc sigil <cmd>, parse response via serial waitfor
|
scribe_sigils |
371 | public | Scribe sigil onto scroll, manage inventory |
get_season |
392 | public | Query game for current season |
get_techniques |
397 | private | Detect active harvesting techniques |
get_scrolls |
416 | public | Buy blank scrolls if below stock level |
log_startup_banner |
457 | private | Print session config + techniques |
log_sigil_summary(sigil, result) |
475 | private | Log per-sigil result line (with total elapsed) |
log_exit_summary |
484 | private | Print session statistics |
format_techniques(techniques) |
513 | private | Format technique array for display |
elapsed_minutes |
519 | private | Minutes since session start |
sigil_elapsed_minutes |
523 | private | Minutes since current sigil started |
time_expired? |
527 | private | True when elapsed >= time limit |
contest_stat_for(resource) |
532 | private | Map resource name to level ivar |
precision_action_viable?(action, contest_stat, precision) |
544 | private | Viability gate for precision actions |
select_repair_action(action, ...) |
559 | private | Check if action qualifies as resource repair; yields if yes |
initialize
โ get_techniques() # detect Inspired/Enlightened
โ log_startup_banner()
โ find_sigils(city, sigil)
โ harvest_sigil(sigil)
โ sigil_info('improve') # first call: parse initial menu
โ improve_sigil(precision) # loop: returns true to continue, false to stop
โ precision_action_viable?()
โ select_repair_action()
โ sigil_info(verb) # execute chosen action
โ scribe_sigils() # if target reached
โ log_sigil_summary(sigil, result)
โ log_exit_summary()
| Variable | Type | Default | Set By | Purpose |
|---|---|---|---|---|
@sigil_precision |
Integer | 0 | sigil_info |
Current sigil precision (0-100) |
@sigil_clarity |
Integer | 0 | sigil_info |
Current sigil clarity (0-100) |
@danger_lvl |
Integer | 0 | sigil_info |
Danger meter (0-20 stars) |
@sanity_lvl |
Integer | 15 | sigil_info |
Sanity resource (0-20 stars) |
@resolve_lvl |
Integer | 15 | sigil_info |
Resolve resource (0-20 stars) |
@focus_lvl |
Integer | 15 | sigil_info |
Focus resource (0-20 stars) |
@num_iterations |
Integer | 0 | harvest_sigil |
Iterations used this sigil (cap: 15) |
@num_aspect_repairs |
Integer | 0 | improve_sigil |
Repair actions taken this sigil |
@sigil_improvement |
Array[Hash] | [] | sigil_info |
Current action menu (3-8 actions) |
@sigil_count |
Integer | 0 | harvest_sigil |
Total sigils encountered |
@sigil_results |
Array[Hash] | [] | log_sigil_summary |
Per-sigil outcome records |
@scribed_in_session |
Boolean | false | find_sigils |
True after first successful scribe |
@rooms_visited |
Integer | 0 | find_sigils |
Rooms traversed |
@start_time |
Time | Time.now | initialize |
Session start |
@sigil_start_time |
Time | Time.now | harvest_sigil |
Current sigil start |
@time_limit |
Integer | 30 | initialize |
Minutes before auto-stop |
Each entry in @sigil_improvement is a Hash:
{
"difficulty" => Integer, # 1-5 (trivial..formidable)
"resource" => String, # "sanity" | "resolve" | "focus"
"impact" => Integer, # 1-3 (taxing..destroying)
"verb" => String, # game verb to execute (e.g., "analyze", "study")
"target" => String, # "sigil" (improve) | "your" (repair)
"aspect" => String, # "precision" | "quality" | resource name (repair)
"risk" => Integer # difficulty + impact
}Called once per iteration. Returns true to continue loop, false to stop.
v1.5.0 algorithm = v1.2.0 algorithm (reverted from v1.4.1, see ยง11 for why). Key differences from v1.4.1: no resource floor check, no high_target mode, allows formidable actions, risk-based action selection, no move budget check.
improve_sigil(precision)
โ
โโ Phase 1: Pre-scan (lines 187-203)
โ Scan @sigil_improvement for precision actions with tight margin (stat - diff < 2)
โ and difficulty >= 3. Store as best_repair_aspect / second_best_repair_aspect.
โ Purpose: identify which resource to repair proactively.
โ
โโ Phase 2: Select action (lines 205-244)
โ For each action in @sigil_improvement:
โ โ
โ โโ Precision action? (aspect == "precision")
โ โ โโ precision_action_viable?() โ check margin, accept formidable
โ โ โโ Selection priority (lines 217-229):
โ โ โโ First viable action โ stored unconditionally
โ โ โโ precision < (target - 20) โ prefer LOWEST risk (far from goal)
โ โ โโ precision >= (target - 20) โ prefer HIGHEST risk (close to goal)
โ โ
โ โโ Repair candidate? โ select_repair_action() (lines 234-243)
โ
โโ Phase 3: Bail-out checks (lines 248-288)
โ โโ Scribe check (line 251): precision >= target, OR near cap + within 5
โ โโ Iteration cap (line 261): iterations >= 15
โ โโ Resource exhaustion (line 270): (san+res+foc)*2.25 + prec < target-5
โ โโ Move budget (line 283): (14-iters)*15 < (target-prec-5), when prec <= 80
โ
โโ Phase 4: Execute or refresh (lines 277-291)
โ โโ Repair override (lines 278-284): use repair if no precision action found
โ โโ Execute action or refresh (lines 287-291)
โ
โโ return true (continue loop)
Given two precision actions that both pass viability:
| Condition | Prefers | Rationale |
|---|---|---|
precision < target - 20 |
Lowest risk | Far from goal, conserve resources |
precision >= target - 20 |
Highest risk | Close to goal, sprint to finish |
Note: No danger-based switching. No high_target mode. This simpler strategy outperforms the v1.4.1 approach in production data (see ยง11).
def precision_action_viable?(action, contest_stat, precision)
difficulty = action['difficulty'].to_i
margin = contest_stat - difficulty
# Path 1: comfortable margin (>1) โ accept any difficulty including trivial
return true if margin > 1
# Path 2: any margin (>0) AND challenging+ difficulty
return true if margin > 0 && difficulty > 2
false
endKey properties:
- Allows formidable (difficulty=5) โ unlike v1.4.1 which blocked them.
- Accepts trivial actions (since v1.5.4/EXP-2) โ any action with margin > 1 is viable.
- Path 2 still requires challenging+ for tight margins.
-
precisionparameter retained for API stability but not currently used by either path.
| Check | Formula | Applies When |
|---|---|---|
| Scribe |
precision >= target OR (iters >= 15 OR (iters == 14 AND no action)) AND precision >= target - 5
|
Always |
| Iteration cap | iterations >= 15 |
Always |
| Resource exhaustion | (san + res + foc) * 2.25 + precision < target - 5 |
precision <= 80 |
| Move budget | (14 - iterations) * 13 < (target - precision - 5) |
precision <= 80 |
Move budget was tested for removal in v1.5.0 โ results were worse. Tightened from * 15 to * 13 in v1.5.8/EXP-5 (kept). Original removal results: (mishap rate 49.9% โ 74.8%,
min per 80+ sigil 58 โ 114). The check was restored in v1.5.2. It acts as protective early
bail-out: sigils falling behind pace are cut loose before they accumulate danger and mishap.
See EXP-1 in ยง13 for full data.
def select_repair_action(action, contest_stat, precision, repair_target, current_repair)
return unless action['difficulty'].to_i <= 3
return unless repair_target.key?("difficulty")
return unless (contest_stat - action['difficulty'].to_i) >= 2
return unless @sigil_precision >= (precision - 15)
return unless action['aspect'] == repair_target['resource']
# ...
endRepair is only considered when:
- Action difficulty <= 3 (not difficult/formidable)
- A repair target was identified in Phase 1
- Comfortable margin (>= 2)
- Close to target (within 15)
- Action's aspect matches the repair target's resource
-
perc sigilโ search for sigils (repeats until found) -
perc sigil improveโ begin improvement / reroll action menu -
perc sigil <VERB>โ execute chosen action -
scribe sigilโ scribe when target precision reached
| Resource | Start | Direction | Role |
|---|---|---|---|
| Danger | 0 | Increases | Mishap probability. Rises ~1-2 per 3 iterations |
| Sanity | 15 | Decreases | Consumed by sanity-cost actions |
| Resolve | 15 | Decreases | Consumed by resolve-cost actions |
| Focus | 15 | Decreases | Consumed by focus-cost actions |
Resources are parsed by counting * in game output: "***-----" โ 3
| Property | Values | Parsed From |
|---|---|---|
| Difficulty | trivial(1), straightforward(2), challenging(3), difficult(4), formidable(5) | 1st word |
| Resource | sanity, resolve, focus | 2nd word |
| Impact | taxing(1), disrupting(2), destroying(3) | 3rd word |
| Verb | game-specific (e.g., FORM, METHOD, STUDY) | 4th word |
| Target | "your" (repair) or "sigil" (improve) | 5th word |
| Aspect | precision, quality, or resource name | 6th word |
Risk = difficulty + impact. Impact also equals resource drain in stars.
| Difficulty | Avg Gain | Min | Max | Zero-Gain Rate |
|---|---|---|---|---|
| trivial(1) | 2.0 | 0 | 3 | 15.4% |
| straightforward(2) | 3.9 | 0 | 6 | 13.5% |
| challenging(3) | 9.6 | 0 | 14 | 3.2% |
| difficult(4) | 13.2 | 11 | 16 | 0.0% |
| formidable(5) | N/A | N/A | N/A | (never taken) |
Critical finding: Gains are constant regardless of current precision level. A difficult action at precision 10 gains the same ~13 as at precision 50.
Mishaps end improvement prematurely. They are stochastic โ they occur at all danger levels (observed at danger 0 through 11). Danger increases probability but does not guarantee safety at any level.
| Type | Pattern | Effect |
|---|---|---|
| Stumble | "About the area you wander" | Sigil lost |
| Lose Track | "You lose track of your surroundings" | Sigil lost |
| Sneeze | "A sudden sneeze" | Sigil lost |
| Chills | "Chills creep down your spine" | Sigil lost |
| Resource Collapse | "Your resolve/sanity/focus collapses" | Sigil lost + stun |
| Other-planar | "rouse the attention of some other-planar entity" | Action fails, 0 gain (non-terminal) |
| Vanished | "The sigil has vanished" | Sigil despawned (non-algorithmic) |
| Combat | "You are too distracted" | Combat interrupted session |
Resource collapse is a distinct failure mode โ it happens when an individual resource is critically depleted, even at danger 0.
Sigils spawn with random starting precision (observed range: 1-15, roughly uniform).
Skip filters: target >= 80 โ skip if prec < 10; target >= 65 โ skip if prec < 5.
Hard cap: 15 iterations. Each action OR refresh costs 1 iteration. Typical productive path: ~9-10 actions + ~5-6 refreshes.
Starting at precision 12 (typical when filtering >= 10), need ~78 gain. At 13.2 avg per difficult action: 6 difficult actions required. With refreshes, that's ~12 iterations. Achievable but requires favorable RNG.
Iter 1: improve (reroll) โ get menu with difficult action
Iter 2: difficult action โ precision=25 (+13)
Iter 3: improve (reroll)
Iter 4: difficult action โ precision=38 (+13)
Iter 5: repair (restore resource)
Iter 6: difficult action โ precision=51 (+13)
Iter 7: improve (reroll)
Iter 8: difficult action โ precision=64 (+13)
Iter 9: repair
Iter 10: difficult action โ precision=77 (+13)
Iter 11: improve (reroll)
Iter 12: difficult action โ precision=90 (+13) โ SCRIBE
90+ is a viable target. The math works โ 6 difficult actions fit within the 15-iteration cap with room for refreshes and repairs. Success rate is governed by game randomness (action menu RNG, mishap rolls), not by an algorithmic ceiling. The script's job is to maximize the probability by making optimal decisions with whatever actions the game offers.
Problem: The viability filter rejects actions too aggressively, causing refreshes that yield zero precision. A trivial action (+2) is always better than a refresh (+0).
Location: precision_action_viable? (line 544), and the fallback chain.
Improvement: Consider loosening the viability filter for low-difficulty actions.
Problem: When only trivial/straightforward precision actions are available, the script takes them rather than repairing a resource to enable a difficult action next turn. +2 now vs enabling +13 later is always worse.
Improvement: Compare expected value: selected_action.difficulty * ~3.3 vs
best_repair_aspect.difficulty * ~3.3 (next turn). If repair enables a substantially
better action, prefer repair.
Problem: The script checks danger level but not the minimum across individual resources. Observed: Sigil #84 collapsed at danger=0 because one resource was critically depleted.
Improvement: Add a guard: if [sanity, resolve, focus].min <= 1 โ bail out or
avoid actions consuming the depleted resource.
Problem: Mishaps occurred at danger 0-11 (not just high danger). Conservative play at high danger doesn't reliably prevent mishaps.
Data: Mishaps observed at danger 0-3 (3 events), 5 (3), 7 (3), 11 (3).
Improvement: Consider whether conservative/aggressive modes should be driven by remaining iterations and distance to target rather than danger level alone.
The starting-precision skip filter (< 10 for target >= 80) is mathematically correct โ
a sigil starting at precision 3 cannot reach 90 in 15 iterations. The 61% skip rate is
an inherent property of the game's random starting precision distribution, not an
algorithm deficiency. The script correctly discards unwinnable sigils early.
Tests bypass initialize (which calls game APIs) using allocate:
obj = SigilHarvest.allocate
obj.instance_variable_set(:@sigil_precision, 50)
obj.instance_variable_set(:@danger_lvl, 5)
# ... set all required ivarsThe helper build_sigilharvest (spec line 241) handles all default ivars.
Override only what your test needs:
let(:obj) { build_sigilharvest }
before do
allow(obj).to receive(:sigil_info).and_return(false)
allow(obj).to receive(:scribe_sigils)
end
it 'does something' do
obj.instance_variable_set(:@sigil_precision, 80)
obj.instance_variable_set(:@danger_lvl, 7)
obj.instance_variable_set(:@sigil_improvement, [action1, action2])
obj.send(:improve_sigil, 90)
expect(obj).to have_received(:sigil_info).with('analyze')
endThe helper sets resources to 5 (not the game's starting 15):
obj.instance_variable_set(:@sanity_lvl, 5)
obj.instance_variable_set(:@resolve_lvl, 5)
obj.instance_variable_set(:@focus_lvl, 5)This is low. Always set explicit resource levels in tests that involve action selection.
action = build_improvement(
"difficulty" => 4,
"resource" => "sanity",
"impact" => 2,
"verb" => "analyze",
"target" => "sigil",
"aspect" => "precision",
"risk" => 6
)Defaults: difficulty=3, resource="sanity", impact=2, verb="analyze", target="sigil", aspect="precision", risk=5.
Tests define these mock modules at top level (not via spec_helper):
| Module | Mocks | Key Global |
|---|---|---|
DRC |
message, bput
|
$sigil_messages, $sigil_bput_log/responses
|
DRCA |
do_buffs |
$sigil_actions |
DRCI |
stow_hands, get_item?, stow_item?, count_item_parts
|
$sigil_actions, $sigil_scroll_count
|
DRCC |
get_crafting_item, stow_crafting_item
|
$sigil_actions |
DRCT |
walk_to, order_item
|
$sigil_walks, $sigil_actions
|
DRCM |
ensure_copper_on_hand |
$sigil_actions |
DRStats |
trader?, circle
|
Internal @trader, @circle
|
Flags |
add, delete, reset, [], []=
|
Internal @flags
|
Room |
current, []
|
Internal @current_id
|
reset_test_state! (spec line 301) clears all globals before each test.
source = File.read(SIGILHARVEST_SOURCE_PATH)
source = source.sub(/\A=begin.*?=end\s*/m, '') # strip doc block
source = source.sub(/^before_dying do.*?end\s*SigilHarvest\.new\s*\z/m, '') # strip entry point
eval(source, TOPLEVEL_BINDING, SIGILHARVEST_SOURCE_PATH, 1)The =begin/=end block and the final SigilHarvest.new + before_dying are
stripped so the class is loaded without executing.
Current tests (151) are written for v1.4.1 and will fail against v1.5.0.
v1.4.1-specific behaviors tested include: formidable blocking, resource floor check,
high_target mode, danger_threshold switching, scribe at iteration 8, skip threshold 13,
batch get capture. All of these were removed/reverted in v1.5.0. Tests need rewriting
to match the v1.2.0 algorithm.
Parses structured log output from sigilharvest sessions. Works with v1.2.0+ log format.
Key data structures:
-
SessionInfoโ metadata from the startup banner (version, city, sigil, precision target, techniques) -
SigilRunโ per-sigil outcome record with fields:-
number,sigil_type,result,target_precision,final_precision -
starting_precision,iterations,final_danger,room,elapsed_minutes -
precision_history,actions_taken,refresh_count,repair_count -
failed_action_count,mishap_type,stop_reason,danger_history -
resource_snapshots,session_index,session_elapsed_minutes
-
Log format parsed:
== SigilHarvest v1.5.0 ==
[Sigil #1] type=permutation result=mishap precision=42/90 iterations=8 danger=7 room=3 elapsed=2.1m total=5.3m
== End SigilHarvest v1.5.0 ==
The total= field (session elapsed at sigil completion) is optional for backward compatibility.
Game logs are stored at: /Users/grocha/angua/lich-5-mine/logs/DR-<CharName>/<year>/<month>/
Steps to collect session logs:
-
Identify which characters ran sessions โ the user provides character names, start times, and whether sessions span log file boundaries (log rotation at midnight or size threshold).
-
Handle multi-file sessions โ some sessions span two log files (e.g., started in
2026-02-01-0627.log, continued in2026-02-01-0712.log). These must be concatenated before analysis:cat file1.log file2.log > CharName_HHMM.log -
Copy to version-specific directory โ store extracted logs under
~/SH_logs/<version>/:~/SH_logs/v1.2.0/ โ 10 v1.2.0 session logs ~/SH_logs/v1.4.1/ โ 10 v1.4.1 session logs ~/SH_logs/v1.5.3/ โ 10 v1.5.3 baseline session logs ~/SH_logs/v1.5.4/ โ 9 v1.5.4 EXP-2 session logs ~/SH_logs/v1.5.5/ โ 10 v1.5.5 EXP-3 session logs ~/SH_logs/v1.5.6/ โ 10 v1.5.6 EXP-4 session logs ~/SH_logs/v1.5.7/ โ 10 v1.5.7 baseline restore session logs ~/SH_logs/v1.5.8/ โ 10 v1.5.8 EXP-5 session logs ~/SH_logs/v1.5.9/ โ 10 v1.5.9 Illuminated technique test logs (4 truncated) ~/SH_logs/permutation/ โ 23 v1.3.x baseline logs -
Naming convention:
CharName_HHMM.log(start time of session, 24h format). -
Verify completeness โ each log file should contain both
== SigilHarvest v<X> ==(banner) and== End SigilHarvest v<X> ==(exit summary). If the exit summary is missing, the session was interrupted. -
Run analyzer โ write a Ruby script that iterates over the log directory and calls the analyzer's parse methods. Example pattern:
require_relative 'sigilharvest_analyzer' Dir.glob('/Users/grocha/SH_logs/v1.5.0/*.log').each do |f| analyzer = SigilHarvestAnalyzer.new analyzer.parse_file(f) # ... aggregate results end
Gotchas:
-
findcommand can hang/timeout on large log directory trees. Use explicitlswith known directory paths instead. - Some sessions show techniques as
["Inspired", "and Enlightened"]โ the "and" isn't stripped by the split regex. Cosmetic only; does not affect analysis. - The
session_indexfield tracks which session a run belongs to within a multi-session file. When analyzing across files, track the filename alongside each run for per-character breakdown.
Both groups: Shard / permutation / target=90 / 60 minutes / Inspired+Enlightened techniques. 10 sessions each, across 10 different characters.
| Metric | v1.2.0 (10 sessions) | v1.4.1 (10 sessions) |
|---|---|---|
| Sigils worked | 339 | 194 |
| Avg precision | 54.3 | 47.8 |
| Max precision | 86 | 86 |
| Scribed (>=90) | 1 | 0 |
| Sigils >= 80 | 9 (2.7%) | 2 (1.0%) |
| Mishap rate | 49.9% | 76.8% |
| Avg danger at mishap | 8.2 | 12.1 |
| Minutes per 80+ sigil | 67 | 300 |
v1.2.0 outperforms v1.4.1 on every metric.
-
v1.4.1's filtering over-restricted action selection. The resource floor check, formidable blocking, and high_target mode collectively forced more refreshes and lower precision outcomes. The simpler v1.2.0 selection strategy works better.
-
v1.4.1 pushes danger to ceiling then mishaps. Danger-at-mishap distribution: v1.4.1 clusters at 17-18 (mean 12.1), while v1.2.0 keeps danger distributed (mean 8.2) and reaches higher precision before mishapping.
-
Move budget check prematurely terminated 37.5% of v1.2.0 sigils at avg precision 55.3. These sigils had remaining iterations that could have gained more precision. Removed in v1.5.0.
-
Mishaps are stochastic at all danger levels. Conservative danger-based strategy switching provides less value than expected. Aggressive play that reaches high precision quickly (before mishap occurs) appears more effective.
| Metric | Value |
|---|---|
| Minutes per 80+ sigil | ~241 |
| Compared to v1.2.0 (75 min reported, 67 min measured) | 3.2x regression |
| Version | Algorithm | Key Changes |
|---|---|---|
| v1.2.0 | Original "best" | Risk-based selection, allows formidable, serial waitfor, move budget check |
| v1.3.2 | Modified | Various changes from upstream โ 3.2x regression vs v1.2.0 |
| v1.4.0 | Redesigned | Formidable blocking, resource floor, high_target mode, scribe at iter 8, skip threshold 13, no move budget |
| v1.4.1 | Patched v1.4.0 | Added sigil_vanished/combat_distracted stop reasons, per-sigil elapsed time |
| v1.5.0 | Reverted to v1.2.0 | v1.2.0 algorithm + v1.4.1 logging infrastructure + move budget removed |
| v1.5.1 | Patch | Added validate_tools pre-flight check (burin/bag/settings + inventory) |
| v1.5.2 | Baseline candidate | Restored move budget check (v1.5.0 data proved removal harmful) |
| v1.5.3 | Clean baseline | Version tick only โ separates clean runs from v1.5.2 burin-retry noise |
| v1.5.4 | EXP-2 (kept) | Accept trivial actions (Path 1 change) + burin validate_tools fix |
| v1.5.5 | EXP-3 (reverted) | Prefer repair over trivial/straightforward precision actions |
| v1.5.6 | EXP-4 (reverted) | Composite resource health guard (skip actions on near-depleted resources) |
| v1.5.7 | Baseline restore | EXP-4 reverted, back to v1.5.4 algorithm |
| v1.5.8 | EXP-5 (kept) | Tighten move budget formula (15โ13 precision/move) |
| v1.5.9 | Technique test | Illuminated Sigil Comprehension enabled (no algorithm change) |
| v1.5.10 | EXP-6 | Fix difficulty ordering, filter ACTION verb |
| v1.5.11 | EXP-10+11 (reverted) | Raise skip threshold < 13 + velocity bail-out < 4/iter after 5 |
| v1.5.12 | EXP-10 (kept) | Skip threshold < 13 (standalone retest) |
| v1.5.13 | EXP-7 (kept) | Difficulty-based action selection (decouple risk from cost) |
| v1.5.14 | EXP-12 (reverted) | Loosen viability margin (accept margin=0 for challenging+) |
| v1.5.15 | EXP-9 (kept) | Recalibrate resource exhaustion coefficient (2.25โ1.75) |
| v1.5.16 | EXP-12r (reverted) | Loosen viability margin (retest with corrected baseline) |
| v1.5.17 | TECH-AWK (kept) | Awakened Sigil Comprehension โ >=80 improvement observed, mechanism unknown |
| v1.5.18 | EXP-13 (revert) | Remove iteration cap + move budget; resource-only bail-out; skip <15 for target 90 |
| v1.5.19 | EXP-14 (revert) | Equalize action costs per Urbaj (all cost labels = 1) โ mishap rate +11.4pp vs EXP-13 base |
| v1.5.20 | Baseline restore | Revert to v1.5.17 algorithm + C1 fix (@actually_scribed). Missing EXP-9 resource check. |
| v1.5.21 | Corrected baseline | Restores EXP-9 resource exhaustion check. Baseline confirmed: all metrics match v1.5.17. |
| v1.5.22 | EXP-15 (revert) | Align move budget max with iteration cap (14โ15). Mishap +11.7pp (Z=2.67), 0 scribes, >=80 7โ6. |
| v1.5.23 | EXP-16 (reverted) | Tighten resource exhaustion coefficient (1.75โ1.5). Mathematically impossible at target 90. |
| v1.5.24 | EXP-14r (kept) | Cost equalization clean retest ({1,2,3}โ{1,1,1}). Neutral (mishap -1.6pp, n.s.). |
| v1.5.25 | EXP-17 (kept) | Resource-aware tiebreaker on tied difficulty+cost. Prefer most-available resource. |
| v1.5.26 | D7 fix (infra) | Unconditional repair logging. Repairs=0 in 2606 iters. Phase 3/4 CLOSED. |
| v1.5.27 | EXP-18 (kept) | Min difficulty threshold. Skip trivial, refresh for better menu. Avg gain 7.18โ8.54. |
| v2.0.0 | Release | v1.5.27 promoted to v2.0.0 for upstream PR. 22 experiments validated, 100% decision agreement. |
Common infrastructure from v1.4.1, present in all v1.5.x versions:
- Time-based sessions with
@time_limit,time_expired? - Structured logging:
log_startup_banner,log_sigil_summary,log_exit_summary - Per-sigil timing (
elapsed=andtotal=in summary line) - Technique detection via
get_techniques/format_techniques - Analyzer-compatible output format
- Pre-flight tool validation (
validate_tools)
-
Scribe counting (v1.5.18):
scribe_sigilsnow counts individual scribes and logs"Scribes: N". Analyzer parses this viaSCRIBES_COUNTregex and populatesscribe_countfield onSigilRun. Falls back to counting rawSCRIBE_SUCCESSlines for older logs. Enables per-sigil scribe yield tracking for Awakened analysis. -
Terminology rename (v1.5.18โv1.5.19):
scroll_countโscribe_countacross all files (sigilharvest.lic, sigilharvest_analyzer.rb, reanalyze_all.rb). "Scrolls scribed" โ "Scribes" in log output. Aligns terminology with game mechanics (scribing, not scrolling). - Banner cleanup (v1.5.18): Removed belt line from startup banner.
-
Cross-version analysis (v1.5.17+):
reanalyze_all.rbusesall_session_runs(version)to handle merged log files with multiple sessions of the same version (e.g., batch 1 + batch 2 in v1.5.17). Tracks scribe yield metrics: total scribes, avg scribes/sigil, scribes/session. -
Session splitter (v1.5.18+):
session_splitter.rbextracts SigilHarvest sessions from full game logs. Handles the key problem of sessions split across two log files from the same character (reconnects mid-session). Groups files by character, sorts chronologically, reads as a virtual concatenation so split sessions are seamlessly joined. Usage:ruby session_splitter.rb --version 1.5.18 path1.log path2.log ...Output goes to~/SH_logs/vX.Y.Z/DR-CharName_timestamp.log. Options:--dry-run,--output DIR,--keep-temp,--version VER.
Sigils appear based on city, sigil type, and season. Room lists loaded from
data/sigils.yaml keyed by [City][SigilType][Season].
Cities: Crossing, Riverhaven, Shard. Seasons: spring, summer, autumn, winter.
- Stacks of 25. Auto-buys when below stock level (default 25).
- Prices: Crossing=125kr, Riverhaven=100lr, Shard=90dok.
For Trader guild at circle 65+: speculate luck on first iteration when
starting precision >= 14 (line 422). May improve RNG outcomes.
| Precision | Description | Clarity | Description |
|---|---|---|---|
| 0-29 | broad strokes | 85-89 | exquisite |
| 30-49 | thick strands | 90-94 | flawless |
| 50-69 | many fibers | 95-97 | flawless |
| 70-89 | thin lines | 98-99 | immaculate |
| 90+ | (scribing target) |
@mishaps = /Chills creep down your spine|About the area you wander|A sudden sneeze|
You lose track|You prepare yourself for continued exertion|You are too distracted/Every change to sigilharvest.lic must include a version bump to the VERSION
constant (line 6). The analyzer parses this from log banners, so version changes are
how we distinguish data collected under different script behavior.
Patch bump (e.g., 1.5.0 โ 1.5.1): Bug fixes, new metadata/logging, minor tuning of thresholds or constants, adding new banner fields.
Minor bump (e.g., 1.5.x โ 1.6.0): Algorithm changes that affect sigil outcomes (action selection logic, danger thresholds, skip filters, resource management), new game command integrations, structural refactors.
Major bump (e.g., 1.x โ 2.0.0): Fundamental redesign of the improvement loop, breaking changes to log format that require analyzer updates, new operating modes.
All algorithm changes require empirical validation:
- Run 10 sessions (same params, same techniques, Inspired+Enlightened) with the change
- Analyze with
sigilharvest_analyzer.rbโ filter by version when log files contain multiple sessions - Compare head-to-head against current baseline
- Key metric: minutes per 80+ precision sigil (lower is better)
- Supporting metrics: mishap rate, avg precision, sigils worked per session
- Only one algorithm change per version โ isolate variables
- If worse: revert the change, restore baseline, document the result
- If better: the new version becomes the baseline for the next experiment
When writing analysis scripts, use these patterns (learned from prior bugs):
-
Filter by version:
parser.sessions.each_with_indexโ skip sessions wheresession.version != target -
Classify skipped:
iterations == 0(notresult == 'SKIPPED'โ parser sets result to"FAILED"for all) -
Classify mishaps:
stop_reason == :mishap(notresult == 'mishap') - Per-character breakdown: extract character name from filename, track file alongside each run
- Test params: Shard / permutation / target=90 / 60 minutes / Inspired+Enlightened
- Sample size: 10 sessions per experiment (9 minimum if a character has a config issue)
- Baseline: v1.5.2 (= v1.2.0 algorithm + v1.4.1 infrastructure + tools check)
- Procedure: One algorithm change per version. Run 10 sessions. Analyze. Decide keep/revert.
-
Log storage:
~/SH_logs/v<version>/
v1.2.0 algorithm with EXP-2 (accept trivial actions), move budget, v1.4.1 logging infrastructure, pre-flight tool validation with multi-attempt burin resolution.
v1.5.4 measured performance (9 sessions, 391 worked, 561 skipped):
- Avg precision: 52.4 | Max: 85 | Scribed: 1
- Sigils >= 80: 5 (1.3%) | Mishap rate: 55.0%
- Avg danger at mishap: 8.5 | Total minutes: 546
- Worked/session: 43.4 | >=80/session: 0.6
- Min per 80+ sigil: 109
v1.5.3 previous baseline (10 sessions, 335 worked, 495 skipped):
- Avg precision: 53.5 | Max: 90 | Scribed: 2
- Sigils >= 80: 4 (1.2%) | Mishap rate: 52.8%
- Worked/session: 33.5 | >=80/session: 0.4
- Min per 80+ sigil: 137
Note: Min-per-80+ has high variance at these sample sizes (~1-3% of sigils reach 80). Per-session normalized metrics (worked/session, >=80/session) are more stable.
-
Hypothesis: The move budget formula
(14 - iters) * 15 < (target - prec - 5)bails out too early. 37.5% of v1.2.0 sigils hitmoves_exhaustedat avg precision 55.3. Removing it lets those sigils play their full iterations. - Change: Deleted the move budget check entirely.
- Result: WORSE. Mishap rate jumped 49.9% โ 74.8%. Min per 80+ sigil: 58 โ 114. Without the early bail-out, doomed sigils kept playing, accumulated danger, and mishapped. The move budget was protective, not wasteful.
- Action: REVERTED in v1.5.2. Move budget restored.
| Metric | v1.2.0 baseline | v1.5.0 (no budget) | Delta |
|---|---|---|---|
| Worked | 339 | 333 | -2% |
| Avg precision | 54.3 | 54.6 | +0.6% |
| Sigils >= 80 | 9 (2.7%) | 5 (1.5%) | -44% |
| Mishap rate | 49.9% | 74.8% | +50% |
| Min per 80+ | 58 | 114 | +97% |
- Hypothesis: 37% of iterations are refreshes (zero precision gain). The viability filter rejects trivial actions (difficulty=1) unless danger > 17 or within 5 of target. A trivial action (+2 avg) is always better than a refresh (+0).
-
Change: In
precision_action_viable?, Path 1 simplified frommargin > 1 && (difficulty > 1 || @danger_lvl > 17 || @sigil_precision >= (precision - 5))to justmargin > 1. Also includes multi-attempt burin resolution fix invalidate_tools. - Sessions: 9 (Barrask, Byd, Fidon, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
-
Logs:
~/SH_logs/v1.5.4/
| Metric | v1.5.3 baseline | v1.5.4 (EXP-2) | Delta |
|---|---|---|---|
| Sessions | 10 | 9 | -1 |
| Worked | 335 | 391 | +56 |
| Worked/session | 33.5 | 43.4 | +29.6% |
| Avg precision | 53.5 | 52.4 | -1.1 |
| Max precision | 90 | 85 | -5 |
| Avg iterations | 10.8 | 10.7 | -0.1 |
| Sigils >= 80 | 4 (1.2%) | 5 (1.3%) | +1 |
| >=80/session | 0.4 | 0.6 | +50% |
| Mishap rate | 52.8% | 55.0% | +2.2pp |
| Avg danger@mishap | 8.0 | 8.5 | +0.5 |
| Min per 80+ | 137 | 109 | -20% |
| Scribed (>=90) | 2 | 1 | -1 |
- Verdict: KEEP โ throughput up ~30%, efficiency up ~20%, quality flat within noise. The extra sigils worked per session and improved min-per-80+ more than compensate for the marginal precision delta (-1.1) which is within statistical variance.
- Action: v1.5.4 becomes new baseline for EXP-3.
- Hypothesis: Taking a trivial (+2) or straightforward (+4) action when a repair could enable a difficult (+13) action next turn is suboptimal. Expected value of repair โ difficult is ~13 over 2 turns (6.5/turn) vs trivial's 2/turn.
- Change: Between Phase 2 and Phase 3, if selected precision action has difficulty <= 2 and a repair action is available that would enable a harder action, prefer the repair. Same guards as Phase 4: repair budget (< 2 without override), danger <= 18.
- Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
-
Logs:
~/SH_logs/v1.5.5/
| Metric | v1.5.4 baseline | v1.5.5 (EXP-3) | Delta |
|---|---|---|---|
| Sessions | 9 | 10 | +1 |
| Worked | 391 | 436 | +45 |
| Worked/session | 43.4 | 43.6 | +0.2 |
| Avg precision | 52.4 | 51.9 | -0.5 |
| Max precision | 85 | 85 | 0 |
| Avg iterations | 10.7 | 10.8 | +0.1 |
| Sigils >= 80 | 5 (1.3%) | 9 (2.1%) | +4 |
| >=80/session | 0.6 | 0.9 | +0.3 |
| Mishap rate | 55.0% | 49.1% | -5.9pp |
| Avg danger@mishap | 8.5 | 8.3 | -0.2 |
| Min per 80+ | 109 | 121 | +12 |
| Scribed (>=90) | 1 | 0 | -1 |
-
Verdict: REVERT โ Mishap rate improvement (-5.9pp) is borderline significant
(zโ1.7, pโ0.09). However,
resource_exhaustedstop reason jumped from 2 to 8, and min-per-80+ regressed from 109 to 121. Fundamental assumption flawed: repair doesn't guarantee the same difficult action next turn because menus are re-rolled each iteration. - Action: REVERTED in v1.5.6. EXP-3 code removed, baseline restored to v1.5.4.
- Hypothesis: Resource collapse (a single resource hitting 0) causes mishaps even at danger 0. Adding a guard that skips actions consuming a near-depleted resource (<=1 star) could prevent these collapses.
-
Change: In Phase 2 action selection, add
next if contest_stat <= 1before considering any action. - Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
-
Logs:
~/SH_logs/v1.5.6/
| Metric | v1.5.4 baseline | v1.5.6 (EXP-4) | Delta |
|---|---|---|---|
| Sessions | 9 | 10 | +1 |
| Worked | 391 | 346 | -45 |
| Worked/session | 43.4 | 34.6 | -20% |
| Avg precision | 52.4 | 51.1 | -1.3 |
| Max precision | 85 | 84 | -1 |
| Avg iterations | 10.7 | 10.9 | +0.2 |
| Sigils >= 80 | 5 (1.3%) | 2 (0.6%) | -3 |
| >=80/session | 0.6 | 0.2 | -67% |
| Mishap rate | 55.0% | 47.4% | -7.6pp |
| Avg danger@mishap | 8.5 | 8.1 | -0.4 |
| Min per 80+ | 109 | 302 | +177% |
| Scribed (>=90) | 1 | 0 | -1 |
-
Verdict: REVERT โ Mishap rate improved (-7.6pp) but at severe cost. Throughput
collapsed (-20% worked/session), quality collapsed (-67% >=80/session), min-per-80+
nearly tripled (109โ302). The guard blocks too many actions, forcing iterations into
refreshes (zero precision gain). moves_exhausted rose (133โ142) despite fewer total
sigils. The existing resource floor (
contest_stat <= risk) already handles this more surgically. - Action: REVERTED in v1.5.7. EXP-4 code removed, baseline restored to v1.5.4.
- Hypothesis: The current formula uses 15 precision/move which is optimistic (difficult actions average 13.2). The 37.5% bail-out rate at avg precision 55.3 suggests the formula is roughly calibrated, but tightening to 13 precision/move might bail out slightly earlier on truly hopeless sigils, saving time for new sigils.
-
Change:
(14 - @num_iterations) * 13 < (precision - @sigil_precision - 5) - Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
-
Logs:
~/SH_logs/v1.5.8/
| Metric | v1.5.4 baseline | v1.5.8 (EXP-5) | Delta |
|---|---|---|---|
| Sessions | 9 | 10 | +1 |
| Worked | 391 | 473 | +82 |
| Worked/session | 43.4 | 47.3 | +9% |
| Avg precision | 52.4 | 51.6 | -0.8 |
| Max precision | 85 | 88 | +3 |
| Avg iterations | 10.7 | 10.7 | 0.0 |
| Sigils >= 80 | 5 (1.3%) | 7 (1.5%) | +2 |
| >=80/session | 0.6 | 0.7 | +0.1 |
| Mishap rate | 55.0% | 47.4% | -7.6pp |
| Avg danger@mishap | 8.5 | 8.1 | -0.4 |
| Min per 80+ | 109 | 87 | -20% |
| Scribed (>=90) | 1 | 2 | +1 |
- Verdict: KEEP โ Mishap rate dropped 7.6pp (55.0% โ 47.4%) as tighter budget bails out earlier, converting would-be mishaps into moves_exhausted exits (133โ189). Throughput up 9% (47.3 worked/session). Efficiency improved 20% (87 min per 80+ sigil vs 109). Top-end quality slightly improved (max 88 vs 85, 2 scribes vs 1). Avg precision delta (-0.8) within noise โ expected since bailing earlier on low-potential sigils lowers the average. No regression on any key metric.
- Action: KEPT. v1.5.8 becomes new baseline for technique tests.
EXP-13 (v1.5.18) โ complete, REVERT. 11 sessions analyzed. Removed iteration cap + move budget, resource-only bail-out, skip <15 for target 90. All key metrics regressed: worked/session 25.2โ9.2, >=80/session 0.64โ0.18, scribes 3โ0, mishap rate 42%โ63.4%. The aggressive skip threshold (<15) eliminated too many sigils, and removing the iteration cap increased mishap exposure without producing higher precision outcomes. Post-simulation analysis (SIM-7) additionally proved skip <15 is catastrophic: ALL 3 v1.5.17 scribes started at precision 13. See full results and simulation analysis below.
EXP-14 (v1.5.19) โ complete, REVERT. 11 sessions analyzed. Equalized action costs (all cost labels = 1) per Urbaj's observation that difficulty determines resource cost. Ran on EXP-13 code base (inherits removed iteration cap, resource-only bail-out, skip <15). Results vs v1.5.18 (isolating cost equalization): mishap rate 63.4%โ74.8% (+11.4pp), 0 real scribes (1 C1 fake), >=80/session 0.18โ0.27 (noise). Cost equalization removed the disincentive for dangerous actions, causing more mishaps without compensating gains. See full results below.
v1.5.20 โ complete, partial baseline. 11 sessions. Reverted EXP-13+14 to v1.5.17
algorithm. C1 bug fixed (@actually_scribed flag). D7 analyzer fix deployed. However,
EXP-9 resource exhaustion check was accidentally dropped during revert โ 0% resource_exhausted
vs 12.1% in v1.5.17 baseline. Mishap rate 46.9% (vs 42.0% baseline), moves_exhausted 46.1%
(vs 37.7% โ absorbed the missing resource exits). 1 real scribe (Mahtra, prec=88, 2 scrolls).
C1 fix validated: zero fake SCRIBEDs. D7 fix validated: code correct but 0 repairs in sample.
v1.5.21 โ complete, BASELINE CONFIRMED. 11 sessions. Restores EXP-9 resource exhaustion check. All metrics match v1.5.17 within normal variance:
- Worked/session: 23.8 (vs 25.2) โ within variance
-
=80/session: 0.64 (vs 0.64) โ exact match
- Mishap rate: 40.8% (vs 42.0%) โ match
- resource_exhausted: 13.4% (vs 12.1%) โ restored (was 0% in v1.5.20)
- moves_exhausted: 39.7% (vs 37.7%) โ match
- 0 scribes in 262 worked (expected ~0.5 at baseline rate โ within Poisson variance)
- 7 sigils >=80: 4 stopped by moves_exhausted at 80-84, 3 by mishap at 82-87
EXP-15 (v1.5.22) โ complete, REVERT. 11 sessions. Aligned move budget max with iteration cap (14โ15). Mechanically worked: moves_exhausted 39.7%โ18.1%, 4 sigils reached iteration_cap (3 at precision 83, gap=2 from scribe). But cost was severe: mishap rate 40.8%โ52.5% (+11.7pp, Z=2.67, p<0.01), sigil_vanished 6.1%โ12.4%. The ~57 freed sigils mostly ended in mishaps (29) or vanishes (16). >=80 count 7โ6 (noise). 0 scribes. Key insight: the old formula's off-by-one was functioning as a safety guardrail. Extending sigils into the high-danger zone costs more than it gains.
v1.5.26 โ complete, Phase 3 CLOSED. 11 sessions, 255 worked sigils. D7 fix validated: repair logging is unconditional, but zero repairs in 2606 iterations. The repair code path never triggers because difficulty-first selection (v1.5.20+) always finds a viable precision action. Historical v1.5.17 data (5 repairs) shows repairs are actively harmful (60% mishap rate). Phases 3 and 4 (repair experiments) are closed โ repairs are a non-factor. Confirmed neutral vs v1.5.25: mishap +8.1pp (n.s. p=0.06), 1 real scribe (Fidon, 3 scrolls).
v1.5.27 โ complete, KEPT. EXP-18: minimum difficulty threshold. Skip trivial-difficulty precision actions, refresh for better menu. Avg gain 7.18โ8.54 (+1.36, exceeded v1.2.0's 8.39). Trivial-range gains 25.8%โ3.6%. Effective gain/iter 3.67โ3.75. 60+ rate +6.9pp. Resource exhausted halved (11.8%โ5.9%). Most effective change since EXP-6.
Status: v2.0.0 released. Promoted to upstream PR as sigilharvest_overhaul.
- Gain distribution now matches v1.2.0 โ gain optimization lever exhausted.
- Remaining levers: mishap reduction (1.3x at -50%), moves_exhausted optimization.
- Retrospective simulation: 100% decision agreement (15/15). Net: v1.2.0 2.9% โ v2.0.0 3.0%.
- Sessions: 10 (all complete 60min)
-
Logs:
~/SH_logs/v1.5.10/
| Metric | v1.5.8 (baseline) | v1.5.10 (EXP-6) | Delta |
|---|---|---|---|
| Worked | 473 | 432 | -41 |
| Avg precision | 51.6 | 50.8 | -0.8 |
| Max precision | 88 | 85 | -3 |
| Sigils >= 80 | 7 (1.5%) | 3 (0.7%) | -0.8pp |
| Mishap rate | 47.4% | 37.5% | -9.9pp |
| moves_exhausted | 189 | 224 | +35 |
| Min per 80+ | 87 | 202 | +115 |
| Refresh rate | 40.6% | 44.2% | +3.6pp |
-
EXP-6 verification:
- ACTION verb usage: v1.5.8=292, v1.5.10=0. Filter working perfectly.
- Difficulty ordering confirmed: median gains trivial(2) < straight(5) < formidable(7) < challenging(9) < difficult(13).
- Per-difficulty gains unchanged โ fixes affect selection, not outcomes per action.
- Verdict: KEEP. Both bug fixes confirmed working. Mishap rate dropped 9.9pp (largest single-experiment improvement). Sigils that previously mishapped now exhaust move budget instead. 80+ count dip (7โ3) is statistically insignificant at these sample sizes. The fixes are objectively correct and required for all downstream experiments.
- Sessions: 10 (all complete 60min)
-
Logs:
~/SH_logs/v1.5.11/
| Metric | v1.5.10 (baseline) | v1.5.11 (EXP-10+11) | Delta |
|---|---|---|---|
| Worked | 432 | 298 | -134 |
| Skipped | 700 | 1092 | +392 |
| Skip rate | 61.8% | 78.6% | +16.8pp |
| Avg precision | 50.8 | 38.2 | -12.6 |
| Max precision | 85 | 82 | -3 |
| Avg iterations | 10.8 | 7.0 | -3.8 |
| Scribed (>=90) | 1 | 0 | -1 |
| Sigils >= 80 | 3 (0.7%) | 1 (0.3%) | -2 |
| Mishap rate | 37.5% | 13.4% | -24.1pp |
| >=80/session | 0.3 | 0.1 | -0.2 |
| Min per 80+ | 202 | 602 | +400 |
- EXP-10 (skip threshold): 1035 triggers. Zero false positives on baseline data. Mechanically correct, but effect swamped by EXP-11.
-
EXP-11 (velocity bail-out): 220 bail-outs out of 298 worked sigils (74%).
The simulation predicted ~3.7% (16/432). Root cause: simulation only checked velocity
at iteration 5; live code checked at every iteration >= 5. A sigil passing at iter 5
can dip below 4.0/iter at iters 6-12, triggering late bail-outs. The
unknownstop reason (221 sigils, avg_prec=31.8) = velocity bail-outs. - Low mishap rate is misleading: Sigils are bailed before reaching high enough danger/precision to mishap. Not a real safety improvement.
- Verdict: REVERTED. EXP-11 catastrophically over-triggered due to flawed simulation methodology. EXP-10 (skip threshold alone) remains viable for standalone testing. EXP-11 killed โ continuous velocity check is fundamentally broken. A single-check-at-iter-5 variant could be revisited but the effect size is small (16/432 = 3.7%, all avg final 38) and would need fresh simulation.
- Action: Reverted to v1.5.10 algorithm. Version bumped to v1.5.12. 121 tests passing.
- Lesson: Always simulate the exact check logic (every-iteration vs single-check).
- Background: EXP-10 was bundled with EXP-11 in v1.5.11 but EXP-11 catastrophically over-triggered, swamping EXP-10's effect. Retested standalone after reverting EXP-11.
-
Change: Raise skip threshold for target >= 80 from
< 10to< 13. - Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
-
Logs:
~/SH_logs/v1.5.12/
| Metric | v1.5.10 (baseline) | v1.5.12 (EXP-10) | Delta |
|---|---|---|---|
| Sessions | 10 | 10 | 0 |
| Worked | 432 | 326 | -106 |
| Skipped | 700 | 1169 | +469 |
| Skip rate | 61.8% | 78.2% | +16.4pp |
| Avg precision | 50.8 | 48.3 | -2.5 |
| Max precision | 85 | 84 | -1 |
| Sigils >= 80 | 3 (0.7%) | 3 (0.9%) | 0 |
| >=80/session | 0.3 | 0.3 | 0.0 |
| Mishap rate | 37.5% | 37.1% | -0.4pp |
| Min per 80+ | 202 | 201 | -1 |
| Scribed (>=90) | 1 | 0 | -1 |
| Encountered/session | 113.2 | 149.5 | +36.3 |
-
EXP-10 verification:
- Skip triggers: 875 "below 13" in v1.5.12 vs 593 "below 10" in baseline. Working correctly.
- False positives: 0 of 229 baseline sigils starting at 10-12 reached 80+. Zero FPs confirmed across all cumulative data (1050+ eligible sigils).
- Time saved: 229 baseline sigils ร 10.8 avg iters = ~2464 iterations eliminated per 10 sessions.
- Encountered +36 more sigils per session from faster skipping.
- Avg precision drop (-2.5): Unexpected but explained by session variance. The 13-14 band dropped from 51.1 to 47.0 (removing 10-12 starters should have raised the average). Natural character/session variation, not a threshold effect.
- Verdict: KEEP. Neutral on the key metric (>=80/session = 0.3 in both). Mechanically correct with zero false positives across all data ever collected. Eliminates ~25 wasted iterations per session on provably unproductive sigils. Safe, conservative filter.
- Action: KEPT. v1.5.12 becomes new baseline. EXP-7 staged as v1.5.13.
- Background: EXP-6 fixed difficulty ordering; EXP-7 replaces risk-based action selection with difficulty-first, cost-as-tiebreaker. Data shows gain determined entirely by difficulty (trivial=2.3 to difficult=13.3), cost has zero correlation.
- Change: Phase 2 action selection: prefer highest difficulty, break ties by lowest impact.
- Sessions: 10 (Barrask, Fidon, Throve + 7 others)
-
Logs:
~/SH_logs/v1.5.13/
| Metric | v1.5.12 (baseline) | v1.5.13 (EXP-7) | Delta |
|---|---|---|---|
| Sessions | 10 | 10 | 0 |
| Worked | 326 | 244 | -82 |
| Avg precision | 48.3 | 53.2 | +4.9 |
| Avg iterations | 9.7 | 11.0 | +1.3 |
| Sigils >= 80 | 3 (0.9%) | 2 (0.8%) | -1 |
| >=80/session | 0.3 | 0.2 | -0.1 |
| Mishap rate | 37.1% | 41.0% | +3.9pp |
| Scribed | 0 | 1 | +1 |
Note: v1.5.13 worked count corrected from 327โ244 by session-filtered re-analysis (Feb 2026). The v1.5.13 logs contained sessions from v1.5.11/v1.5.12/v1.5.13; only v1.5.13 sessions are now counted. v1.5.12 baseline (326) was already correct.
- Key finding โ viability filter is the binding constraint: Difficulty distribution did NOT shift between versions (~20% each difficulty in both). The viability filter typically leaves only 1 precision action viable per iteration, making selection preference irrelevant. The algorithm change is correct (better heuristic), but the practical effect is masked by the filter bottleneck.
- Worked count drop (-82): v1.5.13 worked fewer sigils due to session variance (fewer encountered: 130.7 vs 149.5/session). The avg precision increase (+4.9) and higher avg iterations (11.0 vs 9.7) are consistent with spending more time per sigil.
- Avg precision artifact: 65 "unknown" stop reasons in v1.5.12 (avg_prec=33.2) disappeared in v1.5.13 (parser improvement). Adjusting for these, the real v1.5.12 avg was ~52.1, making the actual delta ~+1.1 (within noise).
- Verdict: KEEP (neutral). Correct heuristic with no downside. Practical effect masked by viability filter constraint. The important outcome is the architectural insight: the filter, not selection, controls outcomes. This redirects optimization to EXP-12 (viability loosening).
- Action: KEPT. v1.5.13 becomes new baseline.
EXP-7 revealed the viability filter as the binding constraint. Full analysis across 8,165 iterations (v1.5.10 + v1.5.12 + v1.5.13):
- Menu composition: 93.5% of menus have 1+ precision actions (6.5% have none โ game constraint). 69.4% have 2+ precision actions. Menus are NOT the bottleneck.
- Viability filter acceptance: 91.9% of precision actions pass viability (using post-action resource values). Only 1,408 rejections across 8,165 iterations.
- IMPORTANT timing bias: The analysis used POST-action resource values. For IMPROVE iterations, these are inflated (IMPROVE restores resources). The actual script checks viability with PRE-action (depleted) values. This means the 91.9% acceptance rate overstates reality. The true acceptance rate during refresh iterations is much lower.
-
Refresh rate breakdown (44.5% = 3,630 / 8,165):
- ~530 (14.6%): Menu had zero precision actions (game constraint, unfixable)
- ~3,100 (85.4%): Menu had precision, but viability rejected all (filter constraint)
- The viability filter IS the primary bottleneck, not a separate "recover mode."
-
Rejection reasons (of 1,408 post-action rejections):
- 87.1% margin <= 0 (stat too low for difficulty)
- 12.9% margin=1 with trivial/straightforward (filter disallows low-difficulty at tight margin)
-
Counterfactual scenarios (biased low due to timing issue):
- Scenario A (accept margin=1 for all difficulties): converts 106 refreshes, +0.5 prec/sigil
- Scenario B (accept margin=0 for challenging+): converts 163 refreshes, +1.8 prec/sigil
- Scenario C (both): converts 262 refreshes, -3.3pp refresh rate
- Real impact likely 2-3x these estimates after correcting for timing bias.
- Implication: EXP-12 (loosen viability to margin >= 0 for challenging+) is the next highest-leverage experiment. EXP-8 (repair window) dropped โ repairs are too rare (0-4 per 10 sessions) and the viability filter is the real bottleneck.
-
Change:
precision_action_viable?Path 2:margin > 0โmargin >= 0for challenging+. Accepts actions where stat == difficulty for formidable/challenging/difficult. - Sessions: 10 (Throve/Refia/Byd split)
-
Logs:
~/SH_logs/v1.5.14/
Layer 2 metrics (raw log parsing, reliable):
| Metric | v1.5.13 (baseline) | v1.5.14 (EXP-12) | Delta |
|---|---|---|---|
| Refresh rate | 63.8% | 62.9% | -0.9pp |
| Gain/action | 7.86 | 8.21 | +0.35 |
| Mishap/iter | 3.84% | 4.25% | +0.41pp |
| Difficulty shift | โ | +3pp difficult, +1.4pp challenging, -2.6pp formidable | โ |
Layer 1 metrics (session-filtered, corrected Feb 2026):
| Metric | v1.5.13 (baseline) | v1.5.14 (EXP-12) | Delta |
|---|---|---|---|
| Sessions | 10 | 10 | 0 |
| Worked | 244 | 233 | -11 |
| Scribed | 1 | 2 | +1 |
| >=80 | 2 | 6 | +4 |
| >=80/session | 0.2 | 0.6 | +0.4 |
| Avg precision | 53.2 | 54.6 | +1.4 |
| Mishap rate | 41.0% | 47.6% | +6.6pp |
| Avg iterations | 11.0 | 10.8 | -0.2 |
Previous analysis used overcounted v1.5.13 baseline (593 worked, all versions in directory). Corrected analysis filters to v1.5.13 sessions only (244 worked).
- Analysis (corrected): Session-filtered Layer 1 reveals a stronger positive signal than originally assessed. >=80 tripled (2โ6), >=80/session tripled (0.2โ0.6), scribes doubled (1โ2), avg precision up +1.4. Layer 2 confirms per-iteration metrics are near-flat (refresh rate -0.9pp, gain/action +0.35, mishap/iter +0.41pp). The per-sigil mishap rate increased +6.6pp (41.0โ47.6%), but this was partially masked in the original analysis by the inflated baseline denominator (593 worked โ artificial 39.3% mishap rate).
- Revert decision context: The revert was made based on the overcounted analysis showing +8.3pp mishap with only modest positive signals. With corrected data, the >=80 improvement is substantial (+4 sigils, 3x improvement) and the mishap delta is +6.6pp. Consider re-testing EXP-12 with corrected measurement infrastructure.
- Verdict: REVERTED (decision made with overcounted data). Corrected analysis suggests the experiment may warrant re-testing.
- Action: REVERTED. v1.5.13 remains baseline. EXP-9 staged as v1.5.15.
-
Change: Identical to v1.5.14:
precision_action_viable?Path 2:margin > 0โmargin >= 0for challenging+. Re-test with session-filtered baseline after corrected analysis suggested the original positive signal (>=80 tripled) may have warranted keeping. - Baseline: v1.5.15 (includes EXP-9 resource exhaust coeff 1.75)
- Sessions: 10
-
Logs:
~/SH_logs/v1.5.16/
| Metric | v1.5.15 (baseline) | v1.5.16 (EXP-12r) | Delta |
|---|---|---|---|
| Sessions | 10 | 10 | 0 |
| Worked | 265 | 271 | +6 |
| Skipped | 1025 | 1104 | +79 |
| Scribed | 1 | 0 | -1 |
| >=80 | 1 | 0 | -1 |
| >=80/session | 0.1 | 0.0 | -0.1 |
| Avg precision | 51.2 | 51.6 | +0.4 |
| Max precision | 85 | 79 | -6 |
| Avg iterations | 10.4 | 10.4 | 0.0 |
| Mishaps | 114 | 129 | +15 |
| Mishap rate | 43.0% | 47.6% | +4.6pp |
Stop reasons:
| Reason | v1.5.15 | v1.5.16 | Delta |
|---|---|---|---|
| mishap | 114 | 129 | +15 |
| moves_exhausted | 107 | 92 | -15 |
| resource_exhausted | 26 | 32 | +6 |
| scribed | 1 | 0 | -1 |
| sigil_vanished | 17 | 18 | +1 |
- Analysis: The retest against the correct baseline (v1.5.15, which includes EXP-9) confirms the original revert decision. The positive signal seen in v1.5.14 (>=80 tripled 2โ6 vs v1.5.13) does not reproduce against v1.5.15: zero sigils reached 80+, max precision dropped to 79, and mishap rate increased +4.6pp (43.0โ47.6%). The relaxed viability margin allows marginal actions that produce more mishaps without compensating precision gains.
- Verdict: REVERTED. Confirmed harmful. The original v1.5.14 positive signal was likely noise or an artifact of the v1.5.13 baseline lacking EXP-9's resource exhaustion change.
- Action: REVERTED. v1.5.15 remains current baseline for future experiments.
- Background: Per Elanthipedia, all Sigil Comprehension technique bonuses are "globally disabled." There are 4 technique levels: Inspired, Enlightened, Illuminated, Awakened. Inspired and Enlightened have been enabled throughout all experiments (base effects only, bonuses disabled). Illuminated and Awakened are listed as "NOT enabled" on the wiki.
- Goal: Determine if enabling Illuminated Sigil Comprehension has any measurable effect on sigil harvesting outcomes.
- Change: Version tick only (v1.5.8 โ v1.5.9). No algorithm change. All characters trained Illuminated Sigil Comprehension before running.
- Sessions: 20 (10 original + 10 additional)
-
Logs:
~/SH_logs/v1.5.9/(20 files:*_1353.log+*_1507.log)
| Metric | v1.5.8 (10 sess) | v1.5.9 (20 sess) | Delta |
|---|---|---|---|
| Worked | 473 | 790 | +317 |
| Worked/session | 47.3 | 39.5 | -7.8 |
| Avg precision | 51.6 | 52.0 | +0.4 |
| Max precision | 88 | 89 | +1 |
| Avg iterations | 10.7 | 10.7 | 0.0 |
| Scribed (>=90) | 2 | 5 | +3 |
| Sigils >= 80 | 7 (1.5%) | 11 (1.4%) | -0.1pp |
| Mishap rate | 47.4% | 45.4% | -2.0pp |
| Min per 80+ | 87 | 107 | +20 |
- Verdict: No effect. Illuminated Sigil Comprehension is confirmed disabled, as the wiki states. Doubled sample size (20 sessions, 790 worked sigils) confirms all key metrics within noise of v1.5.8. No systematic shift attributable to the technique.
Log files contain output from entire game sessions, which may include multiple SigilHarvest invocations across different versions. For example, the v1.5.13 logs contain sessions from v1.5.11, v1.5.12, and v1.5.13. Analysis scripts must filter to only the correct version's session per file.
Fix applied: Added last_session_runs(version) helper to LogParser. All analysis
scripts updated to use session filtering. Re-analysis of all experiments with corrected
methodology.
Validated (numbers unchanged): EXP-6, EXP-10+11, EXP-10, EXP-5 โ earlier experiments used analysis scripts that already had session filtering (or had clean single-version logs).
Corrected: EXP-7 test (v1.5.13 worked: 327โ244), EXP-12 baseline+test (v1.5.13
593โ244, v1.5.14 364โ233), EXP-9 baseline (v1.5.13 593โ244). The overcounting originated
from flat_map(&:sigil_runs) without version filtering, counting all sessions in the log
file regardless of version.
Impact on decisions: EXP-12 revert decision was made with inflated baseline (593 worked, artificial 39.3% mishap rate). Corrected data showed >=80 tripled (2โ6) with +6.6pp mishap (41.0โ47.6%), suggesting a possible positive signal. Re-test completed (v1.5.16): the positive signal did not reproduce against the correct baseline (v1.5.15). Zero sigils reached 80+, max precision dropped to 79, mishap rate +4.6pp. Original revert decision confirmed.
Ordered by expected impact and dependency chain. One experiment per version, no bundling.
| Version | Experiment | Description | Status |
|---|---|---|---|
| v1.5.10 | EXP-6 (kept) | Fix difficulty ordering + filter ACTION verb | Complete |
| v1.5.11 | EXP-10+11 (reverted) | Skip threshold + velocity bail-out (bundled) | Complete โ over-triggered |
| v1.5.12 | EXP-10 (kept) | Skip threshold < 13 (standalone) | Complete |
| v1.5.13 | EXP-7 (kept) | Difficulty-based action selection (decouple risk) | Complete |
| v1.5.14 | EXP-12 (reverted) | Loosen viability margin (accept margin=0 for challenging+) | Complete โ revert may need re-evaluation (see corrected data) |
| v1.5.15 | EXP-9 (kept) | Recalibrate resource exhaustion coefficient (2.25โ1.75) | Complete โ KEPT (neutral) |
| v1.5.16 | EXP-12 retest (reverted) | Loosen viability margin (re-test with session-filtered baseline) | Complete โ confirmed harmful |
| v1.5.17 | Awakened technique (kept) | Confirm if technique is active (target 90 test) | Complete โ >=80 improvement observed (22 sessions), mechanism unknown |
| v1.5.18 | EXP-13 (revert) | Remove iteration cap + move budget check (resource-only bail-out) | Complete โ all metrics regressed (0 scribes, mishap 63%, worked/sess 9.2) |
| v1.5.19 | EXP-14 (revert) | Equalize action costs (Urbaj) | Complete โ REVERT, but confounded (tested on EXP-13 broken base, not v1.5.17). Needs clean retest as v1.5.24. |
| v1.5.20 | Baseline restore | Revert EXP-13+14, add C1 fix | Complete โ missing EXP-9 resource check (0% resource_exhausted vs 12.1% baseline). C1 fix validated. |
| v1.5.21 | Corrected baseline | Restore EXP-9 resource check | Complete โ baseline confirmed, all metrics match v1.5.17 |
| v1.5.22 | EXP-15 (revert) | Align move budget max with iteration cap (14โ15) | Complete โ REVERT. Mishap +11.7pp, >=80 7โ6, 0 scribes |
| v1.5.23 | EXP-16 (revert) | Tighten resource exhaustion coefficient (1.75โ1.5) | Complete โ REVERT. Coefficient mathematically impossible: need prec>=18 at max resources. 0 worked, 100% skip. |
| v1.5.24 | EXP-14 retest (kept) | Equalize action costs โ clean standalone test ({1,2,3}โ{1,1,1}) vs v1.5.21 baseline | Complete โ KEPT. Neutral: mishap -1.6pp (n.s.), 2 real scribes, original "harmful" verdict was confounded |
| v1.5.25 | EXP-17 (kept) | Resource-aware tiebreaker โ when 2+ actions share highest difficulty, prefer action draining most-available resource | Complete โ KEPT (neutral). Mishap -5.3pp (n.s. p=0.21), resource_exhausted -1.6pp, mechanically coherent stop-reason shift |
| v1.5.26 | D7 fix (infrastructure) | Make repair logging unconditional โ repairs confirmed non-existent | Complete โ Phase 3 CLOSED |
| v1.5.27 | EXP-18 (kept) | Minimum difficulty threshold โ skip trivial (difficulty=1) precision actions, refresh for better menu | Complete โ KEPT. Avg gain +1.36 (7.18โ8.54), trivial-range 25.8%โ3.6%, 60+ rate +6.9pp |
-
Hypothesis: Two confirmed bugs compound to reduce precision gains.
-
Difficulty ordering bug:
formidableis ranked 5 (highest) when it should be 3. Measured median gain: formidable=6, challenging=8, difficult=12. The algorithm selects formidable over difficult when close to target (precision >= 70), losing ~6 median precision per affected iteration at the most critical stage. Confirmed consistent across all v1.5.2โv1.5.8 data (8-9% of iterations affected per version). - ACTION verb bug: Per Elanthipedia, ACTION "there is a good chance nothing will happen but the danger level will rise." Confirmed: 21% zero-gain rate vs 0.0% for all other verbs. ~550 ACTION executions per 10 sessions, ~120 completely wasted (zero gain + danger increase). No other verb ever produces zero gain.
-
Difficulty ordering bug:
-
Changes:
-
@action_difficulty:formidable => 3, challenging => 4, difficult => 5 - Skip actions where
verb == "ACTION"during action selection
-
- Risk: Low. Both are bug fixes backed by empirical data. Combined because neither is an algorithm hypothesis โ they're corrections to known-wrong behavior.
- Expected impact: Better end-game precision (difficult selected over formidable when close to target), ~120 fewer wasted iterations per 10 sessions, lower cumulative danger.
-
Hypothesis: The current
risk = difficulty + costcomposite conflates reward potential with resource drain. The algorithm picks ~20% each difficulty level regardless of distance. Decoupling reveals: gain is determined entirely by difficulty (trivial=2.3 to difficult=13.3), cost has zero correlation with gain (taxing=7.0, disrupting=6.8, destroying=7.0). This holds across all distances and all difficulty x cost combinations. -
Calibration data (v1.5.10, 1971 actions post-difficulty-fix):
- Gain by difficulty: trivial=2.3, straightforward=4.5, formidable=6.8, challenging=9.6, difficult=13.3
- Gain by cost: taxing=7.0, disrupting=6.8, destroying=7.0 (no signal)
- Current selection: ~20% each difficulty (nearly uniform, ineffective)
- Theoretical uplift: +6.27 gain/iter if always picking difficult (13.2 vs 7.0 current avg)
- Over 10 iterations: +62.7 precision (obviously bounded by resource constraints)
-
Change: Replace risk-based comparison in Phase 2 action selection:
- Always prefer highest difficulty (maximize precision gain per iteration)
- Break ties by lowest cost/impact (conserve resources when gain is equal)
# Before (risk composite): if far_from_target: prefer lowest risk if close_to_target: prefer highest risk # After (EXP-7): prefer highest difficulty, then lowest impact as tiebreaker
- Risk: Low-medium. Changes the core selection heuristic. Data strongly supports the change across all 1971 observed actions. EXP-6 difficulty fix must be in place (it is).
- Depends on: EXP-6 (satisfied)
- Status: STAGED in v1.5.13 code (123 tests passing). Ready to run after v1.5.12 analysis.
-
Hypothesis: The viability filter is the primary bottleneck controlling the 44.5% refresh
rate. Currently, Path 2 requires
margin > 0(stat > difficulty) for challenging+ actions. Loosening tomargin >= 0(stat >= difficulty) extends the productive phase by 1 resource point, allowing precision actions when resources are at the difficulty threshold instead of forcing IMPROVE. EXP-7's viability analysis showed ~85% of refreshes occur when the menu has precision actions that the filter rejects โ this is the filter, not menu RNG. -
Change: In
precision_action_viable?, Path 2:One-character change:# Before: return true if margin > 0 && difficulty > 2 # After: return true if margin >= 0 && difficulty > 2
>to>=in the margin comparison. Accepts margin=0 (stat == difficulty) for formidable, challenging, and difficult actions. -
What this means practically:
- Formidable (difficulty=3): viable at stat >= 3 (was >= 4)
- Challenging (difficulty=4): viable at stat >= 4 (was >= 5)
- Difficult (difficulty=5): viable at stat >= 5 (was >= 6)
- Trivial/straightforward: unchanged (still require margin > 1, i.e., stat >= difficulty + 2)
- Risk: Low-medium. Accepting margin=0 reduces the safety buffer.
- Depends on: EXP-7 (satisfied)
- Result (v1.5.14): Modest positive signals (refresh -0.9pp, gain +0.35/action, +3pp difficult shift) but per-sigil mishap rate 47.6%. Originally reverted due to overcounted baseline showing only modest gains. Session-filtered re-analysis revealed >=80 tripled (2โ6). REVERTED then RE-TESTED as v1.5.16 with corrected measurement infrastructure.
- Status: Re-staged as v1.5.16 (125 tests passing). Identical code change to v1.5.14.
-
Hypothesis: Repairs currently only trigger when
@sigil_precision >= (precision - 15)(within 15 of target). After the difficulty fix, repairs target the resource consumed bydifficultactions (reward 12-15) instead offormidable(reward 4-9). The window could be expanded to start repairs earlier (enabling more difficult actions sooner) or tightened to reserve iterations for direct precision work. -
Change: Adjust the
precision - 15threshold inselect_repair_action. Test values:precision - 20(wider) orprecision - 10(narrower). - Risk: Low. Only affects when repairs are attempted, not core precision selection.
- Depends on: EXP-6
-
Hypothesis: The resource exhaustion check uses
(san + res + foc) * 2.25 + precision < target - 5. The 2.25 coefficient assumes each resource star is worth ~2.25 precision. -
Calibration data (v1.5.10, 762 actions with resource consumption data):
- Actual overall gain/star: 1.60 (current coefficient 2.25 is at P90)
- By difficulty: trivial=1.16, straightforward=1.48, formidable=1.55, challenging=1.70, difficult=1.72
- By cost: taxing=1.61, disrupting=1.56, destroying=1.62 (no significant variation)
- Distribution: P25=1.17, P50=1.50, P75=2.00, P90=2.25
- Current 2.25 is extremely optimistic โ only 10% of iterations achieve this rate
- Change: Lower coefficient from 2.25 to 1.75. This is between P50 (1.50) and P75 (2.00), and aligns with the difficult-action gain/star of 1.72 (which dominates under EXP-7's difficulty-first selection). At 1.75, the check exits sigils where even median-to-good performance per remaining star can't reach the target. At 2.25, it only exited when P90+ performance couldn't reach target โ far too optimistic.
- Risk: Low. Only affects bail-out timing. More sigils exit earlier (redirecting time to fresh sigils), potentially lower avg precision but higher throughput to 80+.
- Depends on: EXP-7 (satisfied) โ coefficient calibrated to post-EXP-7 difficulty preference.
- Sessions: 10 (Barrask, Byd, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve)
-
Logs:
~/SH_logs/v1.5.15/
Layer 1 metrics (session-filtered):
| Metric | v1.5.13 (baseline) | v1.5.15 (EXP-9) | Delta |
|---|---|---|---|
| Sessions | 10 | 10 | 0 |
| Worked | 244 | 265 | +21 |
| Scribed | 1 | 1 | 0 |
| >=80 | 2 | 1 | -1 |
| >=80/session | 0.2 | 0.1 | -0.1 |
| Avg precision | 53.2 | 51.2 | -2.0 |
| Mishap rate | 41.0% | 43.0% | +2.0pp |
| Avg iterations | 11.0 | 10.4 | -0.6 |
Layer 2 metrics (raw log parsing):
| Metric | v1.5.13 (baseline) | v1.5.15 (EXP-9) | Delta |
|---|---|---|---|
| Refresh rate | 63.8% | 63.1% | -0.7pp |
| Gain/action | 7.13 | 7.09 | -0.04 |
| Mishap/iter | 1.62% | 2.09% | +0.47pp |
EXP-9 specific metrics:
| Metric | v1.5.13 | v1.5.15 | Delta |
|---|---|---|---|
| Resource exhaustion exits | 1 | 26 | +25 |
| moves_exhausted | 120 | 107 | -13 |
| Available stars median | 109 | 92.5 | -16.5 |
- Primary mechanism confirmed: Resource exhaustion exits increased from 1โ26 (+25). The stop-reason shift (moves_exhausted -13, resource_exhausted +25) shows the tighter coefficient is catching sigils that would have exhausted moves anyway.
- Per-iteration metrics flat: Layer 2 confirms the coefficient change doesn't affect per-iteration behavior โ refresh rate, gain/action, mishap/iter are all within noise. EXP-9 only changes when to give up, not how to play.
- Outcome neutral: >=80 slightly down (2โ1), avg precision -2.0, but these are within sample variance for 10 sessions. The +2.0pp mishap rate is noise (100โ114 out of ~2700 iterations). No regression strong enough to justify revert.
- Verdict: KEEP (neutral). Correct heuristic โ aligns coefficient with observed gain/star distribution. No measurable benefit yet but no regression. The mechanism is sound (exits happening where predicted) and enables future experiments to build on a more realistic resource model.
- Action: KEPT. v1.5.15 becomes new baseline.
- Hypothesis: Sigils starting at precision 10-12 reach 80+ at only 0.45-0.61% rate vs 1.01-3.38% for starting precision 13+. These low-start sigils consume iterations (avg 9.7 per sigil) but almost never succeed. Skipping them frees those iterations for fresh sigils with higher expected value. Across 4095 worked sigils (v1.5.2-v1.5.9), raising the skip threshold from < 10 to < 13 yields an estimated net +15.6 additional 80+ sigils from redirected time.
-
Change: In
improve_sigil(line 324), raise the skip threshold for target >= 80 from< 10to< 13:# Before: if @args.precision.to_i >= 80 && @sigil_precision < 10 # After: if @args.precision.to_i >= 80 && @sigil_precision < 13
-
Data basis: Starting precision distribution and outcomes (all versions pooled):
- Start 10: 1040 sigils, avg final 40.1, 0.58% reach 80+
- Start 11: 880 sigils, avg final 41.5, 0.45% reach 80+
- Start 12: 621 sigils, avg final 44.3, 0.61% reach 80+
- Start 13: 396 sigils, avg final 45.4, 1.01% reach 80+
- Start 14: 254 sigils, avg final 49.8, 1.57% reach 80+
- Breakpoint at 12โ13 is consistent across all 8 versions individually.
- Risk: Low. Only affects which sigils are attempted, not the algorithm itself. Lost 80+ sigils (those starting 10-12 that would have made it) are offset ~3:1 by new 80+ sigils from the saved iterations.
- Depends on: EXP-6 (so difficulty fix is in place; the data holds regardless, but testing should be sequential)
- Bundling rationale: EXP-10 and EXP-11 are bundled because they affect orthogonal code paths (start-of-sigil skip vs mid-run bail-out), neither changes the core algorithm, and both had zero false positives across 4097 sigils. Combined simulation: net +21.0.
-
EXP-11 component โ Precision velocity bail-out: After 5 iterations, sigils with
average gain per iteration < 4 have a 0.0% rate of reaching 80+ (0 out of 1399 across
all v1.5.2-v1.5.9 data). These "slow grinder" sigils start above the skip threshold
but never gain momentum. Code adds
@start_precisiontracking and a velocity check after the move budget check:if @num_iterations >= 5 && @start_precision velocity = (@sigil_precision - @start_precision).to_f / @num_iterations if velocity < 4.0 return false end end
-
Combined data basis (4097 sigils, v1.5.2-v1.5.9):
- EXP-10 alone: skip 1990, lose 11 80+, save 19333 iters, net +15.6
- EXP-11 alone: bail 1399, lose 0 80+, save 6933 iters, net +9.5
- Combined: skip+bail 2784, lose 11 80+, save 23298 iters, net +21.0
- Status: TESTED and REVERTED. Velocity bail-out (EXP-11) over-triggered at 74% of worked sigils due to continuous checking (every iter >= 5) vs simulation's single check at iter 5. EXP-10 (skip threshold) mechanically correct but untested in isolation. EXP-11 moved to killed ideas. See results above.
Feedback from experienced player (Urbaj) identified three incorrect assumptions in Matt's original script (our starting point). All three claims validated against our existing calibration data (762 actions from EXP-9, 566 worked sigils from v1.5.15+v1.5.17).
1. No hard iteration cap in the game.
The game allows unlimited PERC SIGIL IMPROVE iterations as long as you have resources.
The script's hard cap of 15 iterations (line 271) and move budget check (line 291, which
uses 14 - iterations as remaining moves) are artificial limits. Validation: only 10/566
sigils (1.8%) reach iteration 14-15, but the move budget check stops 215/566 (38%) of
worked sigils. 59 of those 215 had precision >= 60, and 22 had >= 70. These are sigils
still climbing that get cut off by an assumption that doesn't match the game.
2. Action cost labels describe WHICH resource, not HOW MUCH.
The labels "destroying"/"disrupting"/"taxing" are static descriptors for which resource
is consumed: sanityโdestroying, focusโdisrupting, resolveโtaxing. They do NOT predict
how much resource is consumed. The current @action_cost = { "taxing" => 1, "disrupting" => 2, "destroying" => 3 } is wrong. Validation: gain/star by cost label is flat
(taxing=1.61, disrupting=1.56, destroying=1.62). If destroying consumed 3x more than
taxing, gain/star for destroying would be ~2.33 and taxing would be ~7.0.
3. Difficulty is the sole predictor for both gain AND resource consumption. Both precision gain and resource consumption are determined by difficulty, not cost label. Validation from EXP-7/EXP-9 calibration data:
- Gain by difficulty: trivial=2.3, straightforward=4.5, formidable=6.8, challenging=9.6, difficult=13.3 (strong monotonic signal)
- Gain by cost: taxing=7.0, disrupting=6.8, destroying=7.0 (no signal)
- Gain/star by difficulty: 1.16โ1.72 (higher difficulty = more efficient per star)
- Gain/star by cost: 1.61, 1.56, 1.62 (no signal)
Higher difficulty actions are actually MORE resource-efficient (gain/star increases with difficulty), meaning EXP-7's preference for highest difficulty is even more correct than originally justified โ it maximizes both precision gain and resource efficiency.
- Hypothesis: The hard iteration cap of 15 and the move budget check (which assumes 14 max useful iterations) are artificial limits not imposed by the game. 38% of worked sigils are stopped by these iteration-based limits. Removing them and relying solely on the resource exhaustion check (EXP-9) allows sigils with remaining resources to continue gaining precision. The resource exhaustion check directly measures whether remaining resources can reach the target โ it doesn't need an iteration count proxy.
-
Changes:
- Remove hard cap at iteration 15 (line 271-274)
- Remove move budget check
(14 - @num_iterations) * 13 < ...(line 291-295) - Adjust scribe-near-cap logic (line 261): remove
@num_iterations >= 15condition, keep scribing at target or target-5 based on resource exhaustion proximity
-
What the resource exhaustion check already handles:
(san + res + foc) * 1.75 + precision < target - 5exits when remaining resources can't plausibly reach target. This is a direct measurement, not an iteration-count proxy. -
Data basis: 181/566 (32%) stopped by move budget. Monte Carlo projection (5000
sims per sigil using observed gain/productive-rate/mishap-rate distributions):
- 37 of 181 (20%) have >50% probability of reaching 85 with more iterations
- 15 of 181 (8%) have >50% probability of reaching 90
- Sigils at 70+ with 10+ remaining stars have 55-72% chance of reaching 90
- No infinite loop risk: 76% of refreshes produce viable action next iteration; max observed refresh streak was 6. Resource drain provides natural termination.
- Additional change: Raise skip threshold to <15 for target 90. Analysis shows start=13 (209 sigils, max=83, 0 scribes) and start=14 (162 sigils, max=78, 0 scribes) never reach 85. Only start=15 has scribe potential (1 scribe, 6 >=80 in 194 sigils). Saves 3,851 iterations with zero lost scribes.
- Risk: Medium. Without iteration limits, sigils burn through more resources per run. Fewer sigils attempted per session (each takes longer). Net effect depends on whether extended sigils convert to 80+ at a higher rate than fresh sigils would. Resource exhaustion check provides the safety net. No loop risk โ confirmed by refresh analysis.
- Depends on: EXP-9 (satisfied โ resource exhaustion check in place)
- Expected impact: HIGHEST of any remaining experiment. 32% of sigils currently stopped may continue. Simulation projects 15-37 additional scribes per 566 worked. Combined with skip threshold (fewer wasted attempts), net throughput should increase.
- Sessions: 11 (all complete 60min, 11 characters)
-
Logs:
~/SH_logs/v1.5.18/(extracted via session_splitter.rb; 4 split sessions)
| Metric | v1.5.17 (22 sess) | v1.5.18 (11 sess) | Delta |
|---|---|---|---|
| Worked | 555 | 101 | -454 |
| Worked/session | 25.2 | 9.2 | -16.0 |
| Skipped | 2249 | 1428 | -821 |
| Scribed | 3 | 0 | -3 |
| Scribes/session | 0.18 | 0.0 | -0.18 |
| >=80 | 14 | 2 | -12 |
| >=80/session | 0.64 | 0.18 | -0.46 |
| Avg precision | 52.8 | 55.0 | +2.2 |
| Max precision | 88 | 80 | -8 |
| Avg iterations | 10.5 | 11.0 | +0.5 |
| Max iterations | 15 | 18 | +3 |
| Iters > 15 | 0 | 1 | +1 |
| Mishap rate | 42.0% | 63.4% | +21.4pp |
| Stop reason | v1.5.17 | v1.5.18 | Delta |
|---|---|---|---|
| mishap | 233 (42%) | 64 (63%) | -169 |
| moves_exhausted | 209 (38%) | 0 (0%) | -209 |
| resource_exhausted | 67 (12%) | 17 (17%) | -50 |
| scribed | 3 | 0 | -3 |
| sigil_vanished | 41 | 20 | -21 |
Per-character breakdown (all 11 characters, 0 scribes universally):
| Character | Sigils | Worked | Skipped | Mishap% |
|---|---|---|---|---|
| Barrask | 142 | 8 | 134 | 62.5% |
| Byd | 143 | 8 | 135 | 75.0% |
| Christus | 141 | 8 | 133 | 50.0% |
| Fidon | 139 | 10 | 129 | 70.0% |
| Gnarta | 146 | 8 | 138 | 50.0% |
| Jazriel | 143 | 7 | 136 | 57.1% |
| Kythkani | 133 | 9 | 124 | 66.7% |
| Mahtra | 137 | 9 | 128 | 55.6% |
| Nelis | 133 | 11 | 122 | 63.6% |
| Refia | 133 | 12 | 121 | 75.0% |
| Throve | 139 | 11 | 128 | 63.6% |
Note: ~92-93% skip rate across all characters (vs ~80% in v1.5.17 baseline). Mishap rate ranges 50-75% per character (vs 42% baseline), with no character showing improvement.
-
Analysis: Every key metric regressed. Two compounding problems:
- Skip threshold <15 for target 90 โ Eliminated 63% of worked sigils (25.2โ9.2 per session). The Monte Carlo projection was correct that start <15 rarely reaches 90, but the throughput cost is devastating: far fewer sigils attempted means far fewer chances at any high-precision outcome.
- Removed iteration cap โ Sigils now run up to 18 iterations, but the extra iterations mostly produce mishaps. Mishap rate jumped 42%โ63.4%. The simulation's projection of 15-37 additional scribes did not materialize โ extended sigils hit mishaps before converting. Max precision actually dropped (88โ80). The slight avg precision increase (+2.2) is a selection artifact: only high-starting sigils (15+) are worked, so the floor is higher. But this doesn't compensate for the catastrophic loss of throughput and high-precision outcomes.
- Post-mortem: The Monte Carlo model overestimated scribe potential because it assumed uniform mishap probability per iteration. In practice, mishap risk likely compounds as danger accumulates over extended runs. This experiment bundled THREE changes (remove iteration cap, remove move budget, raise skip threshold to <15 for target 90), violating the one-change-per-version protocol. It is impossible to isolate which change caused which portion of the regression. Worse, EXP-14 was then tested on top of this broken base, making its results confounded as well (see EXP-14 confounding note).
- Verdict: REVERT. Zero scribes, 63% mishap rate, 0.18 >=80/session. All changes from EXP-13 must be reverted. The individual components (skip threshold alone, cap removal alone) could be re-tested as separate experiments if desired.
- Action: REVERT. v1.5.17 remains baseline. EXP-14 (v1.5.19, cost equalization) also REVERT โ see EXP-14 results below.
-
Hypothesis: The
@action_costmapping is wrong (Urbaj's claim 2). Confirmed with 100% correlation from 2,416 action iterations: destroying=sanity (771/772), disrupting=focus (843/844), taxing=resolve (800/800). Each consumes ~4.2 stars of exactly one resource. The labels are static resource descriptors, not cost predictors. Difficulty is the sole independent variable for both gain and resource consumption. -
Change: Equalize
@action_costfrom{ taxing: 1, disrupting: 2, destroying: 3 }to{ taxing: 1, disrupting: 1, destroying: 1 }. This is the minimal, isolated change. -
Downstream effects:
-
impactfield is now always 1 (no cost differentiation between actions) -
riskcomposite =difficulty + 1(same ordering as difficulty alone) - EXP-7 tie-breaking at equal difficulty becomes a no-op (first encountered wins)
- Repair selection (line 671) becomes purely difficulty-based
-
-
What was NOT changed (deferred to future experiments if warranted):
- The resource-aware tiebreaker concept (prefer action consuming most-available resource) was considered but deferred. One change per version. The viability filter typically leaves only 1 option per iteration anyway, making tiebreakers rarely exercised.
- The
riskcomposite still computed asdifficulty + costbut with equal costs it equalsdifficulty + 1, which preserves correct ordering without a code change.
- Risk: Low. Viability filter typically leaves only 1 option per iteration.
- Depends on: EXP-7 (satisfied)
- Expected impact: LOW. Correct in principle but rarely exercised in practice.
-
Tests: 131 examples, 0 failures. Updated
@action_costsetup,build_improvementdefaults, 12 fixture impact/risk values, reworked tie-breaking test to verify first-encountered behavior when costs are equal. - Sessions: 11 (all complete 60min, 11 characters)
-
Logs:
~/SH_logs/v1.5.19/(extracted via session_splitter.rb; 4 split sessions) - Note: v1.5.19 inherits ALL EXP-13 changes (removed iteration cap, resource-only bail-out, skip <15 for target 90). Since EXP-13 is REVERT, these results reflect both the catastrophic EXP-13 base AND the cost equalization. Compare vs v1.5.18 to isolate EXP-14's effect, and vs v1.5.17 for true baseline.
Cross-version comparison (11 sessions each for fair comparison):
| Metric | v1.5.17 (baseline) | v1.5.18 (EXP-13) | v1.5.19 (EXP-14) | EXP-14 delta |
|---|---|---|---|---|
| Worked | 555 | 101 | 103 | +2 |
| Worked/session | 50.5 | 9.2 | 9.4 | +0.2 |
| Skipped | 2249 | 1428 | 1429 | +1 |
| Real scribes | 1 | 0 | 0 | 0 |
| C1 fake scribes | 2 | 0 | 1 | +1 |
| >=80 | 14 | 2 | 3 | +1 |
| >=80/session | 1.27 | 0.18 | 0.27 | +0.09 |
| Avg precision | 52.8 | 55.0 | 52.8 | -2.2 |
| Mishap rate | 42.0% | 63.4% | 74.8% | +11.4pp |
Note on v1.5.17 session count: The v1.5.17 directory has 11 files containing 22 sessions (2 batches merged). Per-file metrics show 50.5 worked/file but the per-session baseline used for Z-score calculations elsewhere uses 22 sessions (25.2 worked/session). This comparison uses per-file numbers for apples-to-apples vs the 11-session EXP-14 data.
| Stop reason | v1.5.17 | v1.5.18 | v1.5.19 | EXP-14 delta |
|---|---|---|---|---|
| mishap | 233 (42%) | 64 (63%) | 77 (75%) | +13 |
| resource_exhausted | 67 (12%) | 17 (17%) | 6 (6%) | -11 |
| sigil_vanished | 41 (7%) | 20 (20%) | 19 (18%) | -1 |
| scribed | 3 | 0 | 1 (C1 fake) | +1 |
| moves_exhausted | 209 (38%) | 0 | 0 | 0 |
The C1 fake scribe: Mahtra Sigil#127, precision 87, scribe_count=nil. Reached 87 but did not actually scribe (mishap or vanish), classified as SCRIBED by C1 bug.
>=80 sigils detail (all 3 ended in failure):
| Character | Sigil# | Start | Final | Iters | Stop |
|---|---|---|---|---|---|
| Barrask | #97 | 15 | 84 | 18 | mishap |
| Gnarta | #103 | 15 | 84 | 12 | sigil_vanished |
| Mahtra | #127 | 15 | 87 | 13 | C1 fake "scribed" |
All worked sigils start at exactly precision 15 (the skip <15 threshold from EXP-13).
-
Analysis: Cost equalization appeared to make things WORSE, not neutral:
- Mishap rate 74.8% โ highest of any version. The cost equalization removed the penalty for "destroying" actions (formerly cost=3). With all costs = 1, the algorithm no longer discriminates against resource-intensive actions, allowing more aggressive action selection. The result: more mishaps without compensating gains.
- Resource exhaustion dropped (17โ6) โ fewer sigils run out of resources because they mishap before reaching resource exhaustion. This is not an improvement.
- No precision improvement: avg precision 52.8 (= v1.5.17 baseline), >=80 count 3 vs v1.5.18's 2 (noise range, all failed anyway).
- The hypothesis was correct but the effect is harmful: Urbaj's observation that cost labels are resource descriptors (not cost predictors) is confirmed. But the old cost weighting {1,2,3} provided an accidental benefit โ it penalized "destroying" actions, which happen to consume sanity. This implicit conservation was better than no conservation.
- Post-mortem: The expected "LOW impact" assessment was wrong. While the viability filter usually leaves 1 option, equalizing costs affects the RISK composite (difficulty + cost) used in action selection. With costs equalized, RISK = difficulty + 1 for all actions, making the algorithm select purely by difficulty. In cases where multiple actions have the same difficulty, the tiebreaker changes. More importantly, the repair selection logic (line 671) becomes purely difficulty-based, potentially accepting riskier repair attempts.
- Verdict: REVERT (but see confounding note below).
โ CONFOUNDING NOTE (Feb 2026 retrospective)
EXP-14 was tested on EXP-13's broken code base (removed iteration cap, removed move budget, skip <15 for target 90). It was never tested against the real v1.5.17 baseline. This means the +11.4pp mishap increase attributed to cost equalization is confounded with EXP-13's already-catastrophic 63.4% mishap rate. The conclusion that cost equalization is "harmful" is not supported by clean data.
Why this matters:
- EXP-13 removed the iteration cap, so sigils ran 15-18 iterations where mishap probability compounds. Cost equalization on extended runs has a different effect than cost equalization on capped runs.
- The "accidental diversification" theory (point 4 above) is unvalidated post-hoc reasoning. The viability filter "typically leaves only 1 option per iteration" โ a rarely-exercised tiebreaker cannot plausibly cause +11.4pp mishap increase on baseline code where runs are capped at 15 iterations.
- Urbaj's data is validated by ours (100% correlation, 2,416 actions). The {1,2,3} mapping is provably wrong. It deserves a standalone test against the confirmed v1.5.21 baseline.
Action: Schedule v1.5.24 as a clean standalone cost equalization test ({1,2,3} โ {1,1,1}) against v1.5.21 baseline. See queued experiments.
-
Changes: Reverted EXP-13+14 to v1.5.17 algorithm. C1 fix (
@actually_scribedflag). D7 analyzer fix (repair_count increment). Version 1.5.20. - Sessions: 11 (all complete 60min, 11 characters)
-
Logs:
~/SH_logs/v1.5.20/
v1.5.20 vs v1.5.17 Comparison (11 sessions each for fair comparison):
| Metric | v1.5.17 (22 sess) | v1.5.20 (11 sess) | Delta |
|---|---|---|---|
| Worked/session | 25.2 | 23.5 | -1.8 |
| Skip rate | 80.2% | 79.8% | -0.4pp |
| Real scribes | 1 | 1 | 0 |
| >=80/session | 0.64 | 0.27 | -0.36 |
| Avg precision | 52.8 | 53.5 | +0.7 |
| Max precision | 88 | 88 | 0 |
| Avg iterations | 10.5 | 10.8 | +0.3 |
| Mishap rate | 42.0% | 46.9% | +4.9pp |
Stop Reasons (key finding โ resource_exhausted = 0%):
| Reason | v1.5.17 | v1.5.20 | Delta |
|---|---|---|---|
| mishap | 42.0% | 46.9% | +4.9pp |
| moves_exhausted | 37.7% | 46.1% | +8.4pp |
| resource_exhausted | 12.1% | 0.0% | -12.1pp |
| sigil_vanished | 7.4% | 6.6% | -0.8pp |
Bug discovered: EXP-9 resource exhaustion check ((san+res+foc)*1.75 + prec < target-5)
was accidentally omitted during the EXP-13โv1.5.20 revert. The check lives in sigil_info
(after resource parsing), not in the Phase 3 bail-out block that was restored. The 12.1%
of sigils that should exit via resource_exhausted instead continued to moves_exhausted (+8.4pp)
or mishap (+4.9pp).
Bug fix validations:
- C1: 1 SCRIBED result, 1 real (scribe_count=2). Zero fakes. VALIDATED.
- D7: Code correct, but 0 repairs observed in this batch (repairs are rare).
>=80 detail (3 sigils):
- Barrask Sigil#67: start=14, final=84, iters=12, stop=mishap
- Mahtra Sigil#33: start=14, final=88, iters=15, stop=scribed (2 scrolls) โ REAL scribe
- Mahtra Sigil#62: start=15, final=81, iters=12, stop=mishap
Verdict: Partial baseline. Core algorithm matches v1.5.17 but missing resource check inflates mishap rate by ~5pp and moves_exhausted by ~8pp. Fixed in v1.5.21.
- Changes: Restores EXP-9 resource exhaustion check. No other changes. Algorithm identical to v1.5.17 + C1 fix + D7 analyzer fix.
- Sessions: 11 (all complete 60min, 11 characters)
-
Logs:
~/SH_logs/v1.5.21/
v1.5.21 vs v1.5.17 vs v1.5.20 Comparison:
| Metric | v1.5.17 (22 sess) | v1.5.20 (11 sess) | v1.5.21 (11 sess) | 21 vs 17 |
|---|---|---|---|---|
| Worked/session | 25.2 | 23.5 | 23.8 | -1.4 |
| Skip rate | 80.2% | 79.8% | 79.6% | -0.6pp |
| Real scribes | 1 | 1 | 0 | -1 |
| >=80 | 14 | 3 | 7 | |
| >=80/session | 0.64 | 0.27 | 0.64 | 0.0 |
| Avg precision | 52.8 | 53.5 | 51.6 | -1.2 |
| Max precision | 88 | 88 | 87 | |
| Avg iterations | 10.5 | 10.8 | 10.2 | -0.3 |
| Mishap rate | 42.0% | 46.9% | 40.8% | -1.2pp |
Stop Reasons (resource_exhausted restored):
| Reason | v1.5.17 | v1.5.20 | v1.5.21 | 21 vs 17 |
|---|---|---|---|---|
| mishap | 42.0% | 46.9% | 40.8% | -1.2pp |
| moves_exhausted | 37.7% | 46.1% | 39.7% | +2.0pp |
| resource_exhausted | 12.1% | 0.0% | 13.4% | +1.3pp |
| sigil_vanished | 7.4% | 6.6% | 6.1% | -1.3pp |
>=80 detail (7 sigils โ 4 budget-stopped, 3 mishapped):
| Character | Sigil | Start | Final | Iters | Danger | Stop |
|---|---|---|---|---|---|---|
| Barrask | #60 | 13 | 83 | 14 | 18 | moves_exhausted |
| Fidon | #23 | 13 | 84 | 14 | 18 | moves_exhausted |
| Gnarta | #112 | 14 | 80 | 14 | 18 | moves_exhausted |
| Mahtra | #102 | 15 | 82 | 14 | 18 | moves_exhausted |
| Jazriel | #37 | 14 | 87 | 12 | 17 | mishap |
| Kythkani | #12 | 15 | 84 | 10 | 7 | mishap |
| Throve | #27 | 15 | 83 | 10 | 7 | mishap |
Key observation: 4 of 7 >=80 sigils were stopped by the move budget at iteration 14 with 1 iteration remaining before the cap. At precision 80-84 (gap of 1-5 to scribe threshold of 85), a single additional iteration at avg 7.3 gain would likely scribe them. This is the strongest signal yet for the "loosen move budget" experiment.
Verdict: BASELINE CONFIRMED. All metrics match v1.5.17 within normal variance. v1.5.21 is the corrected baseline with working C1/D7/EXP-9 instrumentation.
Systematic audit of all script assumptions using 4,608 iterations from 566 worked sigils
(v1.5.15 + v1.5.17 combined). Script: scratchpad/assumption_audit.rb.
Finding 1: Cost label โ resource mapping is 100% confirmed
| Cost Label | Consumes | Hit Rate | Avg Stars Consumed |
|---|---|---|---|
| destroying | sanity | 771/772 (100%) | -4.32 |
| disrupting | focus | 843/844 (100%) | -4.23 |
| taxing | resolve | 800/800 (100%) | -4.15 |
Each action consumes ~4.2 stars of exactly one resource. The @action_cost mapping
{taxing:1, disrupting:2, destroying:3} is provably wrong. Urbaj's claim confirmed with
perfect correlation. Refreshes consume zero resources but increase danger by ~1.0.
Finding 2: Clarity is a degrading hidden variable
- Clarity ALWAYS decreases during a sigil (474/485 sigils), never increases
- Per-iteration: 38.3% of iterations decrease clarity, 0% increase it, mean -1.17
- Refreshes degrade clarity 25x faster than actions (-2.52 vs -0.1 per iteration)
- Weak positive correlation: clarity 70-79 โ 6.7 avg gain; clarity 90-99 โ 7.3 avg gain
- Starting clarity: range 88-99, mean 96.3 (no signal from starting clarity binning)
- Implication: Refreshes have a hidden cost โ they degrade clarity much faster than actions. This reinforces the value of minimizing refreshes (already our strategy). Unclear if clarity directly affects game outcomes or is cosmetic.
Finding 3: Resource level directly affects gain per iteration
| Total Stars | N | Avg Gain | Median | Zero% |
|---|---|---|---|---|
| 0-5 | 3 | 5.0 | 4 | 33.3% |
| 11-15 | 17 | 4.8 | 5 | 5.9% |
| 16-20 | 92 | 6.9 | 6 | 0.0% |
| 21-30 | 626 | 6.5 | 5 | 0.0% |
| 31+ | 1831 | 7.5 | 7 | 0.1% |
Per-resource: consuming a resource at level 3-5 gives ~3-5 gain; at level 10+ gives ~7-8 gain. Danger shows no signal (gain flat across 0-17). Resource depletion doesn't just limit iterations โ it reduces per-iteration effectiveness. The resource exhaustion coefficient (1.75) may understate the impact because it doesn't account for diminishing returns at low resource levels.
Finding 4: Skip threshold should be 15 for target 90 (INVALIDATED โ see SIM-7 below)
| Start Precision | Count | Avg Final | Max | >=80 | >=85 | Scribed |
|---|---|---|---|---|---|---|
| 13 | 209 | 50.9 | 83 | 1 | 0 | 0 |
| 14 | 162 | 51.3 | 78 | 0 | 0 | 0 |
| 15 | 194 | 53.5 | 85 | 6 | 1 | 1 |
All 566 worked sigils start at 13-15 (current threshold skips <13). Start=13 and start=14 never reach 85 in 371 attempts. Raising threshold to <15 saves 3,851 iterations (~37 sigils ร 10.4 avg iters) with zero lost scribes. Only start=15 has any scribe potential. This should be part of EXP-13 or a standalone micro-experiment.
CORRECTION (Feb 2026): This analysis used combined v1.5.15+v1.5.17 data (566 worked). The v1.5.15 data was collected WITHOUT Awakened technique. SIM-7 (below), using v1.5.17 data only (555 worked, WITH Awakened), shows ALL 3 scribes started at precision 13. Awakened provides enough of a boost that start=13 CAN reach 90. The pre-Awakened data diluted this effect in the combined dataset. Skip <15 is DEAD for post-Awakened testing. The current skip <13 threshold is correct.
Finding 5: Iteration cap prevents 15-37 potential scribes per 566 worked
181 sigils (32% of worked) were stopped by the move budget check. Monte Carlo projection (5,000 simulations each, using observed gain distribution):
- 37 of 181 (20%) have >50% probability of reaching 85 with more iterations
- 15 of 181 (8%) have >50% probability of reaching 90
- Several sigils at 70+ with 10-15 remaining stars have 55-72% chance of reaching 90
- Confirms EXP-13 (remove cap) as highest-impact change
Finding 6: No infinite loop risk without cap
- 76% of refreshes produce a viable action on the next iteration
- Refresh streaks: mean 1.2, max 6. Only 3.9% are 3+ consecutive, 0.2% are 5+.
- Resource drain provides natural termination: refreshes cost 0 resources but +1 danger, and resource exhaustion check catches depleted sigils
- Safe to remove iteration cap with resource-based exit as primary guard
Finding 7: <= 80 guard โ minor impact at current volumes
6 sigils reached 80-84 but couldn't scribe (dead ends). However, all 6 had viable
resource projections when crossing 80 โ they died to mishaps (4/6) or iteration cap.
The resource check with <= 80 guard wouldn't have caught any of them earlier.
11 wasted iterations above 80 with zero gain. Low priority fix โ mishaps, not the guard,
are the primary cause of dead ends at 80+.
Queue updates from audit (updated post-EXP-13 results):
| Priority | Change | Experiment | Status |
|---|---|---|---|
| 1 | Remove iteration cap + move budget | EXP-13 | REVERTED โ mishap rate 63.4%, 0 scribes |
| 2 | Raise skip threshold to <15 for target 90 | Was in EXP-13 | REVERTED as bundle โ data valid but must be isolated |
| 3 | Equalize costs | EXP-14 | REVERTED โ mishap rate 74.8%, 0 real scribes |
| 4 | Resource-aware action selection (prefer full resources) | Future | Untested |
| 5 | Monitor clarity degradation | Observational | Finding 2 above โ refreshes degrade 25x faster |
Systematic line-by-line review of every hardcoded value, threshold, and decision point in the algorithm. Each entry identifies the assumption, its test status, and whether it can be isolated for experimentation.
A. Magic Numbers & Thresholds
| ID | Line | Value | Assumption | Status |
|---|---|---|---|---|
| A1 | 264 | 1.75 |
Resource stars โ precision conversion coefficient | EXP-9 (kept). Sub-assumption: all 3 resources are fungible โ untested |
| A2 | 162,270,283 | target - 5 |
Minimum useful scribe precision / bail-out margin | Game mechanic? Untested whether -3 or -7 is better |
| A3 | 323 | < 15 |
Skip threshold for target 90 | Data-confirmed (0 scribes from start<15). Part of EXP-13 revert โ retest standalone |
| A4 | 328 | < 13 |
Skip threshold for target 80 | EXP-10 (kept) |
| A5 | 290 | < 2 |
Max 2 aspect repairs per sigil | Original design. Never tested. |
| A6 | 669 | precision - 15 |
Only repair when within 15 of target | Never tested. |
| A7 | 666 | <= 3 |
Only trivial/straightforward/formidable repairs | Never tested. |
| A8 | 668 | >= 2 |
Repair margin requirement (stricter than precision's > 0) | Never tested. |
| A9 | 291 | <= 18 |
Don't repair when danger > 18 | Near max, rarely reached |
| A10 | 379 | >= 14 |
Trader luck threshold (guild-specific) | Never tested. Hard to isolate |
| A11 | 46 | 1,2,3,4,5 |
Difficulty ordinal values | EXP-6 (confirmed) |
| A12 | 43 | 1,1,1 |
Cost labels equalized | EXP-14 (reverted). Data-confirmed but harmful โ old {1,2,3} provided useful implicit diversification |
B. Algorithm Decision Logic
| ID | Line | Logic | Assumption | Status |
|---|---|---|---|---|
| B1 | 648-661 |
margin > 1 any, margin > 0 for challenging+ |
Viability cutoffs | EXP-12 (margin>=0 reverted twice, +6.6pp mishap) |
| B2 | 233 | Prefer highest difficulty | Max difficulty โ max gain | EXP-7 (kept, confirmed) |
| B3 | 226 | Skip ACTION verb | ACTION has 24.8% zero-gain rate | EXP-6 (confirmed) |
| B4 | 200-213 | Repair when stat - difficulty < 2 AND difficulty >= 3
|
Pre-scan threshold for repair candidates | Untested โ the difficulty >= 3 filter means we never repair for trivial/straightforward |
| B5 | 301-303 | Refresh when no action available | Only alternative is quitting the sigil | Game mechanic โ but refreshes have hidden clarity cost (Finding 2) |
C. Known Bug
| ID | Line | Bug | Impact |
|---|---|---|---|
| C1 | 162 | SCRIBED misclassification |
ACTIVE โ 50% of all "SCRIBED" results across all versions are fakes. After loop exit (including mishaps), any sigil with precision >= target-5 is classified as SCRIBED even if no scribing occurred. Confirmed from raw logs: 10 of 20 "SCRIBED" results have zero "You carefully scribe" game messages. Additionally, the analyzer (line 566) inherits the bug by assigning stop_reason=:scribed based on the script's result field. CRITICAL FIX: (1) Script: track @actually_scribed flag, (2) Analyzer: use scribe_count > 0 not result field. |
D. Untested Game Mechanics
| ID | Question | Current Position | Measurable From Logs? |
|---|---|---|---|
| D1 | Does danger affect mishap probability? | Data says no (uniform 0-18) | Yes โ measured in Finding 3 |
| D2 | Does clarity affect precision gain? | Weak signal (+0.6 from 70-79 to 90-99) | Yes โ measured in Finding 2 |
| D3 | Resource consumption per difficulty | ~4.2 stars per action (Finding 1) | Partially โ need delta analysis per difficulty level |
| D4 | Do refreshes cost resources? | Finding 1 says 0 resources, +1 danger | Yes โ measured |
| D5 | Is there a game iteration soft cap? | No evidence (max seen: 18) | Observational only |
| D6 | Does repair restore the target resource? | Game text implies yes | Measurable from resource snapshots |
| D7 |
repair_count tracking |
Initialized but never incremented in analyzer | Code gap โ parser never counts repairs |
E. Isolatable Experiments by Priority (post-EXP-13 revert)
Original priority list โ superseded by simulation results below.
| Priority | What to test | Which assumption | How to isolate |
|---|---|---|---|
| KILLED by SIM-7 โ all scribes start at 13 | |||
| BLOCKED โ SIM-4 inconclusive, need D7 fix first | |||
| KILLED by SIM-8 โ no sigils exhaust near target | |||
| 4 | Resource-specific projection (not sum-all) | A1 sub | SIM-3 validated fungibility โ deprioritized |
| 5 | Fix SCRIBED misclassification | C1 | Add @actually_scribed flag โ promoted to Phase 0
|
| 6 | Resource consumption per difficulty level | D3 | SIM-2 validated flat rate โ resolved, no experiment |
| NEW | Resource bail-out threshold | A1 | SIM-3 finding: 38.5 precision wasted per sigil |
See "Simulation-Based Testing Chronology" below for the updated experiment sequence.
Eight simulations run against v1.5.17 data (22 sessions, 555 worked sigils, 3 scribed) to classify each hypothesis from the Code-Level Assumption Audit as:
- Validated by logs โ answered from existing data, no live experiment needed
- Killed by simulation โ simulation shows the hypothesis has no impact
- Informs experiment โ simulation guides how to design a live experiment
- Inconclusive โ data gap prevents reliable simulation
Script: scratchpad/assumption_simulations.rb
SIM-1: SCRIBED Misclassification Bug (C1) โ CORRECTED: BUG IS ACTIVE
| Metric | SIM-1 result | Corrected (raw log audit) |
|---|---|---|
| Total SCRIBED results | 3 | 3 |
| True scribes (scribe_count > 0) | 3 | 1 (Fidon, 4 scrolls) |
| Misclassified (scribe_count = 0) | 0 | 2 (Byd, Refia) |
CORRECTION: SIM-1 reported 0 misclassifications because the analyzer's
determine_stop_reason(line 566) assigns:scribedfor ANYresult=SCRIBED, inheriting the script's buggy classification. The simulation checkedstop_reason != :scribedโ which can never detect C1 because the analyzer trusts the script's result field.A raw log audit using
scribe_count(from actual "You carefully scribe" messages) reveals 2 of 3 v1.5.17 "SCRIBED" results are C1 fakes:
- Byd #96: precision 85, "Sigil harvesting failed" (mishap), 0 scrolls produced
- Refia #84: precision 88, "all traces of the sigil have vanished", 0 scrolls
- Fidon #37: precision 86, "Final precision: 86, scribing", 4 scrolls (REAL)
Across ALL versions: 20 result=SCRIBED, 10 real, 10 C1 fakes. 50% misclassification rate. The bug is NOT dormant โ it actively inflates scribe counts.
Classification: CRITICAL bug fix โ actively corrupting data. Phase 0 priority.
SIM-2: Resource Consumption Per Difficulty (D3)
| Iteration Range | N | Avg cost/iter | Total avg consumed |
|---|---|---|---|
| 1-5 | 22 | 2.99 | 12.9 |
| 6-10 | 184 | 2.26 | 19.3 |
| 11-14 | 347 | 2.14 | 25.5 |
| 15 | 2 | 2.67 | 40.0 |
Resource consumption rate is roughly flat at ~2.1-3.0 stars/iter regardless of how deep into a sigil we are. Higher initial rate (1-5 iters) likely reflects higher starting resources enabling more expensive actions early. The per-action cost of ~4.2 stars (Finding 1) is consistent.
Classification: Validated by logs โ no experiment needed.
SIM-3: Resource Fungibility (A1)
| Resource | Avg at exit | Median | Zero% |
|---|---|---|---|
| sanity | 7.4 | 7 | 1.1% |
| resolve | 7.2 | 7 | 2.0% |
| focus | 7.4 | 7 | 1.1% |
- Imbalanced exits (one resource=0, another>=3): 22 (4.0%)
- Balanced exits: 533
- Avg remaining stars at exit: 22.0
- 22.0 ร 1.75 = 38.5 projected precision wasted per sigil
Resources deplete EVENLY, confirming the sum-all projection is valid. But the MAJOR finding is that sigils exit with 22 stars remaining on average. This means the bail-out formula (resource projection coefficient 1.75, line 264) is TOO AGGRESSIVE โ it triggers the resource exhaustion exit while significant resources remain, leaving 38.5 projected precision on the table per sigil.
Classification: Validated (fungible) + MAJOR FINDING โ bail-out aggressiveness is the highest-impact tuning target. See Testing Chronology Phase 2.
SIM-4: Repair Cap (A5)
The repair_count field is never populated in the analyzer (code gap D7). The simulation
estimated repairs by counting action menu items where aspect = resource name. This
methodology is FLAWED: it counts all menu items offered, not algorithm-selected repairs.
Results (0-38 "repairs" per sigil, 99.8% "at cap") are misleading.
Classification: Inconclusive โ must fix D7 first, collect 10+ sessions with real repair_count tracking, then re-simulate.
SIM-5: Fate of High-Precision Sigils (A2/C1)
| Sigil | Start | Peak | Final | Result | Iters | Danger |
|---|---|---|---|---|---|---|
| #96 | 13 | 85 | 85 | SCRIBED | 12 | 17 |
| #37 | 13 | 86 | 86 | SCRIBED | 14 | 18 |
| #84 | 13 | 88 | 88 | SCRIBED | 13 | 18 |
3 sigils ever reached precision 85+ in 555 worked. The analyzer reports all 3 as "scribed" but raw log audit (SIM-1 correction) reveals only 1 actually scribed:
| Sigil | Start | Peak | Final | Real? | Actual outcome |
|---|---|---|---|---|---|
| #96 (Byd) | 13 | 85 | 85 | FAKE | Mishap at 85 ("Sigil harvesting failed") |
| #37 (Fidon) | 13 | 86 | 86 | REAL | Scribed, 4 scrolls produced |
| #84 (Refia) | 13 | 88 | 88 | FAKE | Sigil vanished at 88 |
2 of 3 sigils that reached 85+ were LOST to mishap/vanish. Only 1 successfully scribed. The C1 bug window is NOT narrow โ it's hitting 67% of 85+ sigils in our data.
All 3 started at precision 13 with 12-14 iterations and danger 17-18 at peak.
Classification: Partially validated, C1 impact severe โ reaching 85+ does not guarantee scribing. The high danger (17-18) at 85+ means significant mishap risk remains.
SIM-6: Refresh Cost (D4)
| Metric | With refreshes (N=339) | Without refreshes (N=216) |
|---|---|---|
| Avg refreshes per sigil | 1.6 | 0 |
| Resource cost/iter | 2.08 stars | 2.44 stars |
| Danger/iter | 1.08 | 0.77 |
Refreshes consume 0 resources (lower per-iter cost because refresh iterations don't drain resources) but add danger (+0.31 danger/iter compared to non-refresh sigils). This confirms Finding 2 (refreshes have hidden cost through clarity/danger accumulation). No algorithmic change indicated โ we already minimize refreshes.
Classification: Validated โ no experiment needed.
SIM-7: Skip Threshold Sensitivity (A3) โ CRITICAL FINDING
| Threshold | Work | Skip | >=80 | Scribes | Lost >=80 | Lost scribes | Iters saved |
|---|---|---|---|---|---|---|---|
| Skip <12 | 555 | 0 | 14 | 3 | 0 | 0 | 0 |
| Skip <13 | 555 | 0 | 14 | 3 | 0 | 0 | 0 |
| Skip <14 | 342 | 213 | 8 | 0 | 6 | 3 | 2259 |
| Skip <15 | 183 | 372 | 7 | 0 | 7 | 3 | 3899 |
| Skip <16 | 1 | 554 | 0 | 0 | 14 | 3 | 5818 |
ALL 3 scribes started at precision 13. Any skip threshold above <13 eliminates ALL scribes from the v1.5.17 dataset. Skip <14 also loses 6 of 14 >=80 sigils. Skip <15 (the threshold from EXP-13) loses 7 of 14 >=80 sigils AND all 3 scribes.
This INVALIDATES Finding 4 for post-Awakened testing. The original analysis used combined v1.5.15+v1.5.17 data where the pre-Awakened v1.5.15 data showed 0 scribes from start=13. With Awakened active, the precision boost is sufficient for start=13 sigils to reach 90. The current skip <13 threshold is correct and MUST NOT be raised.
Classification: KILLED โ skip <15 hypothesis is dead. Current <13 is optimal.
SIM-8: Repair Proximity Threshold (A6)
| Distance from target | Count |
|---|---|
| 0-5 (near target) | 0 |
| 6-15 (in repair range) | 0 |
| 16-30 (outside repair range) | 2 |
| 31+ (far from target) | 65 |
All 67 resource-exhausted sigils were 16+ precision from target. Zero were in the 6-15 range where the repair proximity threshold operates. The threshold is irrelevant because resource exhaustion only hits sigils far from target โ sigils near target have been efficiently progressing and don't exhaust resources.
Classification: KILLED โ widening/removing threshold has zero impact.
| ID | Hypothesis | SIM | Classification | Action |
|---|---|---|---|---|
| C1 | SCRIBED misclassification bug | SIM-1, SIM-5 | ACTIVE โ 50% misclass rate | CRITICAL bug fix (Phase 0) |
| D3 | Resource consumption varies by difficulty | SIM-2 | Flat ~2.1-3.0 stars/iter | Validated โ no experiment |
| A1 | Resources are fungible (sum-all valid) | SIM-3 | Even depletion (4% imbalanced) | Validated โ no experiment |
| A1-sub | Bail-out threshold too aggressive | SIM-3 | 38.5 precision wasted/sigil | Experiment (Phase 2) |
| A5 | Repair cap of 2 is binding | SIM-4 | Estimation method flawed | Inconclusive โ fix D7 first |
| A2 | target-5 scribe margin | SIM-5 | 85+ does NOT guarantee scribe (1/3 real) | Needs investigation |
| D4 | Refreshes cost resources | SIM-6 | 0 resources, +0.31 danger/iter | Validated โ no experiment |
| A3 | Skip <15 for target 90 | SIM-7 | All 3 scribes start at 13 | KILLED โ stay at <13 |
| A6 | Repair proximity threshold matters | SIM-8 | 0 sigils exhaust near target | KILLED โ no experiment |
Score: 5 validated by logs, 2 killed by simulation, 1 informs experiment, 1 inconclusive.
Ordered experiment sequence informed by simulation results. Each phase depends on the previous phase being complete.
Phase 0: Infrastructure & Bug Fixes โ DONE (v1.5.20 + v1.5.21)
| Item | What | Status |
|---|---|---|
| Revert EXP-13 | Restore v1.5.17 algorithm as baseline | Done (v1.5.20) |
| Fix C1 | Add @actually_scribed flag |
Done (v1.5.20) โ validated: 0 fake SCRIBEDs |
| Fix D7 | Increment repair_count in analyzer parser |
Done (v1.5.20) โ code correct, 0 repairs in sample |
| Fix EXP-9 omission | Restore resource exhaustion check | Done (v1.5.21) โ was accidentally dropped in v1.5.20 |
v1.5.20 deployed and tested (11 sessions). Discovered missing EXP-9 resource check (0% resource_exhausted vs 12.1% baseline). Fixed in v1.5.21.
Phase 1: Complete EXP-14 Analysis โ DONE, REVERT
- Collected 11 v1.5.19 sessions (all 11 characters)
- Ran on EXP-13 code base (not rebased โ both EXP-13 and EXP-14 now REVERT)
- Results: mishap rate 74.8% (+11.4pp vs EXP-13 base), 0 real scribes, no metric improvement
- Cost equalization removed cost penalty for dangerous actions โ more mishaps
- See EXP-14 detailed results above
Phase 1.5: Corrected Baseline (v1.5.21) โ DONE, CONFIRMED
- 11 sessions, all metrics match v1.5.17 within normal variance
-
=80/session: 0.64 (exact match), mishap: 40.8% (vs 42.0%), resource_exhausted: 13.4% (restored)
- Key finding: 4 of 7 >=80 sigils stopped by move budget at 80-84 with 1 iter remaining
- C1 validated (0 fakes), D7 code correct (0 repairs observed โ genuinely rare)
- Corrected baseline established. Ready for Phase 2.
Phase 2a: EXP-15 (v1.5.22) โ DONE, REVERT
Change: (14 - @num_iterations) โ (15 - @num_iterations) in move budget formula.
| Metric | v1.5.21 (baseline) | v1.5.22 (EXP-15) | Delta |
|---|---|---|---|
| Worked | 262 | 259 | -3 |
| >=80 | 7 (0.64/sess) | 6 (0.55/sess) | -1 |
| Scribes | 0 | 0 | 0 |
| Mishap rate | 40.8% | 52.5% | +11.7pp (Z=2.67) |
| moves_exhausted | 104 (39.7%) | 47 (18.1%) | -21.6pp |
| iteration_cap | 0 (0%) | 4 (1.5%) | +1.5pp |
| resource_exhausted | 35 (13.4%) | 37 (14.3%) | +0.9pp |
| sigil_vanished | 16 (6.1%) | 32 (12.4%) | +6.3pp |
The formula change freed ~57 sigils from budget exits. Of those: ~29 mishapped, ~16 vanished, 4 reached iteration cap (3 at precision 83, gap=2 from scribe). The old off-by-one was functioning as a safety guardrail โ extending sigils costs more mishaps than it gains.
moves_exhausted distribution shifted rightward by 1 iteration (each cohort got 1 more iter):
- v1.5.21: iter 10(5), 11(28), 12(42), 13(23), 14(6)
- v1.5.22: iter 12(8), 13(23), 14(16)
Lesson: Don't extend sigils deeper into the danger zone. The bottleneck is mishap rate at high iterations (24% at iter 11, 20% at iter 12), not the budget formula.
Phase 2b: EXP-16 (v1.5.23) โ DONE, REVERT
Change: resource exhaustion coefficient 1.75 โ 1.5 in sigil_info.
Results: Total wipeout โ 0 worked sigils, 1718 skipped (100%), 0 scribes.
The coefficient 1.5 is mathematically impossible for target 90:
- Max starting resources: 15 + 15 + 15 = 45 stars
- Available at coeff 1.5: 45 ร 1.5 + precision = 67.5 + precision
- Threshold: target - 5 = 85
- Need: 67.5 + precision โฅ 85 โ precision โฅ 18 required
- Starting precision is almost never 18+, so ALL sigils bail on iteration 0
At coeff 1.75: 45 ร 1.75 + 13 = 91.75 โฅ 85 โ passes fine. Minimum viable coefficient: (85 - 13) / 45 = 1.6
Of 1718 total sigils: 1388 skipped by "below 13" threshold, 330 passed it but immediately hit the resource exhaustion exit. The resource check is evaluated on iteration 0 with full resources โ at 1.5, even full resources + precision 13 gives only 80.5, below the 85 threshold.
Post-mortem: This was a calculation error in experiment design. The coefficient determines
the minimum starting precision at full resources. The relationship should have been checked:
(target - 5 - min_starting_precision) / max_resources = (85 - 13) / 45 = 1.6. Any
coefficient below 1.6 makes it impossible for precision-13 sigils (the skip threshold) to
even start. The 1.75 coefficient already provides minimal headroom (91.75 vs 85 threshold).
Future coefficient experiments should target 1.65-1.70 range, not below 1.6.
Lesson: Always verify the boundary condition: can a sigil at the skip threshold (precision 13) with max resources (45 stars) pass the resource check? If not, the coefficient is too low.
Phase 2c: EXP-14 Retest โ Clean Cost Equalization (v1.5.24) โ DONE, KEPT
Change: @action_cost from { taxing: 1, disrupting: 2, destroying: 3 } to
{ taxing: 1, disrupting: 1, destroying: 1 }. Tested against confirmed v1.5.21 baseline.
| Metric | v1.5.21 (baseline) | v1.5.24 (retest) | Delta |
|---|---|---|---|
| Worked | 262 | 263 | +1 |
| Worked/session | 23.8 | 23.9 | +0.1 |
| Scribed (real) | 0 | 2 | +2 |
| >=80 | 7 | 3 | -4 (Fisher p=0.22, n.s.) |
| Mishap rate | 40.8% | 39.2% | -1.6pp (Z=0.39, p=0.70, n.s.) |
| moves_exhausted | 104 (39.7%) | 98 (37.3%) | -2.4pp |
| resource_exhausted | 35 (13.4%) | 32 (12.2%) | -1.2pp |
| sigil_vanished | 16 (6.1%) | 28 (10.6%) | +4.5pp (Z=1.88, p=0.06, marginal) |
| Avg gain/iter | 7.2 | 7.3 | +0.1 |
| Danger at mishap | 7.8 | 7.1 | -0.7 |
Scribes (both C1-validated, 4 scrolls each):
- Barrask #87: prec 92/90, 11 iters, danger 11, start 15 (efficient โ low danger)
- Refia #62: prec 86/90, 15 iters, danger 18, start 14
>=80 detail (3 total, 2 scribed):
- Barrask #87: start 15 โ 92, scribed (4 scrolls)
- Refia #62: start 14 โ 86, scribed (4 scrolls)
- Refia #92: start 14 โ 80, mishap at danger 11
Key findings:
- Mishap rate UNCHANGED (Z=0.39, p=0.70). The confounded EXP-14 showed +11.4pp. The clean test shows -1.6pp (noise). The original "harmful" conclusion was wrong. The "accidental diversification" theory is refuted โ equalizing costs has no measurable effect on mishap rate when tested against baseline code.
- 2 real scribes in 11 sessions โ best single-test result since Awakened technique. Small counts (not statistically significant), but directionally positive.
- sigil_vanished marginally up (p=0.06). Monitor but no action needed โ not significant at p<0.05, and no code change would explain this (only cost tiebreaking changed).
- >=80 down 7โ3 but not significant (Fisher p=0.22). Notably, 2/3 >=80 sigils scribed (67% conversion) vs 0/7 in baseline (0% conversion).
-
Code is now correct:
@action_costaccurately reflects that each label describes WHICH resource, not HOW MUCH. The {1,2,3} mapping was provably wrong.
Verdict: KEPT. Cost equalization is neutral. v1.5.24 becomes the new baseline.
Phase 2d: EXP-17 โ Resource-Aware Tiebreaker (v1.5.25) โ KEPT
Experiment selection analysis (v1.5.24 data, 2,570 iterations with parsed actions):
Four candidate experiments were evaluated for v1.5.25:
| Option | Change | Mechanism | Effect size | Risk |
|---|---|---|---|---|
| A: Resource-aware tiebreaker | When 2+ actions share highest difficulty, prefer action draining most-available resource | Preserves scarce resources, extending productive iterations | 9.4% of iterations (241/2,570 ties) | Low โ only changes tiebreaking |
| B: Resource coefficient 1.75โ1.65 | Lower bail-out threshold | Retains ~1 more sigil per session | ~4.5 precision points headroom | Low but tiny effect |
| C: Danger-aware throttling | Reduce difficulty when danger is high | Reduce mishap rate | 68% of mishaps at danger <10 โ weak signal | Medium โ requires model of mishap function |
| D: Move budget 13โ11 | Lower precision/move coefficient | Bail fewer sigils | Wrong direction โ bails MORE sigils | N/A โ excluded |
Why Option A (resource-aware tiebreaker):
-
Measurable frequency: Fires in 9.4% of iterations (241 ties out of 2,570). That's ~24 tiebreaker decisions per 11-session test โ enough to detect an effect.
-
100% heterogeneous cost profiles: Every observed tie involves actions that drain DIFFERENT resources (e.g., one taxing/resolve, one destroying/sanity). This means every tie offers a real choice โ the tiebreaker always has a meaningful preference to express.
-
Tie distribution (balanced across all resource pairs):
- taxing/destroying (resolve vs sanity): ~33%
- disrupting/taxing (focus vs resolve): ~33%
- disrupting/destroying (focus vs sanity): ~33%
- 3-way ties: <1%
-
Direct mechanism: Resource conservation extends the productive phase. When resources are asymmetric (e.g., sanity=12, focus=5, resolve=8), draining the abundant resource (sanity) instead of the scarce one (focus) avoids hitting the resource exhaustion bail-out prematurely. The bail-out check uses
(sanity + resolve + focus) * 1.75 + precision < 85, so preserving total resource pool matters. -
Low risk: Only fires when two actions are already tied on difficulty (same expected gain) and cost (same impact weight). The change never overrides the primary selection criterion (highest difficulty) or the secondary (lowest cost). It only resolves what was previously an arbitrary first-encountered-wins tie.
Implementation (lines 243-258 of sigilharvest.lic):
The existing action selection has two levels:
- Prefer highest difficulty (EXP-7, determines precision gain)
- Break ties by lowest cost/impact (conserve resources)
With cost equalization (EXP-14 retest, all costs = 1), level 2 never fires. EXP-17 adds
a third level: when difficulty AND cost are tied, prefer the action whose resource label
corresponds to the highest current resource level (contest_stat_for).
# Level 3 tiebreaker (EXP-17):
elsif x['difficulty'] == sigil_action['difficulty'] && x['impact'] == sigil_action['impact']
if contest_stat_for(x['resource']) > contest_stat_for(sigil_action['resource'])
sigil_action = x
endResource mapping: contest_stat_for('sanity') โ @sanity_lvl, 'resolve' โ @resolve_lvl,
'focus' โ @focus_lvl. These are already parsed from the game's star display each iteration.
Why not the other options:
-
B (coefficient): At 1.75, the formula gives
45 ร 1.75 + 13 = 91.75vs threshold 85. Changing to 1.65 gives45 ร 1.65 + 13 = 87.25โ only 2.25 points less headroom. The effect is too small to measure reliably in 11 sessions. - C (danger-aware): 68% of mishaps occur at danger <10, suggesting danger doesn't strongly predict mishap probability. Without a validated mishap model, any throttling rule is speculative. Needs more data analysis before experimenting.
- D (move budget): Lowering the coefficient from 13 to 11 means the formula bails MORE sigils (declares them hopeless earlier). This shrinks the candidate pool โ wrong direction.
EXP-17 Results (12 sessions, 11 characters, Shard/permutation/target=90/60min):
- Sessions: 12 (11 complete, 1 incomplete โ Kythkani fragment, 322 lines)
-
Logs:
~/SH_logs/v1.5.25/ - Baseline: v1.5.24 (EXP-14 retest, 11 sessions)
- C1 audit: 1 real scribe (Byd, 4 scrolls, precision 87). No C1 misclassifications.
| Metric | v1.5.24 (baseline) | v1.5.25 (EXP-17) | Delta |
|---|---|---|---|
| Sessions | 11 | 12 | +1 |
| Worked | 263 | 254 | -9 |
| Scribed | 2 | 1 | -1 |
| Mishap rate (per sigil) | 39.2% | 33.9% | -5.3pp |
| Mishap rate (per iter) | 3.8% | 3.2% | -0.6pp |
| Avg gain/productive iter | 7.3 | 7.1 | -0.2 |
| Avg iters/sigil | 10.3 | 10.6 | +0.3 |
| Refresh rate | 9.5% | 8.9% | -0.6pp |
| Failed actions | 116 | 136 | +20 |
| Stop Reason | v1.5.24 | v1.5.25 | Delta |
|---|---|---|---|
| moves_exhausted | 98 (37.3%) | 117 (46.1%) | +8.8pp |
| mishap | 103 (39.2%) | 86 (33.9%) | -5.3pp |
| resource_exhausted | 32 (12.2%) | 27 (10.6%) | -1.6pp |
| sigil_vanished | 28 (10.6%) | 23 (9.1%) | -1.5pp |
| scribed | 2 (0.8%) | 1 (0.4%) | -0.4pp |
Analysis:
-
Mishap rate directionally improved (39.2%โ33.9% per sigil, 3.8%โ3.2% per iter). Two-proportion z-test: z = -1.25, p โ 0.21 โ not statistically significant at p<0.05. Consistent with hypothesis but insufficient sample size to confirm.
-
Stop-reason shift is mechanically coherent: fewer resource-exhausted exits (32โ27) and fewer mishaps (103โ86), with more moves_exhausted exits (98โ117). Sigils survive longer (avg iters 10.3โ10.6), dying to the move budget instead of resource depletion or mishaps. This is exactly what resource conservation should produce.
-
Scribe count (2โ1) is in the noise. We've seen 0-2 real scribes per 11-session run consistently across all versions. Not a meaningful signal.
-
Failed actions increased (116โ136). The tiebreaker may sometimes pick an action whose resource is abundant but has a higher failure rate. Worth monitoring but not alarming at this sample size.
-
Precision gain marginally lower (7.3โ7.1). Expected โ the tiebreaker resolves ties that were previously arbitrary, sometimes choosing a slightly different action. The tradeoff is resource conservation vs marginal per-iteration gain.
Verdict: KEPT. The tiebreaker is a zero-risk third-level selection rule that fires in ~9.4% of iterations. No degradation in any critical metric. Directional improvement in mishap rate and resource exhaustion. Mechanically coherent stop-reason shift. The change is too small to achieve significance in 11 sessions, but there is no signal of harm and the mechanism is sound. v1.5.25 becomes the new baseline.
To validate the cumulative effect of all changes since the original script, an instrumented
v1.2.0 baseline was created (sigilharvest-v120-baseline.lic) and run for 11 sessions under
identical conditions (Shard/permutation/target=90/60min/Inspired+Enlightened+Illuminated+Awakened).
What the v1.2.0 baseline includes (instrumentation only, no algorithm impact):
- C1 fix (
@actually_scribedflag) โ required for accurate scribe counting -
resolve_burin/get_burin/stow_burinโ infrastructure parity - Difficulty fix (formidable=3, challenging=4, difficult=5) โ confirmed bug fix
- Cost equalization ({1,2,3}โ{1,1,1}) โ confirmed correct mapping
What the v1.2.0 baseline retains (original algorithm):
- Risk-based action selection (low risk far from target, high risk near target)
- ACTION verb accepted (not filtered)
- Skip threshold <10 (not <13)
- No iteration cap (removed per Urbaj's correction)
- Original bail-out coefficients (2.25/15) with <=80 guards
- No resource-aware tiebreaker
Logs: ~/SH_logs/v1.2.0/DR-*.log (11 sessions, Feb 4 2026)
| Metric | v1.2.0 (original algo) | v1.5.25 (current) | Delta |
|---|---|---|---|
| Sessions | 11 | 12 | +1 |
| Total sigils found | 983 | 1,284 | +301 |
| Worked | 414 | 254 | -160 |
| Skipped | 569 (57.9%) | 1,030 (80.2%) | +22.3pp |
| Scribed | 0 | 1 | +1 |
| >=80 precision | 7 (1.7%) | 6 (2.4%) | +0.7pp |
| Avg precision | 54 | 51 | -3 |
| Best precision | 86 | 87 | +1 |
| Metric | v1.2.0 | v1.5.25 | Delta |
|---|---|---|---|
| Avg iters/sigil | 10.9 | 10.6 | -0.3 |
| Avg gain/productive iter | 8.4 | 7.1 | -1.3 |
| Refresh rate | 13.9% | 8.9% | -5.0pp |
| Productivity rate | 45.1% | 50.7% | +5.6pp |
| Failed actions | 172 | 136 | -36 |
| Repairs detected | 2 | 0 | -2 |
| Metric | v1.2.0 | v1.5.25 | Delta |
|---|---|---|---|
| Mishap rate (per sigil) | 50.7% | 33.9% | -16.8pp |
| Mishap rate (per iter) | 4.6% | 3.2% | -1.4pp |
| Danger at mishap (avg) | 8.6 | 8.1 | -0.5 |
| Stop Reason | v1.2.0 | v1.5.25 |
|---|---|---|
| mishap | 210 (50.7%) | 86 (33.9%) |
| moves_exhausted | 167 (40.3%) | 117 (46.1%) |
| sigil_vanished | 35 (8.5%) | 23 (9.1%) |
| resource_exhausted | 2 (0.5%) | 27 (10.6%) |
| scribed | 0 | 1 (0.4%) |
-
v1.5.25 wins on quality, v1.2.0 wins on quantity. v1.2.0 works 63% more sigils (414 vs 254) because skip<10 attempts everything starting at precision 10+. But v1.5.25's skip<13 filters low-value sigils, yielding a higher 80+ rate (2.4% vs 1.7%) and the only scribe. v1.5.25 also finds more total sigils per session (107 vs 89) because it moves through rooms faster.
-
Mishap rate is the dominant difference โ 50.7% vs 33.9% per sigil, 4.6% vs 3.2% per iter. The v1.2.0 risk-based selection exposes sigils to more danger: picking low-difficulty actions early wastes iterations without reducing danger, then switching to high-difficulty late increases exposure at peak danger. v1.5.25's always-highest strategy is more efficient.
-
v1.2.0 gets higher per-action gain (8.4 vs 7.1) but wastes more on refreshes (13.9% vs 8.9%). The risk-based selection picks high-difficulty near target (gain 13+) but low-difficulty far from target (gain 2-5), producing more refreshes when low-risk actions don't yield viable follow-ups. v1.5.25's constant difficulty preference is more consistent with fewer wasted iterations.
-
Resource exhaustion check validates EXP-9. v1.2.0 uses the original 2.25 coefficient โ only 2 exits (0.5%). v1.5.25's 1.75 coefficient catches 27 sigils (10.6%) that would burn resources without reaching target.
-
C1 audit: The only SCRIBED result found in the v1.2.0 directory was a C1 fake from an old file (Saelia, Feb 1). Our 11 new sessions produced 0 real scribes.
Every change from v1.2.0 to v1.5.25 is empirically validated:
| Change | Version | Mechanism | Measured Effect |
|---|---|---|---|
| Difficulty fix | v1.5.10 | Correct formidable ranking | Eliminates wrong-action near target |
| ACTION verb filter | v1.5.10 | Skip zero-gain verb | -120 wasted iters/10 sessions |
| Skip <13 | v1.5.12 | Filter low-value sigils | +0.7pp 80+ rate, +301 sigils found |
| Difficulty-first selection | v1.5.13 | Always pick highest gain | -5.0pp refresh rate, +5.6pp productivity |
| Resource coeff 2.25โ1.75 | v1.5.15 | Earlier bail-out on hopeless | Catches 10.6% vs 0.5% resource exits |
| Iteration cap | v1.5.17 | Limit mishap exposure | -16.8pp mishap rate |
| C1 fix | v1.5.20 | Accurate scribe classification | 44% of old SCRIBEDs were fakes |
| Cost equalization | v1.5.24 | Correct resource mapping | Neutral (labels != amount) |
| Resource tiebreaker | v1.5.25 | Preserve scarce resources | Directional mishap improvement |
Verdict: The original algorithm works harder but less efficiently. v1.5.25 works smarter โ fewer sigils attempted, but each one has better odds, lower mishap exposure, and more accurate instrumentation.
Bug: The DRC.message('Executing aspect repair') log line was gated behind if @debug
(line 321). Since production runs don't use debug mode, the analyzer's REPAIR_ACTION pattern
(line 171 of sigilharvest_analyzer.rb) never matched anything. Result: repair_count was
always 0, making Phase 3 repair analysis impossible.
Fix: Removed if @debug from the repair log message. One-line change. The analyzer
already has the detection code โ it just never found the pattern in non-debug logs.
v1.5.26 Results (11 sessions, 255 worked sigils, 2606 iterations):
| Metric | v1.5.26 | v1.5.25 (baseline) | Delta |
|---|---|---|---|
| Worked | 255 | 254 | +1 |
| Scribed | 1 (Fidon, 3 scrolls, prec=93) | 1 (Byd, 4 scrolls, prec=87) | โ |
| Mishap/sigil | 42.0% | 33.9% | +8.1pp (p=0.06, n.s.) |
| Mishap/iter | 4.1% | 3.2% | +0.9pp |
| Productivity | 51.1% | 50.7% | +0.4pp |
| Avg gain | 7.18 | 7.09 | +0.09 |
| Reached 80+ | 2.0% | 2.4% | -0.4pp |
| Repairs | 0 | 0 | 0 |
No algorithm change. Mishap uptick is run-to-run variance (z=1.88, p=0.06).
3 combat_distracted exits (new stop reason, enemies in sigil rooms).
D7 Validation: Repairs are non-existent. Zero repairs in 255 worked sigils (2606 iterations). Grep confirms zero "Executing aspect repair" messages across all v1.5.26 logs.
Why repairs don't trigger: The repair path requires !sigil_action.key?("difficulty") โ
meaning no precision action is available. With difficulty-first selection (v1.5.20+), the
algorithm virtually always finds a viable precision action. The 224 refreshes (8.6%) happen
at the execution level (game RNG), not the selection level.
Historical comparison: v1.5.17 (risk-based selection, debug mode) triggered 5 repairs across ~300 worked sigils (~1.7% of sigils). All occurred at high precision (75-81) in late iterations (10-13). Results:
- 2/5 succeeded: recovered a resource, no precision change
- 3/5 caused mishaps: sigil destroyed (60% mishap rate on repairs)
Phase 3: CLOSED. Repairs are a non-factor with the current algorithm. They don't happen, and when they did (v1.5.17), they were actively harmful (60% mishap rate). No experiment needed.
Phase 4: CLOSED. The repair difficulty filter (requires difficulty >= 3) is moot โ
loosening it would allow more repairs, but repairs themselves are counterproductive.
With repairs closed, the next question: where do we get precision gains? Monte Carlo simulation (100k sigils per scenario) comparing gain and mishap levers.
Why v1.2.0 has higher avg gain (8.39 vs 7.18):
The gain-per-difficulty-level is identical between versions (trivial=2-3, difficult=13-14). The difference is in how often each difficulty level is selected:
| Gain range (est. difficulty) | v1.2.0 | v1.5.26 | Delta |
|---|---|---|---|
| 1-3 (trivial) | 4.5% | 25.8% | +21.3pp |
| 4-5 (straightforward) | 25.2% | 19.3% | -5.9pp |
| 6-8 (formidable) | 26.2% | 19.1% | -7.1pp |
| 9-11 (challenging) | 18.5% | 13.7% | -4.8pp |
| 12-16 (difficult) | 25.6% | 22.1% | -3.5pp |
v1.5.26 produces 5.7x more trivial-range gains. Both algorithms pick "highest difficulty available," but v1.5.26's skip<13 threshold works more low-starting-precision sigils where the game may offer weaker action menus. The effective gain per iteration (avg gain ร productivity) is similar: v1.2.0 = 3.78, v1.5.26 = 3.67. v1.2.0's higher per-action gain is partially offset by lower productivity (45.1% vs 51.1%).
Gain optimization โ scribe rate by avg gain:
| Scenario | Scribe% | >=80% | Multiplier |
|---|---|---|---|
| Current v1.5.26 (7.2) | 5.9% | 8.6% | 1.0x |
| +0.5 gain (7.7) | 8.4% | 11.8% | 1.4x |
| +1.0 gain (8.2) | 11.9% | 15.9% | 2.0x |
| v1.2.0 actual gains (8.4) | 12.4% | 16.9% | 2.1x |
| +1.5 gain (8.7) | 15.0% | 19.4% | 2.6x |
| +2.0 gain (9.2) | 18.7% | 23.5% | 3.2x |
Each +1.0 avg gain โ ~2.0x scribe rate improvement.
Mishap reduction โ scribe rate by mishap rate:
| Scenario | Scribe% | >=80% | Multiplier |
|---|---|---|---|
| Current (4.1%/iter) | 6.1% | 8.9% | 1.0x |
| -25% mishaps (3.1%/iter) | 7.0% | 10.0% | 1.15x |
| -50% mishaps (2.1%/iter) | 7.8% | 11.2% | 1.28x |
| -75% mishaps (1.0%/iter) | 9.0% | 12.9% | 1.48x |
| No mishaps (0%/iter) | 10.3% | 14.6% | 1.69x |
Even eliminating ALL mishaps gives only 1.69x. Halving mishaps gives 1.28x.
Combined analysis:
| Scenario | Scribe% | Multiplier |
|---|---|---|
| Baseline | 5.8% | 1.0x |
| Gain +1.0 alone | 11.7% | 2.0x |
| Mishap -50% alone | 7.8% | 1.3x |
| Both: gain+1.0 & mishap-50% | 15.3% | 2.7x (super-additive) |
| v1.2.0 gains & no mishaps | 20.8% | 3.6x |
Conclusions:
- Gain optimization is ~3.0x more impactful than mishap reduction
- They stack super-additively (combined 2.7x vs additive 2.4x)
- Priority: gain optimization first, mishap reduction second
- Recovering v1.2.0's gain level (+1.2) without its mishap penalty is the ideal target
- The mishap rate difference between v1.2.0 and v1.5.26 is small (4.6% vs 4.1%/iter) โ the gain gap is not caused by risk tolerance, it's caused by action menu composition
- Hypothesis: Trivial-difficulty (1) precision actions produce avg gain of 2.3, far below the 6.8-13.3 for formidable-difficult. v1.5.26 gets 25.8% trivial-range gains vs v1.2.0's 4.5%. Skipping trivial actions and refreshing for a better menu is +EV: the probability of getting a non-trivial action next iteration is ~74%, and the expected gain from that (74% * 8.0 = 5.9) greatly exceeds the trivial gain (2.3).
-
Change: Add
return false if difficulty < 2at the top ofprecision_action_viable?(line 693). When the highest-difficulty action in the menu is trivial, the algorithm will refresh (analyze the sigil) instead of taking the trivial action. - Sessions: 11 (all characters, Shard, permutation, target=90, 60min)
-
Logs:
~/SH_logs/v1.5.27/ - Baseline: v1.5.26 (11 sessions)
Results (11 sessions, 253 worked sigils, 2686 iterations):
| Metric | v1.5.27 | v1.5.26 | Delta | v1.2.0 |
|---|---|---|---|---|
| Worked | 253 | 255 | -2 | 414 |
| Avg gain | 8.54 | 7.18 | +1.36 | 8.39 |
| Productivity | 43.9% | 51.1% | -7.2pp | 45.1% |
| Effective gain/iter | 3.75 | 3.67 | +0.08 | 3.78 |
| Mishap/sigil | 38.7% | 42.0% | -3.3pp (n.s. p=0.46) | 50.7% |
| Mishap/iter | 3.65% | 4.11% | -0.46pp | 4.64% |
| Reached 60+ | 32.0% | 25.1% | +6.9pp | 33.3% |
| Reached 70+ | 10.7% | 7.8% | +2.9pp | 10.6% |
| Reached 80+ | 2.0% | 2.0% | 0.0pp | 1.7% |
| Resource exhausted | 5.9% | 11.8% | -5.9pp | 0.5% |
| Refresh rate | 17.2% | 8.6% | +8.6pp | 13.9% |
| Repairs | 3 | 0 | +3 | 2 |
| Scribed | 1 (Barrask, 4 scrolls, prec=92) | 1 (Fidon, 3 scrolls, prec=93) | โ | 0 |
Gain distribution by range:
| Gain range | v1.5.27 | v1.5.26 | v1.2.0 |
|---|---|---|---|
| Trivial (1-3) | 3.6% | 25.8% | 4.5% |
| Straightfwd (4-5) | 24.9% | 19.3% | 25.2% |
| Formidable (6-8) | 24.9% | 19.1% | 26.2% |
| Challenging (9-11) | 20.0% | 13.7% | 18.5% |
| Difficult (12+) | 26.6% | 22.1% | 25.6% |
Stop reasons:
| Reason | v1.5.27 | v1.5.26 | Delta |
|---|---|---|---|
| moves_exhausted | 45.8% | 39.2% | +6.6pp |
| mishap | 38.7% | 42.0% | -3.3pp |
| sigil_vanished | 9.1% | 5.5% | +3.6pp |
| resource_exhausted | 5.9% | 11.8% | -5.9pp |
| scribed | 0.4% | 0.4% | 0.0pp |
Analysis:
-
Gain distribution transformed: Trivial-range gains dropped from 25.8% to 3.6%, almost exactly matching v1.2.0's 4.5%. All other brackets rebalanced proportionally. The v1.5.27 gain distribution is now essentially identical to v1.2.0's.
-
Avg gain exceeded v1.2.0: 8.54 vs 8.39. The trivial filter plus difficulty-first selection produces slightly higher gains than v1.2.0's risk-based selection because difficulty-first more reliably picks the highest-difficulty action when one is available.
-
Effective gain/iter improved: Despite 7.2pp lower productivity (more refreshes), the +1.36 avg gain more than compensates. Effective gain: 3.75 vs 3.67 (+0.08). Now nearly matches v1.2.0's 3.78.
-
Resource exhaustion halved: 11.8% โ 5.9%. Refreshes cost 0 resources, so more refreshing = less resource drain per iteration. This is a significant structural improvement โ fewer sigils bail out due to resource depletion.
-
60+ rate +6.9pp: More sigils reaching high-precision tiers (32.0% vs 25.1%). Now matches v1.2.0's 33.3%.
-
3 repairs detected: The trivial filter creates conditions where no precision action passes viability, allowing the repair path to trigger. First repairs in v1.5.20+. Confirms D7 fix is working and repairs can happen when the filter is stricter.
-
Min gain = 3: Confirms trivial-difficulty (gain 2-3) actions are being filtered. The remaining gain=3 entries are likely straightforward actions rolling low.
-
moves_exhausted +6.6pp: More sigils reaching the move budget limit at higher avg precision (55.6 vs 53.3). These are sigils that got further but couldn't finish.
Verdict: KEEP. Most measurably effective change since EXP-6 (difficulty fix). Achieved exactly what the gain optimization analysis predicted: recovered v1.2.0's gain level while maintaining lower mishap rate. The gain distribution is now structurally optimal.
Killed / No Experiment Needed
| Hypothesis | Why killed | SIM |
|---|---|---|
| Skip <15 for target 90 (A3) | ALL 3 scribes start at 13; any raise above <13 loses all scribes | SIM-7 |
| Repair proximity threshold (A6) | 0 sigils exhaust resources near target; threshold never matters | SIM-8 |
| Resource fungibility test (A1) | Even depletion confirmed (4% imbalanced); sum-all is valid | SIM-3 |
| Resource consumption by difficulty (D3) | Flat ~2.1-3.0 stars/iter; no variation to exploit | SIM-2 |
| Refresh cost experiment (D4) | 0 resources, +0.31 danger/iter; already minimize refreshes | SIM-6 |
The following ideas were simulated against 4097 worked sigils / 40018 iterations and found to be neutral or harmful:
- Scribe at target-5 from iteration 8+: Only 1 of 4097 sigils ever peaked at 85+ and then fell below (Jazriel #15, peak=85 at iter 13, mishapped to final=3). The scenario this addresses is vanishingly rare. No expected benefit.
- Consecutive refresh limit: All thresholds (2-5 max streak) had strongly negative net impact (-8.7 to -26.6). Refresh streaks don't predict failure โ they're temporary bad menu RNG, and sigils recover from them.
- Hard iteration cap reduction: Any cap below 13 costs more 80+ sigils than it creates (net -1.8 to -21.3). Current effective cap of ~14-15 is already optimal. Note (Feb 2026): External feedback indicates the game has no hard iteration cap. EXP-13 tests REMOVING the cap entirely (the opposite direction). This killed idea tested LOWERING the cap โ still correct that lower caps are harmful.
- Single-resource floor bail-out: Catastrophically negative at all thresholds (net -36 to -46). Individual resource depletion doesn't predict failure โ the game uses different resources for different actions.
- Quality actions as refresh fallback (from pass #1): Quality actions cost resources but give zero precision gain; refreshes cost nothing. Strictly worse.
- Total resource bail-out threshold (from pass #1): 60% of affected sigils still improve after hitting low resources. Hard cutoff harms more sigils than it helps.
- Background: Awakened requires Illuminated as prerequisite. Wiki says all technique bonuses are globally disabled; Illuminated confirmed no effect in v1.5.9. Tested last to keep a clean isolation โ the per-difficulty median gain analysis (consistent across all v1.5.2โv1.5.9 data) provides a technique-sensitive metric unaffected by algorithm changes.
- Wiki description: "Enchanters will find Awakened Sigil Comprehension allows the scribing of many more sigils from a single perception, vastly simplifying the harvesting process." This implies the technique increases the number of SCROLLS producible per scribed sigil, not precision gains.
- Change: Version tick only (v1.5.15 โ v1.5.17). No algorithm change. All characters trained Awakened Sigil Comprehension before running.
- Baseline: v1.5.15 (same algorithm, without Awakened)
- Depends on: All algorithm experiments completed first.
- Sessions: 22 total (11 characters ร 2 batches). Batch 1: 11 sessions. Batch 2: 11 sessions. Characters: Barrask, Byd, Christus, Fidon, Gnarta, Jazriel, Kythkani, Mahtra, Nelis, Refia, Throve.
-
Logs:
~/SH_logs/v1.5.17/(7 merged files for split reconnections + 4 single files)
Raw results (22 sessions combined):
| Metric | v1.5.15 (10 sess) | v1.5.17 (22 sess) | Delta |
|---|---|---|---|
| Worked | 265 | 555 | +290 |
| result=SCRIBED (reported) | 1 | 3 | +2 |
| Real scribes (log audit) | 0 | 1 | +1 |
| C1 misclassifications | 1 | 2 | +1 |
| >=80 | 1 | 14 | +13 |
| >=80/session | 0.10 | 0.64 | +0.54 |
| Avg precision | 51.2 | 52.8 | +1.6 |
| Max precision | 85 | 88 | +3 |
| Mishap rate | 43.0% | 42.0% | -1.0pp |
| resource_exhausted | 26 | 67 | +41 |
SCRIBE COUNT CORRECTION (C1 bug audit): Raw log audit checking for actual "You carefully scribe" game messages reveals the reported scribe counts are inflated by the C1 misclassification bug (line 162):
v1.5.15: 1 reported SCRIBED โ Throve #112 (precision 85, mishapped, "Sigil harvesting failed"). 0 real scribes, 1 C1 fake.
v1.5.17: 3 reported SCRIBED:
- Byd #96 (precision 85, mishapped, "Sigil harvesting failed") โ C1 FAKE
- Fidon #37 (precision 86, "Final precision: 86, scribing", 4 scrolls) โ REAL
- Refia #84 (precision 88, sigil vanished) โ C1 FAKE
Corrected: 0 real scribes (v1.5.15) โ 1 real scribe (v1.5.17). The "Scribed: 1 โ 3" delta in the original analysis was entirely a C1 artifact.
Scrolls-per-scribe analysis (the metric the wiki implies Awakened affects):
The game mechanic: after scribing, the game says "Remnants of the sigil pattern linger, allowing for additional scribing" โ each "Remnants" message enables one more scribe attempt. The last scribe does NOT produce a "Remnants" message.
| Version | Character | Precision | Scrolls | Awakened? |
|---|---|---|---|---|
| v1.5.3 | Mahtra | 90 | 2 | No |
| v1.5.7 | Kythkani | 90 | 4 | No |
| v1.5.8 | Throve | 88 | 4 | No |
| v1.5.9 | Barrask | 85 | 3 | No |
| v1.5.9 | Gnarta | 89 | 2 | No |
| v1.5.9 | Jazriel | 88 | 2 | No |
| v1.5.14 | Kythkani | 86 | 3 | No |
| v1.5.17 | Fidon | 86 | 4 | Yes |
Pre-Awakened (7 events): mean 2.86 scrolls, median 3, range 2-4 Post-Awakened (1 event): 4 scrolls
Insufficient data to determine whether Awakened increases scrolls-per-scribe. The post-Awakened sample has N=1, and 4 scrolls already occurred in 3 of 7 pre-Awakened events. More real scribe events are needed to measure this.
Batch consistency:
| Metric | Batch 1 (11 sess) | Batch 2 (11 sess) |
|---|---|---|
| Worked | 301 | 254 |
| >=80 | 6 (2.0%) | 8 (3.1%) |
| >=80/session | 0.55 | 0.73 |
Statistical significance (>=80 metric โ NOT affected by C1 correction):
- Baseline >=80 rate: 1/265 = 0.38%
- Test >=80 rate: 14/555 = 2.52% (6.7x improvement)
- Two-proportion Z-test: Z = 2.14 (p โ 0.016, one-tailed)
- Poisson model (treating baseline rate as known): Z โ 8.2 โ but this overstates confidence by not accounting for uncertainty in the baseline rate estimate
- Both batches individually above baseline; batch 2 slightly stronger
- The 14 >=80 sigils include the 2 C1 fakes (precision 85, 88) โ they DID reach >=80 precision, they just didn't successfully scribe. The metric is valid.
- Caveat: The improvement is statistically significant but the mechanism is unknown. Gain distributions, starting precisions, iteration counts, and work rates are all identical between v1.5.15 and v1.5.17. See "What Awakened actually does" section below.
Decision: KEEP Awakened technique on all characters โ but mechanism is UNCERTAIN.
Despite no algorithm changes between v1.5.15 and v1.5.17, the >=80 rate improved from 0.38% to 2.52%. However, the wiki describes Awakened as a scrolls-per-scribe effect ("allows scribing of many more sigils from a single perception"), which does NOT predict precision improvement. Detailed comparison shows gain distributions, starting precisions, iteration counts, and work rates are all identical. The mechanism of the >=80 improvement is unexplained โ it could be from Awakened (undocumented effect), or an uncontrolled confound (different test dates, server conditions). The two-proportion Z-test gives Z=2.14 (p~0.016), significant but not overwhelming. All future experiments should continue with Awakened trained (no downside risk).
What Awakened actually does โ mechanism unknown:
The wiki says: "allows scribing of many more sigils from a single perception" โ this describes a scrolls-per-scribe effect (more copies from each scribed sigil), NOT a precision improvement. Yet we observe more sigils reaching 80+ with Awakened trained.
Detailed mechanism analysis (comparing v1.5.15 vs v1.5.17 directly):
- Gain-per-action distribution: IDENTICAL. v1.5.15 avg=8.2, v1.5.17 avg=8.4. Same bimodal shape (peaks at 2-3 and 13-15). Awakened does NOT boost gain per action.
- Starting precision distribution: IDENTICAL. Both avg=14.0, median=14. Same proportions across buckets. Awakened does NOT change starting positions.
- Iteration counts: IDENTICAL. Both avg ~10.4-10.5, median 11, same distribution. Awakened does NOT grant more iterations.
- Work rate: IDENTICAL. 20.5% vs 19.8%. Same skip threshold, same behavior.
- Mishap rate by bracket: Similar overall (~42%), but v1.5.17 has slightly HIGHER mishap rates at 60-79 precision (51-53% vs 39-44%). Not a protective effect.
- >=80 rate: 0.38% โ 2.52%. This is the ONLY metric that differs.
The attribution to Awakened was based on process-of-elimination reasoning: "no algorithm changed between v1.5.15 and v1.5.17, so the improvement must be from training Awakened." However, the wiki description does not predict this effect, and no per-action metric shows any change. Possible explanations:
- Awakened has an undocumented effect we can't measure at the per-action level
- Confound: sessions were run on different dates โ server conditions, seasonal effects, or undocumented game patches could contribute
- Statistical power: the two-proportion Z-test gives Z=2.14 (pโ0.016), significant but not overwhelming. The earlier Z=8.5 used a Poisson model that may overstate confidence.
Status: KEEP Awakened trained (no downside). Attribution UNCERTAIN โ correlation observed but mechanism unexplained by wiki description or per-action data. Continue collecting scribe data to test the wiki's scrolls-per-scribe claim.
When running a new experiment, record results here:
#### EXP-N: <name> (v<version>)
- Sessions: <count> (<list of characters>)
- Logs: ~/SH_logs/v<version>/
| Metric | Baseline | This exp | Delta |
|--------|----------|----------|-------|
| Worked | | | |
| Avg precision | | | |
| Sigils >= 80 | | | |
| Mishap rate | | | |
| Min per 80+ | | | |
- **Verdict**: KEEP / REVERT
- **Action**: <what was done>