Study 1: The Descriptor Lorebook that never was (any good) - TravelingRobot/NAI_Community_Research GitHub Wiki

A cautionary tale about confirmation bias.

Motivation

The original goals of this study were:

  • Show how my guide & template for semi-blind testing could be used to answer interesting questions about optimal usage of NAI.
  • Explore how to optimize my Descriptor Lorebook. What placement works best, whether to use separators, how many tokens long it should be etc.
  • Investigate whether the lorebook should be structured differently for Euterpe than for Krake.

Some background about the Descriptor Lorebook first:

The Descriptor Lorebook

The descriptor lorebook originates from the "Summaries, Sentiments, and Scenes" lorebook from Magenta.Darkstar. Part of the lorebook is a "Summary" entry, which contains 681 tokens, activated on Summary:, and looked like this:

Summary: she glanced at her love and smiles.
The mahogany-haired adolescent girl glanced fleetingly at her rugged paramour, a crystalline sparkle in her eyes as she gazed happily upon his countenance. It was filled with an expression as enigmatic as shadows in the night. She pondered thoughtfully whether it would behoove her to request that she continue to follow him on his noble mission.
Summary: Saphira was angry.
Saphira’s muscled sides expanded and contracted as the great bellows of her lungs forced air through her scaled nostrils. Eragon thought of the raging inferno that she could now summon at will and send roaring out of her maw. It was an awesome sight when flames hot enough to melt metal rushed past her tongue and ivory teeth without harming them.
Summary: I watch a the tide.
An alder leaf, loosened by wind, is drifting out with the tide. As it drifts, it bumps into the slender leg of a great blue heron staring intently through the rippled surface, then drifts on. The heron raises one leg out of the water and replaces it, a single step. As I watch, I, too, am drawn into the spread of silence. Slowly a bank of cloud approaches, slipping its bulged and billowing texture over the earth, folding the heron and the alder trees and my gazing body into the depths of a vast breathing being, enfolding us all within a common flesh, a common story now bursting with rain.
Summary: Humans are animals.
For much of their history and all of prehistory, humans did not see themselves as being any different from the other animals among which they lived. Hunter-gatherers saw their prey as equals, if not superiors, and animals were worshipped as divinities in many traditional cultures. The humanist sense of a gulf between ourselves and other animals is an aberration. Feeble as it is today, the feeling of sharing a common destiny with other living things is embedded in the human psyche. Those who struggle to conserve what is left of the natural environment are moved by the love of living things, biophilia, the frail bond of feeling that ties humankind to the Earth.
Summary: Feeling at peace, I walked toward the beach.
I waited another couple of minutes, then started on a walk toward the highway and the beach. The air was peculiar, the way it just hung, motionless, drifting off the water, and the only sound was the faint hiss of little breakers running over rock jetties. There weren’t any cars on Highway 90, and only one streetlamp burned about 150 yards down the road. I stood on the corner in front of the condos and looked up at our place, the dark bedroom where Cheryl was sleeping, then walked out into the middle of the empty highway and crossed to the beach side where the sand was gritty under my shoes, then came back, looking all around, soaking up everything. With the lights out things seemed to have lost their power. It was like nothing was holding anything, the resistance was gone, that little pressure that’s always against you, obliging you, keeping you in place.

This was intended to give scene instructions like Summary: A fight breaks out, giving you a purple prose block of a fight breaking out.

My idea was to make use of placebomancer's experiments with Description of: described here, and simply replace "Summary: ..." with "Description of: ...".

So now the entry activated on Description <...>: and looked like this:

Description of how she glanced at her love and smiles:
The mahogany-haired adolescent girl glanced fleetingly at her rugged paramour, a crystalline sparkle in her eyes as she gazed happily upon his countenance. It was filled with an expression as enigmatic as shadows in the night. She pondered thoughtfully whether it would behoove her to request that she continue to follow him on his noble mission.
Description of Saphira being angry:
Saphira’s muscled sides expanded and contracted as the great bellows of her lungs forced air through her scaled nostrils. Eragon thought of the raging inferno that she could now summon at will and send roaring out of her maw. It was an awesome sight when flames hot enough to melt metal rushed past her tongue and ivory teeth without harming them.
Description of how I watch the tide:
An alder leaf, loosened by wind, is drifting out with the tide. As it drifts, it bumps into the slender leg of a great blue heron staring intently through the rippled surface, then drifts on. The heron raises one leg out of the water and replaces it, a single step. As I watch, I, too, am drawn into the spread of silence. Slowly a bank of cloud approaches, slipping its bulged and billowing texture over the earth, folding the heron and the alder trees and my gazing body into the depths of a vast breathing being, enfolding us all within a common flesh, a common story now bursting with rain.
Description of Humans being animals:
For much of their history and all of prehistory, humans did not see themselves as being any different from the other animals among which they lived. Hunter-gatherers saw their prey as equals, if not superiors, and animals were worshipped as divinities in many traditional cultures. The humanist sense of a gulf between ourselves and other animals is an aberration. Feeble as it is today, the feeling of sharing a common destiny with other living things is embedded in the human psyche. Those who struggle to conserve what is left of the natural environment are moved by the love of living things, biophilia, the frail bond of feeling that ties humankind to the Earth.
Description of how, feeling at peace, I walked toward the beach:
I waited another couple of minutes, then started on a walk toward the highway and the beach. The air was peculiar, the way it just hung, motionless, drifting off the water, and the only sound was the faint hiss of little breakers running over rock jetties. There weren’t any cars on Highway 90, and only one streetlamp burned about 150 yards down the road. I stood on the corner in front of the condos and looked up at our place, the dark bedroom where Cheryl was sleeping, then walked out into the middle of the empty highway and crossed to the beach side where the sand was gritty under my shoes, then came back, looking all around, soaking up everything. With the lights out things seemed to have lost their power. It was like nothing was holding anything, the resistance was gone, that little pressure that’s always against you, obliging you, keeping you in place.

I tried this out and compared the outputs to using the "Summary:" entry or no entry at all. I was happy with the results, so I shared this lorebook as an alternative.

I later added renamed that entry to "General Descriptor" and added entries for descriptions that focus on a specific sense.

Optimizing The General Descriptor?

However, I never thoroughly tested how to optimize this lorebook. Were over 600 tokens for the entry a bit much? Should I seperate the descriptions from each other? Where should I place the entry? etc.

If I was honest, I never even rigorously tested whether that lorebook worked as intended at all. By "rigorously " I mean, doing robust testing, not just looking at a few outputs, deciding "I feel like this looks better" and move on. Robust testing would mean:

  1. Generate a large number of outputs from each condition
  2. Blind myself to them so I no longer know which output was generated with which entry
  3. Rate the entries
  4. Conduct statistical tests on those ratings to make sure differences in ratings are not just random noise

I provide a way to do this without any statistical knowledge here.

I was fairly certain from comparing the outputs that my version worked as intended, and better than the "Summary" entry or no entry. However, it wouldn't hurt to make sure. After all, I knew that at least in theory I might have unintentionally made myself see only what I wanted to see (aka confirmation bias).

Also, I knew the fine-tune data for Krake contained a few instances of Summary:. So maybe for Krake Summary: ... might actually work better, and in Euterpe Description of ...: might be the way to go?

So as a first step, I wanted to do through testing on whether my "General Descriptor" actually worked as intended and was better than the "Summary:" entry or no entry for both Euterpe and Krake. Also, this would be a way to demonstrate how my testing template to test relevant questions about how to optimize NAI usage and make confident conclusions about it.

I started with Euterpe.

Study 1a: Euterpe & The Descriptor

Method

I used my own Semi Blind Testing Template to demonstrate how it could be used to conduct studies like this.

Test Context

I wanted to test the lorebook in full context, at least somewhat close to a typical use case. It did not need to be the most elaborately written prose, however. A full context with slightly suboptimal lorebook entries and mediocre prose might be a better test case to see if a utility lorebook that works robustly under many conditions.

Here is how I built the context: I used the Mass Effect lorebook from the sharing channel in the NAI discord. This lorebook turned out to be further from a well-constructed lorebook than I expected. The entries seem to be mostly mindlessly copy-pasted from a wiki, many with token counts of 900+ and beyond. So I needed to make some adjustments along the way. Before every generation I checked the current context and if any new lorebook entries had been activated. If so I manually edited the entries to be below 150 tokens.

I prompted fairly minimal with [ Tags: Mass Effect; Genre: science fiction ] in memory, a dinkus (***) positioned just before story and the following text in the story window:

[ The Citadel, Presidium ]
Shepard

I kept generating until the context was full, using Krake with the Redjack preset. I made edits and retries as necessary to keep the story at least somewhat coherent. It wasn't the most exciting, well-written or lore friendly story. But for this purpose it did not need to be. The resulting scenario that I used for testing (including lorebook entries) can be accessed here.

The context ends with Shepard entering the hangar bay of the Citadel (a space station):

They made their way through the hangar bay. The walls and ceiling were lined with rows of fighter jets and assault ships that would be used for combat missions. In the back of the bay Shepard saw several N7 frigates that were being readied for deployment. The Spectre agents led Shepard past them and through a pair of blast doors that led to another hangar bay with a much smaller number of ships inside. This bay was full of shuttles, along with a few larger transports and cargo ships. The starships that lined the walls looked different from when he was last there. Some had been replaced by newer ships, but others looked exactly like they'd always been there, unchanged over the years since they'd been launched.

Test Conditions

The task would be to generate a description of Shepard thinking of his ship, the Normandy. I wanted to test 4 conditions:

  • Generation using Description of Shepard thinking of the Normandy: with no utility lorebook
  • Generation using Summary: Shepard is thinking of the Normandy. with no utility lorebook
  • Generation using Description of Shepard thinking of the Normandy: with my Descriptor lorebook
  • Generation using Summary: Shepard is thinking of the Normandy. with the Summary: entry1

1: The Summary: entry was identical to the one from "Summaries, Sentiments, and Scenes" lorebook from Magenta.Darkstar, but had slightly different positioning. The insertion order was 200 with insertion position -1. This would cause it to be positioned just before the Dinkus. Pilot testing suggested that this might have a slight positive effect on the output quality. The Descriptor lorebook used the same placement.

⚠️ **GitHub.com Fallback** ⚠️