Lorebook Formats for v3 - TravelingRobot/NAI_Community_Research GitHub Wiki

Table of Contents

For new users (the tl;dr)
General Notes
    Which format is better?
List of Lore Formats
    Full Prose
    Simple Prose
    NAI Caveman
    NAI Featherlite
    Tokensafe
    Cat<nip>
    JSON-Style & Dict-Style

For new users (the tl;dr)

For now, it looks like you will not be doing much wrong no matter the format you use. For beginners I would recommend you write your entries in simple prose, or full prose depending on what feels more natural to you.

Can I just import my old AID world info in zaltys/catnip/featherlite, etc.?

In short: Yes, it just will not be the ideal for NAI.

Slightly longer answer (careful - mostly just TravelingRobot's own opinion): In general most formats seem to work okayish, but will not be ideal for NAI. NAI's model are likely to have their own strength and weaknesses, and formats that played to dragon's strength might not play to Calliope's/Sigurd's strength. (Dragon: much bigger base model so better at "understanding" implied semantic relations, tight token/character limit so formats should use as little characters as possible; versus NAI models: need things to be spelled out more explicity, but no need to be super stingy with your characters)

General notes on Lore Formats

A word of caution by zaltys:

I'd recommend not going overboard with format research, as NAI is always working on new models and the way the 'formats' work is likely to change without much warning.

An analogy by _Gnurro _that I think is a good way to think about Lore entries (and context manipulation in general):

the AI can be primed like people is kinda what this whole transformers thing is about.
It's like saying "green, lush, fresh" to make people more likely to say "plants".
All the shortened formats basically do that, but people think they do some hacking/computer thing, while it's actually a language thing. Transformers are just that good at simulating how people associate words.

Which format is better?

First empirical results found no significant difference between performance of different formats so far.

List of Lore Formats

Full Prose

So there is a bit of an ongoing debate of full prose versus condensed formats with OccultSage being the main proponent of using full prose for Lore entries. While I disagree with some of his points I'll try to present his arguments that I think are worth considering:

  • So far no conclusive evidence has been presented against either full prose or the more condensed formats
  • Well-written full prose entries have the distinct advantage that they can be used to guide the style of the AI. (In that case you obviously drop the [] around your lore entries)
  • In theory the style of condensed lore entries might leak into your output (but so far this has not been observed to be a huge problem for most common condensed formats enclosed in [] , even when testing in an empty prompt). The following are things I (TravelingRobot) suspect to be true (but take with a grain of salt):
    • Full prose give you plenty of syntactic sugar that might help making relationships between entities a bit more explicit. This could maybe help with defining more complex relationships/concepts
    • Condensed formats on the other hand let you focus on just the keywords you want to associate with a topic word. This might help with making the relationships between topic word + trait stronger (less "baggage" from connecting words as Rinter has put it) So full prose or a condensed format? My take on it is that currently it seems like you will not be doing much wrong either way. So I would recommend to write the entries in what makes the most intuitive sense for you and feel free to experiment with other formats as you see fit.

Simple Prose

[ Mark is a 35 year old witty man. He is strong and has red hair. ]

Simple prose seems to work quite well in NAI (simple prose in this context = very short simple sentences). It is also well suited for beginners. A good starting point is to start with the topic word and try to not let the sentence run longer than 10 tokens. Then begin the next sentence either with the topic word of the entry or a signifier pronoun (he/she/it/his/her/its). This way the AI hopefully does not "forget" what you are talking about. Put [ ] around your entry. Put a space after the opening [ and beore the closing ] (so: [ Mark words words. ]). If you want to experiment with more condensed formats: Transitioning to NAI caveman is quite easy from concise prose.

NAI Caveman

[ Kizzy plant based extraterrestrial woman nicknamed 'Kiz' ]

(all text by Monky)

  • Default settings for v3 except rep penalty of 4, rep range of 2048 (if applicable to your sub tier, if not use 1024), rep slope of 3.06 or approximate.
  • HIGHLY RECOMMENDED to set up a Cue Card entry to replace Author's note;
***

:[ Writing style: Descriptive and creative ];
  • Open the advanced settings for this lorebook entry and set insertion order to -400 and insertion position at -3
  • Next, go to Options and block tokens *** and * * * (you can still insert these manually if you desire, they still function it just is less likely to occasionally try to generate them).
  • Notes: Initial run of the story may seem slightly spacey if you have a lot of LB's triggered, as rep pen will be strongest. Suggest a few paragraphs of story to start off with to give it something to build off of. Around 200 words/250 tokens seems fine in my testing.

For caveman entries themselves, the current technique I endorse is monolithic blocks with lines set up similarly to oldschool Neanderthal in AID with some changes.

  1. Token length per line of 18 or less (INCLUDING the encapsulation in that count).
  2. Encapsulation of [] or :[]; (both variants are single token, I feel :[]; performs slightly better, but both are usable)
  3. Structure your blocks with the subject first, then the property of the subject being remarked upon, then details related to that property, ie:
[ Kizzy plant based extraterrestrial woman nicknamed 'Kiz' ]
[ Kizzy species Zellan has orange eyes and blue skin ]
[ Kizzy physiology human-like features attractive ]
[ Kizzy hair Red style short Pixie-Cut ]
[ Kizzy opaque flesh blue build slender short ]
[ Kizzy eyes orange her face human-like ]
[ Kizzy weight 100 pounds height 4'2" ]
[ Kizzy roles qualified medic pilot engineer mechanic ]
[ Kizzy behavior blunt and diction factual ]
[ Kizzy wears V5 Sky Maidens jumpsuit ]
[ Kizzy jumpsuit colored Silver and Grey with medic insignia ]
  1. Use capitalized words for attributes that are not coming through clearly, like colors often need to be capitalized so the repetition penalty doesn't wholesale substitute another color in there. This frees up the non-capitalized form of the word to be repeated by the AI which will be what it wants to use most of the time anyways. One exception, don't put capitalized words next to the subject (ie; the name) or it may assume it is their last name. Similarly, hairstyles also seem to benefit from capitalization, among other things. If it isn't hitting - try capitalizing it and see if it works.
  2. Leave settings for your general LB entries as default unless you want to experiment or know what you're doing, with the above settings this appears to work fine in Sigurd v3. If you wish, increase the amount of context reserved to 1538 tokens (or half that for 1024 users).
  3. To avoid unprompted/unwanted furry, I recommend using hairstyle instead of hair (hair leads to fur and hairstyle tokenizes differently) and flesh instead of skin (leads to scales or fur depending on how the AI takes it). This is partly due to Sigurd v3's finetune so it must be worked around if you see it crop up in ways you don't want. Additionally, for this purpose use human-like, humanlike, Human-like, or Humanlike for non-human entities like say, an Asari from mass effect, or a Vulcan from Startrek, instead of humanoid or Humanoid, as those two also lead down furry/scaly lane.
  4. Addressing bleed between characters; One common problem currently is if one character has a trait, and another doesn't have that defined, sometimes it will bleed onto the character lacking that trait. One quick fix is simple defining that trait for that character, and with 2048 context you can withstand the bloat pretty easily.
  5. Testing VS. play. Testing leads to failure rate which can lead to thinking the entry doesn't work. If you're getting NO hits, this might be the case. Recommend testing that the AI can grasp the entry on a blank scenario with 1 rep penalty, no slope, and no range in a prompt like "You look at x, taking in their appearance" or similar, to see if the AI can put out a correct or mostly correct output you expect. If it works, it is probably suitable for play. Be wary of testing to failure - the AI is a probability machine, not an intelligence, it does sometimes decide on outputs we don't like just due to funky AI math on what should come next. It will fail sometimes - the key is does it get the entry the way we expect the majority of the time. If it does, it will likely play fine.

NAI Featherlite

(Rinter)

[ Cheeps:♀avianPink feathers ]
  • Rinter is already working on revising featherlite for NAI here; revision notes here; notes on experimental ideas here

there's not much to say really. Replace bullet with [ ], and you don't need to worry about space removal between words and symbols. That's about it 😄

one other change (mainly to prevent word mashing format leaking) would be to dial back the amount of wordmashing, since it's not really that important anymore.

the tl;dr of modern featherlite now-a-days is [ pointer topic filter: spectrumofattributes topic moreattributes ] with some wordmashing only between associated words....

  • Kalmarr also regularly uses and tests featherlite. His information and examples can be found here.

Tokensafe

(Pause)

a format designed around minimal token use

[Gordon James Devaux: male accountant/ brown hair/ stubble/ ponytail/ round glasses/ white shirt/ red tie/ black slacks/ black loafers/ civil shut-in/ Gordon's eyes glowing yellow color.]
[ Clint: male/ late twenties/ dense/ butt of Nyuu's jokes/ loyal/ brave/ claims to be knight/ unofficial knight/ skilled/ favorite weapon is steel sword/ Clint's best friend and companion is Nyuu. ]
[ Karen: character archetype/ typically blonde/ general inconvenience/ behavior example: sues council because stop sign blocks her view, refuses to wear face mask during pandemic, complains Karen's favorite parking space replaced with wheelchair ramp/ Karen is a character archetype. ]

Cat<nip> (tested by rando, Cass)

[ Jurģis description: < full name≡ Jurģis Ozols>/< age≡ 40>/< male>.
    Jurģis wearing: < blue citizen's jumpsuit& brown shoes>.
    Jurģis appearance: < hair≡ short& straight& brown>/< eyes≡ green>/< skin≡ somewhat pale>.
    Jurģis situation: < fought in French Foreign Legion during Seven Hour War>.
    Jurģis traits: < skills≡ combat& field medicine>.
    Jurģis mental: < frustrated& nervous>.]

Catnip SFW-Doc (AID), NSFW Doc (AID)

  • Still seems to still work fine in NAI
  • Writing the entry single line (seperate with .) also seems to work okay
  • can get some rare leaks with & and <>, you might want to ban <>, just in case

Special Use Cases: JSON-Style and Python Dict-Style

The following formats are no longer generally recommended, as they seem to have some weaknesses. However, these have been reported to work for specific use cases. So for certain things these might turn out to still be useful.

JSON-Style

...

Python Dict-Style

...

⚠️ **GitHub.com Fallback** ⚠️