Anki Statistical Reports - ghrgriner/anki-stats GitHub Wiki

This document is based on v25.02 of the Anki desktop application.

Relevant Data Structures

Anki Database Structure

The data for an Anki collection is stored in a sqllite database. Relatively complete documentation of the Anki database schema (subject to the caveats listed below) is available elsewhere in the ankidroid repository which was based on the work of Shawn A. Williams.

We will not duplicate the above documentation here. However, note the above documentation is not up-to-date and we note several corrections and clarifications here. Furthermore, we have not investigated whether AnkiDroid and the Anki desktop application use identical database schemas and our comments in this document are based on the desktop application.

Lastly, note user information and (sometimes) technical information is also available in the Anki Manual.

Cards

Cards have several fields that track the state. The information stored in the type and queue attributes is sometimes redundant. Separate fields exist so that type always stores which of the four learning states the card is in, while queue will change if a card is buried or suspended. When the card is unburied or unsuspended, then queue is reset based on the value of type.

Card Type

A card can be one of four learning types. The valid types are listed in the enum below:

// code excerpt from Anki repository
pub enum CardType {
    New = 0,
    Learn = 1,
    Review = 2,
    Relearn = 3,
}

The general idea is that a card starts as New and is set to Learn at the first review. When it graduates from the Learn phase, it becomes Review. If it is answered wrong in Review, it becomes Relearn. It can then graduate from Relearn back to Review, with this process of switching between Review and Relearn continuing indefinitely. Additional details are on the Card Type page.

Card State

The Rust backend defines a structure called CardState that incorporates (in most cases) the type of the card as well as other information. Therefore, care should be used when referring to a card's 'state' to make it clear whether state is meant in a general or this specific sense. Understanding of the CardState is not needed when creating the statistical reports. However, additional detail for is provided on the Card State page.

Queue

The queue variable is used when ordering the cards for study. Generally, cards of New Type are in the New queue, cards of Learn or Relearn type are in the Learn or DayLearn queue, and cards of Review type are in the Review queue, with negative queue values used used to indicate suspended or buried cards. The queue variable is sometimes used in the standard statistical reports to exclude suspended or buried cards. Additional details are on the Card Queue page.

// code excerpt from Anki repository.
pub enum CardQueue {
    New = 0,
    Learn = 1,
    Review = 2,
    DayLearn = 3,
    PreviewRepeat = 4,
    Suspended = -1,
    SchedBuried = -2,
    UserBuried = -3,
}

Card (Deck) Preset

Every non-filtered deck has an associated property called the preset. Recall that a deck can have subdecks. For example, there can be a deck Math and a subdeck Math::Algebra. This defines a tree structure for all (non-filtered) decks, where Math is the parent of Math::Algebra and Default is the parent of Math.

If the deck options for a deck or subdeck is never changed, then the preset is the preset of the parent. The "Default" preset will still exist even if the default deck is deleted (i.e., the deck named Default that is created when the collection is created). (TBD: Check previous sentence.)

If a card is moved to a filtered deck, the non-filtered deck it was moved from is stored and called the 'original deck'. Cards can be selected by their preset or the preset of their original deck (whichever is applicable) in the browser using preset:[some_preset_name].

The relevance of this for our purposes is because when setting the FSRS parameters for a deck, there is a search box that defines which cards will have their memory state reset, and the default for the search box is preset:"preset_of_current_deck" -is:suspended.

Review Log

Information about reviews that were performed are stored in the revlog table. This also contains information about reviews that were rescheduled.

Terminology

If a review was rescheduled manually, it's possible the user actually reviewed the material on the card before deciding to reschedule it. It's also possible the user set the due date in the browser without looking at the card contents. For reviews rescheduled when FSRS parameters were initialized or changed, the user would not have seen the card at all during the rescheduling. Nevertheless, for brevity, we refer to all entries in the revlog table as 'reviews' rather than 'review log entries' in this documentation. This should not cause confusion, since the charts that analyze data in the revlog table always exclude these Manual and Rescheduled reviews either explicitly using the 'review kind' variable described below, or implicitly by limiting to reviews where the answer button was pressed.

Review Kind

In general, the review kind aligns with the type of the card at the time of the review. However, cards with CardType::New will have Learning as the review type for their first review. In addition, there are review types Filtered, Manual, and Rescheduled with the meanings described in the comments below.

// code excerpt from Anki repository, our comments start with two '//'
// while repository comments start with '///'
pub enum RevlogReviewKind {
    Learning = 0,
    Review = 1,
    Relearning = 2,
    /// Old Anki versions called this "Cram" or "Early". It's assigned when
    /// reviewing cards before they're due, or when rescheduling is
    /// disabled.
    //  In particular, note that the above applies even if the card is in a
    //  Filtered deck.
    //  If scheduling is disabled, then the `factor` field will be set to 0.
    //  Otherwise, `factor` is set to a non-zero value.
    Filtered = 3,
    // By (1) selecting 'Set Due Date' for a card. This is the entry made
    // at the time the due date is set. Once the due date is reached and the
    // review occurs another entry will be made in `revlog`.
    // Or by (2) selecting 'Reset card' for a card. In this case, `factor`
    // in the database (= `ease_factor` in the Rust code) will be 0.
    Manual = 4,
    // Set after selecting 'Reschedule cards on change' in the FSRS options
    // for a deck
    Rescheduled = 5,
} 

Impact of Resetting or Deleting a Card on the Review Log

If a card is reset, existing records remain in the revlog table. As noted in the comments for the struct above, an entry will be added to the table with RevlogReviewKind::Manual. The impact on the card is discussed on the Card Type page.

If a card is deleted, all records for the card are deleted from revlog (as the cards record is also deleted).

Database Field Names and Rust Variable Names

Listed below are the database field names in the revlog table and the corresponding Rust field names in the RevlogEntry structure. Refer to the documentation linked above for additional details on the database fields.

Database Field Name Rust Data Type Rust Field Name (members of RevlogEntry)
id RevlogId (i64) id
cid CardId (i64) cid
usn Usn (i32) usn
ease u8 button_chosen
ivl i32 interval
lastIvl u32 last_interval
factor u32 ease_factor
time u32 taken_millis
type enum review_kind

Anki Statistical Reports

The table below lists the statistical reports generated in the Anki statistics window (obtained by clicking the Stats button in the main window).

Title Population Comments
Today Reviews Omits RevlogReviewKind::Manual and RevlogReviewKind::Rescheduled reviews.
Future Due Cards Omits all CardType::New and CardQueue::Suspended cards. Buried cards (CardQueue::SchedBuried or CardQueue::UserBuried) due on or before the current day are also omitted.
Calendar Reviews Omits RevlogReviewKind::Manual and RevlogReviewKind::Rescheduled reviews.
Reviews Reviews Omits RevlogReviewKind::Manual and RevlogReviewKind::Rescheduled reviews. Counts are stratified by RevlogReviewKind, except reviews of kind RevlogReviewKind::Review are reported as 'Young' or 'Mature' based on whether last_interval < 21.
Card Counts Cards If 'excluding inactive' is checked, suspended and buried cards are omitted (CardQueue::Suspended or CardQueue::SchedBuried or CardQueue::UserBuried)
Review Intervals Cards Limit to CardType::Review or CardType::Relearn
Card Ease (non-FSRS decks) Cards Analysis of card.factor / 10, where factor is the database field name and Python name. In Rust, this is called card.ease_factor.
Card Stability (FSRS decks) Cards Limit to cards where card.memory_state is not null. [a] This is an analysis of card.memory_state.stability (using Rust or Python name). In the database, this is stored in the card.data field with other FSRS information. Stability is rounded to the nearest integer in the back-end before binning. [b]
Difficulty (FSRS decks) Cards Limit to cards where card.memory_state is not null. [a] The code that extracts this also filters by (CardType::Review or CardType::Relearn), but this is redundant. This is an analysis of card.memory_state.difficulty (using Rust or Python name). In the database, this is stored in the card.data field with other FSRS information.
Retrievability (FSRS decks) Cards Limit to cards where the card.memory_state is not null. [a,c]
Hourly Breakdown Reviews Omits RevlogReviewKind::Filtered, RevlogReviewKind::Manual, and RevlogReviewKind::Rescheduled reviews. The hour is calculated as the time of the review (stored as epoch time, the number of seconds since 1/1/1970 in UTC) plus the current time zone offset (allowing for daylight savings time). [d]
Answer Buttons Reviews Omits RevlogReviewKind::Manual and RevlogReviewKind::Rescheduled reviews. RevlogReviewKind::Learning, RevlogReviewKind::Relearning, and RevlogReviewKind::Filtered reviews are all reported as 'Learning', while RevlogReviewKind::Review reviews are reported as 'Young' or 'Mature' based on whether last_interval < 21.
Added Cards No cards excluded
True Retention Reviews Population: button_chosen > 0 (not rescheduled) and (not RevlogReviewKind::Filtered or ease_factor != 0) and (RevlogReviewKind::Review or last_interval <= -86400 or last_interval >= 1). Note that here a 'Young' review is any review where last_interval < 21 and 'Mature' reviews are the remaining. This is unlike the 'Reviews' and 'Answer Buttons' charts which only partition the RevlogReviewKind::Review reviews into 'Young' and 'Mature'.

[a] Once FSRS is enabled, a card will have its memory state set once the card is answered (no longer New type). If an existing deck is converted to FSRS, the free-text box containing default text preset:"preset_of_current_deck" -is:suspended defines the cards which have their FSRS state assigned, except New type cards are always also excluded and cards with all reviews prior to the date set in the Advanced > Ignore cards last reviewed before option are also excluded.

[b] This is unusual. For other histograms in the report, the variable is binned without rounding.

[c] The retrievability calculated by this package will not always match the retrievability presented in Anki. See here for details.

[d] In particular, note that the time zone or time zone offset is not stored for each review. If a user always reviews between 6 and 7 in the morning local time and then moves to a time zone 5 hours earlier, the reviews will appear in the 1:00 - 2:00 am bin of the histogram. A similar issue will occur if the local time zone offset changes due to daylight savings time.

Filtering By Last 12 Months or All History

There is a radio button at the top of the statistics window where users can switch between viewing data from the last 12 months and all history. This option only affects figures / charts where the analysis population is 'Reviews'.

Rollover Hour and Filtering Review History

When calculating the past or (scheduled) future day of a review or filtering review history, the rollover hour is used if it was set in the application (i.e., by setting the Next day starts at option to something other than 0 hours past midnight in the Tools > Preferences > Review menu). For example, when filtering the 'Hourly Breakdown' or 'Answer Buttons' chart by amount of review history (1 month, 3 months, 1 year), the day and hour of the next review day is calculated and reviews that occurred 30, 90, or 365 days before this are ignored. (Here a day is defined as 86400 seconds.) Other charts that filter past or future expected reviews by time period behave similarly, although the exact cut-offs used for the time periods sometimes vary by chart. In particular, the 'Reviews' chart cuts off the '3 months' and '1 year' charts at 89 days and 364 days before the next review day, respectively, while the 'Added' chart cuts off the '1 month' chart at 31 days prior to the start of the next review day, and the 'Future Due' chart cuts off the '3 month' chart at 89 days after the current day.