Basic concepts - aburillo/WikiChron GitHub Wiki

Here we explain basic concepts we use in WikiChron, so that you can understand: what exactly we refer with certain terms, how is the data we work from and what are the assumptions we have taken.

Table of Contents

Glossary

  • Page: Any wiki page of the wiki. Pages can be content pages (articles), but also User pages, Talk pages, Help Pages, etc. For our research, we are currently analyzing a subset of pages from the most relevant mediawiki namespaces. More on this later.
  • Namespace: Mediawiki organize pages in what is called "Namespaces". Every namespace has a integer number associated with it and a name of what the pages belonging to it are about. The aim is to put together pages with the same different purpose in the same namespace and separate different purposes in different namespaces. You can read more about it in the Wikia Help page or in the mediawiki Help page.
  • Article. An article, as defined by mediawiki, is a page of content. More technically speaking, it's any page which belongs to the (Main) namespace of a wiki. However, it is important to remark that Wikia refers to articles as pages in some cases: for instance, in the top right counter of "pages" of any Wikia wiki.
  • Talk pages. Talk pages are pages dedicated to the discussion, communication and coordination between users. You can find more info about them in the mediawiki Help page.
  • Edition. This is the most atomic unity of a change in a wiki. It has info about the text changed, the date-time when it was made, the author (either anonymous or registered) and the page changed. An edition can be made only by one user in one page.
  • Wiki dump. A record of all the edits made in a wiki (see Wikia help for more info). Wikichron uses a processed csv version of this dump in order to generate the plots.
  • User. There are two types, and if no one is specified, we mean both of them. Registered user i.e. it has a user account with a account name an user page, and an Anonymous user which is identified by an IP address. This decision adds noise, but we consider that the no-aggregation of edits from the same IP is more misleading and less informative. Furthermore, an anonymous user can be editing from different IPs, or an anonymous user can be turn into a registered user at some time and being registered edit anonimously; however, in WikiChron we don't attemp these kinds of id merging. Read about a study about anonymous editors in Wikipedia here.
  • Active users. We use the MediaWiki's definition which states that an active user is any user who have made any action (edit) to any page during the last 30 days.
  • Article:Talk y User:Talk pages. These are discussion pages that wiki users use to coordinate and communicate with each other. In the past, previous research focused has been done on these because they help to measure coordination within a community. This is why we have created some specific metrics to show edits on this.
  • Monthly and cumulative metrics. For many metrics we are interested in having both the value in a per-month basis (Monthly) and the value considering the sum of the values for the current month and all the previous months (Cumulative).
  • Ratio metric. Metric which consists in a quotient of two metrics.
  • Calendar dates. Time expressed in natural calendar dates i.e. Jan 2010, Feb 2010.
  • Months from birth. Time expressed in discrete numbers of months from the date when the wiki was created.
  • Contribution. As of today, whenever we say contribution we are referring to edition. However, we don't discard to use more complex evaluation in the future, like using the number of bytes, words, editions, etc.

Assumptions

  • We are generating the dumps from the page history of the following namespaces: (-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 110, 111, 500, 501, 502, 503, 1200, 1201, 1202, 2000, 2001, 2002). All these namespaces are automatically added to any wiki created in Wikia, but a wiki can have more pages from other non-default namespaces. Wiki admins can request extra extensions which can possibly add more namespaces, as well as they can ask for a custom namespace itself to Wikia. All pages included in these nonstandard namespaces aren't processed, and, therefore, aren't taken into account for WikiChron metric visualization. This is why the number of pages WikiChron analyze and the whole amount of edits and pages shown in Special:Statistics may differ.
  • Bots edits. In WikiChron, we have removed all the bot editions in order to drop off artificial noise. To do this, we need the user ids of those bots. This data should be in the wikis.json file. To retrieve the bots ids, we query for all the users in bot or bot-global user groups.
  • New users and user number. Since we are using a wiki editions dump, for a user to be counted in WikiChron stats she has to have done at least one edition during the whole wiki life. Hence, registered users with no editions will not be counted.
  • Anonymous users. We always count anonymous editions, but we don't do identity merging more than assuming that anon users with same IP are the same user. The consideration that every IP is an anonumous editor has been used in other studies such as this work by Aaron Halfaker. As a result, actual users and editions per user can be slightly different in the reality.
  • Some pages can be in fact redirect to other pages. Redirect pages are actual pages whose content is a statement which indicates a redirection to another wiki page. We aren't deleting any of these pages because they represent activity and better content structuring within the wiki.
  • Distribution of Participation metrics

  • As of today, for distribution of work metrics we are using the cumulative data until any given date. However, we know that this approach can be lead to very inflexible values as wiki grows, so we are exploring better time ranges for these metrics.
  • We have set a minimum number of users per metric that a wiki must have in order to calculate the metrics. You can find those values here: https://github.com/Grasia/WikiChron/blob/master/lib/metrics/stats.py#L17
  • For more information, there is a wiki page with detailed description of the Metrics about distribution of participation.

Time axis options

There are two possibilities to display the time axis (x axis) for the time series graphs:

  • Calendar dates. This option plots the data into the corresponding date it was generated, i.e. the axis will show the dates Dec 2011, Jan 2012, Feb 2012 and so on. This option is specially useful for one-wiki analysis.
  • Months from birth. This option sets the time axis to a count of natural numbers starting from 1, where 1 is the month when the wiki was born, and the following numbers are the offset in months relative to when the wiki was born. This option is more useful for multi-wiki analysis, in this case, the count will start with the month of birth of the oldest (oldest birth date) wiki. Note that we are not taking in account that the birth month possibly have less days, and, hence, expected lower values than the rest of months. For instance, if foowiki was born in 9th of December 2011, the month 1 refers to the interval between 9 to 31st of December, including 23 days only; while the next month, month 2, will include the whole 31 days of January, the next one 28 and so on.
⚠️ **GitHub.com Fallback** ⚠️