First steps (tutorial) - juergenlerner/eventnet GitHub Wiki

Event network analyzer (eventnet): statistical analysis of networks of relational events.

Overview

A single relational event encodes who does when what to whom. Examples include a person sending an email to another person, a Wikipedia user editing a Wikipedia article, a customer buying a certain product, or a country signing an agreement with another country.

A relational event model, consequently, seeks to explain who is going to do when what to whom. Often, events that happen among two nodes might depend on other events that happened before on other pairs of nodes. For instance, Alice might send an email to Bob because, previously, Bob contacted Charlie who then called Alice. Such network effects are typical in the examples mentioned above. The computation of explanatory variables for these network effects - along with the maintenance of the pairs of nodes on which events might happen (the risk set) - is one of the main challenges when specifying and estimating relational event models.

The event network analyzer software tackles this by taking one or several input files, containing lists of relational events, and computing expanatory variables (statistics) for all observations, events and non-events, that can then be analyzed by standard statistical software for time-to-event data, such as provided by the package survival in the R software for statistical computing.

Starting eventnet

To use eventnet, download the JAR file of the latest version and start the program by double-clicking or via the command java -jar eventnet-x.y.jar. (The latter option is even preferable since you will get more detailed error messages, if any, and you can increase the memory size. For instance, the command java -jar -Xmx4g eventnet-x.y.jar starts eventnet with 4GB memory, etc.)

Eventnet is written in java and needs the java runtime environment (JRE), Java 8 or higher. You may also search for "download java runtime environment <my operating system>" for other possibilities to get the JRE.

A graphical user interface (GUI) opens in which you can specify the input data and what eventnet should do with it. All of these specifications can be saved in a configuration file, which can be opened in eventnet on the same or a different computer. A saved configuration, thus, ensures reproducibility and can save a lot of work.

(Known bug on macOS: at least on some computers running macOS, the file browser opened from eventnet does not let you create a new file. A workaround is to save the configuration to an already existing file, for instance, the example configuration file human_migration_configuration.xml. Be sure to save configurations only to files that you don't need anymore. Of course, the old content will be overwritten.)

If you cannot start eventnet at all, you may have a look at the troubleshooting help page, which lists some common problems and tries to come up with solutions.

Eventnet configurations

A configuration, together with the input data determines the result. This ensures reproducibility. It is also possible to create or modify the configuration file (which is in XML format) "by hand"; or otherwise and let eventnet execute the computations. Configurations can be saved, loaded, renamed, or executed via the file menu and/or in the area on the left-hand side of the eventnet GUI. The content of a configuration is specified in the six tabs in the larger right-hand side of the eventnet GUI. The content of these tabs should be filled in order, from (i) to (vi), since options for latter settings might depend on previous ones.

The six parts of a configuration, specified each in one of the six tabs, are described in the following. (The description below is very short. More details can be found in the basic tutorial.)

(files) In the "files" tab you can specify one or several input files, containing lists of events. It is also possible to specify directories and let eventnet process all files (potentially all files that have a given ending) in these directories. The input files must be comma-separated files where you can choose the cell delimiter, etc. If several files are analyzed with the same configuration then they must have a compatible structure. The output base directory gives the location where the output files (containing computed explanatory variables for all observations) should be stored. It is advisable to choose an empty directory for this, or a directory whose content could be overwritten.
(events) the different components of an event are mapped to column names in the input files. Only the source and target of events are required; the other components can be missing (set to <implied>). It is possible to analyze one-mode or multi-mode networks. In the latter case, intuitive names should be assigned for the different node sets and event types should be given their source node set and target node set.
(time) the time information, if present, can be given as integers, decimal, or as date-time strings. In the latter case the pattern of these strings must be given. If time is implied then the event numbers are used as event time. Events in input files must be sorted by time from the past to the future; events happening at the same time are possible. The time interval type specifies which events are considered as simultaneous events.
(attributes) define which information is recorded from past events. This information is later used to compute explanatory variables (statistics) and, potentially, to define the risk set. You can specify dyad-level attributes (for instance, recording the number of events from a particular node to a particular other node), node-level attributes (for instance, recording in-comming or out-going events for each node, or setting externally given node attributes), and network-level attributes (for instance, recording events in the whole network or the last event time).
(statistics) are explanatory variables that determine the predicted rate of future events on dyads or nodes. They are specified as functions of the attributes. Eventnet provides statistics for dyadic effects (repetition or reciprocation), degree effects, triadic effects, four-cycle effects, and statistics dependent on node attributes (the latter can also define degree effects, if degrees are recorded in node-level attributes).
(observations) specify which observations to consider for analysis and they define the risk set (that is, which events could have been observed). It is possible to efficiently sample from the non-events in this risk set ("case-control sampling") and/or to sample uniformly from the risk set. Case-control sampling is advisable if the number of non-events is huge compared to the number of events.

A completed configuration can be executed by clicking on the process button in the GUI or - without opening the GUI - via the command java -jar eventnet-x.y.jar configuration_filename.xml. (The latter option is preferable, at least when analyzing large files or when searching for errors in the configuration file.)

The results are written to the output directory (or sub-directories thereof). The output files can be conveniently analyzed with standard software for time-to-event data, such as provided by the package survival in the R software for statistical computing, as illustrated below.

A tiny example

We illustrate the use of eventnet on a tiny list of just 10 input events. This data is completely made-up and will not yield any empirical insight. It is for the sole purpose of illustrating a minimal set of tasks needed to specify and estimate a relational event model with eventnet. The sequence of "observed" events is

sender,receiver
A,B
A,D
B,A
B,C
A,B
D,C
C,B
A,C
A,D
A,C

If you want to replicate this example you could copy and paste these events into a file tiny_example.csv, or simply download the file tiny_example.csv.

Open the eventnet GUI. In the files tab browse to the input event file (tiny_example.csv), set the delimiter to COMMA, and browse to an output base directory to which you want to write the results. These settings are shown in the screenshot below.

In the events tab specify that the SOURCE of events is in the column labeled "sender" and the TARGET of events is in the column labeled "receiver". No other settings have to be done in this tab. (Indeed, in our tiny example we have no event types or weights; the event time is given implicitly by the order of events in the input file; and the relational events are from a one-mode network.) These settings are shown in the screenshot below.

In the time tab specify that the event interval type is EVENT. This means that each event happens "alone", that is, there are no simultaneous events. No other settings have to be done in this tab. These settings are shown in the screenshot below.

In the attributes tab click on the create attribute button. In the next dialog, set the attribute class to DYAD_LEVEL and the type name to DEFAULT_DYAD_LEVEL_ATTRIBUTE and click on ok. In the create or edit attribute dialog (see below) give a name to this attribute (like "past_interaction"), click on the add event type button; the add event response dialog opens. Set the event type to EVENT, click on set and then, in the create attribute dialog on ok. The resulting attribute adds up for each dyad the number of past events that happened on that dyad. There is no scaling, no decay, etc.

In the statistics tab click on the create statistic button. In the create statistic dialog create a statistic named "repetition" by the settings shown below. Create a similar statistic "reciprocation" by changing the dyad direction from OUT to IN.

Create a third statistic named "outdegree_sender" by the settings shown below. Create a similar (forth) statistic "indegree_receiver" by changing the dyad direction from OUT to IN and the endpoint from SOURCE to TARGET.

Create a fifth statistic named "transitive_tie" by the settings shown below. Create a similar statistic "cyclical_tie" by interchanging the dyad direction from OUT to IN (direction of the first dyad) and from IN to OUT (direction of the second dyad).

The statistics tab now displays a list of six statistics as shown below. In the following, eventnet will compute the values of these six statistics for each observation (events and non-event dyads).

In the observations tab create a single observation (click on the create observation button) with settings as in the screenshot below. You have to specify a name, like "OBS", uncheck the box apply case-control sampling, and check the box exclude loops. No other settings have to be done.

Click on the process button in the left-hand side of the eventnet GUI to start the computation. With this small example it will finish rather immediatelly and will report the number of processed items (for instance, 10 events). The computed statistics of all observations (events and non-event dyads) are in a file tiny_example_OBS.csv in the output directory.

If this does not work at all, we recall that there is a troubleshooting help page, which lists some common problems and tries to come up with solutions.

To fit a relational event model on the precomputed statistics with the coxph function of the "survival" package in the R software for statistical computing you can use the following code. Models could be estimated with different explanatory variables (for instance, including interaction effects), different functions, different packages, or different statistical software.

# set the working directory
setwd("<output directory of eventnet>")
# read the explanatory variables of all observations
events <- read.csv("tiny_example_OBS.csv")

## install the R package 'survival' if necessary (uncomment the following line)
#install.packages("survival")
# attach the library
library(survival)

## specify and estimate a Cox proportional hazard model
my.surv <- Surv(time = rep(1,dim(events)[1]), event = events$IS_OBSERVED)
my.model <- coxph(my.surv ~ repetition + reciprocation 
                  + outdegree_sender 
                  + indegree_receiver
                  + transitive_tie 
                  + cyclical_tie
                  + strata(EVENT)
                  , data = events)
# print estimated parameters, standard errors, ...
summary(my.model)

In this particular example, not a single effect is significant. Indeed, we had only 10 observed events which apparently is not enough. Depending on the distribution of the explanatory variables, some hundred events might be sufficient for producing significant results.

A much larger case study, using real empirical data, is given in the basic tutorial.