Troubleshooting (help) - juergenlerner/eventnet GitHub Wiki

You tried to apply eventnet in your own project, or just to repeat the illustrative analyses from the tutorials, but it did not work.

This page tries to go systematically over typical problems and tries to suggest solutions.

We assume that you are at least familiar with the first steps tutorial, or with the RHEM first steps tutorial if your interest is in applying RHEM.

Recommendation: start eventnet from the command line

In general, eventnet can be started from a command line (e.g., java -jar eventnet-x.y.jar) or by double-clicking on the JAR file. However, when you start it from the command line, you will get more detailed error messages which can help a lot in tracing down the errors.

On Windows click on the start menu and type cmd in the search field. Then click on the suggested "console" app. Alternative names include "terminal", "shell", or "power shell". Alternatively, search for these terms in the Internet. A window with a usually black background opens in which you can type commands. On MacOS or Linux search for any of the terms "console", "terminal", "shell" - or just start it, if you know how.

It is often easier if the eventnet JAR file, the configuration file, and the data file (CSV) are all in one directory and if you change the console's working directory to that directory. Use the command cd <directory name> to change the directory.

The next few sections go through possible problems with java and are only relevant if you could not start eventnet at all (e.g., the eventnet GUI does not open at all). If you can open the eventnet GUI (but other things do not work), you may directly jump to the section on processing an eventnet configuration.

Ensure that java is installed

Eventnet is written in java and needs the java runtime environment (JRE), Java 8 or higher. You may also search for "download java runtime environment <my operating system>" for other possibilities to get the JRE. An alternative to the JRE by Oracle is the Eclipse Temurin JRE.

Type the command java -version in the console. The version should be 8 or higher (might also be denoted as jdk 1.8.) If you get an error message saying that the command java is unknown or misspelled, go through the installation process of the JRE again. (Up to now the problem has not yet anything to do with eventnet.)

Permission to start an unsigned app

Even if the JRE is installed, your system might still prevent the execution of the eventnet JAR file. This will usually happen on a computer running MacOS since eventnet is not signed by any certificate from Apple. Search for "MacOS run unsigned app" to see how you can allow the execution of eventnet.

If your computer is administrated by someone else (e.g., system administrator), it might be that you cannot run an unsigned app. In this case, ask your system administrator, or use eventnet on a different computer (e.g., private laptop).

Starting eventnet

Once the points above have been solved, it should be possible that you start eventnet. The command java -jar eventnet-x.y.jar (where "x.y" is to be replaced by the version number of the jar file) should open the eventnet graphical user interface (GUI). If this succeeds, the remaining errors are likely due to problems with reading/writing/locating files, formatting of the input data (CSV file), or with misspecification of the eventnet configuration.

Processing an eventnet configuration

We assume now that you can open the eventnet GUI. Processing a configuration can be started either by loading a configuration file in the GUI (or putting together the configuration in the GUI from scratch) and clicking on the process button, or from a command line with a command like java -jar eventnet-x.y.jar <configuration_filename>, where the latter option is preferred to get more detailed error messages, if any. The problems discussed below may appear when processing an example configuration from the tutorials, or when processing your own configuration.

The following symptoms may indicate that something does not work properly.

(1) The configuration file cannot be loaded into the GUI.
(2) The processing did not produce any output file, or just an empty output file, or an output file that has a header (column labels) but nothing else.
(3) The processing resulted in some output - but then it crashed or it gets seemingly stuck before processing the entire input.

If eventnet has processed the entire input, then the progress report (in the console from which you started eventnet and/or in the small window that opened when you clicked the process button in the GUI) ends with

DONE
NUMBER OF PROCESSED ITEMS:
...

followed by some additional lines giving the number of files, time units, rows of any type, rows containing events of specific types, etc. If no DONE appears, then the processing has not ended properly.

(4) The processing ended properly (the DONE appears) but the output or the model estimation does not make sense. For instance, a computed statistic is constantly zero even though a simple argument demonstrates that it has to be non-zero on some instances, or model estimation does not converge, or produces some parameters that are NA, or ...

In the following we go through some possible sources of errors that might lead to these symptoms.

The configuration file cannot be loaded into the GUI

An eventnet configuration is written in XML, however, the filename suffix does not matter for eventnet. A usual suffix for XML files is .xml, but some email servers seem to block such files, which makes it more inconvenient to share them. Moreover, word processing software (such as MS Word, Open Office Writer, Libre Office Writer) often tries to interpret XML files, or search for layout information. Such problems can be avoided by using the filename suffix for plain text files: .txt.

Such a configuration file (with the suffix .txt) can be opened and edited with word processing software. However, it must not be saved in a formatted document format (such as .docx, .odt, etc) but it must be saved as plain text file (.txt). I use a text editor called Emacs (available for all common operating systems) to open, read, edit, and save eventnet configuration files, which works fine and can even recognize and highlight formatting errors in XML files (provided that the filename suffix is .xml).

XML has a specific syntax which is too complex to be discussed here. If the configuration file is not wellformed, it cannot be opened or processed in eventnet. This will be revealed in an error message in the console from which you execute the command java -jar eventnet-x.y.jar <configuration_filename>. If this happens, we recommend to edit eventnet configurations in the GUI and then save them to a file - which will produced wellformed XML. Editing an XML configuration file in an editor is never really needed (although it might be faster when creating very large and complex configurations).

Directories and files

It might be that the directory of the input (CSV) files (or their filenames), or the output directory is misspecified. In the configuration files from the eventnet tutorials, the input directory is most often specified as a single dot (.) which represents the current working directory, which is the working directory of the console in which you typed java -jar eventnet-x.y.jar. It is also possible to specify a directory via a so-called "absolute path", such as C:\Users\my_username\my_data_directory. The given filenames (or input sub-directory names) are then appended to this "input base directory". A similar remark applies to the "output base directory". When processing a configuration, the first few lines in the progress report give the output base directory, as an "absolute path", and the directory and name of the first input file (CSV file), again as an absolute path. You may check whether these directories and filenames, as reported by eventnet, are correct. If not, update them in the configuration, change the working directory, or move the files - whatever seems most appropriate.

Even if the directory names and filenames are correct, it might be that eventnet has no permission to read (the input files) or to create and write (the output files). This is a tricky issue that is dependent on the specific computer, operating system, and settings. In most operating systems there is a user home directory, or user document directory, or download directory, which might be the most prone to be readable and writable.

You can check whether eventnet can read at least the first input file in the "files" tab in the eventnet GUI. If a small excerpt of a table is displayed under csv settings and above output base directory, it can read the first input file. If no table appears, then either it cannot read the first input file, or the csv settings are misspecified (see below). You can check whether eventnet can create and write to the output file(s) after clicking on the process button and then looking with a file browser in the output directory. Even if the processing crashes or gets stuck, it should still create a number of output files equal to the number of eventnet "observation generators" (see the tutorials) in the configuration.

Formatting of the CSV files

If directory and file names are correct and files are readable - but still no excerpt of a table, or a malformed table, is displayed (see above) - it might be that eventnet cannot understand the formatting of the CSV input file. First, the delimiter and, if needed, the quote character need to be set correctly. If this does not fix the problem, the issue becomes tricky. It might be that the file encoding is not understandable by eventnet - but this is hard to even see, or change. You might try to open the CSV file in some software for statistical analysis or data analysis, such as in the R software for statistical computing via the command read.csv(), check whether it reads the table correctly, and then write it out into a new file (e.g., write.csv() in R). This might fix issues with the encoding - or you might see that the file is also not readable by other software.

Processing crashes, gets stuck, or produces an empty output file

Assuming now that the input file can be read properly (the preview table is displayed, see above) and eventnet can create and write to the output file(s), there might still be some errors in the configuration. Next we try to find possible sources of errors that could lead to a processing that crashes, gets stuck (the progress report does not advance anymore), or produces an output file with no rows, except perhaps the header.

Specification of event components

Every event (that is, row in the input table) needs to specify a non-empty source and a target - even if the row represents a "dummy event" that sets a node level attribute or if the row specifies a participant in an undirected hyperevent. In such cases, the source and the target column contain the same node id, that is, the row specifies a "loop", and the respective event type specification (in the "events" tab) has to allow loops. Eventnet removes leading and trailing whitespace characters (e.g., space and tab) from the node ids given in input file. If this results in the empty string for either the source id or the target id, the respective row is ignored.

If a column for the event WEIGHT is specified in the "events" tab (that is, if the weight is not "implied"), then these entries in the input file have to represent decimal numbers. To the best of my knowledge, the decimal separator (if any) always has to be a dot (.), and not a comma. Moreover, commas separating thousands from millions from billions should not be used. A decimal number might look like 1234.987, or 1.0, or even without a decimal separator, like 12. It might also use "scientific notation", like 3.7e5 (which means moving the decimal point five digits to the right), or 1.0e-7 (which means moving the decimal point seven digits to the left). But no other sophistication is possible. If an entry in the weight column cannot be interpreted as a decimal number, the respective row is ignored.

The formatting of the entries representing the event TIME depend on the given time format type (see the "time" tab and, for instance, the basic tutorial). The simplest format is INTEGER in which case the entries must be interpretable as integers (no decimal point, not even .0). Another common format is DATE_TIME, which are strings like 2023-07-18 16:46:21, or 18.07.2023 or 07/18/2023. In this case a time format pattern has to be specified - see the basic tutorial and/or a rather lengthy treatment on all possible time format patterns. An entry that cannot be interpreted as a date/time according to the given pattern will likely cause the processing to crash. At the very least, the respective row would be ignored.

All rows in the input file have to be ordered in time from older to younger events when going through the file from top to bottom. Consecutive rows may have the same time - but it must never be the case that an older event is listed afer a younger event. This would cause the processing to stop (or crash or result in undefined behavior).

If a column for the event TYPE is specified, then all different event types (that is, all different character strings in the type column) must be explicitly declared in the configuration. See the "events" tab in the GUI or the <event.types> element in the configuration file, which may look like this:

  <event.types>
    <type name="edit" implied="false" admits.loops="false" source.node.set="users" target.node.set="articles"/>
    <type name="talk" implied="false" admits.loops="false" source.node.set="users" target.node.set="articles"/>
  </event.types>

If the network is a multi mode network, then all event types must declare their source node set and target node set. If an event type is not declared in the configuration, the respective row will be ignored.

Reference to undeclared attributes

A common source of error is that the declaration of some eventnet statistic refers to an attribute that does not exist (that is, is not declared in the configuration). This will result in a transparent error message or it will cause the processing to crash. Note that the name and the class of the attribute must match the declaration in the statistic. For instance, if a statistic refers to a node-level attribute called activity then it would not help if there is only a dyad-level attribute of that name. Moreover, two attributes in the same configuration must never have the identical name, even if they are from different classes (e.g., node-level vs dyad-level).

Out-of-memory error

A java application always runs with a strict limit on the memory consumption, which might be exceeded when processing large data files with eventnet. This will result in an error message that you can see in the console from which you start eventnet. You can increase the memory limit with the -Xmx argument of the java command. For instance,

java -Xmx1024m -jar eventnet-x.y.jar [<configuration_filename>]

starts the java application with 1024 MB, the argument -Xmx16g sets it to 16 GB, etc. It is rarely useful to set a limit that is larger than your computer's main memory (RAM).

Excessive risk set size

In networks with many nodes, the risk set size (e.g., the number of possible events at a given time point) can be so large that processing them all would never finish. For instance, in a dyadic REM with a million nodes, we get a risk set containing about one trillion pairs of nodes. In relational hyperevent models (RHEM), the entire risk set size is excessive already with a much smaller number of nodes. For instance, just 40 nodes already yield more than a trillion subsets of nodes, which are candidates for possible hyperevents.

In most cases (except when analyzing small networks) we do not process the entire risk set, but a sample from it. This is done by checking the apply case-control sampling option in the specification of the observation generators. In the XML configuration files, this is done by the option apply.case.control.sampling="true" and specifying a number.of.non.events="1" per event.

Processing a huge risk set without sampling would never finish, or produce eventually an out-of-memory error.

No events or no non-events in the output

If processing finishes successfully (the DONE appears in the progress report) but the output files contain either no rows at all, or they contain no non-events, it might be that the respective observation generator is specified in a way that no event or no non-event satisfies the given constraints. (For further causes, see the comments above on the specification of the various event components.)

Eventnet offers several possibilities to constrain the risk set (which can also be used to define dynamically changing risk sets). For instance, it can require that sources, targets, nodes, dyads, or hyperedges have to be non-zero on a specified attribute. Then it is good to check whether the respective attribute is indeed set to a non-zero value at the appropriate moment in time (check the specification of that attribute). Recall that events are ignored if their type is not specified in the "events" tab (the <event.types> element in the configuration file). Thus, attributes would not be set to any value by events of unknown type. A hyperedge is non-zero on a given hyperedge attribute only if exactly the identical hyperedge is mapped to a non-zero value (not if it overlaps with another hyperedge mapped to a non-zero value). In multi-mode networks it is possible to require that the source, target, or node of events is from a specified set of nodes. It is good to check the event-type specification whether adds the right nodes to the various node sets. Loops can be excluded in the observation specification and it is good to know that a directed hyperevent counts as a loop if the source set and the target set have one or more nodes in common. It is further possible to condition the risk set on the observed source, target, both source and target, or hyperedge. Conditioning on the observed source and target will define a risk set of size one, which consequently contains no non-event but only the one observed event. (This setting can still make sense in a model seeking to explain variation of the type or weight of observed events.) Moreover, the analysis can be restricted to a given interval whose borders can be specified via time, event counter, event interval counter, time unit counter, and time point counter. It is good to check whether there are any events in the specified range.

Confusion between dyadic events, undirected hyperevents, and directed hyperevents

The most frequently used types of observations in eventnet are for dyadic events (having one source and one target), undirected hyperevents (having one arbitrarily large set of event participants without any distinction between sources and targets), and directed hyperevents (having an arbitrarily large set of source nodes and an arbitrarily large set of target nodes). In some cases, the same data could be interpreted in different ways. For instance, paper publication events might be considered as undirected hyperevents among the set of authors - or as directed hyperevents linking the set of authors to the paper they publish. An important remark is that an observation only calls statistics of the compatible type. For instance, an observation for undirected hyperevents only calls the undirected hyperedge statistics and not any of the directed hyperedge statistics. This could result in an output file in which (seemingly) some or all columns for the statistics are missing. If some or all statistics are missing in the output file, one possibility could be a mismatch between the type of observations and statistics.

Processing with eventnet finishes but models cannot be estimated

Last we deal with the symptom that the processing with eventnet finishes (that is, the DONE appears in the progress report) but estimating model parameters from the output table (in R or in other statistical software) does not work. This might have several causes.

The case of no events or no non-events in the output table already has been discussed in the subsection above. If there are no non-events in the output table it is impossible to estimate a Cox proportional hazard model (e.g., by the coxph() function in the R package survival).

If a statistic is constantly zero, or more generally if it is constant over all instances, it cannot be used as explanatory variable. If some argument suggests that the statistic should not, or cannot, be constantly zero, it might be due to misspecification of the configuration. Most statistics are functions of eventnet attributes and attributes change their values by events of given types - at a time point right after the time of the event that changes the attribute value. For instance, "dummy" events that set exogenous node covariates have to appear in the input file before these values should actually have any effect on statistics. If there is an all-zero statistic (which however should have some variation), you may check the whole pipeline from the respective events in the input data, event type declaration in the configuration, definition of attributes, and definition of statistics. Recall that event types must be declared in the configuration (see above) - otherwise these events will be ignored.

A bit more tricky is a statistic that shows some variation overall but that is constantly zero on all non-events. Such a statistic could not be used in a CoxPH model. (The reason is that such a statistic would be a "perfect predictor", since any instance with a non-zero value is an event with 100% certainty - and the CoxPH model does not allow for perfect predictors since the associated parameters would be plus/minus infinity.) The reason why a statistic could be constantly zero on all non-events could be a misspecification of the configuration (similar to the above) - but it could also be a problem of network sparsity. The latter can easily happen with some RHEM statistics, for instance, subset repetition of higher order, in connection with a large and sparse network. The probability that a randomly chosen subset of nodes has participated in any previous event might be so small that it does not happen for any of the randomly selected non-events. The same problem might arise in huge, sparse networks of dyadic events, as illustrated with the repetition statistic in the tutorial on large event networks. A possible solution is to increase the number of non-events in the case-control sampling. But this is only possible up to a certain point (preliminary experience shows that it is rarely helpful to increase the number of instances, events and non-events to more than 10 million). Another possible solution is to reduce or partition the network into smaller but denser subnetworks - if this makes sense in the given application. We emphasize that the problem of statistics that are zero over all non-events is not necessarily an "error" in the specification of the model, but it may simply be a characteristic of the data. If it cannot be solved in any other way, these statistics have to be dropped from the model. For instance, subset repetition in a RHEM can be included only up to some order that depends on the data characteristics.

In a CoxPH model the information which events are associated with which non-events is conveyed with a strata variable (compare the R code examples in most or all of the tutorials), for instance the event time or the event counter. A CoxPH model cannot be estimated with a statistic that always takes the same values on events and their associated non-events. Examples of such statistics are "global" variables, such as the day of the week or the cumulative number of previous events in the whole network. In a RHEM that conditions on the size of the observed events, the event size would be another example of such a variable. If a variable always takes the same value on associated events and non-events, then its parameter is undefined since its effect cancels out in the likelihood function of the CoxPH model. Again, such a situation is not due to any "error" in the model specification. It just points to some properties of the CoxPH model. The only solution is not to include such a variable in the models. Note that such a variable could still be interacted with other variables that do vary between events and associated non-events. For instance, it might be that a repetition effect is stronger or weaker depending on the day of the week, or that a RHEM effect is stronger on large events than on small events.

Some REM or RHEM statistics, e.g., counts of previous events, are often very skewed on typical event networks. Sometimes this may lead to a non-convergent model estimation since parameters are numerically as good as infinite. Sometimes it helps to scale these statistics with a sublinear function such as the square-root or log(1+x) (i.e., taking the logarithm of one plus the value of the statistic), before model estimation.