09 10 22: Data Analysis Overview and Hypothesis - rolandgriggs/URE2022_XGXaviGriggs GitHub Wiki

Project 4. Impact of the number of database updates on long term database availability

Research Aim :

Investigate if the number of databases updates impacts the availability of databases.

Research questions:

  1. How many databases have a very high number of published updates (more than 5 updates), a medium range number of updates (5 to 2 updates), and no updates?

Hypothesis : The Majority of databases will have a medium number of updates.

Methods : Number of databases with more than 5 updates, 5-2 updates and those with none.

Potential visualizations: Column chart

  1. How many databases (published more than 10 years ago) have recent updates (updates in the past 10 years)?

Hypothesis : The majority of databases published more than 10 years ago will not have recent updates.

Methods : The number of databases that are more than 10 years old with any updates in the past 10 years.

Potential visualizations: Line graph

  1. What is the proportion of available/unavailable databases with a high, medium or no updates?

Hypothesis : Databases that have high or medium updates are more likely to be available.

Methods : Number of databases that are available/unavailable in proportion to their number of updates.

Potential visualizations: Scatterplot

Dataset description:

2343 entries (1 entry per database. Excluded the Databases never published online.

Variables included:

  • db_id : Unique identifier for the database in JL_DB dataset
  • resource_name : Name of the database
  • first_publication : Date of the first article publication of the database
  • Nb_of_articles : Number of publications for that database. If equal to 1, then the database had no published updates, if superior to 1, the database was updated.
  • last_publication : Date of the last publication. Equal to first publication if only one article was published for that database.
  • available_2022 : TRUE if the database is available online in 2022, FALSE if not