Data Analysis Part 1 ‐ overview and hypothesis - OscarO-0/BAT102_oscarosuna GitHub Wiki

#Project 2. Impact of the number of citations on long term database availability

##Research aim: Investigate if the popularity of databases (the number of citations of the database) impacts the availability of databases.

##Research questions:

  1. How many databases have a very high number of citations (more than 100 citations), a medium range number of citations (10 to 100 citations), a low citation number (less than 10 citations), and no citations?

hypothesis I think that there might be a trend of having more lower cited databases than high or medium, because lower cited databases will most likely be newer ones and the newer one's are probably the most collected in our datasets.

methods number of citations for databases and grouping those databases in groups of less than 10, 10-100, and 100+

potential visualization bar graph

  1. How many databases with a very high number of citations (more than 100 citations) are old databases (published more than 10 years ago)?

hypothesis I think we will see a large amount of databases that are highly cited and also 10+ years old, it makes intuitive sense that an older database will have accumulated more citations.

methods The number of databases that are highly cited as well as 10+ years old

potential visualization pie chart

  1. Are databases with a high or medium number of citations less susceptible to being discontinued than databases with low or no citations?

hypothesis I think low and medium cited databases might be more susceptible to being discontinued because it might not be as feasible to keep those databases up and running if they are not being used.

methods Number of databases with low citations that are also discontinued.

potential visualization bar graph