Data Analysis Part 2 ‐Analysis and Visualizations Project 4 - alexneubert/BAT102_ANalexandrian GitHub Wiki

Research Aim :

Investigate if the number of databases updates impacts the availability of databases.

Research questions:

  1. How many databases have a very high number of published updates (more than 5 updates), a medium range number of updates (5 to 2 updates), and no updates? Data Calculation: The table presents the number of databases based on their update frequency:

No updates: 1,597 databases 2-5 updates: 672 databases More than 5 updates: 86 databases

Visualization Used: Pie Chart

Main Results: The majority of databases (approximately 66%) have no updates, while a smaller proportion (4%) have more than 5 updates. Databases with 2-5 updates account for 28% of the total.

Conclusion: A significant number of databases remain untouched without updates, while a minority receive frequent updates. This distribution shows a tendency towards limited updating for many databases.

  1. How many databases (published more than 10 years ago) have recent updates (updates in the past 10 years)? Data Calculation: The breakdown of older databases (published over 10 years ago) by recent update status:

No updates: 892 databases At least one recent update (within the past 10 years): 286 databases At least one old update (older than 10 years): 342 databases

Visualization Used: Pie Chart

Main Results: A significant portion of older databases (approximately 57%) have no recent updates. Only 18% have recent updates, indicating continued relevance, while 22% have updates that are over 10 years old.

Conclusion: Many older databases are not actively maintained, suggesting a potential decline in relevance or usage over time. However, a small portion of older databases continue to receive updates, indicating that they still hold value for current users.

  1. What is the proportion of available/unavailable databases with a high, medium or no updates?

Data Calculation: Availability of databases in relation to update frequency:

No updates: 716 available, 880 discontinued 2-5 updates: 395 available, 276 discontinued More than 5 updates: 64 available, 21 discontinued

Visualization Used: Bar Chart

Main Results:

Databases with no updates have a higher discontinuation rate (880 out of 1,596). Databases with 2-5 updates show a more balanced status, with a higher proportion still available (395 available vs. 276 discontinued). Databases with more than 5 updates show the lowest discontinuation rate, with most still available (64 available vs. 21 discontinued).

Conclusion: There is a clear trend where databases with higher update frequencies tend to have higher availability. This suggests that ongoing updates contribute positively to the longevity and continued availability of databases.

Overall Conclusion The analysis suggests that the number of updates a database receives is positively correlated with its availability. Databases with frequent updates are more likely to remain available, whereas those with no updates have a higher likelihood of being discontinued. This shows the importance of regular updates to maintain the relevance and accessibility of databases over time.