Database Primary Keys, Practices and Data Assumptions - SeanGrady/life_tracker GitHub Wiki

Currently there are several types of data which are sourced from external CSVs and therefore require careful ETL to prevent gaps and duplicates, for example the various exports from nutrition/exercise apps like Cronometer, LoseIt and MyFitnessPal. Wherever possible, when creating tables for these data types, we are trying to define composite primary keys that ensure the data can't have duplicates. This makes it much easier to work with the csv exports, as in a pinch you can simply export all the data for a given user from the external service and load it into the database at once. Any gaps will be filled, and any data that would be duplicated will instead not be added since a record with that primary key will already exist.

For some of these data types, it makes sense to restrict users to only one data point per day (for example, weight or bodyfat percentage; if this changes meaningfully in the course of one day, you should go see a doctor). For these data types, since there's often little in the way of unique information, I'm using the date as part of the composite primary key. So for a weigh-in, for example, the key is the app user's ID and the date. This is fine for the MVP, but eventually we will want to be careful with assumptions here. For example, one weigh-in per day seems reasonable, but what if you'd like to update your weight more than once per day and keep only the most recent? Or the maximum or minimum?

There might be use cases where multiple weights a day are useful, like in the case of a marathon runner that wants to make sure they're staying well hydrated. In that case, it may be best to allow duplicates of these kinds of data and give the user control over which if any are kept. Care will need to be taken to make sure changes users make in external apps update properly rather than adding more duplicates to the mess.