considerations - rmoehn/theatralia GitHub Wiki

Considerations

There is a scenario, where the user sits in an archive, a library or at a conference and wants to quickly add materials to her collection.
- Do people always have internet access in archives, libraries and at conferences? Do we need some offline part of the application? (UO: well, yes, I assume that the offline locations are getting less and less so that the offline part would count as a luxury later on problem)
- If the emphasis is on quickly, would it be useful to provide some functionality where you first add lots of materials and the metadata later? (UO: yes, both should be possible, quick adding unstructured data for later rework an important option)
- Would it be useful the other way around: add lots of metadata and later the materials themselves? – I could think of a situation where you scan some things and the scanner saves them on a pendrive and later you want to add those things in batch and associate them with the metadata you put in during scanning.(UO: yes, correct, similar case: you put some metadata of infos you need and ask an assistant to do some web search and link the materials to it)
The development page says that we should aim for a high degree of distribution from the start. It sounds as if the normal thing would be to have only one or a few users per node and the nodes should somehow be connected. I wonder what benefits we would have from such an architecture. They should be great enough to outweigh the resulting difficulties and complexities:
- Users are much less inclined to use a system if they have to set it up on their own than if they just have to register somewhere on the internet. (cf. Diaspora, Friendica)
- Users would barely ever undertake to set up the system on their private computer. – It's offline most of the time, does no backups, sits behind firewalls.
- A multi-node system makes development (including testing) more complex.
  - Communication and synchronisation between nodes.
  - Searching for items in distributed databases!
  - Need those discipline registry things.
- Requirements say that there should be an incentive for libraries, archives, publishers to publish their information in the system. How do they know which node to use? Should the set up their own? But they wouldn't researchers, but a completely different user group.
- Requirements say that the system should give an impression of the state and direction of research in some discipline area. If you want to get that impression, to which node do you go? Were do you start looking?

(**UO: there seems to be a misunderstanding here, caused by the use of the word "node". the idea is the following:

one instance of the system (i.e. hildesheim/theatralia.de) serving generally lots of users, it can develop into a widely known address for theatre research for example and users can simply sign up on the server to add there profile.
internally though all the different users are not managed by overarching processes and database but are completely separated from each other (data and process), so for example, looking up public entities would not mean looking in the shared database for the public marked things but sending messages to other users to get the data. for those instances I used the word "node".
the thinking behind this idea was mainly, that it in the end, this would mean there is no big difference, whether the users are all on one server or different instances on the system. communication inside of "theatralia.de" would be similar to a possible later communication between "theatralia.de" and "19thcentury.com"**)

Requirements say that the system should be neither a complete database nor a publishing platform. They also say that there should be an incentive for libraries, archives and publishers to publish their information in the system. How do those two fit together?

(UO: well, big question, some answers: a) if actually some academic community uses it widely, publishers might be interested in having their publications found: without too much work, they could enter new bibliography data into the system, so if theres a new dissertation on some subject, it will be linked there, though actually publishers would probably rather asked the authors to do so, but that is one and the same. b) libraries will not use the system for managing data or publishing it, since they have their own systems, but cooperations should be possible, that link the system to library contents and catalogues and allow to put metadata onto those library entities. c) similar with archives, but most archives are too small and under financed to have their own good data systems (some still print digital paper articles and put them into folders to be safe) ... here later on cooperations might be thinkable.. but again I think, it would have the focus on interfacing and adding metadata, the "real" data entities being in statically referencable databases that do not allow much more than safe storage...)