Rework tags - agrodet/tatoeba2 GitHub Wiki

Introductory notes - what are tags?

Tags are supposed to be meta data that gives extra information about a sentence. 
As such tags should not be confused with lists.

REWORK HOW TAGS ARE ADDED / DELETED TO A SENTENCE

Add a "Tag sentence" icon to the sentence block menu. The "Add tag" feature would be similar to the "Add to list" feature. The equivalent to "Your last selected list (if any) and last updated lists" suggestions would be split into two sections: "Your most recently applied tags" and "Most useful utility tags" (@-tags). The usefulness of the latter can be discussed: Would it be faster to scroll until "@change" and click on it or to simply enter "@ch" in the input field, get the suggested tag names, and click on "@change"? Regular users would not have the second section.

On the sentence page, we wouldn't have an input field like we have now. Instead, we would have a "Tags" section similar to the "Lists" section. We would need a way to delete a tag from that section, just like we need it for lists (issue number?). The removal of a tag from that section shouldn't reload the entire page (#1237)

There is a need to use loose-matching for auto-suggestion (lowercase / uppercase, accents, etc.). (#302)

However, on the "Browse by tags" page, the input field is needed for search. There, we need to add an indicator showing that the suggestion search is still running (#298)

Finally, one last problem is left to solve: How to provide the option to tag translations on the fly? Simple use case: I search for sentences tagged "animal" in order to translate them. Obviously, I also want to tag my translations "animal" (if relevant).

MERGING OF TAGS (#961)

Provide a tool to corpus maintainers to allow them to rename, merge etc. tags.

  • Merging should be a once in a while operation.
  • It should NOT be a one-man operation. This should be discussed among corpus maintainers.
  • The community should be notified on what tag(s) would be merged to what tag(s). Users could argue or rebute the merging for some tags.
  • URL of the removed tag should somehow redirect to the URL of the tag it was merged into.
  • When tags will be translatable, there is a non-negligible risk that translations would prevent a merge operation. For example, Tag A and Tag B have similar meaning in English, but not in Finnish. How do we merge Tag A and Tag B? Since tags are used for meta data, this shouldn't be a problem. Because translations would be up to users, it will be.

TRANSLATION OF TAGS (#54)

  • Adding a tag shouldn't be reserved to English speakers.
  • User A adds a tag in Finnish, it's fine. When the tag gets translated, it can be linked to the "group" of translations of the same object.
  • It may happen that translating Tag A from Finnish to English apparently gives a new tag but it's just because the translation of Tag A is different from "Tag B", although Tag B and Tag A are actually the same. This could be mitigating by merging tags (or not, see above).
  • Tags should be displayed in the interface language.

In summary, using English as a common ground, and making the use of a tag possible only after it has been added in English is intrinsically wrong and unfair. The same goes for requesting a new tag.

CATEGORIZATION (#333)

TRANG asked the following questions:

  • What categories do we want exactly?
  • What would be the process for adding a new category? For deleting a category? For renaming a category?
  • How do we decide which tag belongs to which category?

With a little bit more details:

  • Who adds a category? When?
  • How to notify users?

And arguably the most important: What IS a category and how do we use it?
It's easy to say @-tags belong to the "utility" category and the "by xxx" to the "authored" one. Those are trivial cases.
But, suppose a sentence is tagged "animal", what category is it?
And now, there is a dozen of sentences talking about snakes. It makes sense to tag them all "snake". Shouldn't they be tagged "animal"? Shouldn't "animal" include "snake"?

Categorization could be used as a mean of control to "Allow users to tag their own sentences without special permissions" (#1198)

MAINTENANCE OF TAGS

Problem: "Today, tagging is a free-for-all activity. Contributors are not consulting each other before creating new tags. We have many duplicate tags and many "personal" tags."

Simple possibility: Display tags ordered by "creation date" and allow corpus maintainers to merge / delete (see above). Not a one-person decision. The creator should be notified with an explanation on the use of tags.

Miscellaneous

  • Correctly deal with sentences merged by Horus. (#1622)

  • Add an admin page to remove tags (#330). This overlaps with the previous section. One could argue to limit the functions to admin or not. In any case, maintenance of tags should not be a one-person decision.