Spec: Related Items - ckan/ckan GitHub Wiki

Update - Discussion has been moved to the ckan/ideas-and-roadmap repository: https://github.com/ckan/ideas-and-roadmap/tree/master/specs/showcase

Summary of the current implementation

This section summarizes the current (CKAN 2.2, Feb 2014) implementation of related items, both from the user's point of view and the technical details, the point is to get an idea of what the feature currently does and any problems with the current implementation.

Note that related items is not currently implemented as an extension! It's a core feature, and always on. Maybe it should be moved to an extension?

model

In the model for related items (see ckan/model/related.py), each related item has:

id
type (API, application, idea, news article, paper, post or visualization)
title
description
image_url
url
created date
owner_id (the user who created the item)
view_count
featured (true or false)

The way the UI and much of the code is implemented currently, each related item is related to exactly one dataset. You can't have an item related to multiple datasets. You can use the API to create items related to no datasets.

The type field seems a bit pointless, there's no actual difference between the different types of related items, they're all just objects with a title, description and link. What if someone wants to add a related item that doesn't come under those 7 types? On the other hand, you can filter related items by type and e.g. have a page showing just the apps, and having a list of types helps to suggest what sorts of things the feature might be used for. Maybe it just needs a final type "Other"?

AaronM: being able (with the API) to create a related item with no dataset seems a bit pointless/illogical. By definition a reuse would require the existence of something (a dataset) to e reused in the first place?

I agree that at present Type is not very useful. Other as a choice would be helpful, but the likes of "idea" I think is pretty woolly, and "application" is open to different interpretations.E.g. I presume you mean "I have developed an App that uses the data", but in my area of work it would more commonly be along lines "I have applied the information in this dataset to help me solve my (real world) problem". On the other hand as a sysadmin being able to categorise/filter reported reuses would be a good thing. I think just need some thought on what you call this feature (see also comment below), and what type choices are presented...

`/related` page (the "related dashboard")

This page is not yet linked to from anywhere in the default CKAN theme.

This page shows all of the related items on the site, paginated. You can filter the related items by type, and also show only featured related items, and you can sort the related items by created date or by view count.

The default sort order is just labelled "Default" and you don't know what the ordering actually is.

All you can do with the related items on this page is click on them, and you'll be taken to the item's URL (i.e. the external URL that the item links to). You can't create or edit or delete related items from this page, you can't see the datasets that the items are related to, and there's no way to get to the dataset pages.

From a user point of view I don't think this page is very well thought through yet. Both the URL and name of this page seem very weird to me, related to what? The datasets that the related items are related to can't be seen or reached from this page. (Also not every related item is related to a dataset anyway, via the API you can create related items with no datasets, all of the related items on publicdata.eu are this way for example.)

I think some clarification is needed about what the purpose of this page is / what the use-cases for it are and then we can consider how it should be re-designed.

The page title is "Apps & Ideas" but this is different from the page URL (/related). The feature description in the page sidebar also just talks about apps and ideas:

What are applications?

These are applications built with the datasets as well as ideas for things that could be done with them.

(Btw, "ideas for what could be done with them", does them refer to the datasets? Or the applications?)

But when you create a related item, there are 7 possible types: API, app, idea, news article, paper, post, visualization, so it's not just apps and ideas.

Also when related items are shown on dataset pages, the name "Apps & Ideas" is not used (see below).

Viewing related items

You can view a list of a dataset's related items on the dataset's page, or a list of all related items on the related dashboard page. Clicking on a related item just redirects you to the item's external URL (e.g. the blog post or whatever) though, related items don't appear to have their own pages.

On the dataset page, the URL ends in /related and the tab is just titled "Related", the button is "Add Related Item", etc. (There's no use of the "Apps & Ideas" title from the dashboard page.) When adding or updating a related item, a third name "related media" appears:

What are related items?

Related Media is any app, article, visualisation or idea related to this dataset.

For example, it could be a custom visualisation, pictograph or bar chart, an app using all or part of the data or even a news story that references this dataset.

This is quite different from the description on the related dashboard page.

If we want to support items that are related to more than one dataset, then related items will probably need to get their own pages, because somewhere we need to show a list of all the datasets that the item is related to.

You can also view individual related items using the API with the related_show action.

Creating related items

It is possible to create a related item that isn't related to any dataset, using the related_create API. In the web interface though, the only way to create a related item is by going to a dataset's related items tab, and then the item will be related to that one dataset.

After creating the user is redirected to the dataset's list of related items.

View counting

Related items have their own view counting feature. Whenever someone clicks on a related item to follow its link, it increments the count (implemented in the controller class).

It looks like there's no limiting (e.g. the same user repeatedly clicking), viewing an item over the API doesn't count, I'm not sure if clicking on an item on the related dashboard page counts, or on custom pages that use the API to show related items (e.g. publicdata.eu's front page).

This view counting could maybe be integrated with CKAN's builtin page view tracking feature, although that feature also has problems.

Activity streams

Activity streams are created when you create, update or delete a related item. These could do with a little quality control though, e.g. I think I saw "api" spelled in lower-case. (API is also mis-spelled as "Api" in related item tooltips.) Also, what activity streams should these activities appear in? It looks like currently they appear in the user's activity stream, but not the dataset's? They probably don't appear in the group or organization's stream either.

Featured related items

A sysadmin can set mark certain related items as "featured". (Not sure whether this is doable in the web UI, or only the API.) The only place this is currently used is on the related dashboard page, where there's a checkbox to show only the featured items.

In extensions you could quite easily do things like show the featured items on the site front page, or have one page showing the featured apps and another page showing featured visualizations, etc. An example extension showing how this can be done might be a good idea.

Authorization

The way the auth functions are currently implemented is:

Anyone can see any related item or list of related items. (So private datasets with private related items isn't supported.)
Anyone who is logged-in can add a related item to any dataset. (It doesn't seem to matter whether they have permission to edit the dataset, although I didn't test it with organizations.)
Only a sysadmin or the "owner" of a related item (the person who created the item) can edit a related item.
The owner of a related item can't change, it's always the person who created it. (Maybe it can be changed via the API?)
Anyone who has permission to delete a dataset can delete an individual related item from the dataset (and this will completely remove the item from the site, not just remove it from the dataset). The creator of a related item can also delete it.
Only sysadmins can create featured related items or mark a related item as featured.

Technical details

Model

There's a separate table related_dataset that maps related items to datasets, so in theory the model supports a many-many relationship, but I think the rest of the implementation only allows a related item to be related to one dataset. (And allowing items to be related to multiple datasets would probably raise a lot of questions about authorization and user interface.)

I think we should move more code into the model, e.g. methods for returning lists of related items filtered by dataset, type, featured, etc. These methods can then be unit-tested in the model, and the logic can just call the model methods instead of doing its own sqlalchemy.

Controller

dashboard()

The related dashboard page is implemented by the dashboard() method in the related controller. It calls the related_list() action function to actually get the related items to show. It looks like the pagination is done in the controller, this should be moved into the action function so that the API supports pagination and so that it can be tested more easily.

read()

The related controller's read() method (which redirects the browser to the related item's external URL) accesses the model directly to get the related item. It should be going through the related_show() action function. It also does its own call to check_access() in the controller - again related_show() should be doing this and the controller just calling related_show(). View counting feature should not be implemented in the controller either.

_edit_or_new()

The related controller's _edit_or_new() method does its own check_access() call, that should be done by the related_create() action (there shouldn't be any auth stuff in controllers).

There's some other weird stuff going on in the controller here too, like some unflattening and tuplizing and putting a package in c.pkg_dict that may not be used anywhere.

It shows a flash message after creating the item, which I don't think we normally do in CKAN, shouldn't be there? (Updating and deleting related items also does this.)

The method docstring of the related controller's _edit_or_new() method looks more like a git commit message.

It adds "related" and "id" to the context at the end, not sure why.

Actions

related_list

The related_list() action function (returns a list of related items, optionally filtered by dataset, type and whether featured) is for some reason putting a package in c.pkg and c.pkg_dict, but I'm not sure if this is actually used (and it doesn't seem like something an action function should do.

related_list() can accept either a dataset ID (in a param named id, which would be clearer if it was called dataset_id) or a dataset dict (in a param called dataset). If the dataset param is not present, it falls back to the id param. This precedence isn't documented or tested. This seems unnecessary to me, much simpler to accept just the id.

If no dataset id or dict is given, it returns all related items, but I don't think this is documented either. Also it looks like all the sorting and filtering features are only applied when returning all related items, and not when returning a dataset's related items (again not mentioned in the docstring).

related_list() calls check_access('related_show'), i.e. it calls the related_show auth function, it should have its own related_list auth function.

It doesn't appear to do much validation of params, e.g. what happens if the given dataset dict or id is invalid?

It also makes a JSON dump of all of the package's resources for some reason?

related_create

After calling model_save.related_dict_save() the related_create() action seems to do its own thing to append the related item to the dataset's list of related items. Should this be done in model_save?

Schemas

The related_show() action uses the default_related_schema(), which is also used by related_create and related_update. I think each action should have its own schema: default_related_show_schema(), default_related_create_schema(), default_related_update schema(). If they're all the same, then make a _default_related_schema() helper function and have them all call it.

In the case of related items, I'm not sure I see the point of validating the data coming out of the database anyway, since I don't think any conversions are done. With packages and resources for example, the schemas can be customized by IDatasetForm and the custom schemas may include converter functions like convert_from_tags() and convert_from_extras(), so that's why data coming out of the database in package_show() needs to be converted/validated. But in related_show(), this seems pointless?

It looks like user_dictize(), if passed the 'with_related' option in the context, will also dictize all of the user's related items. When showing a user profile page, the user controller does pass this option. The dictized items are never used though.

I think this (and several other instances of odd, apparently unused stuff being put into contexts and Pylons template contexts by various related items functions) shows why we should be avoiding things like the template context, and instead using template helper functions. These context variables were presumably once used by the templates, but the templates have since been changed and no longer use them, so they're wasting CPU cycles and cluttering up the code. If the old templates had been calling helper functions instead, those helper function calls would have been deleted along with the old template code.

Some of these template context variables may be used by the legacy templates (and the tests!) but not used in the new templates.

Tests

There are related items tests in tests/functional/test_related.py, tests/functional/pi/test_activity.py and tests/logic/test_action.py. I haven't looked into these but it would probably be really good to write a complete new-style tests for the feature and delete all of these old ones.

New implementation

Aims

The "related items" feature (to be renamed, probably) is about re-uses of data, e.g. apps, visualizations, stories, etc. that use the data from a CKAN site.
seanh: It may not have been clear previously that "related items" were only meant for reuses of data. You might have posted, for example, a document about how the data was collected as a related item. In this new spec, we've decided to make the feature specifically about data reuses. We should probably rename the feature to "data reuses" or something to that effect.

We want to let:

Site maintainers promote data reuses on their CKAN sites
Data reusers relate their data reuses to the relevant datasets within the CKAN site
Site visitors search for and find valuable reuses
Site admins and dataset owners moderate reuses
Site admins and dataset owners showcase the best reuses

Note that site admins or organization admins may add their own data reuses to the site, so they may be playing the role of data reusers as well.

What do we call the feature?

seanh: I don't think related items is a very good name, it's too abstract and doesn't suggest what the feature is for or what it does. Related media is even worse. Apps & ideas doesn't seem right because related items can be more than just apps and ideas. Also I don't think an "idea" is a very good type for a related item, concrete types are better e.g. blog post, news article, application. If the feature is meant to be about showing reuses of data, could we call it something like Data reuses?

AaronM: now that it has been explained that related items is meant for only reuses of the data I would agree that related items is not a suitable name. For my organisation we are interested to know not only how other people reuse data we make public (i.e. use it for a different purpose than we original did the research for), but also in how they use/apply/implement the results of the original research (without changing/reinterpreting/repurposing the data they take some action). Tracking direct use as well as reuse is valuable for us in validating for ourselves and our funding bodies that we are doing work that is useful - so it is a carrot for researchers to share data if they can capture/record this sort of feedback. So much of our work relies on us gathering and interpreting data, but other external people implementing the changes our data suggests.

Maybe heading "Data Uptake and Use/Re-use" or "Dataset Utilisation" or...

Brook: I like 'Show & Tell' at the Dataset/Org/Group level, with 'Showcase' as the top-level navigation item showing featured items.

Show & Tell is informal enough to encourage users to share all instances of data reuse, not just what they might consider to be their best efforts.
It doesn't make a value judgement on a particular reuse
It's a verb rather than noun and doesn't specify any particular application (it's not a specific type of reuse like 'Apps & Ideas')
We can distinguish the best S&T items with the 'featured' parameter. Featured items are promoted to the top-level site Showcase (currently /related).

What should the types of related item be?

They are currently API, application, idea, news article, paper, post or visualization. I'd suggest removing idea from that list as it's not concrete. Also rename paper to research paper, since I think that's what it's supposed to be for. Also add "other" as a type.

AaronM: definitely need other. Agree remove idea. Paper might be research paper/report, since a paper tends to refer to an externally peer-reviewed submission to a journal vs a report that may well be peer reviewed (or not) but is likely self-published by the organisation. Application I'd possibly change to "App development" (to make quite distinct from interpretation as application of the research findings). Post seems a bit vague as it stands, and in some respects you might consider its intent similar to a news article (just a different medium)?

Brook: Perhaps this opens up a can of worms, but rather than a predefined list of types, why not allow user-defined tagging? There could be a predefined list of tags as top-level suggestions, but allow users to define their own if need be. This would need design work.

User stories

Note: the format of a user story is:

As a ROLE, I want to DESIRE so that BENEFIT

seanh: I've removed several phrases like "on dataset page", "site wide & on dataset page", etc. from the user stories below because I think that's a concrete user interface decision that probably doesn't belong in the user stories. But let it be known that Ira would like data reuses to appear both on dataset pages and on a site-wide marketplace page!

Data reuser user stories

As a data reuser I want to show the cool things I've done with a site's data so that my reuses reach a wider audience.
As a reuser I want to be able to associate multiple datasets with each of my data reuses so that I can represent all of the datasets that each of my reuses uses.

AaronM: Would this case be met by the interested party clicking on the link(on the reuse page) to the reused 'thing' and on the reusers own page they would have full details of what they did, where they got the data (which could include outside CKAN)?

As a reuser I want to be able to associate my data reuse with multiple datasets so that I can showcase my reuse in all the relevant contexts.

seanh: The last two user stories are very similar but I think there are two different cases here. The first one is: someone is looking at a data reuse, and they want to see what datasets it was made from. So going from the reuse to the datasets. The second one is: someone is looking at a dataset (or multiple datasets, e.g. a group, organization, tag, dataset search result...), and they want to see what reuses have been made from that/those dataset(s). So going from the dataset(s) to the data reuse(s). Together, these two user stories imply a many-many relationship between datasets and data reuses.

AaronM: note with regards to multiple datasets and it's technical implementation, you need to consider that of the multiple datasets used they may or may not all be housed on a single CKAN instance, and may combine with data from disparate (non CKAN) sources - no doubt this puts some scope around just how all encompassing you can plan to be in terms of linking to/from every dataset related to the reuse.

Re the third use case above, could the reuser create a 'reuse' entry from the main dataset within the ckan repository that they used, and then if they used other datasets (in the same or other CKAN instance) they could create a 'reuse' and link it to the URL of the 'main' entry? That would be a manual workaround, but would enable you to keep it simple for a start.

I can see value in linking reuses to multiple datasets, but I would weigh that against any added complexity (both for frontend users and behind the scenes) before you make decision (maybe it is a staged process, 1st rejig the basic model, then consider mulitple dataset issues).

Site visitor user stories

As a visitor I want to see any re-uses that have been made with this/these dataset(s) since they may be more useful to me than just the data itself.

seanh: This could apply to the page for an individual dataset (i.e. a dataset's reuses should be shown somewhere on the dataset's page), but it could also apply to any pages where multiple datasets are listed, e.g. group and organization pages, dataset search results.

As a visitor, I want to be able to see what datasets this data reuse was made from, so that... ?
As a visitor I want to see what data re-uses exist so that I can get ideas for myself or because I'm looking for interesting re-uses.
As a visitor I want to be able to search through all the data re-uses so that I can see if what I need (e.g. an app about hospital ratings in London) already exists.

Data publisher user stories

As a data publisher I want to see what reuses have been made from my data so that I can see what value is being made of my data.
As an organization or group admin, I want to see all the reuses that have been made of the datasets in my organization or group so that I can see how valuable and interesting the data is to re-users and be motivated to open up more data in better quality!

seanh: Those last two are kind of the same user story, I think one of them was meant to be about showing data reuses on the dataset pages, the other about showing them on the group and organization pages.

As a data publisher (and/or organisation admin or sysadmin) I want to be able to filter the reuses made of datasets to particular time periods so I can use the information in reports to management or funding agencies to demonstrate the value of our research data to external users. [AaronM]
As an organization admin I want to be able to delete data reuses from my datasets if they are irrelevant or spam so that they don't pollute my dataset pages.
As an organization admin I want to be able to moderate new data reuses that are added to my datasets in a queue where I can approve or delete the items so that I can quality check any additions.

seanh: Presumably they will also need to moderate data reuse edits. What about deletions?

As a sysadmin I want a way to moderate related item additions centrally for all items being added to the portal.

seanh: ...so that? I don't have to visit every organization's page one-by-one? It seems like the data reuse moderation page for a given user should show a queue of all the data reuses that user can moderate. If the user is an admin of multiple organizations, they'd see reuses from each of those orgs. If they're a sysadmin, they'd see all reuses.

As a data publisher I want to see what reuses have been made from my data so that I might get ideas for other research I can do to add further value. [AaronM]
As a data publisher I want to see what reuses have been made from my data to make connections for future research collaboration (with the reuser). [AaronM]

Specific changes from current state

seanh: I've tried to order these with the easiest changes (those that have fewer open questions) first and the more complicated ones (with lots of open questions that need to be answered before they can be implemented) last. Hopefully we can get to work on the first changes, and not let the whole thing be held back by specification problems with the later ones.

Refactoring and tests

Update: Since it sounds like we may be rewriting related items as an extension and using a group type to implement the related items, that would be a complete rewrite so there wouldn't be any point in refactoring or testing the existing implementation (which would be deleted). In that case ignore this section.

Various technical refactorings, tests, etc. (See the description of the current implementation above.)

This would be some time investment for no immediate again (in terms of user-visible feature changes) but I think it's worth doing first because it'll make the following steps go smoother and the final result better.

Move the feature into a core extension

Speaks for itself. Note there has been talk of implementing related items as either a package type or a group type, so that would be a rewrite from scratch not just moving the existing code, see implementation details below.

Documentation

If the feature is moved into an extension, then some docs will need to be added to the maintainer's guide to explain how to enable it. This would also be the place to document any configuration settings (e.g. moderation on/off). This may be a very short doc!

Update: Actually there is already a small Apps & Ideas page in the docs so we should update that (and probably rename it).

Docs for the feature should also be added to the user guide as well.

Add related items to group and organization pages

I guess this means adding a tab to the group and org pages, listing all of the related items related to any of the group or org's datasets. I guess it would look much like the related items pages that datasets already have. (Design mockup needed?)

One thing that comes to mind is that a group or org may have a much larger number of related items than a dataset has, so their pages may need extra features like search and pagination that the dataset pages don't have.

This would mean adding new API actions to get all the related items of a group or organization.

Show which dataset a related item is related to

On the /related page there's no way to see what datasets the items or related to, or get to the datasets.

How can this be shown? Design mockups needed.

I wonder if individual related items should get their own pages (currently clicking on a related item just redirects you straight to the item's external URL), and the dataset can be shown there. Items having their own pages is also useful for some more of the features below. If so a design mockup for related item pages is needed.

Allow editing related items from anywhere

Wherever related items are listed (on the /related page, on group or org pages...) there should be an edit button for each item (that you are allowed to edit).

If related items are going to get their own individual pages, then the edit buttons could go there, which might be better than cluttering up pages that are showing lists of related items.

Hide related items that are related to private datasets

Don't show related items of private datasets on the /related page or on group or organization related items pages. Organization members should see the organization's private items on the organization's related items page, and on the related pages of the datasets. (I think this is consistent with the way private datasets work, iirc they never appear in search results on the /dataset page, but org members can see them on the org's page.)

Note that if we allow related items to be related to more than one dataset, then this features becomes a little more complicated (see below).

Implement searching for related items

On the /related page. Also on group and org pages?

Design mockups needed to show exactly how the page will look, what filters and sorting options are wanted, etc.

Note that the /group page already implements simple searching for groups, so if related items are going to be reimplemented as a group type (see below) they would get this for free.

Allow multiple datasets to be related to a single related item.

This one is complicated. I think when you start to consider how the user interface will work and especially how authorization to create and edit related items and moderation will work, then there are a lot of open questions here.

The create/edit related item form will need to handle multiple related datasets
There needs to be two ways to connect a related item to a dataset:
1. Go to the dataset's page and from there search for related items and add them to the dataset.
  
  For example you have to edit the dataset, and somewhere on the form there is a related items section where you can add one or more related items. But since the people who are allowed to add related items might not be allowed to edit the dataset itself, I think we need a separate "add some related items to this dataset" button and form.
2. Go to the related item's page, and from there search for datasets to add to the related item.
Note that if we're going to implement related items as a group type, then groups have this same problem (not yet solved) of wanting an interface to add groups tp datasets and an interface to add datasets to groups. So we could solve the problem for both groups and related items at once.
There needs to be a site-wide add a related item to this site button, not just the add a related item to this dataset button on the dataset page.
When related items are displayed (or on the related item pages, if we add those) it will have to show the list of datasets the item is related to, not just one
What about items that are related to some public datasets and some private datasets? Will we allow this? If so, we'll have to hide the private datasets from the item's list of related datasets (unless the viewer is allowed to see those datasets). If the item isn't related to any dataset that the user is allowed to see, then we should hide the item from the user entirely.

seanh: I think this might be a bad idea, it's too complicated, it won't be clear to users creating and editing related items whether the items will be public or private or exactly who will be able to see them. With datasets it's simple: anyone can see a public dataset, only members of a dataset's organization can see a private dataset, and each dataset belongs to one organization. If related items can be related to different datasets from different organizations then I think it becomes too complicated from the user's point of view.
How do items related to multiple datasets interact with authorization to create and edit the items, and moderation of related item creations and edits?

We can't have dataset owners moderating the creating and editing of related items as such when related items may be related to more than one dataset, because then either:
1. You have a situation where a dataset owner from organization A can approve a related item, and it also shows on datasets of organizations B and C even though they didn't approve it, or
2. You have a situation where every related organization has to approve an item before it can appear
Instead we propose to let dataset owners moderate whether the related items appear on their dataset's pages.

Note that groups currently have this same problem (currently unsolved), so if related items are implemented as a group type then we can fix this for both related items and groups at once.

The suggested solution is:
- We need some authorization to decide who is allowed to create related items. For example this could be any logged-in user, or sysadmins only, etc. We probably want configuration options for this. How is this done with creating groups currently?

AaronM: If you want to gain the benefit for the data publisher of knowing who and how their datasets are being reused, then you need to have the ability to create a related item reasonably open. I think this means any logged-in user, perhaps with a recaptcha type requirement to stop spam robots posting lots of rubbish that needs moderating (maybe that could be a config setting so that you can leave it off, but if spam becomes an issue turn it on).

When a user (who is authorized to do so) creates a related item, it appears on the /related page right away, but it does not yet appear on the pages of any of the dataset's it's related to.
Optionally we could have the related items be moderated before they appear on the site at all, but who moderates them? Sysadmins? A special group of moderator users?
The dataset owners for each related dataset (the organization editors and admins) get a notification that someone would like to add a related item to their dataset. They get to moderate this. If they approve it, then the item appears on their dataset's page (as well as on the /related page that it's already shown on).
- If they deny it, do they have a way to send a message to the creator saying why it was denied?
- Can the creator resubmit their item, maybe after editing it?
Note that the creator of a related item could relate it to multiple datasets, and some of the dataset owners could approve this and some of them could deny it, then the item would appear on some of the dataset's pages and not on others.
This lets each dataset owner control what appears on their own dataset's pages.
Even if a dataset owner denies adding a related item, if someone goes to the related item's page (e.g. via the /related page, or via another dataset's page) they will still see the dataset in the item's list of related datasets. The dataset owner only gets to control what appears on their dataset's page.

What about editing related items?

We need some authorization to decide who is allowed to edit a related item. Just the person who created the item and sysadmins?
When someone edits a related item, the edited version appears on the site right away. Related item edits are not moderated by dataset owners.

AaronM: if that is the case then the person who created the item should not be able to edit it as of right - they should need to request the sysadmin to opened it to them. Otherwise could be abused - person creates an acceptable related item that the daatset owner approves, but then creaor comes back and edits and put something unacceptable and it appears as though the dataset owner has sanctioned the content.

Optionally we could have the edits be moderated before they appear on the site at all, but who moderates them? Sysadmins? A special group of moderator users?
If the edit involves adding new datasets to the related item, then it's the same as above: the datasets will appear in the related item, but the related item won't appear in the datasets until approved by the dataset owner.
What about edits that remove a dataset from a related item? Do the removals have to be approved by the dataset owners as well?

This leaves open one possible problem: someone creates a good related item, gets the dataset owner to approve it for their dataset page, then edits the related item and changes it to something unwanted. I think this situation is not likely to come up too often, and to mitigate it suggest:

Dataset owners can remove related items from their datasets at any time (and add them back later as well)
Dataset owners get notified whenever someone edits a related item that's related to one of their datasets.
- They just get a notification, they don't get to moderate the edit.
- Do they get notified only for items that are showing on their dataset pages? Or for any items related to their datasets?

How do the notifications work?

Suggest reusing the existing activity streams, dashboard and email notifications system
But separate it out, so that nofications that require an action from the user appear on a differen tab on the dashboard, and in separate emails, from normal activity notifications

Implement related item moderation

Where will the moderation interface go, and how will it be designed?

Note that if we want to allow new and edited related items to be moderated before they appear on the site at all, that is quite different from allowing dataset owners to moderate which items appear on their dataset's pages, the two kinds of moderation will need different designs and implementations.

Brook: with regard to moderation and approval. I would suggest a simplified approach that will hopefully negate the need for a moderation/approval system, but still allow some level of control for Dataset owners. Instead of thinking about Related Items as essentially belonging to one (or more) datasets (and hence the owners of those datasets), they instead simply belong to the user who created the Related Item.

Anyone can create a Related Item that associates with your dataset, but you as a dataset owner can choose to hide all Related Items from your dataset page (no 'Related' tab). So while you do control what is displayed with your dataset, you don't control another user's Related Item that happens to be associated with you dataset.

If necessary, we could provide a further control for dataset owners that disallows associating their dataset with any Related Item, but I'd prefer to side with Open. If the dataset is public, others should be allowed to create Related Items for it.

If a dataset owner particularly wants to highlight a Related Item, perhaps they can set a Dataset specific 'featured' flag that promotes the item within the context of their Dataset. Related Items could be promoted at a Dataset or site-wide context (perhaps also within Groups and Orgs).

If they (or anyone) believes a particular Related Item is abusive and requires moderation, perhaps they can flag it for sysadmins to deal with.

Implementation details

We've talked about implementing related items as a group type:

There's already a many-many relationship between groups and datasets
Groups have uploading of logo files, search, IGroupForm, and other features that we want for related items as well
Easier to do from an extension
As noted above, some of the new problems that would have to be solved would be useful for groups as well