Moderation queue for objects #67

nigelbabu · 2014-07-04T15:11:52Z

Some parts of CKAN will need to be open to the community like related items or perhaps groups. This will need moderation. A flexible moderation that will let admins to look through an approve/reject things will be helpful.

rossjones · 2014-09-23T05:33:26Z

We've a use-case for doing this with packages too. Which is unfortunate as it'd need something very like revisions to allow reviewers to revert changes, unless the moderation is only on creation.

This has been attempted before with moderated edits which was unsuccessful. I think taking the simplest approach possible should be the way forward, perhaps adding a single new state (pending?) and deciding what the rules are for something going to 'pending' instead of 'active'.

wardi · 2014-10-01T13:09:59Z

@rossjones that works for initial posts, but edits would be posted immediately or disallowed completely. I despair that clients will demand arbitrarily complicated workflows that include multi-step processes for creating, editing and deleting like they do for dataset changes.

rossjones · 2014-10-01T13:26:39Z

In my recent experience, they do ask for arbitrarily complex workflows, but at present we're managing it through a combination of status and visibility, with email notifications on deletion/creation.

But yes, it might get worse gulps

davidread · 2014-10-01T13:35:19Z

At the model level, 'moderated edits' worked exactly how you want it. It's just at the UI level we never nailed it. But that was trying to do a really complicated workflow. You could do a simple UI that allowed a package to have one draft (rather than multiple) then it could work fine.

And you can add logic and notifications to do all those complexities you mention.

The feeling has been to get rid of it because no-one's looked at it much and we don't use it currently, but if this is functionality is what's wanted then maybe it is the best method and we should keep it.

The reason that we designed CKAN to have a package and package_revision table closely bonded is that it is easy to change between different revisions - i.e. unlimited undo functionality, and having 'draft' versions all ready to go. You can't do that so easily when you store a revision/change as a blob, as is suggested in #41 because when the database is migrated, the blob doesn't get migrated. So if we do #41 then we lose the 'draft state' functionality.

wardi · 2014-10-01T13:42:44Z

@davidread I see what you mean. The #41 blob approach is based on the data that's available from the logic layer, which would break if we break API compatibility. API users would be unhappy if that started happening, though.

Since we're not actually using the revisions for undo/redo how do know that it actually works (or would continue to work after a migration)?

The other problem that #41 tries to address is our silent overwriting of updates that happen close to the same time and the inability to have "suggested updates" from people that don't have permission to edit the one draft version.

wardi · 2014-10-01T13:49:40Z

And for even more fun, we also have requests for multiple stages of approvals, where each stage gets handed to a different group, and scheduled updates (publish this change at this time in the future)

davidread · 2014-10-02T09:15:43Z

The schema of a package you push/pull over the API certainly changes field by field all the time for us. Sure it is a bit of disturbance for API users, but that is the price of innovation.

Compare this to the changes to the package table - when columns are added/removed or change type then the database migration has to also change the package_revision table. CKAN just won't work unless they are the same - the copy-on-write will fail for every db write. So you're guaranteed that these 'draft' or 'unapproved' versions of a package get migrated at the same time.

Regarding features like multiple stages of approvals, multiple "suggested updates" for a package, or scheduled updates, I think these fit the existing model well. You just write these into the logic layer as you'd expect, perhaps with its own table to keep track. When the logic decides that a different 'revision' of a package should now be the 'current' one you just move the 'current' flag in the package_revision table. I think the model matches your use case really well.

wardi · 2014-10-02T14:04:41Z

Yes, I can see how to implement most of what I need on top of revisions except for anything that branches.

For example I couldn't implement a suggested change feature allowing things like "here's a spelling correction to the notes" because it would become out of date as soon as another change like "add this new resource" or "time to publish the last update" happens, and it would have to be manually resubmitted.

Another issue for us is package_dictize and its queries against the revision tables takes something approaching 1s to run. Pulling complete datasets as JSON blobs would be much easier on the DB.

The revisions tables are also growing continuously. On an internal site we need to periodically throw out the whole DB and reload as a way to manage this.

davidread · 2014-10-02T15:27:10Z

For patching spelling corrections, there's no reason you couldn't store which package_revision it is based on, and calculate the patch and apply that to another. I guess the alternative you're suggesting is a custom 'patch' blob, which I agree is a little less faff, but still prone to the db migration issue. So once again it's a weigh-up.

Complete datasets as JSON blobs - now you're talking. That's what your SOLR cache gave you and is nice and fast - how come you are finding it takes 1s? BTW on JSON storage - have you been warming to Mongo?! #82

Revisions being 'copy on write' means they are a pretty wasteful history, I agree. Storing JSON diffs has got to be a better way (or Mongo...). Certainly for us though we want to retain the accountability to know which user did what change.

wardi · 2014-10-02T15:43:59Z

Yes, a new column on the revision linking to a parent revision could represent a spelling correction or anything you could determine from a json diff.

A JSON patch can do more than that, though. With JSON patch two different people can say "add a resource" and the those patches can be applied in any order. You can say "I expect these values to be exactly this or my patch won't apply", which is useful for approving publishing of something without change, but requiring a new approval if specific fields you care about have changed.

The super slow package_dictize is partially due to lots of extras (each with their own revisions), tags in vocabularies (with their own revisions), the resulting mega-join in the DB that the planner has a hard time chewing on, and all the extra work package_dictize likes to do to pull in generated fields like organization details (we would need to do this in any solution). Creating all sorts of intermediate dictionaries and lists actually shows up pretty high in the profile too.

I'm not worried about storage of individuals bits of history. It makes sense to store the complete state of an object so you can figure out what the state was without re-running all the changes from the beginning of time. I just want a way to drop history older than some date, and having all that history in a single table seems easier than cleaning tens of _revision tables.

davidread · 2014-10-02T16:11:49Z

You can say "I expect these values to be exactly this or my patch won't apply"

I imagine you'd can achieve that in a similar way with revisions. In your logic you'll have to decide what 'context' is stored in the patch, whichever method you use.

Yes, package_dictize is slow. It doesn't seem right that the revision tables are consulted for the normal case - the 'current' package is simply in the package table. Hmmm seems simple enough to fix. Then the revisions wouldn't slow things up.

wardi · 2014-10-02T16:13:58Z

I thought so too, but the data in the non _revision tables is the latest update, not the current published version.

davidread · 2014-10-02T16:35:34Z

Ah. Well that is pointless. Probably due to no-one wanting to touch vdm. Ok, revisions suck.

wardi · 2014-12-03T17:56:05Z

comment and other moderation tasks could fit as a type of workflow plugin #108

rossjones mentioned this issue Dec 8, 2014

Dataset workflow plugins #108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moderation queue for objects #67

Moderation queue for objects #67

nigelbabu commented Jul 4, 2014

rossjones commented Sep 23, 2014

wardi commented Oct 1, 2014

rossjones commented Oct 1, 2014

davidread commented Oct 1, 2014

wardi commented Oct 1, 2014

wardi commented Oct 1, 2014

davidread commented Oct 2, 2014

wardi commented Oct 2, 2014

davidread commented Oct 2, 2014

wardi commented Oct 2, 2014

davidread commented Oct 2, 2014

wardi commented Oct 2, 2014

davidread commented Oct 2, 2014

wardi commented Dec 3, 2014

Moderation queue for objects #67

Moderation queue for objects #67

Comments

nigelbabu commented Jul 4, 2014

rossjones commented Sep 23, 2014

wardi commented Oct 1, 2014

rossjones commented Oct 1, 2014

davidread commented Oct 1, 2014

wardi commented Oct 1, 2014

wardi commented Oct 1, 2014

davidread commented Oct 2, 2014

wardi commented Oct 2, 2014

davidread commented Oct 2, 2014

wardi commented Oct 2, 2014

davidread commented Oct 2, 2014

wardi commented Oct 2, 2014

davidread commented Oct 2, 2014

wardi commented Dec 3, 2014