Conflict resolution with C3PO - datascience/c3po Wiki
This document describes the basic steps of the conflict resolution process built in C3PO. For details, please refer to our paper: Kulmukhametov, Artur, Andreas Rauber, and Christoph Becker. "Improving data quality in large-scale repositories through conflict resolution." International Journal on Digital Libraries, (2021).
Initially, we start with the overview page. Here we see the identification conflict rate of 17% and we want to reduce this number:
Firstly, we navigate to "Conflicts" tab and select properties of interest: 'format', 'format version' and 'mimetype', then click the button to generate the table.
The conflict overview table contains the conflicts which occur in the characterisation results of the selected properties. It looks like this:
Secondly, we select the first conflict from the table (as the largest number of records has this conflict). We navigate to the subset using "Overview" hyperlink. Now we can analyze the subset, try out sampling and find out the root cause of the conflict and how to fix it.
Thirdly, we navigate to "Objects" and open one object of the subset to create a conflict resolution rule. Here, we check out the boxes in column "Rule Trigger" for those characterisation results, which will be used as rule trigger. Then we checkout the boxes of the values which we want to remove. By removing such values we achieve a conflict resolution. At the bottom of the page we give our rule a name and save it (or execute immediately).
Finally, we navigate to "Conflicts" tab again. We find our conflict resolution rule in the table. We select the rule and execute conflict resolution by clicking the button. A pop-up with a number of affected records will appear.
We can go back to the main overview page (do not forget to unset any filters) and make sure that the conflict rate is reduced.
Well done! This is an example of propotypical implementation of the conflict resolution in C3PO. If you experience any issues, or have questions, please contact us.