Network to Policy Consistency - UCI-Networking-Group/OVRseen GitHub Wiki
This page explains OVRseen's workflow for checking network-to-policy consistency in OVRseen.
Dependencies
The following dependencies have been installed in the provided VM.
- build-essential
- fonts-dejavu
- gdown 4.0.1
- python3-dev
$ apt-get install python3-dev build-essential fonts-dejavu
Please also run the following command to activate a Python virtual environment (with the right dependencies) before using OVRseen.
OVRseen/virtualenv $ ./python3_venv.sh
OVRseen/virtualenv $ source python3_venv/bin/activate
Setup
1) Please create a directory named ext/
. This is the working directory for network-to-policy consistency analysis and purpose extraction. We will copy input files, run scripts, and obtain output files here.
OVRseen/privacy_policy/network-to-policy_consistency $ mkdir ext/
2) Please download the NLP model of PoliCheck. For convenience, we provide the exact copy of the same NLP model provided by the original PoliCheck. Next, we extract the tar.gz
file into ext/
.
OVRseen/privacy_policy/network-to-policy_consistency $ tar xvf NlpFinalModel.tar.gz -C ext/
3) Please copy the ontologies, synonym lists, and the domain-to-entity mapping list into data/
.
OVRseen/privacy_policy/network-to-policy_consistency $ mkdir -p ext/data
OVRseen/privacy_policy/network-to-policy_consistency $ cp ontology/*.{gml,yml} ext/data/
These data and entity ontology files contain the structures that correspond to Figure 3 and Section 4.1.2 in our paper.
4) Please find privacy_policies.zip
in our datasets. Our extract_datasets.sh
should find and copy the right files into the right locations in OVRseen's directory structure. If this script has not been run, please copy privacy_policies.zip
into network-to-policy_consistency
and run the following command to unzip it.
OVRseen/privacy_policy/network-to-policy_consistency $ unzip privacy_policies.zip
If the privacy policies have been moved and extracted, we then run process_zipped_policies.py
to extract and copy the HTML files of privacy policies into html_policies
.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 process_zipped_policies.py privacy_policies ext/html_policies
5) Please generate policheck_flows.csv
(i.e., the input CSV file that contains data flow tuples) from the all-merged-with-esld-engine-privacy-developer-party.csv
file obtained from the OVRseen's post-processing step. all-merged-with-esld-engine-privacy-developer-party.csv
is also available, for convenience, in intermediate_outputs
folder of our datasets. Our extract_datasets.sh
should find and copy the right files into the right locations in OVRseen's directory structure. If this script has not been run, please copy the file into network-to-policy_consistency
.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 preprocess_policheck_flows.py all-merged-with-esld-engine-privacy-developer-party.csv ext/data/policheck_flows.csv
Please ignore any warning messages (usually about some extracted data types that we do not consider). This policheck_flows.csv
can also be found in intermediate_outputs
of our datasets.
After running the above steps, the ext/
folder should look like this:
ext
├── data
│ ├── data_ontology.gml // data ontology + synonym list
│ ├── data_synonyms.yml
│ ├── domains.yml // domain to entity mapping
│ ├── entity_ontology.gml // entity ontology + synonym list
│ ├── entity_synonyms.yml
│ └── policheck_flows.csv
├── NlpFinalModel // Extract the NLP model here
│ └── ...
└── html_policies // Put privacy policy webpages here
├── <app.package.name>.html
└── ...
Analyzing Network-to-Policy Consistency
Preparing PoliCheck
The following steps prepares PoliCheck before we perform the actual steps for network-to-policy consistency analysis.
6) Please run PoliCheck's HTML pre-processor to generate the plain texts for privacy policies. These plain texts can also be found in intermediate_outputs
of our datasets.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 Preprocessor.py -i ext/html_policies -o ext/plaintext_policies
7) Please execute the following command to generate an empty privacy policy for each app that we did not find a privacy policy for. In this case, PoliCheck will eventually classify data flows of these apps as omitted disclosures.
OVRseen/privacy_policy/network-to-policy_consistency $ awk -F, 'NR > 1 { print $1 }' ext/data/policheck_flows.csv | sort -u | xargs -i touch ext/plaintext_policies/{}.txt
8) Please run PoliCheck's NLP pattern extraction code (this may take a while).
OVRseen/privacy_policy/network-to-policy_consistency $ python3 PatternExtractionNotebook.py ext/
9) Please run our first-party name extractor to resolve non-pronoun first-party entity names.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 CollectFirstPartyNames.py ext/
Running the Analysis
10) At this point, we need to make a copy of the ext/
folder since in the next steps, we are going to run the network-to-policy consistency analysis in two scenarios: (1) without, and (2) with third-party privacy policies as explained in Section 4.1.3 in our paper.
OVRseen/privacy_policy/network-to-policy_consistency $ cp -r ext/ ext2/
11) Please run the following command to detect references to third-party privacy policies specified in the app's privacy policy.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 detect_third_party_policies.py ext/
12) Please run the following commands to perform the network-to-policy consistency analysis.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 ConsistencyAnalysis.py ext/
OVRseen/privacy_policy/network-to-policy_consistency $ python3 RemoveSameSentenceContradictions.py ext/
OVRseen/privacy_policy/network-to-policy_consistency $ python3 DisclosureClassification.py ext/
After this step, we will get the output file ext/policheck_results.csv
. This output file contains the network-to-policy consistency analysis result without including third-party privacy policies if they are not specified by the app's privacy policy.
Referencing Oculus and Unity Privacy Policies
Referencing Oculus and Unity privacy policies in Section 4.1.3 in our paper discusses our results when including third-party privacy policies by default (even when they are not specified by the app's privacy policy). To reproduce these results, we have copied the content of ext/
into ext2/
in step 10) above.
13) Please re-run the following command to add references to Oculus privacy policy (and Unity privacy policy for Unity apps) by default, in addition to the existing references to other third-party privacy policies already specified in the app's privacy policy.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 detect_third_party_policies.py ext2/ append
14) Please re-run the network-to-policy consistency analysis.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 ConsistencyAnalysis.py ext2/
OVRseen/privacy_policy/network-to-policy_consistency $ python3 RemoveSameSentenceContradictions.py ext2/
OVRseen/privacy_policy/network-to-policy_consistency $ python3 DisclosureClassification.py ext2/
The output is in ext2/policheck_results.csv
. This output file contains the network-to-policy consistency analysis result when including the Oculus privacy policy (and Unity privacy policy for Unity apps) by default.
Section 4.1.3 in our paper were written based on the statistics reported in ext/policheck_results.csv
and ext2/policheck_results.csv
. These two output CSV files can also be found in intermediate_outputs
in our datasets.
Generating Plots
15) Please run make_plots.py
to regenerate plots we presented in our paper based on the outputs in ext/
and ext2/
.
OVRseen/privacy_policy/network-to-policy_consistency $ python3 make_plots.py ext/ ext2/
After executing this script, the plots in Figures 4, 5, and 6 in our paper can be found (i.e., PDF files) inside ext/plots/
.