App Corpus - UCI-Networking-Group/OVRseen GitHub Wiki
Dependencies
The following dependencies have been installed in the provided VM.
- ChromeDriver 94.0.4606.41 (with Google Chrome 94)
Please also run the following command to activate a Python virtual environment (with the right dependencies) before using OVRseen.
OVRseen/virtualenv $ ./python3_venv.sh
OVRseen/virtualenv $ source python3_venv/bin/activate
Crawling App Stores
We use Selenium and ChromeDriver to crawl the official Oculus and SideQuest app stores. The crawler script will give us the most up-to-date list of all apps in both app stores in the form of CSV files.
1) We first go to the crawler scripts and run the scripts with the option --links_only to produce a JSON file that contains all of the links to each app's webpage on an app store. /usr/bin/chromedriver is the path to the binary of the installed chromedriver.
$ cd OVRseen/supplementary_code/store_extraction/
OVRseen/supplementary_code/store_extraction $ python3 extract_oculus.py --links_only /usr/bin/chromedriver . # This produces oculus_links.json.
OVRseen/supplementary_code/store_extraction $ python3 extract_sidequest.py --links_only /usr/bin/chromedriver . # This produces sidequest_links.json.
2) Then, we re-run the crawler script using the JSON file that contains app webpage links as an input.
OVRseen/supplementary_code/store_extraction $ python3 extract_oculus.py -url_file oculus_links.json /usr/bin/chromedriver oculus_apps # This saves each webpage as a JSON file in oculus_apps folder.
OVRseen/supplementary_code/store_extraction $ python3 extract_sidequest.py -url_file sidequest_links.json /usr/bin/chromedriver sidequest_apps # This saves each webpage as a JSON file in sidequest_apps folder.
In the case that the script generates errors and stops the crawling, we can try to restart it (or change the input JSON file to start from the problematic link) or change the value for the time.sleep() functions (sometimes the webpage is not as responsive). For SideQuest, there might be some changes needed since the website structure often changes.
3) Finally, we combine all the JSON files for an app store into a CSV file. At the end, we will have two CSV files, one for each app store.
OVRseen/supplementary_code/store_extraction $ python3 third_party_converter.py --variable_headers oculus_apps/ oculus_apps.csv
OVRseen/supplementary_code/store_extraction $ python3 third_party_converter.py --variable_headers sidequest_apps/ sidequest_apps.csv
Using the generated CSV files, we can update our lists of apps that were generated using the crawler script (i.e., lists_of_apps in our datasets). Our lists of top apps were selected from the crawled lists of apps: the top apps were curated based on popularity/reviews.