4_BASICS: Building data sets, scrape instagram and bing & zipping and unzipping - jamahun/fakelab.studio GitHub Wiki
The following instructions will install pip onto your local machines. Pip is a python package installer and can be frequently used to install, update, and upgrade packages on your machine.
- Download the latest version of python onto your machine
- Modify installation to include pip
- You are done, the command pip should now work in your terminal
Note that most of these instructions are a simplified version of the instaLooter git
The following will install instaLooter on a linux machine from scratch. Note that most of these instructions are from the instaLooter online documentation.
pip install --user instalooter
Before scraping any content you have to login on Instagram through instalooter. (If you don't have an Instagram account you can sign up here.
Once you have signed up to instagram execute the command instalooter login and follow the prompts in the terminal to login using your instagram username and password.
instalooter <username> [<directory>] [options]
instalooter hashtag <hashtag> <directory> [options]
instalooter post <post_token> <directory> [options]
instalooter batch <batch_file>
To download all pictures from the uglybelgianhouses instagram profile into the current directory in a folder named scrape01:
instaLooter uglybelgianhouses scrape01
To download all pictures from the #concrete hashtag into the current directory in a folder named scrape01:
instaLooter hashtag concrete scrape01
We will be using the documentation on the git in this link: https://github.com/ultralytics/google-images-download
git clone https://github.com/ultralytics/google-images-download
cd google-images-download
pip install -U -r requirements.txt
To install pip on MACOSX
sudo easy_install pip
- Update your chrome by clicking on chrome kebab menu button > About Chrome
- Go to https://chromedriver.chromium.org/downloads download the relevant version
- Save the chromedriver_win32.zip to a location on your local machine. In this example we will use C:\Software
- Unzip chromedriver_win32.zip, the file should be chromedriver.exe
When specifying the keyword uses +'s instead of spaces ie 'gaudi+facades' instead of 'gaudie facades'
cd google-images-download
python bing_scraper.py --search '[enter+keyword]' --limit [enter amount of images you want to download] --download --chromedriver "C:\Software\chromedriver.exe"
To add more specificity to your search include arguments from the following list:
You just need to ad a -- in front of the argument you wish to use. ie. to add a specific file format type
--fornmat jpg
Use tar command to unzip a .zip file in the current working directory
tar xvf file.zip
Use the following syntax to extract to a specific destination directory
tar xvf file.zip -C /destination/directory/
Install unrar to extract .rar files
sudo apt-get install unrar
Use unrar to extract .rar file in the current working directory
unrar e file.rar
Use unrar to extract .tat file to a specific destination directory
unrar e file.rar /destination/directory/