4_BASICS: Building data sets, scrape instagram and bing & zipping and unzipping - jamahun/fakelab.studio GitHub Wiki

Install Pip

The following instructions will install pip onto your local machines. Pip is a python package installer and can be frequently used to install, update, and upgrade packages on your machine.

  1. Download the latest version of python onto your machine
  2. Modify installation to include pip
  3. You are done, the command pip should now work in your terminal

Scrape Instagram using instaLooter

Note that most of these instructions are a simplified version of the instaLooter git

Installation

The following will install instaLooter on a linux machine from scratch. Note that most of these instructions are from the instaLooter online documentation.

pip install --user instalooter

Logging in and out

Before scraping any content you have to login on Instagram through instalooter. (If you don't have an Instagram account you can sign up here.

Once you have signed up to instagram execute the command instalooter login and follow the prompts in the terminal to login using your instagram username and password.

Usage

instalooter <username> [<directory>] [options]
instalooter hashtag <hashtag> <directory> [options]
instalooter post <post_token> <directory> [options]
instalooter batch <batch_file>

Examples

To download all pictures from the uglybelgianhouses instagram profile into the current directory in a folder named scrape01: instaLooter uglybelgianhouses scrape01

To download all pictures from the #concrete hashtag into the current directory in a folder named scrape01: instaLooter hashtag concrete scrape01

Use bing scraper to download large amounts of images

We will be using the documentation on the git in this link: https://github.com/ultralytics/google-images-download

Download the bing scraper

git clone https://github.com/ultralytics/google-images-download

Install the required python packages

cd google-images-download
pip install -U -r requirements.txt

Mac Users

To install pip on MACOSX

sudo easy_install pip

Download chromedriver

  1. Update your chrome by clicking on chrome kebab menu button > About Chrome
  2. Go to https://chromedriver.chromium.org/downloads download the relevant version
  3. Save the chromedriver_win32.zip to a location on your local machine. In this example we will use C:\Software
  4. Unzip chromedriver_win32.zip, the file should be chromedriver.exe

Run the python script to download images

When specifying the keyword uses +'s instead of spaces ie 'gaudi+facades' instead of 'gaudie facades'

cd google-images-download
python bing_scraper.py --search '[enter+keyword]' --limit [enter amount of images you want to download] --download --chromedriver "C:\Software\chromedriver.exe" 

List of arguments:

To add more specificity to your search include arguments from the following list:

You just need to ad a -- in front of the argument you wish to use. ie. to add a specific file format type

--fornmat jpg

Unzip

Use tar command to unzip a .zip file in the current working directory

tar xvf file.zip

Use the following syntax to extract to a specific destination directory

tar xvf file.zip -C /destination/directory/

Install unrar to extract .rar files

sudo apt-get install unrar

Use unrar to extract .rar file in the current working directory

unrar e file.rar

Use unrar to extract .tat file to a specific destination directory

unrar e file.rar /destination/directory/
⚠️ **GitHub.com Fallback** ⚠️