1. Using iterators in PythonLand - upalr/Python-camp GitHub Wiki

1. Introduction to Iteretors

1.1 Iterable:

lists, strings, dictionaries, file connections are iterable + enumrrate() and zip()

INFO: for loop are used on those iterables

INFO 2 : you can do pd.read_csv(filename, chunksize=100). This creates an iterable reader object, which means that you can use next() on it.

iterable -> iter() -> iterator -> next()

1 iterators vs iterables

iterable -> iter() -> iterator -> next() process

2 iterators vs iterables process

1.1.1 Breaking down the for loop

  for i in rage(0, 11):
      print(i)

rage(0, 11) : Iterable i: Iterator variabls (represent members of iterable)

2. Playing with Iterators (enumerate() and zip())

2.1 Enumerate()

enumerate() and zip() video

Enumerate() is a function that takes any iterable as an argument such as a list and returns a special enumerate object which consists of pairs containing the elements of original iterable along with the index within the iterable.

We can use the function list() to turn this enumerate object to a list of tuples.

3 enumrate

Enumarate object itself is also a iterable and we can loop over it.

4 enumrate unpack

2.2 Zip()

zip() accepts an arbitrary number of iterables and returns an zip object which is actually iterator of tuples we can turn this iterator of tuples into a list using list()

5 zip

we could use a for loop to iterate over the zip object and print the tuples

6 unpack zip1

we could also use the (*) operator to print all the elements

7 unpack zip2

3. Using iterators to load large files into memory

8 chank

The object created by pd.readcsv('data.csv', chunksize = 1000) is an iterable. So we can iterate over it using a for loop in which each chunk will be a DataFrame

9 chank

INFO 2 Again : you can do pd.read_csv(filename, chunksize=100). This creates an iterable reader object, which means that you can use next() on it. DataCamp

# Import the pandas package
import pandas as pd

# Initialize reader object: df_reader
df_reader = pd.read_csv('ind_pop.csv', chunksize=10)

# Print two chunks
print(next(df_reader))                  # print first chunk (row 0 to 9)
print(next(df_reader))                  # print second chunk (row 10 to 19)

Output: