1. Using iterators in PythonLand - upalr/Python-camp GitHub Wiki
1. Introduction to Iteretors
1.1 Iterable:
lists, strings, dictionaries, file connections are iterable + enumrrate() and zip()
INFO: for loop are used on those iterables
INFO 2 : you can do pd.read_csv(filename, chunksize=100). This creates an iterable reader object, which means that you can use next() on it.
iterable -> iter() -> iterator -> next()
iterable -> iter() -> iterator -> next() process
1.1.1 Breaking down the for loop
for i in rage(0, 11):
print(i)
rage(0, 11) : Iterable i: Iterator variabls (represent members of iterable)
2. Playing with Iterators (enumerate() and zip())
2.1 Enumerate()
Enumerate() is a function that takes any iterable as an argument such as a list and returns a special enumerate object which consists of pairs containing the elements of original iterable along with the index within the iterable.
We can use the function list() to turn this enumerate object to a list of tuples.
Enumarate object itself is also a iterable and we can loop over it.
2.2 Zip()
zip() accepts an arbitrary number of iterables and returns an zip object which is actually iterator of tuples we can turn this iterator of tuples into a list using list()
we could use a for loop to iterate over the zip object and print the tuples
we could also use the (*) operator to print all the elements
3. Using iterators to load large files into memory
The object created by pd.readcsv('data.csv', chunksize = 1000) is an iterable. So we can iterate over it using a for loop in which each chunk will be a DataFrame
INFO 2 Again : you can do pd.read_csv(filename, chunksize=100). This creates an iterable reader object, which means that you can use next() on it. DataCamp
# Import the pandas package
import pandas as pd
# Initialize reader object: df_reader
df_reader = pd.read_csv('ind_pop.csv', chunksize=10)
# Print two chunks
print(next(df_reader)) # print first chunk (row 0 to 9)
print(next(df_reader)) # print second chunk (row 10 to 19)
Output: