09 01 Introduction and Flat Files - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki

Text Files

Importing entire text files

# Open a file: file
file = open('moby_dick.txt', 'r')

# Print it
print(file.read())

# Close file
file.close()

Context Manager and Importing text files line by line

you can bind a variable file by using a context manager construct: with open('huck_finn.txt') as file:
While still within this construct, the variable file will be bound toopen('huck_finn.txt'); thus, to print the file to the shell, all the code you need to execute is:

with open('huck_finn.txt') as file:
    print(file.readline())

no need to close the file explicitly

# Read & print the first 3 lines
with open('moby_dick.txt') as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())

Import Flat Files

File extension
- .csv
- .txt
- Delimiters: commas,tabs

Using NumPy to import flat files

NumPy arrays: standard for storing numerical data
Essential for other packages: e.g. scikit-learn
loadtxt() : will freak if there's is multiple types of data
genfromtxt(): return a structured array

# Import numpy
import numpy as np

# Assign the filename: file
file = 'digits_header.txt'

# Load the data: data
data = np.loadtxt(file, delimiter='\t', skiprows=1, usecols=[0,2])

# Print data
print(data)

Using pandas to import flat files as DataFrames

sep: '\t` etc.
comment ='#'
na_values='nothing'

# Assign the filename: file
file = 'digits.csv'

# Read the first 5 rows of the file into a DataFrame: data
data = pd.read_csv(file, header=None, nrows=5)

# Build a numpy array from the DataFrame: data_array
data_array = data.values

# Print the datatype of data_array to the shell
print(type(data_array)) #<class 'numpy.ndarray'>