Python code solutions - mdepasca/miniature-adventure GitHub Wiki
Pandas
To read a ASCII file
I used read_csv
, function full of parameters.
The file is SIMGEN_PUBLIC_DES.DUMP
from the Supernova Photometric Classification Challenge. I want to read just the supernova ID and its type. Here is a chunck of it:
NVAR: 36
VARNAMES: CID GENTYPE SNTYPE NON1A_INDEX GENZ HOSTZ....
SN: 857494 3 -9 1 3.9452e-01 3.7062e-01 3.6300e-02 ....
SN: 238436 3 -9 1 2.5966e-01 2.6396e-01 3.0200e-02 ....
SN: 870263 3 -9 1 8.2972e-01 8.2412e-01 2.0200e-02 ....
SN: 419054 3 -9 1 7.2242e-01 7.3212e-01 1.5300e-02 ....
SN: 463559 3 -9 1 8.8864e-01 9.0714e-01 1.7600e-02 ....
SN: 368063 3 -9 1 7.5189e-01 7.5669e-01 1.4300e-02 ....
SN: 105238 3 -9 1 6.0254e-01 5.9714e-01 2.2000e-02 ....
SN: 812941 3 -9 1 4.7147e-01 5.0657e-01 2.4900e-02 ....
Thing to do are:
- to skip the first line
- keep the second as header (aka columns names)
- select only
CID
andGENTYPE
columns.
import pandas as pd
dump = pd.read_csv(path, sep=' ', skiprows=0, header=1,
usecols=[1,2], skipinitialspace=True, engine='c')
sep = ' '
is specified beacause its default is','
skiprows
is the number of the row ro skip, zero indexedheader
is row's number to use as headerusecols
used to specify col indeces to readskipinitialspace
does what it says. Used because beforeCID
column there are more then one space
The problem is that the type associated to each column is not numeric but the generic object
.
In [1]: dump.dtypes
Out[2]:
CID object
GENTYPE object
dtype: object
Type object
is subtle. To convert to something else a specific function has to be used
In [60]: dump = dump.convert_objects(convert_numeric=True, copy=False)
In [61]: dump.dtypes
Out[61]:
CID float64
GENTYPE float64
dtype: object