python_pycharm_graph - 8BitsCoding/RobotMentor GitHub Wiki

PyCharm -> File -> Settings -> Project Interpreter -> '+' -> 필요한 package설치

pandas, matplotlib 설치

사전 사항

예제파일은 다음 두 개를 이용 mtcars.csv, csv_exam.csv, split.csv, gpgga_exam.csv

CSV 정보 출력하기

from pandas.io.parsers import read_csv
import matplotlib.pyplot as plt

df = read_csv('csv_exam.csv')
#df = read_csv('csv_exam.csv', names=['id', 'class', 'math', 'english', 'science'])

print('Type:', type(df))             # 데이터 프레임의 row(observation)의 갯수
print('Shape:', df.shape)            # 데이터 프레임의 (row, col)
print('Head:\n', df.head(3))         # 데이터 프레임의 처음 일부 데이터
print('tail:\n', df.tail(3))
print('Values:\n', df.values)
print('Describe:\n', df.describe())  # 요약 통계량(최솟값, 최댓값, 중앙값, 평균 ...)

print('-'*30)

출력결과

df = read_csv('csv_exam.csv', names=['id', 'class', 'math', 'english', 'science'])
print('Head:\n', df.head(3))

가장 상위 열에 해당 아이디의 열을 추가

Head:
    id  class  math  english  science
0  id  class  math  english  science
1   1      1    50       98       50
2   2      1    60       97       60

print('Type:', type(df))

<class 'pandas.core.frame.DataFrame'>

print('Shape:', df.shape)

Shape: (20, 5)

20행 5열

print('Head:\n', df.head(3))

상위 3행 데이터 출력

Head:
    id  class  math  english  science
0   1      1    50       98       50
1   2      1    60       97       60
2   3      1    45       86       78

print('tail:\n', df.tail(3))

하위 3행 데이터 출력

tail:
     id  class  math  english  science
17  18      5    80       78       90
18  19      5    89       68       87
19  20      5    78       83       58

print('Values:\n', df.values)

전체 데이터 출력

print('Describe:\n', df.describe())

데이터의 요약정보 출력

Describe:
              id      class       math    english    science
count  20.00000  20.000000  20.000000  20.000000  20.000000
mean   10.50000   3.000000  57.450000  84.900000  59.450000
std     5.91608   1.450953  20.299015  12.875517  25.292968
min     1.00000   1.000000  20.000000  56.000000  12.000000
25%     5.75000   2.000000  45.750000  78.000000  45.000000
50%    10.50000   3.000000  54.000000  86.500000  62.500000
75%    15.25000   4.000000  75.750000  98.000000  78.000000
max    20.00000   5.000000  90.000000  98.000000  98.000000

데이터 파싱하기 (추가/삭제 하기)

print('df.loc[0:1]\n', df.loc[0:1])

0~1행 출력

    id  class  math  english  science
0   1      1    50       98       50
1   2      1    60       97       60

print('df.loc[0:1]\n', df.loc[:, 'id':'math'])

모든 행의 id~math 열 출력

    id  class  math
0    1      1    50
1    2      1    60
...

new_df = df[df.math >= 50]
print('new_df(math>=50)\n', new_df)

특정 값에 조건 걸기 1

new_df(math>=50)
     id  class  math  english  science
0    1      1    50       98       50
1    2      1    60       97       60
5    6      2    50       89       98
...

new_df = df[df.math >= 50]
new_df.sort_values(by='math', ascending=0)
print('new_df(math>=50)\n', new_df)

특정 값으로 내림차순 정렬

new_df(math>=50)
     id  class  math  english  science
0    1      1    50       98       50
1    2      1    60       97       60
5    6      2    50       89       98
...

new_df = df[(df.math >= 50) & (df.english >= 20)]
print('new_df(math>=50)\n', new_df)

특정 값에 조건 걸기 2

new_df2 = df[df.class.str.contains('test')]

특정 문자열 포함여부

df['new_column'] = 'new column'
print('new_column\n', df.head(3))

열 추가

new_column
    id  class  math  english  science  new_column
0   1      1    50       98       50  new column
1   2      1    60       97       60  new column
2   3      1    45       86       78  new column

from pandas.io.parsers import read_csv
import matplotlib.pyplot as plt

df = read_csv('split.csv', names=['id'])

split = df.id.str.split(',',expand=True)
#split['id', '1', '2'](/8BitsCoding/RobotMentor/wiki/'id',-'1',-'2') = df.id.str.split(',',expand=True)

print('df\n', df)
print('split\n', split)

문자열 나누기

df
       id
0  a,b,c
1  1,2,3
split
    0  1  2
0  a  b  c
1  1  2  3

그래프 그리기

from pandas.io.parsers import read_csv
import matplotlib.pyplot as plt

df = read_csv('mtcars.csv')

# 자동차 배기량(disp)과 연비(mpg)의 상관 관계 산점도 그래프
plt.scatter(df.disp, df.mpg)
plt.xlabel('disp')
plt.ylabel('mpg')
plt.show()

# 자동차의 무게(wt)와 연비(mpg)의 산점도 그래프
#plt.scatter(df.wt, df.mpg)
plt.plot(df.wt,df.mpg,'rs--')
plt.xlabel('weight')
plt.ylabel('miles per gallon')
plt.grid()
plt.savefig('wt-mpg.png')
plt.show()

실전 GPS데이터 파싱하기

gpgga_exam.csv를 사용하면 됨

# 기본적 import
from pandas.io.parsers import read_csv
import pandas as pd
import matplotlib.pyplot as plt

df = read_csv('gpgga_exam.csv', names=['gps_raw_data'])
# csv파일을 읽어온다.

split = df.gps_raw_data.str.split(',', expand=True )
# ',' 문자 단위로 나눈다.

split.columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19']
# Q. column을 이런식으로 다 적어 줘야하는지?

split1 = split[split['0'].str.contains('GPGGA')]
# split '0'열에 GPGGA라는 문자가 있다면 split1으로 넣는다.

split2 = split1.loc[:, ['2', '4']]
# split1의 '2', '4'열을 split2로 넣는다.

split2['2'] = pd.to_numeric(split2['2'])
split2['4'] = pd.to_numeric(split2['4'])
# split2의 '2', '4'열의 자료형을 변경한다.

split2['2'] = split2['2'].sub(3700, fill_value=0)
split2['4'] = split2['4'].sub(12700, fill_value=0)
split2['2'] = split2['2'].div(60, fill_value=0)
split2['4'] = split2['4'].div(60, fill_value=0)
split2['2'] = split2['2'].add(37, fill_value=0)
split2['4'] = split2['4'].add(127, fill_value=0)

# 결과 출력
print('split2\n', split2)

split2
             2          4
8   37.583465  127.02741
14  37.600132  127.02741
15  37.616798  127.02741

python_pycharm_graph - 8BitsCoding/RobotMentor GitHub Wiki

목차

사전 사항

CSV 정보 출력하기

출력결과

데이터 파싱하기 (추가/삭제 하기)

그래프 그리기

실전 GPS데이터 파싱하기