pythonDataChartToolNote - juedaiyuer/researchNote GitHub Wiki
#Python数据图表工具比较#
如何使用Python的数据可视化工具,深入Dataquest
##数据集##
使用的Github
使用的数据来自openflights
航线数据集---对应的是两个机场之间的飞行路径
机场数据集---每一行对应世界上的某一个机场
航空公司数据集
##读取数据##
这些数据没有列的首选项,因此我们通过赋值column属性来添加列的首选项,将每一列作为字符串进行读取
# Import the pandas library.
import pandas
# Read in the airports data.
airports = pandas.read_csv("airports.csv", header=None, dtype=str)
airports.columns = ["id", "name", "city", "country", "code", "icao", "latitude", "longitude", "altitude", "offset", "dst", "timezone"]
# Read in the airlines data.
airlines = pandas.read_csv("airlines.csv", header=None, dtype=str)
airlines.columns = ["id", "name", "alias", "iata", "icao", "callsign", "country", "active"]
# Read in the routes data.
routes = pandas.read_csv("routes.csv", header=None, dtype=str)
routes.columns = ["airline", "airline_id", "source", "source_id", "dest", "dest_id", "codeshare", "stops", "equipment"]
查看数据,或者加载到内存上的数据
>>> routes.head()
airline airline_id source source_id dest dest_id codeshare stops equipment
0 2B 410 AER 2965 KZN 2990 NaN 0 CR2
1 2B 410 ASF 2966 KZN 2990 NaN 0 CR2
2 2B 410 ASF 2966 MRV 2962 NaN 0 CR2
3 2B 410 CEK 2968 KZN 2990 NaN 0 CR2
4 2B 410 CEK 2968 OVB 4078 NaN 0 CR2
##制作柱状图##
显示不同的航空公司的航线长度分布
使用距离公式,使用余弦半正矢距离公式来计算经纬刻画的两个点之间的距离
import math
def haversine(lon1, lat1, lon2, lat2):
# Convert coordinates to floats.
lon1, lat1, lon2, lat2 = [float(lon1), float(lat1), float(lon2), float(lat2)]
# Convert to radians from degrees.
lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])
# Compute distance.
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
c = 2 * math.asin(math.sqrt(a))
km = 6367 * c
return km
使用一个函数来计算起点机场和终点机场之间的单程距离
def calc_dist(row):
dist = 0
try:
# Match source and destination to get coordinates.
source = airports[airports["id"] == row["source_id"]].iloc[0]
dest = airports[airports["id"] == row["dest_id"]].iloc[0]
# Use coordinates to compute distance.
dist = haversine(dest["longitude"], dest["latitude"], source["longitude"], source["latitude"])
except (ValueError, IndexError):
pass
return dist
路线数据
>>> routes.head()
airline airline_id source source_id dest dest_id codeshare stops equipment
0 2B 410 AER 2965 KZN 2990 NaN 0 CR2
1 2B 410 ASF 2966 KZN 2990 NaN 0 CR2
2 2B 410 ASF 2966 MRV 2962 NaN 0 CR2
3 2B 410 CEK 2968 KZN 2990 NaN 0 CR2
4 2B 410 CEK 2968 OVB 4078 NaN 0 CR2
机场数据框架
>>> airports.head()
id name city country code icao \
0 1 Goroka Goroka Papua New Guinea GKA AYGA
1 2 Madang Madang Papua New Guinea MAG AYMD
2 3 Mount Hagen Mount Hagen Papua New Guinea HGU AYMH
3 4 Nadzab Nadzab Papua New Guinea LAE AYNZ
4 5 Port Moresby Jacksons Intl Port Moresby Papua New Guinea POM AYPY
latitude longitude altitude offset dst timezone
0 -6.081689 145.391881 5282 10 U Pacific/Port_Moresby
1 -5.207083 145.7887 20 10 U Pacific/Port_Moresby
2 -5.826789 144.295861 5388 10 U Pacific/Port_Moresby
3 -6.569828 146.726242 239 10 U Pacific/Port_Moresby
4 -9.443383 147.22005 146 10 U Pacific/Port_Moresby
##source##