pythonDataChartToolNote - juedaiyuer/researchNote GitHub Wiki

#Python数据图表工具比较#

如何使用Python的数据可视化工具,深入Dataquest

##数据集##

使用的Github

使用的数据来自openflights

航线数据集---对应的是两个机场之间的飞行路径
机场数据集---每一行对应世界上的某一个机场
航空公司数据集

##读取数据##

这些数据没有列的首选项,因此我们通过赋值column属性来添加列的首选项,将每一列作为字符串进行读取

# Import the pandas library.
import pandas
# Read in the airports data.
airports = pandas.read_csv("airports.csv", header=None, dtype=str)
airports.columns = ["id", "name", "city", "country", "code", "icao", "latitude", "longitude", "altitude", "offset", "dst", "timezone"]
# Read in the airlines data.
airlines = pandas.read_csv("airlines.csv", header=None, dtype=str)
airlines.columns = ["id", "name", "alias", "iata", "icao", "callsign", "country", "active"]
# Read in the routes data.
routes = pandas.read_csv("routes.csv", header=None, dtype=str)
routes.columns = ["airline", "airline_id", "source", "source_id", "dest", "dest_id", "codeshare", "stops", "equipment"]

查看数据,或者加载到内存上的数据

>>> routes.head()
  airline airline_id source source_id dest dest_id codeshare stops equipment
0      2B        410    AER      2965  KZN    2990       NaN     0       CR2
1      2B        410    ASF      2966  KZN    2990       NaN     0       CR2
2      2B        410    ASF      2966  MRV    2962       NaN     0       CR2
3      2B        410    CEK      2968  KZN    2990       NaN     0       CR2
4      2B        410    CEK      2968  OVB    4078       NaN     0       CR2

##制作柱状图##

显示不同的航空公司的航线长度分布

使用距离公式,使用余弦半正矢距离公式来计算经纬刻画的两个点之间的距离

import math
def haversine(lon1, lat1, lon2, lat2):
    # Convert coordinates to floats.
    lon1, lat1, lon2, lat2 = [float(lon1), float(lat1), float(lon2), float(lat2)]
    # Convert to radians from degrees.
    lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])
    # Compute distance.
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    c = 2 * math.asin(math.sqrt(a))
    km = 6367 * c
    return km

使用一个函数来计算起点机场和终点机场之间的单程距离

def calc_dist(row):
    dist = 0
    try:
        # Match source and destination to get coordinates.
        source = airports[airports["id"] == row["source_id"]].iloc[0]
        dest = airports[airports["id"] == row["dest_id"]].iloc[0]
        # Use coordinates to compute distance.
        dist = haversine(dest["longitude"], dest["latitude"], source["longitude"], source["latitude"])
    except (ValueError, IndexError):
        pass
    return dist

路线数据

>>> routes.head()
  airline airline_id source source_id dest dest_id codeshare stops equipment
0      2B        410    AER      2965  KZN    2990       NaN     0       CR2
1      2B        410    ASF      2966  KZN    2990       NaN     0       CR2
2      2B        410    ASF      2966  MRV    2962       NaN     0       CR2
3      2B        410    CEK      2968  KZN    2990       NaN     0       CR2
4      2B        410    CEK      2968  OVB    4078       NaN     0       CR2

机场数据框架

>>> airports.head()
  id                        name          city           country code  icao  \
0  1                      Goroka        Goroka  Papua New Guinea  GKA  AYGA
1  2                      Madang        Madang  Papua New Guinea  MAG  AYMD
2  3                 Mount Hagen   Mount Hagen  Papua New Guinea  HGU  AYMH
3  4                      Nadzab        Nadzab  Papua New Guinea  LAE  AYNZ
4  5  Port Moresby Jacksons Intl  Port Moresby  Papua New Guinea  POM  AYPY

    latitude   longitude altitude offset dst              timezone
0  -6.081689  145.391881     5282     10   U  Pacific/Port_Moresby
1  -5.207083    145.7887       20     10   U  Pacific/Port_Moresby
2  -5.826789  144.295861     5388     10   U  Pacific/Port_Moresby
3  -6.569828  146.726242      239     10   U  Pacific/Port_Moresby
4  -9.443383   147.22005      146     10   U  Pacific/Port_Moresby

##source##

Python 数据图表工具的比较