Usage_of_OpenSky_Query_Module - Flight-Path-Analysis/FlightPathAnalysis GitHub Wiki

Usage of src.data.opensky_query module.

This tutorial outlies the basic usage of the src.data.opensky_query module, used to query flight data and and state vector data from the OpenSky database. It is accompanied by the jupyter notebook located at tutorials/Usage Of OpenSky_Query Module.ipynb.

Definitions

  • Flight Data: Information about the flight, icao24 number, callsign, first and last seen times.
  • State Vector Data: Information about flight's position and movement at given points of time.

Requirements

  • A OpenSky Login and access to the OpenSky Database. Credentials can be aquired by contacting the administrators of the website.
  • A .yml file containing the credentials to access the OpenSky Database
  • A .yml file being a copy of config/config_template.yml contaning information about the flights to be downloaded, as well as a path to the credentials config file

Helpful Links

Setting up access to database

This section sets up the basic access to the database by loading credentials and configuration files into a custom Client object.

import sys
# This variable should indicate the path from this Jupyter Notebook to the root directory of the repo.
root_path = '../'
# Adds the repo's root to the list of paths
sys.path.append(root_path)

# Package to define and interpret dates
import datetime
# Package to read yml files
import yaml
# Package to handle file paths
import os
# Package for downloading opensky data 
from src.data import opensky_query
# Utilities package
from src.common import utils

# Normalizing all paths to work on all operational systems

root_path = os.path.normpath(root_path) # Path from this notebook to the root directory
config_path_from_root = os.path.normpath('config/config_tutorial.yml') # Path from root to the desired config file
config_path = os.path.join(root_path, config_path_from_root) # Defining path from this notebook to config file

# Loading config file as a dictionary
with open(config_path, 'r') as file:
    config = yaml.safe_load(file)
        
# Defining credentials filepath
credentials_file_from_root = os.path.normpath(config['base-configs']['opensky-credentials'])
credentials_file = os.path.join(root_path, credentials_file_from_root) # Defining path from this notebook to credentials file

# Loading credentials file
with open(credentials_file, 'r') as file:
    credentials = yaml.safe_load(file)

# Creates an instance of a logger class to log all that happens, optional (but encouraged).
logger = utils.Logger(config)

# Creates an instace of the Querier class used for querying the opensky database
opensky_querier = opensky_query.Querier(
    credentials,
    config,
    logger = logger)

Loading the Flight Data for flights in the airports and dates specified

The code below downloads the flight data for all found flights between the dates specified and between the airports specified. The result is given as a Pandas DataFrame

The columns of the dataframe are:

  • time: Recorded Unix UTC time of when the data was gathered
  • lat: Latitude of the aircraft at the given time
  • lon: Longitude of the aircraft at the given time
  • baroaltitude: Altitude of the aircraft according to the internal barometer at the given time
  • geoaltitude: Altitude of the aircraft according to the internal gps at the given time
# Looking at all flights from Baton Rouge to Dallas from `Jan 1, 2022` to `Jan 2, 2022`
departure_airport = 'KBTR'
arrival_airport = 'KDFW'
start_date = datetime.date(2022, 1, 1)
end_date = datetime.date(2022, 1, 2)

# This code will query the database and return the flights requested.
# As mentioned in the wiki, sometimes there are missing files for some days, which the code handles and adds as an exception before repeating the query
flights = opensky_querier.query_flight_data(
          {'departure_airport': departure_airport,
          'arrival_airport': arrival_airport}, 
          {'start': start_date, 
          'end': end_date})

Loading State Vector data for a single flight

The code below donloads a specific flight's state vectors, and returns it in a pandas DataFrame.

The columns of the dataframe are:

  • time: Recorded Unix UTC time of when the data was gathered
  • lat: Latitude of the aircraft at the given time
  • lon: Longitude of the aircraft at the given time
  • baroaltitude: Altitude of the aircraft according to the internal barometer at the given time
  • geoaltitude: Altitude of the aircraft according to the internal gps at the given time
# Let's choose a random flight out of the dataframe we got
flight = flights.sample(1).iloc[0]

# Similarly to querying flight data, there are 'bad hours' which can happen, these are handled by the code.
state_vectors = opensky_querier.query_state_vectors(
                flight['icao24'],
                flight['firstseen'],
                flight['lastseen'])