The Open Performance Data Initiative (OPDI) is a project launched in 2022 by the Performance Review Commission (PRC) to promote transparency, reproducibility, and accessibility of performance data in the European Air Traffic Management (ATM) system. The goal is to create a harmonized, open data environment that fosters data-driven decisions and ensures accountability across the aviation industry. For more information, see the OPDI Portal.
This repository contains the core ETL pipelines, SQL scripts, and scratch notebooks used to extract, transform, and load (ETL) data for the OPDI platform.
python/v2.0.0/00_etl_ourairports.py
: Downloads and processes datasets from OurAirports (e.g., airports, runways) and loads the data into Hive.
python/v2.0.0/01_osn_statevectors_etl.py
: Downloads OpenSky Network (OSN) data and uploads it to Hive.
python/v2.0.0/02_tracks_etl.py
: Identifies and allocates IDs to each track in the OSN dataset using custom heuristics.
python/v2.0.0/03_osn_flight_table.py
: Maps flights to ADEP (departure) and ADES (arrival) airports and generates the flight table.
python/v2.0.0/04_osn_flight_events_etl.py
: Extracts flight events using OpenAP & PySpark algorithms.
python/v2.0.0/05_extract_opdi.py
: Extracts datasets for upload to the OPDI platform.
SQL/OurAirports/create_oa_*.sql
: Creates Hive tables for the OurAirports datasets (airports, runways, etc.).
SQL/create_osn_*.sql
: SQL scripts for creating tables to store OSN data in Hive.
SQL/create_airport_*.sql
: Creates airport grid tables in Hive for milestone tracking.
Airport_coverage_data.ipynb
: Exploratory notebook for airport coverage analysis.Untitled1.ipynb
: Scratchbook for ad hoc calculations and explorations.airport_grid_creation.ipynb
: Notebook used for creating Python scripts related to airport grid generation.data-download.ipynb
: Notebook handling the download of data from OurAirports and OSN.
- Python 3.x
- Hive
- PySpark 3.x.x
- OpenSky Network API key - Request via OpenSky Network website.
-
Clone the repository:
git clone https://github.com/your-org/opdi-etl.git
-
Install required dependencies:
pip install -r requirements.txt
-
Set up your Hive environment by ensuring the correct permissions and configurations for storing the downloaded data.
-
Run the ETL scripts in sequence starting with
00_etl_ourairports.py
.python python/v2.0.0/00_etl_ourairports.py
Each script can be run individually. Below is an example of how to run the OSN raw data ETL:
python python/v2.0.0/01_osn_statevectors_etl.py
Ensure the configuration files are set up properly for each step, especially paths for input/output directories and API keys where applicable.
We welcome contributions to the project. Please follow these steps:
- Fork the repository.
- Create a feature branch (git checkout -b feature/my-new-feature).
- Commit your changes (git commit -am 'Add some feature').
- Push to the branch (git push origin feature/my-new-feature).
- Create a new pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- ATM: Air Traffic Management
- OPDI: Open Performance Data Initiative
- PRC: Performance Review Commission
- OSN: OpenSky Network
- ADEP/ADES: Departure/Arrival Airports