introduction - gabriel-milan/smtr_challenge GitHub Wiki

The goal of this challenge was to build a data pipeline using Dagster in order to evaluate my adaptation capabilities with a new framework and software quality skills.

Without further ado, the pipeline implemented consists on three solids (pipelines and solids are concepts from Dagster):

  • fetch_json_data, which makes GET requests to an arbitrary API URL, parsing the response JSON data into a Python dictionary
  • generate_dataframe, which generates a Pandas dataframe from a Python dictionary
  • save_dataframe_to_csv, which exports a Pandas dataframe to a CSV file

This repository contains two main directories:

  • smtr_challenge is where the implementation is made
    • solids contains the three solids previously described
    • pipelines contains a single pipeline which connects the three solids
  • test is where the test cases are implemented
    • test_solids.py contains test cases that ensure all solids work as expected
    • test_pipelines.py contains test cases that ensure the pipeline works as expected

On the root directory you'll find two files describing the project dependencies:

  • requirements.txt contains dependencies for code execution
  • test_requirements.txt contains dependencies for executing test cases (all requirements for code execution are also needed)

Please read the next sections of this wiki in order to understand how it's implemented and how to run it.

⚠️ **GitHub.com Fallback** ⚠️