Programming and build (data engineering) - ONSdigital/DRAFT_DE_learning_roadmap GitHub Wiki

The GDD framework only describes Working, Practitioner, and Expert levels for Programming and build (data engineering). We have included descriptions for the Awareness and level here as well because of the wide range of data engineering skill levels we have at the ONS so that apprentices and colleagues coming from other professions such as data science can be represented and find resources.

  • Awareness: show a basic understanding of software development principles and can write simple scripts under supervision.
  • Working: design, code, test, correct and document simple programs or scripts under the direction of others.
  • Practitioner: use agreed standards and tools to design, code, test, correct and document moderate-to-complex programs and scripts from agreed specifications and subsequent iterations. Collaborate with others to review specifications where appropriate.
  • Expert: set local or team-based standards for programming tools and techniques and can select appropriate development methods. Advise on the application of standards and methods and ensure compliance. Take technical responsibility for all stages and iterations in a software development project, providing method-specific technical advice and guidance to project stakeholders.

If you are looking for some beginners tutorials on using Python please have a look at Data analysis and synthesis awareness.

Working in data engineering isn't just about writing code that works, its about writing code that works efficiently. In this page we include some resources which detail techniques and methods which can help with this.

If you are looking for training that leads to certification on GCP you may be able to take part in the Google Get Certified Program.

For some general git training across a range of topics on Git and GitHub please see GitHub Skills

Awareness

Awareness: Concepts

An overview of the philosophy of Python programming can be found in PEP 20: The Zen of Python

The ability to write object oriented code is a key feature of Python. Here is a short video comparing functional programming with object oriented programming, YouTube: FP vs OOP.

Learning Hub: Command line basics you can also use this list if you switch between using Windows and Linux commands often and need to translate.

Infrastructure as code is often considered a more advanced topic. We often first learn how to use infrastructure using point and click on the GCP console or equivalent. We include a short introduction to Terraform here, YouTube: Getting started with Terraform for Google Cloud

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. As data engineers in the ONS we mostly use Docker for containerisation.

YouTube: Containerize Python Applications with Docker

YouTube: Logging

Learning Hub: Introduction to Continuous Integration

At the ONS we use Git, along with GitHub and GitLab for version control. Version control is a part of writing reproducible analytical pipelines (RAPs).

Unit testing is also an important part of pipeline building with Python. You can find more about this on the Testing page.

Awareness: Cloud

There is a great series of YouTube videos where you can get a quick overview of GCP services called Cloud Bytes. We include some here:

Working

Learning Hub: DataFrames, Manipulation and Cleaning

Learning Hub: Introduction to Object Orientated Programming in Python

CS50P: Week 8 OOP

More on RAPS

Working: Cloud

Cloud Skills Boost (10 credits): Dev Apps with Cloud Functions

Practitioner

DAP CATS: Coding as a team