Programming and build (data engineering) - ONSdigital/DRAFT_DE_learning_roadmap GitHub Wiki
The GDD framework only describes Working, Practitioner, and Expert levels for Programming and build (data engineering). We have included descriptions for the Awareness and level here as well because of the wide range of data engineering skill levels we have at the ONS so that apprentices and colleagues coming from other professions such as data science can be represented and find resources.
- Awareness: show a basic understanding of software development principles and can write simple scripts under supervision.
- Working: design, code, test, correct and document simple programs or scripts under the direction of others.
- Practitioner: use agreed standards and tools to design, code, test, correct and document moderate-to-complex programs and scripts from agreed specifications and subsequent iterations. Collaborate with others to review specifications where appropriate.
- Expert: set local or team-based standards for programming tools and techniques and can select appropriate development methods. Advise on the application of standards and methods and ensure compliance. Take technical responsibility for all stages and iterations in a software development project, providing method-specific technical advice and guidance to project stakeholders.
If you are looking for some beginners tutorials on using Python please have a look at Data analysis and synthesis awareness.
Working in data engineering isn't just about writing code that works, its about writing code that works efficiently. In this page we include some resources which detail techniques and methods which can help with this.
If you are looking for training that leads to certification on GCP you may be able to take part in the Google Get Certified Program.
For some general git training across a range of topics on Git and GitHub please see GitHub Skills
Awareness
Awareness: Concepts
An overview of the philosophy of Python programming can be found in PEP 20: The Zen of Python
The ability to write object oriented code is a key feature of Python. Here is a short video comparing functional programming with object oriented programming, YouTube: FP vs OOP.
Learning Hub: Command line basics you can also use this list if you switch between using Windows and Linux commands often and need to translate.
Infrastructure as code is often considered a more advanced topic. We often first learn how to use infrastructure using point and click on the GCP console or equivalent. We include a short introduction to Terraform here, YouTube: Getting started with Terraform for Google Cloud
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. As data engineers in the ONS we mostly use Docker for containerisation.
YouTube: Containerize Python Applications with Docker
Learning Hub: Introduction to Continuous Integration
At the ONS we use Git, along with GitHub and GitLab for version control. Version control is a part of writing reproducible analytical pipelines (RAPs).
- Learning Hub: Packaging and Documentation
- Article: What is Git? Our beginner’s guide to version control
- Learning Hub: Introduction to RAP
- Learning Hub: Loops and Functions
Unit testing is also an important part of pipeline building with Python. You can find more about this on the Testing page.
Awareness: Cloud
There is a great series of YouTube videos where you can get a quick overview of GCP services called Cloud Bytes. We include some here:
- YouTube: Cloud Storage in a minute
- YouTube: Cloud Functions in a minute You can also learn more about Cloud Tasks and Cloud Scheduler in this video.
- YouTube: Pub/Sub in a minute There will be many data engineers in the ONS who have no need for Pub/Sub because they wont be streaming data.
- YouTube: BigQuery in a minute
- YouTube: BigQuery ML in a minute
- YouTube: Cloud IAM in a minute
- YouTube: Cloud Logging in a minute
- YouTube: Cloud Monitoring in a minute
- YouTube: Dataproc in a minute
- YouTube: Dataplex in a minute
Working
Learning Hub: DataFrames, Manipulation and Cleaning
Learning Hub: Introduction to Object Orientated Programming in Python
More on RAPS
- Learning Hub: Introduction to Git
- Presentation: Make Git Happen
- Game: Oh My Git for use on an off-network laptop only.
- Cheat Sheet: Git
- Learning Hub: Packaging Code with Python
Working: Cloud
Cloud Skills Boost (10 credits): Dev Apps with Cloud Functions