Getting Started with Rust - ThinkParQ/beegfs-rust GitHub Wiki
Overview
This document serves to provide guidelines, best practices, and how-to guides for developing using Rust at ThinkParQ. In general this document will not attempt to recreate external documentation, but rather provide links and go into detail on how things relate to our development process.
Table of Contents:
- Overview
- Setting up Rust
- Learning Rust
- Coding Standards
- Dependency Management
Setting up Rust
- Download and install Rust from rustup. Alternatively, you can use your package manager if it provides
rustup
. Avoid installing arust
package - you would miss the tooling for updating to the latest version and switching toolchains. - Install the nightly toolchain:
rustup toolchain install nightly
. We currently (May 2024) use it only for running the formatter as nightly allows for a more extensive configuration. - Install clippy:
rustup component add clippy
.clippy
is a Rust linter which helps catching common mistakes and bad practices. The CI expects the code to pass it without warnings. - Setup your editor/IDE: Rust analyzer is a powerful LSP server that provides all of the good editing tools for Rust and is highly recommended.
- If you are using VSCode, you can just install the rust-analyzer extension. Do not use the extension named "Rust", it is outdated and unmaintained.
- Other editors/IDEs have different ways to set up and configure Rust LSP support - it depends. In some cases you need to install rust-analyzer manually, in others the editor might provide auto setup.
- Configure rust-analyzer in your editor to run
cargo clippy
on save to immediately get feedback on mistakes while writing code. It also helps preventing accidentally checking in code with warnings - or at least lowers the chance. To use clippy, set rust-analyzerscheck.command
option toclippy
.- In VSCode, the setting ID is
rust-analyzer.check.command
- In VSCode, the setting ID is
- Configure you editor to auto-format on save (might be enabled by default). We only accept properly formatted code, by enabling format-on-save, your code is always in the right shape. To set it up:
- Find your editors "format on save" setting and enable it. Note that this is not a rust-analyzer setting but belongs to your editor.
- In VSCode, the setting ID is
editor.formatOnSave
- In VSCode, the setting ID is
- As mentioned above, we use the nightly version of
rustfmt
for formatting. To use it instead of the default / stable one, setrustfmt.extraArgs
to+nightly
- In VSCode, the setting ID is
rust-analyzer.rustfmt.extraArgs
- In VSCode, the setting ID is
- Find your editors "format on save" setting and enable it. Note that this is not a rust-analyzer setting but belongs to your editor.
Rust Version
In beegfs-rust
, we use a fixed version of Rust. It is set in rust-toolchain.toml and is updated in irregular intervals. Cargo/rustup should download the required toolchain automatically when running a cargo
command on the repo.
Rust releases a new version every six weeks - see the (Rust Release Notes).
Learning Rust
Rust is kind of complex and has a steep learning curve. It is not recommended to dive right into BeeGFS code but get to know the basics first, learn about the basic principles, the borrow checker and so on. Key resources for starting off can be found on the official website.
Some starting points:
- A fresh beginner is very advised to first read through The Book. It contains a thorough and great introduction to the language.
- If you rather want to learn by doing, the Rustlings course or Rust by Example might be great starting points.
- You might want to look through the Rust standard library.
Coding Standards
Code Hygiene
It is expected before submitting a pull request that the following tools have already been run:
cargo +nightly fmt
to format the code. We use the nightly version since it provides more features.cargo clippy
to check for errors and lints.
With Rust analyzer integrated in the Editor/IDE, these can be run automatically as you change/save files (see the setup procedure above).
Warnings are treated as errors and are thus not accepted.
Code style and best practice
Coding style in general should follow the Rust API Guidelines. These provide rules for and guidance on higher level design and implementation topics not covered by clippy. Following these helps to keep the code base in a consistent and readable state.
We are not overly strict with ,ost of the items covered within the guidelines (see the Checklist). But they still should be taken into account.
For now, special attention should be payed to the first chapter: Naming. Using consistent names increases the code readability. The naming guidelines cover the general topics of casing, naming conversion functions, getters and more. In addition to that, the BeeGFS related naming is defined below.
Specific naming conventions
The following naming rules specify and complement the generic ones from Naming:
Word order
If a name describes an action (meaning, it contains a verb), use the "natural" verb-object order. Example: fn set_value()
, not fn value_set()
. Exception: When items in the same namespace belong to categories, the category comes first. For example, in the mgmtd command line options: auth_enable
and auth_file
belong together to the auth
category, so auth
comes first.
Descriptive interface parameters
Parameters being part of an interface (e.g. function arguments) should not just be named id
but (a bit) more descriptive, e.g. target_id
. This follows the convention that is used in the mgmtd database scheme and lets a reader immedately see which kind of ID it is.
BeeGFS related naming
BeeGFS related naming convention should be followed internally in the code as well as when communicating with others (log messages, documentation, ...). Referring to the same thing always with the same name makes everything easier to comprehend. Internally, omitting a word for a shorter variable name or function argument is allowed though if it is clear by the type what is expected.
- A buddy group is always called
buddy group
, notbuddy mirror group
,mirror group
ormirror buddy group
(as they are randomly in all of the old code) - A storage pools is called a
storage pool
. Sincestorage_pool_id
is fairly long and used a lot, it can be (and usually is) abbreviated aspool_id
. - A capacity pool is called
capacity pool
in free text, and, as part of names,cap pool
. meta
is used for meta related stuff, notmetadata
, since it is shorter.
Logging
Logging should be used sparingly to avoid hard to read logs (at least at the higher levels, INFO
and above). In general, one combined log message for a whole operation should be preferred over logging several times. For example, when multiple failures can occur in a loop without the function returning an error, do not log on each iteration but collect the failures and log once after the loop ends.
This also prevents mixing up related log messages with unrelated ones from other tasks/threads.
Message handlers
Incoming requests should be handled as single, atomic operations that can either complete successfully or fail as a whole. There are currently one or two exceptions where forwarding to others nodes is required.
The following should be taken into account when writing message handlers:
ERROR
log
Only one When the request fails, make exactly one log entry containing the error chain leading to the failure. Usually, this is done automatically when returning a Result::Err
from the message handler.
INFO
log on success if and only if system state changes
Only one If a request succeeds and the request changes the state of the system (e.g. writing to the database) - then, and only then, make an INFO
level log entry telling the user what has been changed.
Info messages on read only requests are superfluous since the system state doesn't change and the requestor already knows that the request succeeded by retrieving the expected response. They would just clog the log file.
Database access
Transactions
When accessing the database, the handle automatically starts a transaction which is commited after the provided closure / function has been processed. So, all executed operations in one call to read_tx
/ write_tx
are automatically atomic. This ensures that read data using multiple statements is always consistent and also prevents partially successful operations.
Database interaction in a message handler should therefore usually be made within a single read_tx
/ write_tx
call.
Logging
Since database transactions are "all or nothing", logging in the database thread should usually be avoided and instead happen outside, after the transaction succeeds or fails. If something goes wrong, an appropriate error should be returned instead, which can then be caught and logged by the requestor.
User friendly errors
While the database enforces its integrity itself using constraints and some triggers, errors that occur due to a constraint violation are technical and possibly hard to understand by a user. In particular, they do not tell the use in clear language what went wrong.
To improve that, queries / operations that rely on incoming data that needs to satisfy constraints should explicitly check this data for fulfilling these constraints and return a descriptive error in case it doesn't. For example, when a new unique alias is set but it actually exists already, we rather want to log an error like
Alias {} already exists
instead of
UNIQUE constraint failed: entities.alias: Error code 2067: A UNIQUE constraint failed
Error handling
If an operation fails, the error should either be passed upwards by using ?
or handled by matching on the result. Panics might be caught but should be avoided. panic!
, .unwrap()
, .expect()
, assert!()
and other functions that fail by panicking must usually not be used.
There are some exceptions:
- It can (and has to) be used in tests
- If an error shouldn't happen during normal operation and can not easily be recovered from, panicking is allowed. This includes
assert!()
ordebug_assert!()
for checking invariants. - If a value demands for an
.unwrap()
of anOption
and it is clear from the surrounding code that it cannot beNone
, unwrapping is also ok as a last resort. It is highly preferred though to restructure the code instead so that is not necessary anymore. In almost all cases it is possible (e.g. by using the inner value before putting it in theOption
or use one of the countless helper functions`).
If .unwrap()
should be used, consider using .expect()
instead and provide additional information.
Documentation
Ensure to provide quality documentation using Rustdoc and reasonable commentary for all new/updated code. Rustdoc can auto generate documents from the Rustdoc comments in the source code - it is generally the preferred way to document code / an API.
If necessary, a README file can be provided, but these should generally be limited to providing step-by-step instructions or examples for a particular use case to help users understand generally how to use the package. API documentation should always be done using Rustdoc.
Testing
Wherever it makes sense, include appropriate tests with your PR for newly added / modified code.
Dependency Management
cargo
, Rusts package manager, has a very nice dependency management system built in. Unlike the Go package management, it does usually not (directly) rely on a git repo and git metadata to fetch the data and determine the version. Instead, a Cargo.toml
file is provided with each package/crate that explicitly defines the version and other metadata.
When pulling in dependencies, they are generally sourced them from crates.io. But it is also possible to source dependencies from git repositories, which is what you need to do to include our internal crates, like protobuf
(see below).
How To: Coordinate changes requiring updates in multiple repositories
If you are working on multiple repositories depending on each other at the same time, it can get really tedious to make changes to a dependency, push them and pull them in to the consumer repository. One way around that would be to modify the projects Cargo.toml
temporarily to use a local repository instead. That, however, comes with the risk of accidentally committing and pushing the temporary modification and should be avoided. Fortunately, Cargo provides another way outside of the source tree: You can tell cargo to use local paths instead of a remote repositories by defining patches. Add the following to your $HOME/.cargo/config.toml
(create the file if it doesn't exist yet):
[patch.'https://github.com/thinkparq/protobuf']
protobuf = { path = "/path/to/local/repo/protobuf"}
The dependencies code (protobuf
in this example) will now be used directly from the local source instead of the remote repository. You can now make changes to it and have them immediately be applied. Note that this ignores the required tag / branch specified in Cargo.toml
- you are responsible for checking out the required branch in your local copy of the dependency repository.
As a final note, the configuration made to $HOME/.cargo/config.toml
applies globally. That might be undesired - fortunately, cargo can also be configured in a per-project or per-workspace (or in general, per-directory-tree) way by providing .cargo/config.toml
at the desired location. See here for more info.
Module vs Package vs Crate
To make that clear and avoid confusion, take a look at the following Rust terms and their meanings:
- Module: An isolated unit of code, defined by using the
mod
keyword. Most of the time, you will probably face one module per file plus a potential submodule calledtest
for tests. Modules exists in hierarchies and everything defined in a module is private to this module unless explicitly exported usingpub
. A module is NOT a compilation unit. - Package: A (sub-) structure of Rust files with a
Cargo.toml
on the root level. A complete, enclosed piece of software, providing a library and/or binaries when built. If you want to usecargo
, you have to create a package to work in. - Crate: Often used synonymous to Package. But actually refers to a compilation unit in a package. Most packages only have one compilation unit (a library or a binary), but they can actually contain one library crate and an arbitrary amount of binary crates. So there is a slight difference compared to package - which most of the time doesn't matter.
- An example would be the
mgmtd
package: It actually contains a library crate, which provides the actual management server and two binary crates: One that provides the management server binary, calling into the library and one that is generated bycargo test
and runs the tests (by calling functions from the library).
- An example would be the
The meaning of Package and Module is quite different compared to Go, for example. In Go, a module is defines the project and a package is a unit of code.