Getting Started with Rust - ThinkParQ/beegfs-rust GitHub Wiki

Overview

This document serves to provide guidelines, best practices, and how-to guides for developing using Rust at ThinkParQ. In general this document will not attempt to recreate external documentation, but rather provide links and go into detail on how things relate to our development process.

Table of Contents:

Setting up Rust

  1. Download and install Rust from rustup. Alternatively, you can use your package manager if it provides rustup. Avoid installing a rust package - you would miss the tooling for updating to the latest version and switching toolchains.
  2. Install the nightly toolchain: rustup toolchain install nightly. We currently (May 2024) use it only for running the formatter as nightly allows for a more extensive configuration.
  3. Install clippy: rustup component add clippy. clippy is a Rust linter which helps catching common mistakes and bad practices. The CI expects the code to pass it without warnings.
  4. Setup your editor/IDE: Rust analyzer is a powerful LSP server that provides all of the good editing tools for Rust and is highly recommended.
    • If you are using VSCode, you can just install the rust-analyzer extension. Do not use the extension named "Rust", it is outdated and unmaintained.
    • Other editors/IDEs have different ways to set up and configure Rust LSP support - it depends. In some cases you need to install rust-analyzer manually, in others the editor might provide auto setup.
  5. Configure rust-analyzer in your editor to run cargo clippy on save to immediately get feedback on mistakes while writing code. It also helps preventing accidentally checking in code with warnings - or at least lowers the chance. To use clippy, set rust-analyzers check.command option to clippy.
    • In VSCode, the setting ID is rust-analyzer.check.command
  6. Configure you editor to auto-format on save (might be enabled by default). We only accept properly formatted code, by enabling format-on-save, your code is always in the right shape. To set it up:
    1. Find your editors "format on save" setting and enable it. Note that this is not a rust-analyzer setting but belongs to your editor.
      • In VSCode, the setting ID is editor.formatOnSave
    2. As mentioned above, we use the nightly version of rustfmt for formatting. To use it instead of the default / stable one, set rustfmt.extraArgs to +nightly
      • In VSCode, the setting ID is rust-analyzer.rustfmt.extraArgs

Rust Version

In beegfs-rust, we use a fixed version of Rust. It is set in rust-toolchain.toml and is updated in irregular intervals. Cargo/rustup should download the required toolchain automatically when running a cargo command on the repo.

Rust releases a new version every six weeks - see the (Rust Release Notes).

Learning Rust

Rust is kind of complex and has a steep learning curve. It is not recommended to dive right into BeeGFS code but get to know the basics first, learn about the basic principles, the borrow checker and so on. Key resources for starting off can be found on the official website.

Some starting points:

Coding Standards

Code Hygiene

It is expected before submitting a pull request that the following tools have already been run:

  • cargo +nightly fmt to format the code. We use the nightly version since it provides more features.
  • cargo clippy to check for errors and lints.

With Rust analyzer integrated in the Editor/IDE, these can be run automatically as you change/save files (see the setup procedure above).

Warnings are treated as errors and are thus not accepted.

Code style and best practice

Coding style in general should follow the Rust API Guidelines. These provide rules for and guidance on higher level design and implementation topics not covered by clippy. Following these helps to keep the code base in a consistent and readable state.

We are not overly strict with ,ost of the items covered within the guidelines (see the Checklist). But they still should be taken into account.

For now, special attention should be payed to the first chapter: Naming. Using consistent names increases the code readability. The naming guidelines cover the general topics of casing, naming conversion functions, getters and more. In addition to that, the BeeGFS related naming is defined below.

Specific naming conventions

The following naming rules specify and complement the generic ones from Naming:

Word order

If a name describes an action (meaning, it contains a verb), use the "natural" verb-object order. Example: fn set_value(), not fn value_set(). Exception: When items in the same namespace belong to categories, the category comes first. For example, in the mgmtd command line options: auth_enable and auth_file belong together to the auth category, so auth comes first.

Descriptive interface parameters

Parameters being part of an interface (e.g. function arguments) should not just be named id but (a bit) more descriptive, e.g. target_id. This follows the convention that is used in the mgmtd database scheme and lets a reader immedately see which kind of ID it is.

BeeGFS related naming

BeeGFS related naming convention should be followed internally in the code as well as when communicating with others (log messages, documentation, ...). Referring to the same thing always with the same name makes everything easier to comprehend. Internally, omitting a word for a shorter variable name or function argument is allowed though if it is clear by the type what is expected.

  • A buddy group is always called buddy group, not buddy mirror group, mirror group or mirror buddy group (as they are randomly in all of the old code)
  • A storage pools is called a storage pool. Since storage_pool_id is fairly long and used a lot, it can be (and usually is) abbreviated as pool_id.
  • A capacity pool is called capacity pool in free text, and, as part of names, cap pool.
  • meta is used for meta related stuff, not metadata, since it is shorter.

Logging

Logging should be used sparingly to avoid hard to read logs (at least at the higher levels, INFO and above). In general, one combined log message for a whole operation should be preferred over logging several times. For example, when multiple failures can occur in a loop without the function returning an error, do not log on each iteration but collect the failures and log once after the loop ends.

This also prevents mixing up related log messages with unrelated ones from other tasks/threads.

Message handlers

Incoming requests should be handled as single, atomic operations that can either complete successfully or fail as a whole. There are currently one or two exceptions where forwarding to others nodes is required.

The following should be taken into account when writing message handlers:

Only one ERROR log

When the request fails, make exactly one log entry containing the error chain leading to the failure. Usually, this is done automatically when returning a Result::Err from the message handler.

Only one INFO log on success if and only if system state changes

If a request succeeds and the request changes the state of the system (e.g. writing to the database) - then, and only then, make an INFO level log entry telling the user what has been changed.

Info messages on read only requests are superfluous since the system state doesn't change and the requestor already knows that the request succeeded by retrieving the expected response. They would just clog the log file.

Database access

Transactions

When accessing the database, the handle automatically starts a transaction which is commited after the provided closure / function has been processed. So, all executed operations in one call to read_tx / write_tx are automatically atomic. This ensures that read data using multiple statements is always consistent and also prevents partially successful operations.

Database interaction in a message handler should therefore usually be made within a single read_tx / write_tx call.

Logging

Since database transactions are "all or nothing", logging in the database thread should usually be avoided and instead happen outside, after the transaction succeeds or fails. If something goes wrong, an appropriate error should be returned instead, which can then be caught and logged by the requestor.

User friendly errors

While the database enforces its integrity itself using constraints and some triggers, errors that occur due to a constraint violation are technical and possibly hard to understand by a user. In particular, they do not tell the use in clear language what went wrong.

To improve that, queries / operations that rely on incoming data that needs to satisfy constraints should explicitly check this data for fulfilling these constraints and return a descriptive error in case it doesn't. For example, when a new unique alias is set but it actually exists already, we rather want to log an error like

Alias {} already exists

instead of

UNIQUE constraint failed: entities.alias: Error code 2067: A UNIQUE constraint failed

Error handling

If an operation fails, the error should either be passed upwards by using ? or handled by matching on the result. Panics might be caught but should be avoided. panic!, .unwrap(), .expect(), assert!() and other functions that fail by panicking must usually not be used.

There are some exceptions:

  • It can (and has to) be used in tests
  • If an error shouldn't happen during normal operation and can not easily be recovered from, panicking is allowed. This includes assert!() or debug_assert!() for checking invariants.
  • If a value demands for an .unwrap() of an Option and it is clear from the surrounding code that it cannot be None, unwrapping is also ok as a last resort. It is highly preferred though to restructure the code instead so that is not necessary anymore. In almost all cases it is possible (e.g. by using the inner value before putting it in the Option or use one of the countless helper functions`).

If .unwrap() should be used, consider using .expect() instead and provide additional information.

Documentation

Ensure to provide quality documentation using Rustdoc and reasonable commentary for all new/updated code. Rustdoc can auto generate documents from the Rustdoc comments in the source code - it is generally the preferred way to document code / an API.

If necessary, a README file can be provided, but these should generally be limited to providing step-by-step instructions or examples for a particular use case to help users understand generally how to use the package. API documentation should always be done using Rustdoc.

Testing

Wherever it makes sense, include appropriate tests with your PR for newly added / modified code.

Dependency Management

cargo, Rusts package manager, has a very nice dependency management system built in. Unlike the Go package management, it does usually not (directly) rely on a git repo and git metadata to fetch the data and determine the version. Instead, a Cargo.toml file is provided with each package/crate that explicitly defines the version and other metadata.

When pulling in dependencies, they are generally sourced them from crates.io. But it is also possible to source dependencies from git repositories, which is what you need to do to include our internal crates, like protobuf (see below).

How To: Coordinate changes requiring updates in multiple repositories

If you are working on multiple repositories depending on each other at the same time, it can get really tedious to make changes to a dependency, push them and pull them in to the consumer repository. One way around that would be to modify the projects Cargo.toml temporarily to use a local repository instead. That, however, comes with the risk of accidentally committing and pushing the temporary modification and should be avoided. Fortunately, Cargo provides another way outside of the source tree: You can tell cargo to use local paths instead of a remote repositories by defining patches. Add the following to your $HOME/.cargo/config.toml (create the file if it doesn't exist yet):

[patch.'https://github.com/thinkparq/protobuf']
protobuf = { path = "/path/to/local/repo/protobuf"}

The dependencies code (protobuf in this example) will now be used directly from the local source instead of the remote repository. You can now make changes to it and have them immediately be applied. Note that this ignores the required tag / branch specified in Cargo.toml - you are responsible for checking out the required branch in your local copy of the dependency repository.

As a final note, the configuration made to $HOME/.cargo/config.toml applies globally. That might be undesired - fortunately, cargo can also be configured in a per-project or per-workspace (or in general, per-directory-tree) way by providing .cargo/config.toml at the desired location. See here for more info.

Module vs Package vs Crate

To make that clear and avoid confusion, take a look at the following Rust terms and their meanings:

  • Module: An isolated unit of code, defined by using the mod keyword. Most of the time, you will probably face one module per file plus a potential submodule called test for tests. Modules exists in hierarchies and everything defined in a module is private to this module unless explicitly exported using pub. A module is NOT a compilation unit.
  • Package: A (sub-) structure of Rust files with a Cargo.toml on the root level. A complete, enclosed piece of software, providing a library and/or binaries when built. If you want to use cargo, you have to create a package to work in.
  • Crate: Often used synonymous to Package. But actually refers to a compilation unit in a package. Most packages only have one compilation unit (a library or a binary), but they can actually contain one library crate and an arbitrary amount of binary crates. So there is a slight difference compared to package - which most of the time doesn't matter.
    • An example would be the mgmtd package: It actually contains a library crate, which provides the actual management server and two binary crates: One that provides the management server binary, calling into the library and one that is generated by cargo test and runs the tests (by calling functions from the library).

The meaning of Package and Module is quite different compared to Go, for example. In Go, a module is defines the project and a package is a unit of code.