Bazel - kamialie/knowledge_corner GitHub Wiki

Bazel

Bazel is artifact-based build system - artifacts need to be created and there are dependencies between them. Each target is composed of inputs, outputs and an action to generate outputs from inputs. If a tool is needed for generation actions, it is also considered as an input.

Other Bazel features:

  • Incremental builds
  • Cache for artifacts
  • Cache for test results
  • Parallel builds
  • Remote execution

Hermetic builds:

  • toolchain definition
  • file sandboxing
  • network sandboxing

Build phases:

  • loading
  • analysis
  • execution

General concepts

The directory containing the WORKSPACE file is the root of the main repository. The name can be defined with workspace(name = 'custom_name') function. External repositories can be defined in WORKSPACE file with http_arhive, git_repository, etc. A repository contains packages and subpackages.

A directory containing a BUILD file defines a package (can also be called BUILD.bazel). A directory inside a package is part of that package only if it does not contain a BUILD file inside.

Label

The name of a target is label. The canonical format is @repository_name//package_name/subpackage:target. Depending on the context some parts can be omitted. Omitting different parts assumes the following:

  • //package/subpackage:target - same repository where label is located
  • @//package/subpackage:target - main repository
  • //package/subpackage - target with the same name as the package assumed
  • :target - same package where label is located

Wildcards can be used to refer to multiple targets. :

  • //package/subpackage:all - all targets inside a package
  • //package/... - all targets from the package and subpackages

Targets marked with tags = ["manual"] are skipped when wildcards are used. Same applies to targets that are not compatible with the target platform, e.g. target_compatible_with = ["@platforms//cpu:arm"].

// - root of the repo

  1. optional repo name
  2. path after slashes defines the package, then a colon
  3. target name (could be a rule or file)

Omitting a target name implifies a default target name, should have the same name as the package (last part in the path, meaning name of directory)

If already in the same directory as the target, full path can be ommitted (:main)

Visibility

BP is limitting visibility as much as possible. package(default_visibility = ["//visibility:private])

Dependency graph

Bazel query is used to show depenencies between targets. Depending on the build phase (load, analize, build) different graphs are used.

  • query - after loading phase; assumes all branches from the select
  • cquery - after analysis phase (preferred, but doesn't have all the features; resolves select statements based on configurations of query)
  • aquery - also after analysis with a focus on actions

Usage:

  • Get dependencies of a specific target
    bazel cquery "deps(//taget)"
  • Is a target depending on another target? How?
    # All possible paths between 2 artifacts
    # ... represents any target
    bazel cquery "allpaths(//my_target, //folder/...)"
    # Get a single path between targets
    bazel cquery "somepath(//my_target, //folder:my_other_target)"
  • What targets are depending on this target?
    bazel cquery "rdeps(..., //my_target)"

Use --notool_deps and --noimplicit_deps to filter out implicit dependencies, e.g. compiler.

Commands

Build

Builds specified targets. bazel help build - get all available options; add --long to also include explanations.

Popular options:

  • --keep_going - build commands builds all depencies that are needed for the target as well; this options changes the default behavior to stop on the first error, so that it tries to build as many targets as possible. None-zero exit code is returned, if error was found.
  • --jobs=<n> - number of jobs to run in parallel is decided by Bazel based on available and consumed CPU and RAM. This parameter further limits number of parallel jobs to run.
  • --platform=//platforms:<name> - specify target platform that is different than the system.
  • --toolchain_resolution_debug=.\* - informs on the different toolchains that were considered and the one that has been selected. Accepts a regular expression, .\* shows all information.
  • --verbose_failures - if a commands fails, prints the full command line
  • --nobuild - to force running only the analysis phase. Helps to narrow down analysis phase errors.
  • --subcommands=pretty_print - useful to know what commands get executed during a build. Produces a lot of output, useful only for debugging.
  • --config=<name> - pull additional options from .bazelrc file
    build:new_config --option1
    build:new_config --option2
    
  • --announce_rc - show all options taken from .bazelrc file

Run

Runs the specified targets. Pass parameters to the executable after --, bazel run <path>:<target> -- --arg1 --arg2. All build options can be used with run command; this also means that all build options in the bazelrc file are applied to run command automatically.

Popular options:

  • --run_under=<path> - run target through another program, e.g. analyzers; --run_under=/usr/bin/valgrind --quite. Any command that accepts a program can be specified (including scrpits).

Test

Builds and runs specified test targets. All bazel build options can be specified to bazel test. If nothing has changed, bazel doesn't rerun a test that has passed before; in the summary it is marked as cached.

Popular options:

  • test_output - change test output behaviour
    • summary (default) - list of failed and succedeed test without logs
    • streamed - real time test logs, but tests won't run in parallel
    • errors - print logs of failed tests
    • all - logs of all tests
  • --test_env=FOO=bar - customize test runner environment (runs all tests); can be specified multiple times.
  • --test_arg=arg1 - specify arguments to be passed to test executable; can be specified multiple times.
  • --flaky_test_attempts=4 - if test fails, it is run up to N times, and it if passes, it is marked as flaky in the summary.

Speed up builds

Caching

Each artifact and action has a corresponding generated fingerprint (hash). Given the fingerprint of inputs and action, a fingerprint of output can be generated. A cached artifact can then be retrieved from a key value store using this fingerprint.

Reproducible build is the one that produces bit-by-bit same output given same set of inputs, build environment and build command.

In Bazel an artifact should declare all it's dependencies; common rules expose:

  • srcs - list of files that need to be processed by the rule; some rules might expose additional parameters for input files, e.g. header files
  • deps - list of targets that the target depends on, only direct dependencies, compile time
  • data - list of files that are needed at runtime, e.g. config files, other binaries, etc
filegroup(
    name = "config",
    srcs = ["config.json"],
)

cc_library(
    name = "lib",
    srcs = ["library.cpp"],
    hdrs = ["library.h"],
)

cc_binary(
    name = "hello_world",
    srcs = ["main.cpp"],
    data = [":config"],
    deps = [":lib"],
)

Different types of cache:

  • Bazel server
  • Output directory
  • Repository cache
  • Disk cache
  • Remote cache

Key value storage of fingerprint of the output (generated from fingerprints of the set of inputs and the action and an artifact can be local (disk cache) or remote.

Bazel server

Starts when the first command is run in the workspace, and keeps running in the background; follow up commands use the same server. Caches information of the analysis phase. Stop the server with bazel shutdown.

Output directory

Directory used by actions; contains final outputs and all intermediate files. Output directory layout. bazel clean deletes contents of execroot, while --expunge options deletes everything under _bazel_$USER.

Repository cache

Stores the files downloaded when fetching external repositories; enabled by default. Location can be found with bazel info repository_cache. Change default location with --repository_cache=<absolute_path> (disable by setting it to empty string). Reused across workspaces.

Disk cache

Local key value store of fingerprints and artifacts. Here Bazel writes actions and action outputs. Useful if multiple of workspaces or branches of the same projects are used. Disabled by default, enable with --disk_cache=<absolute_path>. Might use a lot of disk space, and is not automatically cleaned.

Remote cache

Same as disk cache, but remote. Disabled by default, enable with --remote_cache=<uri_to_remote_cache>. Disable uploading local results with ---remote_upload_local_results=false.

Remote execution

Home page.

External dependencies

git_repository - clone external git repository. With new_git_repository one can also specify own BUILD and WORKSPACE files.

http_archive - download http archive, BUILD and WORKSPACE files can be provided.

http_file - download a single file from URL and expose it as a file group.

maybe - add a repository only if it is not already present. To be used in combination with git_repository and http_archive.

maybe(
    name = "bazel_skylib",
    repo_rule = git_repository,
    remote = "...",
    tag = "...",
)

Import external dependency. After @ specify the name of the dependency, after // specify location of the target.

load("@bazel_skylib//lib:versions.bzl", "versions")

Reference external dependency:

cc_binary(
    name = "foo",
    deps = ["@external_dep_name//:target_a"],
)

Language support by external rules

In general the following steps are needed to enable support for languages that are not supported natively:

  1. Import set of rules as external dependency from the WORKSPACE file
  2. (depends on the rule) Import any required dependencies and register toolchains
  3. Load the rules and use it in BUILD file

Tools

Bazelisk

Bazelisk reads .bazelversion file, install and runs correct version of bazel.

  • 4.2.1
  • 4.x
  • latest
  • last_rc (latest release candidate)
  • last_green (last commit with green pipeline)

Buildifier

Buildifier helps with formatting Bazel files and detects bad practices and deprecated functionality. Can also automatically fix some issue.

# By default runs in fix mode
buildifier --mode=check --lint=off -v -r .

# Run fix mode for lint issues
buildifier --mode=fix --lint=fix -v -r .
# View issue that were not fixed automatically
buildifier --mode=fix --lint=warn -v -r .

To enable bazel completions, run bazel build //scripts:fish_completion and copy the resulting script from bazel-bin/scripts/bazel.fish to ~/.config/fish/completions/bazel.fish.

VSCode extention

Bazel VSCode extension:

  • Bazel: Buildifier Fix On Format - enable
  • Bazel: Enable Code Lens - enable
  • Bazel: Queries Share Server - enable

Extending Bazel

Macros

Used to group rules and/or remove code duplication. Macros do not exist after analysis phase, which means the code created by macro is treated the same as all other code by Bazel.

Defined in .bzl file, then imported in BUILD file to be used.

  • .bzl:
    def cc_google_test(name, srcs, deps, visibility = None, **kwargs):
        # Since cc_test rule is native, have to have a native prefix
        native.cc_test(
            name = name,
            srcs = srcs,
            deps = deps + [
                "googletest//:gtest",
                "googletest//:gtest_main",
            ],
            visibility = visibility,
            **kwargs
        )
    
    def cc_tested_library(name, srcs, hdrs, tests, lib_deps = None, test_deps = None, visibility = None):
        native.cc_library(
            name = name,
            srcs = srcs,
            hdrs = hdrs,
            deps = lib_deps,
            visibility = visibility,
        )
    
        cc_google_test(
            name = name + "_test",
            srcs = tests,
            deps = [name] + (test_deps if test_deps else []),
            visibility = ["//visibility:private"],
        )
  • BUILD
    cc_library(
        name = "my_lib",
        srcs = ["my_lib.cpp"],
        hdrs = ["my_lib.h"],
    )
    
    cc_test(
        name = "my_lib_test",
        srcs = ["my_lib_test.cpp"],
        deps = [
            "my_lib",
            "googletest//:gtest",
            "googletest//:gtest_main",
        ]
    )
    
    cc_google_test(
        name = "my_lib_test",
        srcs = ["my_lib_test.cpp"],
        deps = ["my_lib"],
    )
    
    cc_tested_library(
        name = "my_lib"
        srcs = ["my_lib.cpp"],
        hdrs = ["my_lib.h"],
        tests = ["my_lib_test.cpp"],
    )

Test macro, shows the code after macro expansion. Additional 3 parameters help to identify where macro is located and which one is used: generator_name, generator_function, generator_location:

bazel query ... --output=build

More on macros:

Rule

In essence rule describes an action to be performed on some inputs to produce some outputs. Similarly to macros defined in .bzl files.

Simple rule:

  • .bzl
    # Always one and only parameter - ctx
    def _sample_rule_impl(ctx):
        out = ctx.actions.declare_file(ctx.label.name)
        ctx.actions.write(
            output = out,
            content = "File generate by test rule",
        )
        return [DefaultInfo(files = depset([out]))]
    
    sample_rule = rule(
        implementation = _sample_rule_impl,
    )
  • BUILD
    sample_rule(
        name = "generated_file",
    )

More complex rule. Common parameters can always be provides, so name doesn't have to be defined in the attribute.

  • .bzl
    def _configure_file_impl(ctx):
        ctx.actions.expand_template(
            output = ctx.outputs.out,
            template = ctx.file.template,
            substitutions = ctx.attr.substitutions,
        )
    
    configure_file = rule(
        implementation = _configure_file_impl,
        attrs = {
            # Could be an output of another rule, so mark it as label
            "template": attr.label(
                # Allow only single file to be used, and limit extension
                allow_single_file = [".tpl"],
                mandatory = True,
            ),
            # Since output file is declared in attributes,
            # no need to use DefaultInfo provider to express
            # that this file can be used by targets
            "out": attr.output(mandatory = True),
            "substitutions": attr.string_dict(mandatory = True),
        }
    )
  • BUILD
    configure_file(
        name = "configured_file",
        template = "template.tpl",
        out = "my_file.txt",
        substitutions = {
            "{NAME}": "foo",
            "{SURNAME}: "bar",
        }
    )

Common ctx actions:

  • write
  • declare_directory
  • declare_file
  • declare_symlink
  • do_nothing
  • expand_template
  • run
  • run_shell
  • symlink
  • write

More on rules:

Configure

Platform

A platform is a set of constraints; a custom constraint or set can also be created. Constraints can be used to skip incompatible targets, use right toolchain, select right dependency and more. Common constraints:

  • OS
  • CPU type
  • System libraries
  • Drivers

Canonical Bazel Platforms.

Bazel recognizes following environments:

  • Host - where bazel is executed; --host_platform

  • Execution - where actions are run; for compile languages this is also where code is compiled

  • Target - where final targets are executed, --platforms

  • simple custom platform:

    platform(
        name = "windows_32_bits",
        constraint_values = [
            "@platforms//os:windows",
            "@platforms//cpu:x86_32",
        ],
    )
  • simple constraint:

    constraint_setting(name = "rendering_drivers")
    constraint_value(
        name = "opengl",
        constraint_setting = ":rendering_drivers",
    ))
    platform(
        name = "windows_32_bits_opengl",
        constraint_values = [
            "@platforms//os:windows",
            "@platforms//cpu:x86_32",
            ":opengl",
        ]
    )

Platforms can be used to reflect compatibility. Incompatible targets are skipped when building with wildcards (//...). Special @platforms//:incompatible is used to explicitly reflect incompatibility; makes sense only together with select statement - either specify it as default branch or explicitly for certain branch.

cc_library(
    name = "linux_only_lib",
    srcs = ["linux_only_lib.cpp"],
    target_compatible_with = [
        "@platforms//os:linux",
    ]
)

Config and select

Custom configurations allow building different versions of a product. These config settings can be used to conditionally selects targets (select statements). Platform constraints can also be used in select statements.

Flags can be of type int, boolean or string.

Only one branch of the select statement can be selected; in case none of the branches match and error is thrown by bazel. Use //conditions:default to to have an else clause. To specify a flag - bazel run :sum_lib --:sum=rounding.

string_flag(
    name = "sum",
    # Default value
    build_setting_default = "accurate",
)

# Possible values of the flag
config_setting(
    name = "accurate_config",
    flag_values = {
        ":sum": "accurate",
    },
)

config_setting(
    name = "rounding_config",
    flag_values = {
        ":sum": "rounding",
    },
)

cc_library(
    name = "sum_lib",
    srcs = select({
        ":accurate_config": ["sum_lib.cpp"],
        ":rounding_config": ["sum_rounding_lib.cpp"],
        # "//conditions:default": [],
    })
)

Toolchain

Toolchain describes a set of tools to be used by a rule. Contents depend on a language and/or the rule. Documentation.

Depending on a platform different tools might be needed, thus, toolchain compatibility can also be specified:

  • exec_compatible_with - execution environment constraints
  • target_compatible_with - target platform constraints

Custom toolchains can also be created. Provide one with --extra_toolchains parameter in cli or .bazelrc file or register_toolchains command in WORKSPACE file.

Links

Communications:

Hands on:

⚠️ **GitHub.com Fallback** ⚠️