Quick overview of MSBuild - KirillOsenkov/Bliki GitHub Wiki

MSBuild usage

  1. command-line builds (optionally with /r to restore NuGet packages)
  2. in-IDE builds (Visual Studio):
  3. programmatic usage via API

Command-line builds

Evaluation

Command-line builds primarily do two things: evaluation and execution. Evaluation reads, parses and caches XML files, inlines all imported project and targets files, and then interprets the preprocessed XML with all the imports inlined. Parsing XML is not a significant part performance-wise, and each XML file is parsed only once and then reused. MSBuild concatenates the import tree into a single large logical "XML file", which is called preprocessing. You can view the preprocessed XML for a project by running msbuild /pp my.csproj or in the binlog viewer by clicking Preprocess in the context menu of a project.

The actual evaluation of all expressions and conditions, as well as populating properties and items takes up most of the time. The output of an evaluation is a set of properties and items (with metadata) and item definitions to use during the build.

The evaluator makes several passes:

  1. initial properties
  2. properties
  3. item definition groups
  4. items
  5. using tasks
  6. targets

Execution

The second phase of the build after evaluation is the execution of the target graph. The target graph is computed dynamically by topologically sorting BeforeTargets, DependsOnTargets and AfterTargets. As soon as the next target is determined, it is executed. When a target runs, tasks within the target are instantiated and executed. Tasks may include the MSBuild task, intrinsic tasks (add item/remove item/update item/set property, expressed in XML as ItemGroup and PropertyGroup inside targets) as well as user tasks such as Csc, ResolveAssemblyReference, etc.

The new static graph feature can help with correctness and determinism, but it is currently not on by default (see /graph and /isolate command-line switches). https://twitter.com/CodobanMihai is an expert on static graph.

Each project has its own evaluation and set of properties and items. A project starts execution for a given set of entry targets using a given evaluated project instance. Initial evaluation data (properties and items) can and will be further mutated by intrinsic tasks and user tasks (outputs of tasks can be assigned into properties and items). Each project can be built multiple times (when different entry targets are requested) and these builds may share the same evaluation (if the initial global properties are the same), in which case the changes to properties and items are seen and shared across builds of various targets of the same project. If a set of global properties changes, a new evaluation is made (e.g. two different target frameworks do not share an evaluation, each gets its own).

Target build ordering

  • One might think that BeforeTargets and AfterTargets enable topological sorting, but they do not. One of them takes precedence and one might be plain ignored (like you saw). Hence using both of these usually looks like a bug or at least misleading.
  • If you imagine existing targets as pegs with bundles of other targets hanging off of them, BeforeTargets will just hang your target to the left of the target you specify, in an unpredictable and random order. AfterTargets will hang it to the right, but where it will go inside that bunch of other targets hanging off your target is not determined.
  • When MSBuild runs a target, it looks for all targets that have specified to be BeforeTargets or AfterTargets for this target and runs first the Before ones, then your target, then the After ones
  • DependsOnTargets will force the other targets you specify to run before yours. This might perturb the normal target ordering if your target wasn’t around.
  • Specifying DependsOnTargets on your target doesn’t guarantee that your target will run. It will run only when someone else calls it.
  • Some common MSBuild targets specify the *DependsOn property that allows you to plug yourself into that list. It is similar to BeforeTargets conceptually, but DependsOn is less prescriptive than BeforeTargets (it just ensures that the target will run before any other target that depends on it, whereas BeforeTargets binds closer to a single target that you have specified)

Project System

The Visual Studio Project System is a component in Visual Studio responsible for reading the .sln, listing projects, and loading/unloading each project. Each .csproj can be loaded by one of two project systems in VS: legacy (old, written in C++) and CPS + Managed (modern, written in .NET). VS currently decides which project system to use when loading a project by the project GUID specified in the .sln file or other hints. CPS is the new Common Project System written in C# that uses MEF for composition and exposes immutable trees as data structures modeling the project contents and build results (not related to Roslyn trees). SDK-style projects open with CPS. Managed project system https://github.com/dotnet/project-system extends CPS (which is agnostic of the project language) to provide C# and VB specific logic.

Logically the managed project system is split into three components (simplified):

  1. CPS (Microsoft.VisualStudio.ProjectSystem.Implementation.dll) - core CPS, orchestrates everything, uses MEF with scoped composition, TPL DataFlow, immutable collections, everything is async. CPS is not open source but you can find docs here: https://github.com/microsoft/VSProjectSystem
  2. Project Services (Microsoft.VisualStudio.ProjectServices.dll) - independent component that hosts MSBuild and satisfies build requests (and does build caching as well). This component essentially hosts the MSBuild BuildManager API and is very similar to MSBuildWorkspace and Buildalyzer. This is the part mostly connected to MSBuild. It is not open source. https://twitter.com/panopticoncntrl is an expert on Project Services.
  3. Managed project system (Microsoft.VisualStudio.ProjectSystem.Managed.dll) - extensions to CPS to support C#/VB and .csproj/.vbproj specific logic, as well as populating the Roslyn workspace. This part is open-source at: https://github.com/dotnet/project-system. https://twitter.com/drewnoakes works on the managed project system. https://twitter.com/davkean is a knowledgeable alumnus who used to work on it too.

Visual Studio for Mac uses a different (but also managed) project system, not related to CPS. https://twitter.com/lastexitcode is the maintainer and expert on VSMac project system and NuGet integration. It is simplified compared to CPS but not as refined and may not be as tuned. It is not open source or rehostable outside of VSMac.

Visual Studio Code uses OmniSharp (which is similar to MSBuildWorkspace) for the project system.

Function of a project system

A project system does primarily three things: evaluation, design-time builds and "real" builds. The goal of evaluation is same as for command-line: to establish a set of properties and items to use during the build (either design-time or "real"). VS runs build out-of-proc and explicitly disables in-proc node to make sure no build logic runs within the VS process. This is to ensure stability, memory consumption and responsiveness of the IDE. Catastrophic build failures will not crash the IDE.

The goal of a design-time build is to establish the command-line arguments for each Csc invocation for each project. When using MSBuild programmatically via an API (MSBuildWorkspace, Buildalyzer, etc), the goal is the same. Once we know what sources, references and compiler switches are passed to Csc for each project (and each target framework), this information is used to populate the Roslyn workspace, which serves as a central repository of knowledge about the loaded solution and powers language services and editor experiences.

Almost no data structures are reused across different design-time builds, so technically you could kill the MSBuild nodes that performed a previous design-time build, then request a new one and it will spawn new nodes and build. There are caches for design-time builds, so in certain situations a design-time build may be avoided if we can reuse the information from the cache.

The project system is responsible for watching changes on the file system to the .csproj files, all imported targets files, and all source files (in case of wildcards, to update the set of sources when files are added or removed from disk). The project system is also responsible for modifying the .csproj on disk when the user changes options in the Project Properties, or changes files via Solution Explorer.

When a .csproj file or any imported target changes on disk, a new evaluation and design-time build is run only for that project, because any single change can lead to drastic changes for the project (butterfly effect), and incremental reevaluation or design-time build is not feasible. If a targets file changes, all projects that import that targets file are reevaluated and a design-time build is rerun for them.

Projects can be reloaded independently of each other. Some projects can be unloaded. Project references across two loaded projects can become metadata references when the source project is unloaded. A project system loads each project individually and they are relatively isolated/componentized, so there is little cross-project information/communication.

Fast up-to-date check

Since the project system tracks all inputs and outputs for each project, when a project needs to be rebuilt it looks at the timestamps of all inputs and outputs. If any of the inputs are newer than any of the outputs, MSBuild is invoked. If however all inputs are up-to-date, MSBuild isn't even invoked for a given project and a project is reported as up-to-date.

Miscellaneous links and references