Recovering protobuffer schemas from game files - HearthSim/Firestarter GitHub Wiki

"Protocol Buffers", "Protobuf" or "Protos" is a specification+framework for serializing data. It's not wrong to call this a protocol for communicating (relevant) data between two computer systems. More information about Protocol buffers can be found here.
Within this post I'll describe how to retrieve semantic information about data being sent to/from the HearthStone client in the form of protocol buffer schemas.

BNet, and HearthStone by extension, uses a protocol that is defined on top of the Protobuf framework.
Building such a protocol starts by defining a schema for each 'message' you want to send between client and server. A message is some data containing information about actions or statuses, e.g. buying from the shop, starting a game, the results of playing a card. This data could be constructed as a JSON payload which is readable and probably easily understandable.
JSON does have a significant overhead in total byte-size because it associates each value with a key. Also the textual representation in itself is overhead: Integer 15 can be encoded into 4 bits, while string "15" needs 16 bits.

Both client and server have expectations about the data within each message so we can apply tricks to reduce the amount of required bytes to encode said messages. The protobuf encoding specification does exactly that.
The framework to support this specification also makes the developer's live easier because it encodes messages automatically. The result is a (minimized) stream of bytes, which is called the wire-format.

We're attempting to build a server that can actually communicate with the HearthStone client, so it needs to be able to send and receive wire-encoded messages. To achieve this we can either manually code networking logic or use the same protobuf schemas used by the developers and make it handle communication for us. The semantic information provided by the latter option is also helpful for implementing server features more quickly.

The HearthSim community already does protocol buffer schema extraction of most clients. See this repository.

Let's build protobuffer schemas then!
We're going to make use of the proto-extractor repository, https://github.com/HearthSim/proto-extractor, to recover proto schema's from client data files.
Download the repository as a zip file and unpack on your computer before continuing!

The Extractor README file suggests installing Visual Studio or the Mono suite, but we're going for a more lightweight approach by using '.NET Core'. This requires you to install the dotnet SDK from here.
After installation navigate to the repository source code and open a new Powershell/Command prompt window.

The extractor program must be built before we can use it, from within the root of the repository run the following command.

dotnet build .\extractor.sln

Dependancies will be automatically downloaded and compiled before the actual program is compiled.

This guide assumes SDKv2 or higher is used. Older versions require you to manually execute dotnet restore .\extractor.sln before building!

You'll see if the build succeeded and where the compilation artefacts are stored. Look for the path similar to [..]proto-extractor\extractor\bin\Debug\netcoreapp1.1\extractor.dll and navigate to the parent folder of that .dll file within your command prompt.

In this case the output file path ends with .dll which means it's not directly executable.
The command dotnet [application dll] must be used to run it.

Once you're within the directory containing extractor.dll you'll want to run it. For this you need to have HearthStone client data files available, either your currently installed version or some other.
The extractor itself doesn't require much arguments to work, but you have to explicitly tell it the file-paths which need extraction: Assembly-CSharp.dll and Assembly-CSharp-firstpass.dll, located at %HS_INSTALL%/HearthStone_Data/Managed. Check the Extractor README for more information about the other arguments.
%HS_INSTALL% is a placeholder for the path pointing to your local HearthStone client files.

Running the extractor could look like following (on Windows using Powershell), tailor the parameters according to your needs.

# PS [..]\proto-extractor\extractor\bin\Debug\netcoreapp1.1> 
dotnet extractor.dll --libPath "%HS_INSTALL%\Hearthstone_Data\Managed" --outPath "./generated_protos"
"%HS_INSTALL%\Hearthstone_Data\Managed\Assembly-CSharp.dll" 
"%HS_INSTALL%\Hearthstone_Data\Managed\Assembly-CSharp-firstpass.dll"

And the recovered proto files can be found underneath the folder called generated_protos, given decompilation was successful!

Note: Recovering proto schemas is not always a lossless operation, especially regarding HearthStone. This results in schemas with a slight variation on the actual schemas used by the developers (baseline).
A more accurate recovery can be achieved by using the protobin_to_proto.py script, located within the proto-extractor repository, which can be used to detect proto schemas in any (binary) file.
See https://github.com/HearthSim/proto-extractor/blob/master/README.md#binary-proto-extraction for usage information.
Try to run this python script on the BNet Update Agent/Launcher executable and see what happens..