Protocol buffers - U8NWXD/oppia-wiki GitHub Wiki
Table of contents
Introduction
At Oppia, we need to be able to store and transfer data in a language-agnostic way, for example between our frontend and backend code. We mostly use JSON, but we sometimes use protocol buffers instead. To quote from its documentation:
Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
Practically speaking, this means you can define a data structure in a *.proto
file (called a proto file). Then, you can run a command to generate code for reading and writing that data structure. The magic comes from the fact that the generated code can be in many different languages. This lets you define a data structure once and use it in both Python and JavaScript, for example. We use buf to perform this code generation.
Below, we'll discuss how we use protocol buffers at Oppia.
Support for protocol buffers
Install proto files
We expect proto files to be defined in their own repositories, which we download and install as dependencies. In manifest.json
, the proto
section describes how to download proto files:
"proto": {
"oppiaMlProto": {
"version": "0.0.0",
"downloadFormat": "zip",
"url": "https://github.com/oppia/oppia-ml-proto/archive/0.0.0.zip",
"rootDirPrefix": "oppia-ml-proto-",
"targetDirPrefix": "oppia-ml-proto-"
}
},
In this case oppiaMlProto
says to download the version 0.0.0 archive of the oppia/oppia-ml-proto
repository and unzip the contents, prefixing the unzipped folder with oppia-ml-proto
. The folder will be placed in third_party/
by scripts/install_third_party.py
.
Generate code from proto files
Later in the process of installing Oppia's dependencies, install_third_party_libs
calls buf generate
to generate Python and JavaScript code from the proto files in the directories listed in the PROTO_FILES_PATHS
constant in install_third_party_libs.py
. The buf
command reads from two configuration files:
-
buf.yaml
tellsbuf
where to find proto files. For our example above, it would listthird_party/oppia-ml-proto-0.0.0
since that folder contains proto files. The config file would look like this:version: v1beta1 build: roots: - third_party/oppia-ml-proto-0.0.0
Note that the
version
key specifies what version of the buf configuration language we are using, not the version of our proto files. -
buf.gen.yaml
tellsbuf
how to generate code from our proto files. For example, suppose you want to generate Python code tosrc/python
and JavaScript code tosrc/javascript
. Then you could use the following configuration:version: v1beta1 plugins: - name: python out: src/python - name: js out: src/javascript
Assuming that we use the above configurations, let's see what code would be generated. Suppose oppia-ml-proto-0.0.0
has two proto files: a.proto
and b.proto
. The following code files would be generated:
src/python/a_pb2.py
src/python/b_pb2.py
src/javascript/a.js
src/javascript/b.js
Note that for Python code, protobuf replaces .proto
with _pb2.py
to distinguish these files from those generated with protobuf version 1. For more information about how the file names of generated code are constructed, see the protobuf documentation for the programming languages you use.
Adding your own proto files
To take advantage of Oppia's support for protocol buffers, you should follow these steps:
- Create and publish (or find) your proto files in a dedicated repository. This doesn't necessarily have to be on GitHub, for example if you want to use someone else's proto files. See the protobuf docs for details on proto file syntax.
- Add an object under the
proto
key inmanifest.json
describing how to download your proto files. For details on the syntax used bymanifest.json
, check the code inscripts/install_third_party.py
, which parses the manifest. - Add the path to where your proto files will be downloaded to
buf.yaml
under theroots
key. - Also add the path to your proto files to the
PROTO_FILES_PATHS
constant inscripts/install_third_party_libs
. - If you need more languages than are currently in
buf.gen.yaml
, updatebuf.gen.yaml
to add your languages. - You're done! Now you can import classes representing your data structures from the code that buf generates.
Examples
Oppia ML
The primary way we use protocol buffers right now is for the Oppia ML project. Its proto files are defined in the oppia/oppia-ml-proto repository.
It uses the following buf.gen.yaml
:
version: v1beta1
plugins:
- name: python
out: proto_files
- name: js
out: extensions/classifiers/proto
opt: import_style=commonjs,binary
- name: ts
out: extensions/classifiers/proto
Otherwise, the files are as in the examples above.
Then we import from this generated code like this:
-
TypeScript and JavaScript:
import { TextClassifierFrozenModel } from 'classifiers/proto/text_classifier';
-
Python:
from proto_files import text_classifier_pb2
To learn how to use the generated code, check out the protobuf tutorial for your language.