protobuf - RicoJia/notes GitHub Wiki

========================================================================

Proto Basics

========================================================================

Motivation: we want to transmit class objects, by serializing it into binaries. Protobuf generated serializatble datastructure in different languages
- pb.h, pb2.py
- challenges: receiving & reading code must be with excatly the same memory layout, endianness
Choices
- CSV: types are inferred, and not guranteed. Good for visualization tho
- JSON: javascript Object Notation
  - pros: data can take any form, widely accepted in the web, can be read by most languages,
  - cons:
    - no schema so you can put anything anywhere, JSON won't complain.
    - No documentation, no metadata
    - Can be quite big, due to repeated keys
- protobuf
  - defined in .proto textfile
```
  example.proto 
  syntax = "proto3"

  message Msg{
      int32 id = 1; 
      string first_name = 2; 
    }
```
    - Pros:
      - compressed (less CPU usage)
        
        3x smaller, 20x faster than XML
      - schema: needed to generate code and read data
      - fully typed
      - can be read by all main languages
      - schema can evolve over time
      - code is generated automatically!
    - Cons:
      - support for some languages might be lacking
      - can't open serialized data with a text editor, cuz it's compressed and serialized)
    - Applications: RPC frameworks, such as gRPC to exchange data.
      - proto3 comes out after 2016 by google
Links:
Data types
- float: 32 bits, double:64bit
  - sint32, sint64 are more efficient with negative nums than int32, int64.
    - They use a technique called "zigzag", like 1->1, -1->2, 2->3 ... you truly just have one more bit. int64 is always 10 bytes long??
    - JSON would encode 26 as bytes representing 2, and 6.
- map
  - map<string, Result> results = 2
  - Map cannot be repeated
  - there's no ordering for map
- Time
  - Google.Protobuf.WellKnownTypes.Duration
  - Google.Protobuf.WellKnownTypes.Timestamp
- bytes: small image
- enum: if you know all the fields in advance
  - in proto2, no need for default as required can tell u that
  - For proto, you need to be careful with default
```
syntax = "proto3"
  message Msg{
    enum EyeColor{
        UNKNOWN_EYE_COLOR = 0;    // this is the default value of "EyeColor"
        GREEN = 1; 
        BLACK = 2;      // good to have all caps
      }
    EyeColor eye_color = 8;       // this is the "instance of enum"
    }
```
  - C++: you just do: msg->ENUMFIELD to print the value
Tags
- tags from 1 to 15 use 1 byte,
- 16 - 2047 uses 2 bytes
Fields:
- repeated: for array or list, can take any number (0 included)
  - custom func needed if no size is specified
- // means comments
- default values are defined in proto3, but not in proto 2 (which has optional, required, etc.)
  - bool: false
  - string, empty,
  - enum: first values
  - repeated: empty list
Naming Convensions: This is for the compiler to work properly.
- CamelCase for msgs
- underscore_separated_names for fields
- CamelCase for Enums CAPITAL_WITH_UNDERSCORE for value names
- Uber's proto guide

Multiple msg definition

You can have on message file containing another

  message Person {
      ...
      Date birthday;    //Note the msg is defined underneath
    }
  message Date{

    }

  message Msg{
      message Address{
          string addr = 1;
        }
      repeated Address addr= 10;
    }

import a file: import "dir/3-date.proto"

"package: " is actually a namespace

//File 1
syntax = "proto3"
package myData

//File2
import "sub_dir/file.proto"
message Person{
    myData.Date date = 7; 
  }

========================================================================

Installation and compilation

========================================================================

Installation

  # Make sure you grab the latest version
  curl -OL https://github.com/google/protobuf/releases/download/v3.5.1/protoc-3.5.1-linux-x86_64.zip
  # Unzip
  unzip protoc-3.5.1-linux-x86_64.zip -d protoc3
  # Move protoc to /usr/local/bin/
  sudo mv protoc3/bin/* /usr/local/bin/
  # Move protoc3/include to /usr/local/include/
  sudo mv protoc3/include/* /usr/local/include/
  # Optional: change owner
  sudo chown $USER /usr/local/bin/protoc
  sudo chown -R $USER /usr/local/include/google

protoc stands for proto buffer, c compiler
To compile the code, do: protoc -I=PROTO_DIR --python_out=PYTHON_DIR PROTO_DIR/simple.proto
- -I means the dir to search in for imports
- --python_out ... this is how to compile proto
- use PROTO_DIR/*.proto
- Then, you can see file_pb2.py

========================================================================

Encoding, Decoding

========================================================================

test script

if a field is set to default value, then print (msg) wouldn't show the field

Decoding

if you have two schemas, example.proto, scalar_types/proto:
- if there are same fields in the same order, then one field can decode
- if not (two separate fields), then you will see an empty msg object, no errors.

Cpp

oneof

Callback

  bool oneof_dec_cb(pb_istream_t* stream, const pb_field_t* field, void** arg){
    switch (field->tag) {
      case ShelfEvent_shelf_data_tag: {   //EventData_camera_data_tag 
      }
    }
  }

Nanopb would add 4 bytes to the beginning of each message to signify lengt . Link
- if you don't remove it, you will see error parsing message
we use nanopb because:
1. you have to manually alloc to serialize and deserialize stuff. Nanopb handles them for u. Nanopb was created for protobuf
```
Example mymessage = {42};
uint8_t buffer[10];
pb_ostream_t stream = pb_ostream_from_buffer(buffer, sizeof(buffer));
pb_encode(&stream, Example_fields, &mymessage);
```
2. In c, comaptible with 32 bit microcontrollers.
Varint: variable length int: small int will take up 1 byte. int32, sint32 are all varints. fixed int are fixed32, sfixed64
- use _DecodeVarint32 (buf, starting_pos) to decode

work with proto

def __deserialize_msg(self, msg): 
    #TODO
    n = 0
    data_len, data_pos = _DecodeVarint32(msg, n)
    n = data_pos
    data_buf = msg[n:n + data_len]
    n += data_len
    return shelf_event_pb2.ShelfEvent().FromString(data_buf)

def __reserialize_msg(self, proto): 
    """
    Reserialize the proto and return it
    """
    gd_serialized = gd.SerializeToString()
    f = BytesIO()
    _EncodeVarint(f.write, len(gd_serialized))
    f.write(gd_serialized)
    return f.getbuffer().tobytes()

========================================================================

Sample Msgs

========================================================================

Simple_msg

Simple_msg.proto

syntax = "proto3";        #must specify proto2 or 3
package example.simple;   #This is more like a namespace
message SimpleMessage{    #This is a very simple msg
    int32 id = 1; 
    bool is_simple = 2; 
    string name = 3; 
    repeated int32 sample_list = 4; #repeated means list
  }

Simple_demo.py

import simple.simple_pb2 as simple_pb2              # generated python msg
simple_msg = simple_pb2.SimpleMessage()             # Create a msg obj
simple_msg.id = 113
simple_msg.is_simple = True
simple_msg.name = "test"


# this is how to make a list
#Method 1: append, make a copy of the number, 
sample_list = simple_msg.sample_list                #You must get a reference to the List object in simple_msg
sample_list.append(123)                             #Then you can append stuff
sample_list.append(456)

#method 2: extend, working for a message type, not just appending raw numbers
simple_msg.sample_list.extend([1,2,3])

with open("simple.bin", "wb") as f:                 #wb means write, binary mode, output is called simple.bin
    byteString = simple_msg.SerializeToString()     
        f.write(byteString)

with open("simple.bin", "rb") as f: 
    simple_msg_read = simple_pb2.SimpleMessage().FromString(f.read())
    print(simple_msg_read.sample_list)

Complex Msg

Complex.proto

syntax="proto3";
package example.complex;
message ComplexMsg{
  DummyMsg one_dummy = 1;
  repeated DummyMsg multiple_dummy = 2;
  }
message DummyMsg{
  int32 id = 1;
  string name = 2;
  }

Complex.demo.py

  import complex_pb2
  complex_msg = complex_pb2.ComplexMsg()

  #Below is not correct: you cannot assign field to an object. 
  #Maybe a const pointer. 
  # one_dummy_msg = complex_pb2.DummyMsg()
  # one_dummy_msg.id = 1
  # complex_msg.one_dummy = one_dummy_msg

  complex_msg.one_dummy.id = 1
  complex_msg.one_dummy.name = "hehe"

  #Method 1: adding a ref to repeated, but sometimes add() does not exist for simple objects
  msg1 = complex_msg.multiple_dummy.add();
  msg1.id = 1
  msg1.name = "hehehe"

  #Method 2: adding a ref to repeated
  msg2 = complex_msg.multiple_dummy.add(id=2, name="rj");

  #Method 3: not recommended: copying msg3
  msg3 = complex_pb2.DummyMsg()
  msg3.id = 3
  msg3.name = "hehehehe"
  complex_msg.multiple_dummy.extend([msg3])

  print(complex_msg)

========================================================================

Protobuf CPP

========================================================================

Compilation protoc --proto_path=src --cpp_out=build/gen src/foo.proto src/bar/baz.proto

========================================================================

Protobuf Python

========================================================================

oneof
```
msg.tag_inside_oneof = ...
```
- Only the last field that was set gets to keep its value
- In generated code, the API is the same, except there's one more function for checking the last field.
  - each time, there will be a tag generated in the received msg
```
evt.which_event == Field_Tag
```
Enum: each field is in the msg directly, no Enum field required
```
metadata1.node_type = oct_command_pb2.WEIGHT
```

Field checks

see if there's oneof field at all
```
if proto.WhichOneof("event"):
```
see which oneof field:
```
if proto.HasField("announcement"):
```

construct a proto with keyworded arg: (must have keyword)

new_time_info = shelf_event_pb2.TimeInfo(ts=new_ts, us=new_us)

No assignment is allowed in proto:

proto.event_data.camera_data.time_info_after.CopyFrom(new_time_info_after)

Optional compound object is empty by default.

#TODO
test_proto = shelf_event_pb2.ShelfEvent()
print("===============")
print("test proto: ", test_proto)
test_proto.event_data.CopyFrom(shelf_event_pb2.EventData()) #will show empty object
test_proto.event_data.camera_data.CopyFrom(shelf_event_pb2.CameraData())
print("event_data: ", test_proto.event_data)
print("has camera_data: ", test_proto.event_data.HasField("camera_data")) #if not initialied, false
print("===============")

========================================================================

Nanopb

========================================================================

why nanopb over protobuf?
- nanopb is ANSI C, American National Standards, a set of successive standards, suitable for microcontrollers
- nanopb generator is built on top of Google's protoc
- Two ways to convert .proto to c:
```
# If you have downloaded nanopb binary: 
generator-bin/protoc --nanopb_out=. myprotocol.proto
#if not, 
protoc --plugin=protoc-gen-nanopb=nanopb/generator/protoc-gen-nanopb ...
```

Key functions:

SimpleMessage message = SimpleMessage_init_zero;
/* Create a stream that will write to our buffer. */
pb_ostream_t stream = pb_ostream_from_buffer(buffer, sizeof(buffer));
message.lucky_number = 13;
// write to the output stream
status = pb_encode(&stream, SimpleMessage_fields, &message);
message_length = stream.bytes_written;

pb_encode(&output_stream, SimpleMessage_Fields, &msg)

nanopb's encode function pb_encode will encode a message field by field, then it's going to encode tag and the actual value, so it's possible to have the write_callback multiple times. The problem is in the send function itself
if callback return false, pb_encode will stop.
- actually the callback was called 5 times. A simple message is called 5 times, shelf_event is called 3 times.

pb_decode

What about repeated fields?

//1. OK
struct decoder_params {
  Event* event;
  bool before;
  decoder_params() : event{}, before{} {
  }
  // Decode the before packets
  decoder_params bparam;
  bparam.event = msg.get();
  bparam.before = true;

  2. before, after set up// Need decode function for: camera_data,
  request->camera_data.packets_before.funcs.decode = &packets_decode;
  request->camera_data.packets_before.arg = (uint32_t*)&bparam;

// callback function
bool ShelfConnection::packets_decode(pb_istream_t* stream, const pb_field_t* field, void** arg) {
    decoder_params* param = ((decoder_params*)*arg);
    Event* event = param->event;
    auto data = std::make_unique<PacketData>();
    data->packet_->extrinsics.funcs.decode = extrinsics_decode;
    data->packet_->extrinsics.arg = (void*)&data->extrinsics_;

    data->packet_->data.funcs.decode = packet_decode;
    data->packet_->data.arg = (uint32_t*)event;

    if (!pb_decode(stream, Packet_fields, data->packet_.get())) {
        ERROR("CameraPacket decode failure : %s\n", PB_GET_ERROR(stream));
        return false;
    }

    data->type_ = (param->before) ? packet_type::BEFORE : packet_type::AFTER;

    DEBUG("Camera Frame : ID:%d, type:%d, before:%d size:%d, cap ts:%u, enc ts:%u", data->packet_->camera_id,
          data->packet_->frame_type, data->type_, data->packet_->size, data->packet_->cap_ts, data->packet_->enc_ts);

    event->packets_.emplace_back(std::move(data));
    return true;
}

decoder

Fun: how decode is defined?

bool (*callback)(pb_istream_t *stream, pb_byte_t *buf, size_t count);

pb_callback_t has callback and arg in pb_common.h

  decoder_params bparam;
  request->camera_data.packets_before.funcs.decode = &packets_decode;
  msg.cb_event.arg = (unit32*)&bparam    //
  bool ShelfConnection::packets_decode(pb_istream_t* stream, const pb_field_t* field, void** arg){
    decoder_params* param = ((decoder_params*)*arg);    //why void **? 
    Event* event = param->event;
  }

Oneof decoding good example

we need special encoding & decoding callbacks only for 1. sub messages 2."array" types with unknown size, such as string, array (repeated), 3. msg level callback for oneof msg

If any data remains in stream, normal decoding will continue.
E.g

  // The proto file
  message OneOfMessage{
  option (nanopb_msgopt).submsg_callback = true;
  int32 prefix = 1;
  oneof values{
      int32 intvalue = 5;
      string strvalue = 6 [(nanopb).max_size = 8];
  }

  bool print_int32(pb_istream_t *stream, const pb_field_t *field, void **arg) {
      uint64_t value;
      if (!pb_decode_varint(stream, &value))
          printf("%d", value); 
          return false;

      return true;
  }
  // The main file
  /* The callback below is a message-level callback which is called before each
   * submessage is encoded. It is used to set the pb_callback_t callbacks inside
   * the submessage. The reason we need this is that different submessages share
   * storage inside oneof union, and before we know the message type we can't set
   * the callbacks without overwriting each other.
   */
  bool msg_callback(pb_istream_t *stream, const pb_field_t *field, void **arg)
  {
      /* Print the prefix field before the submessages.
       * This also demonstrates how to access the top level message fields
       * from callbacks.
       */
      OneOfMessage *topmsg = field->message;
      printf("prefix: %d\n", (int)topmsg->prefix);
      if (field->tag == OneOfMessage_submsg1_tag)
      {
          SubMsg1 *msg = field->pData;
          printf("submsg1 {\n");
          msg->array.funcs.decode = print_int32;
          msg->array.arg = "  array: %d\n";
      }
  }

  int main(){
    uint8_t buffer[256];
    OneOfMessage msg = OneOfMessage_init_zero;
    pb_istream_t stream;
    size_t count;

    // this is msg level callback, called when the submessage tag is Known, but before the actual msg is decoded. Set the decoding function there, or simply decode the function right on the spot. 
    msg.cb_values.funcs.decode = msg_callback;
    stream = pb_istream_from_buffer(buffer, count);
    if (!pb_decode(&stream, OneOfMessage_fields, &msg)){
        return 1;
    }

    /* This is just printing for the test case logic */
    if (msg.which_values == OneOfMessage_intvalue_tag) {
        printf("prefix: %d\n", (int)msg.prefix);
        printf("intvalue: %d\n", (int)msg.values.intvalue);
    }

  }

Python msg.WhichOneof('ONEOF_FIELD_NAME') == "ONEOF_FIELD_TAG"
Forward compatibility vs backward compatibility: Forward compatibility: your new code can be run on the old system;
- Certain rules for both:
  - Never change field tag after it's defined -Fields can be removed, as long as tag number is not used again(rename the field instead, like adding "OBSOLETE_"), so future users of your proto can't accidentally reuse the number
  - Changing data types can be complicated, so make new fields instead
  - When adding fields, think: does the default value makes sense to th old code??
  - Field name changes are trivial, tag number is important for protobuf!!
  - When you delete fields, mark them as reserved, so there won't be code conflict.
    - This is to prevent this scenario: you somehow still have an old proto, future user loads it.

Cautions

in nanopb 0.4.1, required field right in front of optional will generate "required field msg"
not having all decoded msg for a msg type will yield seg fault.

Bug 2: segfault

solve the missing field problem (bug 1), https://jpa.kapsi.fi/nanopb/docs/whats_new.html

byte decoding: we need extra decoding for bytes

  bool ShelfConnection::packet_decode(pb_istream_t* stream, const pb_field_t* field, void** arg) {
      // This is how we get the ptr to the struct for storing the camera data
      // Event is not automatically generated
      Event* msg = ((Event*)*arg);
      auto pdata = std::make_unique<std::vector<uint8_t>>(stream->bytes_left);

      if (!pb_read(stream, &(*pdata)[0], stream->bytes_left)) {
          ERROR("CameraPacket data read failure : %s\n", PB_GET_ERROR(stream));
          return false;
      }

      msg->packetsData_.emplace_back(std::move(pdata));
      return true;
  }

when printing optional fields, make sure has_FIELD is set to true. Otherwise, even tho the field is filled, you still can't print it!!
When writing a fixed-size array, you gotta set FIELD_count to a number.

========================================================================

Protocol Buffer Services and gRPC

========================================================================

A service is a set of endopoints, your application can be accessed from

you can define a service on top of msgs.

  syntax = "proto3"
  message SearchRequest{
      int person_id = 1; 
    }
  message SearchResponse{
      string person_name =1 ; 
    }
  service SearchService{
      rpc Search (SearchRequest) returns (SearchResponse)
    }

Protocol Buffer Services need to be interpreted by a framework, to generate associated code.
- A main one is gRPC. Came about with proto3
you can use any language to generate gRPC servers, clients, and the generated code will send proto request, proto response, etc.
Micro Service:
- They contains a function of your business and are written in different languages.
- Micro-Services must agree on:
  - data format, error patterns, load-balancing
  - One popular is REST (HTTP-JSON), another is gRPC
    - API: it's a contract, I send you a request, you send me a response
      - But it's not easy to build API:
        
        Data model: JSON, XML, Binary?
        
        end point format: GET /api/v1... post /api/v1/user..
        
        How much data in one call?
        
        Latency?
        
        Scalability to 1000 clients?
    - gRPC:
      - Developed by google, part of "Cloud Native Computation Foundation (CNCF)", which has docker and kubernetes.
      - Allows you to define a high level Request, response for RPC (Remote Procedure Calls), and handles the rest for you
        
        Solves many RPC's problems
        
        response, requests are in proto
        
        In client and server code, an RPC request, response will look JUST LIKE A FUNCTION CALL!
      - Fast, low latency, load-balancing, logging, all handled for u

========================================================================

Theory

========================================================================

Protobuf is universal because serialization is the same for any language

Serialization and deserialization is built on top of Varint. (variable Integer)
- VarInt uses an arbitrary num of bytes, to represent arbirarily large values.
  - The magic to that is the last "octet" (byte) must have MSB 0.
  - Example
```
  300 = 10 0101100 
  => (1)010 1100 (0000 00)10
```

========================================================================

gRPC basics

========================================================================

Intro

It's an IDL & underlying messaging format:
- IDL (Interface Description Language): allows programs written in different languages to talk to each other.
- protobuf, rest, JSON

User Work flow

client.say_hello() -> client stud sends request -> Server receives request, process it, sends SayHelloResponse back -> client stud receives it, and return value to client.say _hello()

.proto is a text file that defines the service

// The greeter service definition.
service Greeter {
  // Sends a greeting
  rpc SayHello (HelloRequest) returns (HelloReply) {}
}

// The request message containing the user's name.
message HelloRequest {
  string name = 1;
} 

// The response message containing the greetings
message HelloReply {
  string message = 1;
}

This needs to be compiled by the protoc compiler
pb2 here 2 means protobuf Python API 2

A bit about the code:
- server:
  1. Launches a grpc server with a threadpool
  2. Add the user-supplied server code to the server
- Client:
  1. set up insecure channel with IP
  2. Through the channel, call the server through gRPC.

Prep steps

python -m pip install --upgrade pip
python -m pip install grpcio
# Need to install the latest protobuf
pip install --upgrade protobuf
python -m pip install grpcio-tools

Run steps:

compile

python -m grpc_tools.protoc -I <PROTO_PATH> --python_out=<MAINFILE_DIR> --pyi_out=<MAIN_FILE_DIR> --grpc_python_out=<MAIN_FILE_DIR> <PROTO_FILE_NAME>

Functionalities
- Two types of gRPC responses / requests: stream, vs unary
- insecure_channel doesn't have HTTP
- Can server and client in synchronous, and asynchronous manner
- status check, Link

protobuf - RicoJia/notes GitHub Wiki

Proto Basics

Multiple msg definition

Installation and compilation

Encoding, Decoding

test script

Decoding

Cpp

Sample Msgs

Protobuf CPP

Protobuf Python

Nanopb

pb_encode(&output_stream, SimpleMessage_Fields, &msg)

pb_decode

Cautions

Protocol Buffer Services and gRPC

Theory

gRPC basics

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️