TfLite Survey - PaddlePaddle/Mobile GitHub Wiki
-
Architecture Introduction
See the architecture graph: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite
- Lite Converter Also called freeze Graph, it will merge the checkpoint values with the graph structure.
- Android APP
- Jave API
- C++ API
- Interpreter: The main executive engines
- Android Neural Network API.
-
What is the relationship between TensorFlow and TfLite?
There is no relationship between TensorFlow and TfLite. TfLite is another lightweight inference framework.
The simple usage is as follows:
// 1. Load Model
tflite::FlatBufferModel model(path_to_model);
// 2. Init and Build Interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
// 3. Resize input tensors, if desired.
// Allocate Tensors and fill `input`.
interpreter->AllocateTensors();
float* input = interpreter->typed_input_tensor<float>(0);
// 4. Inference
interpreter->Invoke();
// 5. Read the output
float* output = interpreter->type_output_tensor<float>(0);
-
BuiltinOpResolver
- The regular usage will require the developer to use the BuiltinOpResolver, which has many operators.
- Operator pruning
- Optimizing the kernels
- NEON
- multi-threads
- All the kernel available support int8 and float32
- Pre-fused activations: for example, pre-fuse bias adding and activation: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/kernels/internal/optimized/optimized_ops.h#L281
- use FlatBuffers instead
protobuf-lite
to serialize. - There is a separate optimization for mobilenet https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/kernels/internal/optimized
- a simple arena memory management https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/simple_memory_arena.h
- NHWC data format
The Android Neural Networks API (NNAPI) is an Android C API designed for running computationally intensive operators for machine learning on mobile devices. Tensorflow Lite is designed to use the NNAPI to perform hardware-accelerated inference operators on supported devices.
For the details about NNAPI
, you can refer to Android NN survey. The way how to integrate NNAPI
in TfLite is described as following.
- In C++ API, the TfLite will init and build
Interpreter
. This process will detect whether theNNAPI
existed or not.
// 2. Init and Build Interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
Interpreter::Interpreter(ErrorReporter* error_reporter)
: arena_(kDefaultArenaAlignment),
persistent_arena_(kDefaultArenaAlignment),
error_reporter_(error_reporter ? error_reporter
: DefaultErrorReporter()) {
context_.impl_ = static_cast<void*>(this);
context_.ResizeTensor = ResizeTensor;
context_.ReportError = ReportError;
context_.AddTensors = AddTensors;
context_.tensors = nullptr;
context_.tensors_size = 0;
context_.gemm_context = nullptr;
// Reserve some space for the tensors to avoid excessive resizing.
tensors_.reserve(kSlotsToReserve);
nodes_and_registration_.reserve(kSlotsToReserve);
next_allocate_node_id_ = 0;
UseNNAPI(false);
}
TfLiteStatus NNAPIDelegate::Invoke(Interpreter* interpreter) {
if (!nn_model_) {
// Adds the operations and their parameters to the NN API model.
TF_LITE_ENSURE_STATUS(BuildGraph(interpreter));
}
ANeuralNetworksExecution* execution = nullptr;
CHECK_NN(ANeuralNetworksExecution_create(nn_compiled_model_, &execution));
// Currently perform deep copy of input buffer
for (size_t i = 0; i < interpreter->inputs().size(); i++) {
int input = interpreter->inputs()[i];
// TODO(aselle): Is this what we want or do we want input instead?
// TODO(aselle): This should be called setInputValue maybe to be cons.
TfLiteTensor* tensor = interpreter->tensor(input);
CHECK_NN(ANeuralNetworksExecution_setInput(
execution, i, nullptr, tensor->data.raw, tensor->bytes));
}
// Tell nn api where to place final data.
for (size_t i = 0; i < interpreter->outputs().size(); i++) {
int output = interpreter->outputs()[i];
TfLiteTensor* tensor = interpreter->tensor(output);
CHECK_NN(ANeuralNetworksExecution_setOutput(
execution, i, nullptr, tensor->data.raw, tensor->bytes));
}
// Currently use blocking compute.
ANeuralNetworksEvent* event = nullptr;
CHECK_NN(ANeuralNetworksExecution_startCompute(execution, &event));
CHECK_NN(ANeuralNetworksEvent_wait(event));
ANeuralNetworksEvent_free(event);
ANeuralNetworksExecution_free(execution);
return kTfLiteOk;
}