TfLite Survey - PaddlePaddle/Mobile GitHub Wiki
Architecture Introduction
See the architecture graph:
- Lite Converter Also called freeze Graph, it will merge the checkpoint values with the graph structure.
- Android APP
- Jave API
- C++ API
- Interpreter: The main executive engines
- Android Neural Network API.
What is the relationship between TensorFlow and TfLite?
There is no relationship between TensorFlow and TfLite. TfLite is another lightweight inference framework.
The simple usage is as follows:
// 1. Load Model
tflite::FlatBufferModel model(path_to_model);
// 2. Init and Build Interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
// 3. Resize input tensors, if desired.
// Allocate Tensors and fill `input`.
float* input = interpreter->typed_input_tensor<float>(0);
// 4. Inference
// 5. Read the output
float* output = interpreter->type_output_tensor<float>(0);
- The regular usage will require the developer to use the BuiltinOpResolver, which has many operators.
- Operator pruning
- Optimizing the kernels
- multi-threads
- All the kernel available support int8 and float32
- Pre-fused activations: for example, pre-fuse bias adding and activation:
- use FlatBuffers instead
to serialize. - There is a separate optimization for mobilenet
- a simple arena memory management
- NHWC data format
The Android Neural Networks API (NNAPI) is an Android C API designed for running computationally intensive operators for machine learning on mobile devices. Tensorflow Lite is designed to use the NNAPI to perform hardware-accelerated inference operators on supported devices.
For the details about NNAPI
, you can refer to Android NN survey. The way how to integrate NNAPI
in TfLite is described as following.
- In C++ API, the TfLite will init and build
. This process will detect whether theNNAPI
existed or not.
// 2. Init and Build Interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
Interpreter::Interpreter(ErrorReporter* error_reporter)
: arena_(kDefaultArenaAlignment),
error_reporter_(error_reporter ? error_reporter
: DefaultErrorReporter()) {
context_.impl_ = static_cast<void*>(this);
context_.ResizeTensor = ResizeTensor;
context_.ReportError = ReportError;
context_.AddTensors = AddTensors;
context_.tensors = nullptr;
context_.tensors_size = 0;
context_.gemm_context = nullptr;
// Reserve some space for the tensors to avoid excessive resizing.
next_allocate_node_id_ = 0;
TfLiteStatus NNAPIDelegate::Invoke(Interpreter* interpreter) {
if (!nn_model_) {
// Adds the operations and their parameters to the NN API model.
ANeuralNetworksExecution* execution = nullptr;
CHECK_NN(ANeuralNetworksExecution_create(nn_compiled_model_, &execution));
// Currently perform deep copy of input buffer
for (size_t i = 0; i < interpreter->inputs().size(); i++) {
int input = interpreter->inputs()[i];
// TODO(aselle): Is this what we want or do we want input instead?
// TODO(aselle): This should be called setInputValue maybe to be cons.
TfLiteTensor* tensor = interpreter->tensor(input);
execution, i, nullptr, tensor->data.raw, tensor->bytes));
// Tell nn api where to place final data.
for (size_t i = 0; i < interpreter->outputs().size(); i++) {
int output = interpreter->outputs()[i];
TfLiteTensor* tensor = interpreter->tensor(output);
execution, i, nullptr, tensor->data.raw, tensor->bytes));
// Currently use blocking compute.
ANeuralNetworksEvent* event = nullptr;
CHECK_NN(ANeuralNetworksExecution_startCompute(execution, &event));
return kTfLiteOk;