Contents
2020-01-30
INT8 LINEAR Performance mode only (C++)
# sh run_harness.sh
[2020-01-30 09:45:23,548 main.py:291 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-large/SingleStream/config.json
[2020-01-30 09:45:23,548 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-large/SingleStream/config.json ...
[2020-01-30 09:45:23,548 main.py:295 INFO] Processing config "GeforceRTX2080Ti_ssd-large_SingleStream"
[2020-01-30 09:45:23,548 main.py:111 INFO] Running harness for ssd-large benchmark in SingleStream scenario...
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 29478000, 'input_dtype': 'int8', 'input_format': 'linear', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDResNet34/int8_linear', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-large', 'config_name': 'GeforceRTX2080Ti_ssd-large_SingleStream', 'test_mode': 'PerformanceOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.01.30-09.45.23'}
[2020-01-30 09:45:23,553 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.01.30-09.45.23/GeforceRTX2080Ti/ssd-large/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDResNet34/int8_linear" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-large/SingleStream/ssd-large-SingleStream-gpu-b1-int8.plan" --performance_sample_count=64 --max_dlas=0 --single_stream_expected_latency_ns=29478000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-large/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-large/SingleStream/user.conf" --scenario SingleStream --model ssd-large --response_postprocess coco
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-large/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-large/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-large/SingleStream/ssd-large-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.06288s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode : Performance
90th percentile latency (ns) : 4758055
Result is : INVALID
Min duration satisfied : NO
Min queries satisfied : Yes
Recommendations:
* Decrease the expected latency so the loadgen pre-generates more queries.
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 210.12
QPS w/o loadgen overhead : 212.94
Min latency (ns) : 4268744
Max latency (ns) : 10200442
Mean latency (ns) : 4696223
50.00 percentile latency (ns) : 4696862
90.00 percentile latency (ns) : 4758055
95.00 percentile latency (ns) : 4775386
97.00 percentile latency (ns) : 4791460
99.00 percentile latency (ns) : 4835215
99.90 percentile latency (ns) : 6043032
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 33.9236
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 64
No warnings encountered during test.
1 ERROR encountered. See detailed log.
Device Device:0 processed:
4072 batches of size 1
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 4072
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-01-30 09:45:50,264 main.py:142 INFO] Result: 90th percentile latency (ns) : 4758055 and Result is : INVALID
======================= Perf harness results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-large: 90th percentile latency (ns) : 4758055 and Result is : INVALID
======================= Accuracy results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-large: No accuracy results in PerformanceOnly mode.
INT8 LINEAR Inference
# sh run_infer_large_geforcertx2080ti_int8_linear.sh
[2020-01-30 09:35:25,837 infer.py:137 INFO] Running accuracy test...
[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - FlattenConcat_TRT
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[TensorRT] VERBOSE: Deserialize required 1393290 microseconds.
[2020-01-30 09:35:27,505 runner.py:38 INFO] Binding input
[2020-01-30 09:35:27,505 runner.py:38 INFO] Binding NMS_0
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
loading annotations into memory...
Done (t=0.45s)
creating index...
index created!
[2020-01-30 09:35:27,977 infer.py:68 INFO] Running validation on 100 images. Please wait...
[2020-01-30 09:35:27,998 infer.py:87 INFO] Batch 0 >> Inference time: 0.010165
[2020-01-30 09:35:28,010 infer.py:87 INFO] Batch 1 >> Inference time: 0.005486
[2020-01-30 09:35:28,022 infer.py:87 INFO] Batch 2 >> Inference time: 0.005503
[2020-01-30 09:35:28,034 infer.py:87 INFO] Batch 3 >> Inference time: 0.005512
[2020-01-30 09:35:28,045 infer.py:87 INFO] Batch 4 >> Inference time: 0.005523
[2020-01-30 09:35:28,057 infer.py:87 INFO] Batch 5 >> Inference time: 0.005453
[2020-01-30 09:35:28,069 infer.py:87 INFO] Batch 6 >> Inference time: 0.005463
[2020-01-30 09:35:28,080 infer.py:87 INFO] Batch 7 >> Inference time: 0.005533
[2020-01-30 09:35:28,091 infer.py:87 INFO] Batch 8 >> Inference time: 0.005446
[2020-01-30 09:35:28,102 infer.py:87 INFO] Batch 9 >> Inference time: 0.005482
...
[2020-01-30 09:35:28,974 infer.py:87 INFO] Batch 90 >> Inference time: 0.004283
[2020-01-30 09:35:28,985 infer.py:87 INFO] Batch 91 >> Inference time: 0.004278
[2020-01-30 09:35:28,995 infer.py:87 INFO] Batch 92 >> Inference time: 0.004291
[2020-01-30 09:35:29,005 infer.py:87 INFO] Batch 93 >> Inference time: 0.004247
[2020-01-30 09:35:29,016 infer.py:87 INFO] Batch 94 >> Inference time: 0.004285
[2020-01-30 09:35:29,026 infer.py:87 INFO] Batch 95 >> Inference time: 0.004282
[2020-01-30 09:35:29,037 infer.py:87 INFO] Batch 96 >> Inference time: 0.004242
[2020-01-30 09:35:29,047 infer.py:87 INFO] Batch 97 >> Inference time: 0.004283
[2020-01-30 09:35:29,058 infer.py:87 INFO] Batch 98 >> Inference time: 0.004273
[2020-01-30 09:35:29,068 infer.py:87 INFO] Batch 99 >> Inference time: 0.004268
Loading and preparing results...
DONE (t=0.11s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=1.50s).
Accumulating evaluation results...
DONE (t=0.53s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.266
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.471
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.261
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.138
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.362
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.349
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.248
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.368
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.391
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.206
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.497
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.484
[2020-01-30 09:35:31,560 infer.py:131 INFO] Get mAP score = 0.265670 Target = 0.200000