GeforceRTX2080Ti SSDMobilenetV2 - wom-ai/inference_results_v0.5 GitHub Wiki

Contents

2020-02-06

INT8 CHW4 Performance Only (C++)

[2020-02-06 02:30:23,890 main.py:294 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-06 02:30:23,890 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-06 02:30:23,891 main.py:298 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-06 02:30:23,891 main.py:114 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-06 02:30:23,892 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.06-02.30.23/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'PerformanceOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.06-02.30.23'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.01304s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 1606876
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 667.89
QPS w/o loadgen overhead        : 682.53

Min latency (ns)                : 1225574
Max latency (ns)                : 11102517
Mean latency (ns)               : 1465147
50.00 percentile latency (ns)   : 1401742
90.00 percentile latency (ns)   : 1606876
95.00 percentile latency (ns)   : 1918794
97.00 percentile latency (ns)   : 2096869
99.00 percentile latency (ns)   : 2582034
99.90 percentile latency (ns)   : 3944440

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

No errors encountered during test.
Device Device:0 processed:
  40075 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 40075
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-06 02:31:30,999 main.py:145 INFO] Result: 90th percentile latency (ns) : 1606876 and Result is : VALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: 90th percentile latency (ns) : 1606876 and Result is : VALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: No accuracy results in PerformanceOnly mode.

INT8 CHW4 Accuracy Only (C++)

[2020-02-06 02:31:31,561 main.py:294 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-06 02:31:31,561 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-06 02:31:31,561 main.py:298 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-06 02:31:31,561 main.py:114 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-06 02:31:31,564 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.06-02.31.31/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="AccuracyOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'AccuracyOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.06-02.31.31'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.01518s.

No warnings encountered during test.

No errors encountered during test.
Device Device:0 processed:
  5000 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 5000
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-06 02:31:46,808 main.py:145 INFO] Result: Cannot find performance result. Maybe you are running in AccuracyOnly mode.
[2020-02-06 02:31:46,814 __init__.py:42 INFO] Running command: python3 build/inference/v0.5/classification_and_detection/tools/accuracy-coco.py --mlperf-accuracy-file /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.06-02.31.31/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf_log_accuracy.json             --coco-dir /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/preprocessed_data/coco --output-file build/ssd-small-results.json
loading annotations into memory...
Done (t=0.41s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.15s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=13.86s).
Accumulating evaluation results...
DONE (t=2.25s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.243
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.369
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.268
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.020
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.563
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.217
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.273
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.274
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.026
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.196
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633
mAP=24.305%

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: Cannot find performance result. Maybe you are running in AccuracyOnly mode.


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: Accuracy = 24.305, Threshold = 21.780. Accuracy test PASSED.

INT8 CHW4 Inference

[2020-02-06 02:33:12,833 infer.py:144 INFO] Running accuracy test...
[2020-02-06 02:33:12,833 infer.py:58 INFO] Running SSDMobileNet functionality test for engine [ ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan ] with batch size 1
[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - FlattenConcat_TRT
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[TensorRT] VERBOSE: Deserialize required 1323137 microseconds.
[2020-02-06 02:33:14,414 runner.py:38 INFO] Binding Input
[2020-02-06 02:33:14,414 runner.py:38 INFO] Binding Postprocessor
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[2020-02-06 02:33:14,843 infer.py:85 INFO] Running validation on 200 images. Please wait...
[2020-02-06 02:33:14,856 infer.py:95 INFO] Batch 0 >> Inference time:  0.006726
[2020-02-06 02:33:14,858 infer.py:95 INFO] Batch 1 >> Inference time:  0.001593
[2020-02-06 02:33:14,860 infer.py:95 INFO] Batch 2 >> Inference time:  0.001588
[2020-02-06 02:33:14,862 infer.py:95 INFO] Batch 3 >> Inference time:  0.001588
[2020-02-06 02:33:14,865 infer.py:95 INFO] Batch 4 >> Inference time:  0.001590
[2020-02-06 02:33:14,867 infer.py:95 INFO] Batch 5 >> Inference time:  0.001585
[2020-02-06 02:33:14,869 infer.py:95 INFO] Batch 6 >> Inference time:  0.001586
[2020-02-06 02:33:14,871 infer.py:95 INFO] Batch 7 >> Inference time:  0.001586
[2020-02-06 02:33:14,873 infer.py:95 INFO] Batch 8 >> Inference time:  0.001591
[2020-02-06 02:33:14,875 infer.py:95 INFO] Batch 9 >> Inference time:  0.001586

...

[2020-02-06 02:33:15,203 infer.py:95 INFO] Batch 180 >> Inference time:  0.001202
[2020-02-06 02:33:15,205 infer.py:95 INFO] Batch 181 >> Inference time:  0.001213
[2020-02-06 02:33:15,207 infer.py:95 INFO] Batch 182 >> Inference time:  0.001204
[2020-02-06 02:33:15,208 infer.py:95 INFO] Batch 183 >> Inference time:  0.001204
[2020-02-06 02:33:15,210 infer.py:95 INFO] Batch 184 >> Inference time:  0.001204
[2020-02-06 02:33:15,212 infer.py:95 INFO] Batch 185 >> Inference time:  0.001214
[2020-02-06 02:33:15,214 infer.py:95 INFO] Batch 186 >> Inference time:  0.001203
[2020-02-06 02:33:15,215 infer.py:95 INFO] Batch 187 >> Inference time:  0.001210
[2020-02-06 02:33:15,217 infer.py:95 INFO] Batch 188 >> Inference time:  0.001204
[2020-02-06 02:33:15,219 infer.py:95 INFO] Batch 189 >> Inference time:  0.001207
[2020-02-06 02:33:15,220 infer.py:95 INFO] Batch 190 >> Inference time:  0.001207
[2020-02-06 02:33:15,222 infer.py:95 INFO] Batch 191 >> Inference time:  0.001209
[2020-02-06 02:33:15,224 infer.py:95 INFO] Batch 192 >> Inference time:  0.001206
[2020-02-06 02:33:15,225 infer.py:95 INFO] Batch 193 >> Inference time:  0.001203
[2020-02-06 02:33:15,227 infer.py:95 INFO] Batch 194 >> Inference time:  0.001212
[2020-02-06 02:33:15,229 infer.py:95 INFO] Batch 195 >> Inference time:  0.001200
[2020-02-06 02:33:15,230 infer.py:95 INFO] Batch 196 >> Inference time:  0.001204
[2020-02-06 02:33:15,232 infer.py:95 INFO] Batch 197 >> Inference time:  0.001212
[2020-02-06 02:33:15,234 infer.py:95 INFO] Batch 198 >> Inference time:  0.001204
[2020-02-06 02:33:15,235 infer.py:95 INFO] Batch 199 >> Inference time:  0.001212
[2020-02-06 02:33:16,253 infer.py:139 INFO] Get mAP score = 0.276376 Target = 0.223860
loading annotations into memory...
Done (t=0.41s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.55s).
Accumulating evaluation results...
DONE (t=0.44s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.276
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.421
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.300
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.030
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.202
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.632
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.246
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.297
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.297
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.034
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.217
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671

2020-02-04

INT8 CHW4 Inference

# sh ./run_infer_geforcertx2080ti_int8_chw4.sh 
[2020-02-03 04:47:01,267 infer.py:144 INFO] Running accuracy test...
[2020-02-03 04:47:01,267 infer.py:58 INFO] Running SSDMobileNet functionality test for engine [ ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan ] with batch size 1
[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - FlattenConcat_TRT
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[TensorRT] VERBOSE: Deserialize required 1340332 microseconds.
[2020-02-03 04:47:02,861 runner.py:38 INFO] Binding Input
[2020-02-03 04:47:02,861 runner.py:38 INFO] Binding Postprocessor
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
[2020-02-03 04:47:03,282 infer.py:85 INFO] Running validation on 200 images. Please wait...
[2020-02-03 04:47:03,294 infer.py:95 INFO] Batch 0 >> Inference time:  0.006780
[2020-02-03 04:47:03,297 infer.py:95 INFO] Batch 1 >> Inference time:  0.002114
[2020-02-03 04:47:03,300 infer.py:95 INFO] Batch 2 >> Inference time:  0.002108
[2020-02-03 04:47:03,303 infer.py:95 INFO] Batch 3 >> Inference time:  0.002115
[2020-02-03 04:47:03,305 infer.py:95 INFO] Batch 4 >> Inference time:  0.002117
[2020-02-03 04:47:03,308 infer.py:95 INFO] Batch 5 >> Inference time:  0.002103
[2020-02-03 04:47:03,311 infer.py:95 INFO] Batch 6 >> Inference time:  0.002104
[2020-02-03 04:47:03,313 infer.py:95 INFO] Batch 7 >> Inference time:  0.002107
[2020-02-03 04:47:03,316 infer.py:95 INFO] Batch 8 >> Inference time:  0.002120
[2020-02-03 04:47:03,319 infer.py:95 INFO] Batch 9 >> Inference time:  0.002108

...

[2020-02-03 04:47:03,747 infer.py:95 INFO] Batch 180 >> Inference time:  0.001611
[2020-02-03 04:47:03,750 infer.py:95 INFO] Batch 181 >> Inference time:  0.001614
[2020-02-03 04:47:03,752 infer.py:95 INFO] Batch 182 >> Inference time:  0.001612
[2020-02-03 04:47:03,754 infer.py:95 INFO] Batch 183 >> Inference time:  0.001612
[2020-02-03 04:47:03,757 infer.py:95 INFO] Batch 184 >> Inference time:  0.001611
[2020-02-03 04:47:03,759 infer.py:95 INFO] Batch 185 >> Inference time:  0.001619
[2020-02-03 04:47:03,762 infer.py:95 INFO] Batch 186 >> Inference time:  0.001608
[2020-02-03 04:47:03,765 infer.py:95 INFO] Batch 187 >> Inference time:  0.001613
[2020-02-03 04:47:03,767 infer.py:95 INFO] Batch 188 >> Inference time:  0.001615
[2020-02-03 04:47:03,770 infer.py:95 INFO] Batch 189 >> Inference time:  0.001615
[2020-02-03 04:47:03,773 infer.py:95 INFO] Batch 190 >> Inference time:  0.001612
[2020-02-03 04:47:03,775 infer.py:95 INFO] Batch 191 >> Inference time:  0.001613
[2020-02-03 04:47:03,778 infer.py:95 INFO] Batch 192 >> Inference time:  0.001615
[2020-02-03 04:47:03,780 infer.py:95 INFO] Batch 193 >> Inference time:  0.001616
[2020-02-03 04:47:03,783 infer.py:95 INFO] Batch 194 >> Inference time:  0.001608
[2020-02-03 04:47:03,785 infer.py:95 INFO] Batch 195 >> Inference time:  0.001617
[2020-02-03 04:47:03,788 infer.py:95 INFO] Batch 196 >> Inference time:  0.001614
[2020-02-03 04:47:03,790 infer.py:95 INFO] Batch 197 >> Inference time:  0.001625
[2020-02-03 04:47:03,793 infer.py:95 INFO] Batch 198 >> Inference time:  0.001617
[2020-02-03 04:47:03,796 infer.py:95 INFO] Batch 199 >> Inference time:  0.001606
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.54s).
Accumulating evaluation results...
DONE (t=0.44s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.421
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.310
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.029
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.209
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.249
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.299
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.299
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.034
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.224
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.676
[2020-02-03 04:47:04,810 infer.py:139 INFO] Get mAP score = 0.278008 Target = 0.223860

INT8 CHW4 Performance

# sh run_harness.sh 
[2020-02-04 07:33:29,904 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-04 07:33:29,904 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-04 07:33:29,904 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-04 07:33:29,904 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-04 07:33:29,906 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.33.29/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'PerformanceOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.33.29'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.01332s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 1904169
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 526.38
QPS w/o loadgen overhead        : 536.11

Min latency (ns)                : 1641304
Max latency (ns)                : 10676396
Mean latency (ns)               : 1865277
50.00 percentile latency (ns)   : 1822757
90.00 percentile latency (ns)   : 1904169
95.00 percentile latency (ns)   : 2119840
97.00 percentile latency (ns)   : 2645702
99.00 percentile latency (ns)   : 2892960
99.90 percentile latency (ns)   : 3605993

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

No errors encountered during test.
Device Device:0 processed:
  31584 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 31584
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-04 07:34:36,978 main.py:144 INFO] Result: 90th percentile latency (ns) : 1904169 and Result is : VALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: 90th percentile latency (ns) : 1904169 and Result is : VALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: No accuracy results in PerformanceOnly mode.

INT8 CHW4 Accuracy

[2020-02-04 07:34:37,501 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-04 07:34:37,501 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-04 07:34:37,501 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-04 07:34:37,501 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-04 07:34:37,503 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.34.37/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="AccuracyOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'AccuracyOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.34.37'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.01447s.

No warnings encountered during test.

No errors encountered during test.
Device Device:0 processed:
  5000 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 5000
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-04 07:34:54,886 main.py:144 INFO] Result: Cannot find performance result. Maybe you are running in AccuracyOnly mode.
[2020-02-04 07:34:54,892 __init__.py:42 INFO] Running command: python3 build/inference/v0.5/classification_and_detection/tools/accuracy-coco.py --mlperf-accuracy-file /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.34.37/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf_log_accuracy.json             --coco-dir /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/preprocessed_data/coco --output-file build/ssd-small-results.json
loading annotations into memory...
Done (t=0.41s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.14s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=13.78s).
Accumulating evaluation results...
DONE (t=2.36s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.245
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.270
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.172
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.566
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.218
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.275
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.276
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.027
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.637
mAP=24.467%

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: Cannot find performance result. Maybe you are running in AccuracyOnly mode.


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: Accuracy = 24.467, Threshold = 21.780. Accuracy test PASSED.