Contents
2020-02-05
INT8 CHW4 Performance Only (C++)
[2020-02-05 09:28:47,056 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-05 09:28:47,057 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-05 09:28:47,057 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-05 09:28:47,057 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-05 09:28:47,059 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.05-09.28.46/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'PerformanceOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.05-09.28.46'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00764s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode : Performance
90th percentile latency (ns) : 926046
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 1087.70
QPS w/o loadgen overhead : 1114.57
Min latency (ns) : 715923
Max latency (ns) : 11420220
Mean latency (ns) : 897208
50.00 percentile latency (ns) : 873828
90.00 percentile latency (ns) : 926046
95.00 percentile latency (ns) : 945089
97.00 percentile latency (ns) : 1271093
99.00 percentile latency (ns) : 2052013
99.90 percentile latency (ns) : 2900098
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256
No warnings encountered during test.
No errors encountered during test.
Device Device:0 processed:
65263 batches of size 1
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 65263
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-05 09:29:53,965 main.py:144 INFO] Result: 90th percentile latency (ns) : 926046 and Result is : VALID
======================= Perf harness results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: 90th percentile latency (ns) : 926046 and Result is : VALID
======================= Accuracy results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: No accuracy results in PerformanceOnly mode.
INT8 CHW4 Accuracy Only (C++)
[2020-02-05 09:29:54,513 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-05 09:29:54,513 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-05 09:29:54,513 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-05 09:29:54,513 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-05 09:29:54,515 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.05-09.29.54/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="AccuracyOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'AccuracyOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.05-09.29.54'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00796s.
No warnings encountered during test.
No errors encountered during test.
Device Device:0 processed:
5000 batches of size 1
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 5000
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-05 09:30:06,512 main.py:144 INFO] Result: Cannot find performance result. Maybe you are running in AccuracyOnly mode.
[2020-02-05 09:30:06,520 __init__.py:42 INFO] Running command: python3 build/inference/v0.5/classification_and_detection/tools/accuracy-coco.py --mlperf-accuracy-file /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.05-09.29.54/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf_log_accuracy.json --coco-dir /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/preprocessed_data/coco --output-file build/ssd-small-results.json
loading annotations into memory...
Done (t=0.56s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=13.85s).
Accumulating evaluation results...
DONE (t=2.26s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.237
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.350
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.263
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.018
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.160
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.559
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.214
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.266
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.267
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.022
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.184
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.627
mAP=23.689%
======================= Perf harness results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: Cannot find performance result. Maybe you are running in AccuracyOnly mode.
======================= Accuracy results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: Accuracy = 23.689, Threshold = 21.780. Accuracy test PASSED.
INT8 CHW4 Inference Only (Python)
[2020-02-05 09:38:39,322 infer.py:144 INFO] Running accuracy test...
[2020-02-05 09:38:39,322 infer.py:58 INFO] Running SSDMobileNet functionality test for engine [ ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan ] with batch size 1
[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - FlattenConcat_TRT
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[TensorRT] VERBOSE: Deserialize required 1209800 microseconds.
[2020-02-05 09:38:40,777 runner.py:38 INFO] Binding Input
[2020-02-05 09:38:40,777 runner.py:38 INFO] Binding Postprocessor
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[2020-02-05 09:38:41,193 infer.py:85 INFO] Running validation on 200 images. Please wait...
[2020-02-05 09:38:41,206 infer.py:95 INFO] Batch 0 >> Inference time: 0.006555
[2020-02-05 09:38:41,208 infer.py:95 INFO] Batch 1 >> Inference time: 0.000942
[2020-02-05 09:38:41,209 infer.py:95 INFO] Batch 2 >> Inference time: 0.000941
[2020-02-05 09:38:41,210 infer.py:95 INFO] Batch 3 >> Inference time: 0.000947
[2020-02-05 09:38:41,212 infer.py:95 INFO] Batch 4 >> Inference time: 0.000936
[2020-02-05 09:38:41,213 infer.py:95 INFO] Batch 5 >> Inference time: 0.000938
[2020-02-05 09:38:41,215 infer.py:95 INFO] Batch 6 >> Inference time: 0.000939
[2020-02-05 09:38:41,216 infer.py:95 INFO] Batch 7 >> Inference time: 0.000943
[2020-02-05 09:38:41,218 infer.py:95 INFO] Batch 8 >> Inference time: 0.000949
[2020-02-05 09:38:41,219 infer.py:95 INFO] Batch 9 >> Inference time: 0.000940
...
[2020-02-05 09:38:41,460 infer.py:95 INFO] Batch 181 >> Inference time: 0.000923
[2020-02-05 09:38:41,461 infer.py:95 INFO] Batch 182 >> Inference time: 0.000923
[2020-02-05 09:38:41,463 infer.py:95 INFO] Batch 183 >> Inference time: 0.000915
[2020-02-05 09:38:41,464 infer.py:95 INFO] Batch 184 >> Inference time: 0.000918
[2020-02-05 09:38:41,465 infer.py:95 INFO] Batch 185 >> Inference time: 0.000923
[2020-02-05 09:38:41,467 infer.py:95 INFO] Batch 186 >> Inference time: 0.000923
[2020-02-05 09:38:41,468 infer.py:95 INFO] Batch 187 >> Inference time: 0.000921
[2020-02-05 09:38:41,469 infer.py:95 INFO] Batch 188 >> Inference time: 0.000923
[2020-02-05 09:38:41,471 infer.py:95 INFO] Batch 189 >> Inference time: 0.000926
[2020-02-05 09:38:41,472 infer.py:95 INFO] Batch 190 >> Inference time: 0.000919
[2020-02-05 09:38:41,474 infer.py:95 INFO] Batch 191 >> Inference time: 0.000919
[2020-02-05 09:38:41,475 infer.py:95 INFO] Batch 192 >> Inference time: 0.000923
[2020-02-05 09:38:41,476 infer.py:95 INFO] Batch 193 >> Inference time: 0.000925
[2020-02-05 09:38:41,478 infer.py:95 INFO] Batch 194 >> Inference time: 0.000921
[2020-02-05 09:38:41,479 infer.py:95 INFO] Batch 195 >> Inference time: 0.000914
[2020-02-05 09:38:41,480 infer.py:95 INFO] Batch 196 >> Inference time: 0.000923
[2020-02-05 09:38:41,482 infer.py:95 INFO] Batch 197 >> Inference time: 0.000922
[2020-02-05 09:38:41,483 infer.py:95 INFO] Batch 198 >> Inference time: 0.000916
[2020-02-05 09:38:41,485 infer.py:95 INFO] Batch 199 >> Inference time: 0.000921
[2020-02-05 09:38:42,485 infer.py:139 INFO] Get mAP score = 0.268132 Target = 0.223860
loading annotations into memory...
Done (t=0.39s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.53s).
Accumulating evaluation results...
DONE (t=0.45s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.268
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.380
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.291
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.022
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.212
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.237
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.027
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.220
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.650
2020-02-04
INT8 CHW4 Performance Only (C++)
[2020-02-04 07:56:39,747 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-04 07:56:39,747 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-04 07:56:39,747 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-04 07:56:39,747 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-04 07:56:39,749 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.56.39/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'PerformanceOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.56.39'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.0104s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode : Performance
90th percentile latency (ns) : 1354635
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 715.26
QPS w/o loadgen overhead : 730.47
Min latency (ns) : 1134747
Max latency (ns) : 10589602
Mean latency (ns) : 1368976
50.00 percentile latency (ns) : 1312750
90.00 percentile latency (ns) : 1354635
95.00 percentile latency (ns) : 1568880
97.00 percentile latency (ns) : 2463886
99.00 percentile latency (ns) : 3157033
99.90 percentile latency (ns) : 4397460
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256
No warnings encountered during test.
No errors encountered during test.
Device Device:0 processed:
42918 batches of size 1
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 42918
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-04 07:57:46,751 main.py:144 INFO] Result: 90th percentile latency (ns) : 1354635 and Result is : VALID
======================= Perf harness results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: 90th percentile latency (ns) : 1354635 and Result is : VALID
======================= Accuracy results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: No accuracy results in PerformanceOnly mode.
INT8 CHW4 Accuracy Only (C++)
[2020-02-04 07:57:47,296 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-04 07:57:47,296 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-04 07:57:47,297 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-04 07:57:47,297 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-04 07:57:47,299 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.57.46/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="AccuracyOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'AccuracyOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.57.46'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00992s.
No warnings encountered during test.
No errors encountered during test.
Device Device:0 processed:
5000 batches of size 1
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 5000
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-04 07:58:01,688 main.py:144 INFO] Result: Cannot find performance result. Maybe you are running in AccuracyOnly mode.
[2020-02-04 07:58:01,694 __init__.py:42 INFO] Running command: python3 build/inference/v0.5/classification_and_detection/tools/accuracy-coco.py --mlperf-accuracy-file /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.57.46/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf_log_accuracy.json --coco-dir /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/preprocessed_data/coco --output-file build/ssd-small-results.json
loading annotations into memory...
Done (t=0.42s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.09s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=13.17s).
Accumulating evaluation results...
DONE (t=2.20s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.240
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.265
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.018
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.163
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.216
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.269
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.269
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.189
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635
mAP=23.957%
======================= Perf harness results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: Cannot find performance result. Maybe you are running in AccuracyOnly mode.
======================= Accuracy results: =======================
GeforceRTX2080Ti-SingleStream:
ssd-small: Accuracy = 23.957, Threshold = 21.780. Accuracy test PASSED.
INT8 CHW4 Inference Only (Python)
# sh run_infer_geforcertx2080ti_int8_chw4.sh
[2020-02-04 08:05:41,176 infer.py:144 INFO] Running accuracy test...
[2020-02-04 08:05:41,176 infer.py:58 INFO] Running SSDMobileNet functionality test for engine [ ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan ] with batch size 1
[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - FlattenConcat_TRT
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[TensorRT] VERBOSE: Deserialize required 1218964 microseconds.
[2020-02-04 08:05:42,639 runner.py:38 INFO] Binding Input
[2020-02-04 08:05:42,639 runner.py:38 INFO] Binding Postprocessor
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
loading annotations into memory...
Done (t=0.39s)
creating index...
index created!
[2020-02-04 08:05:43,058 infer.py:85 INFO] Running validation on 200 images. Please wait...
[2020-02-04 08:05:43,071 infer.py:95 INFO] Batch 0 >> Inference time: 0.006978
[2020-02-04 08:05:43,073 infer.py:95 INFO] Batch 1 >> Inference time: 0.001455
[2020-02-04 08:05:43,076 infer.py:95 INFO] Batch 2 >> Inference time: 0.001454
[2020-02-04 08:05:43,077 infer.py:95 INFO] Batch 3 >> Inference time: 0.001468
[2020-02-04 08:05:43,080 infer.py:95 INFO] Batch 4 >> Inference time: 0.001462
[2020-02-04 08:05:43,082 infer.py:95 INFO] Batch 5 >> Inference time: 0.001457
[2020-02-04 08:05:43,084 infer.py:95 INFO] Batch 6 >> Inference time: 0.001449
[2020-02-04 08:05:43,085 infer.py:95 INFO] Batch 7 >> Inference time: 0.001456
[2020-02-04 08:05:43,087 infer.py:95 INFO] Batch 8 >> Inference time: 0.001463
[2020-02-04 08:05:43,089 infer.py:95 INFO] Batch 9 >> Inference time: 0.001452
...
[2020-02-04 08:05:43,413 infer.py:95 INFO] Batch 180 >> Inference time: 0.001115
[2020-02-04 08:05:43,414 infer.py:95 INFO] Batch 181 >> Inference time: 0.001124
[2020-02-04 08:05:43,416 infer.py:95 INFO] Batch 182 >> Inference time: 0.001114
[2020-02-04 08:05:43,418 infer.py:95 INFO] Batch 183 >> Inference time: 0.001123
[2020-02-04 08:05:43,419 infer.py:95 INFO] Batch 184 >> Inference time: 0.001132
[2020-02-04 08:05:43,421 infer.py:95 INFO] Batch 185 >> Inference time: 0.001131
[2020-02-04 08:05:43,423 infer.py:95 INFO] Batch 186 >> Inference time: 0.001129
[2020-02-04 08:05:43,425 infer.py:95 INFO] Batch 187 >> Inference time: 0.001127
[2020-02-04 08:05:43,426 infer.py:95 INFO] Batch 188 >> Inference time: 0.001120
[2020-02-04 08:05:43,428 infer.py:95 INFO] Batch 189 >> Inference time: 0.001125
[2020-02-04 08:05:43,429 infer.py:95 INFO] Batch 190 >> Inference time: 0.001122
[2020-02-04 08:05:43,431 infer.py:95 INFO] Batch 191 >> Inference time: 0.001122
[2020-02-04 08:05:43,433 infer.py:95 INFO] Batch 192 >> Inference time: 0.001121
[2020-02-04 08:05:43,434 infer.py:95 INFO] Batch 193 >> Inference time: 0.001124
[2020-02-04 08:05:43,436 infer.py:95 INFO] Batch 194 >> Inference time: 0.001124
[2020-02-04 08:05:43,437 infer.py:95 INFO] Batch 195 >> Inference time: 0.001130
[2020-02-04 08:05:43,439 infer.py:95 INFO] Batch 196 >> Inference time: 0.001132
[2020-02-04 08:05:43,441 infer.py:95 INFO] Batch 197 >> Inference time: 0.001130
[2020-02-04 08:05:43,442 infer.py:95 INFO] Batch 198 >> Inference time: 0.001121
[2020-02-04 08:05:43,444 infer.py:95 INFO] Batch 199 >> Inference time: 0.001116
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.53s).
Accumulating evaluation results...
DONE (t=0.45s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.271
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.393
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.294
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.228
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.606
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.239
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.296
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.297
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.028
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.237
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.656
[2020-02-04 08:05:44,444 infer.py:139 INFO] Get mAP score = 0.271346 Target = 0.223860