GeforceRTX2080Ti - wom-ai/inference_results_v0.5 GitHub Wiki

Contents

2020-06-11

INT8 CHW4 Performace mode only (C++)

# sh run_harness.sh 
[2020-06-11 11:18:46,721 main.py:303 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-06-11 11:18:46,721 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-06-11 11:18:46,721 main.py:307 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-06-11 11:18:46,721 main.py:117 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
BenchmarkHarness (
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'PerformanceOnly', 'warmup_duration': 5.0, 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.18.46'}
BenchmarkHarness )
=========================================================
argstr:
--plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.18.46/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small
[2020-06-11 11:18:46,724 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.18.46/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
model: ssd-small
scenario: SingleStream
multi_stream_samples_per_query :1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
model: ssd-small
scenario: SingleStream
multi_stream_samples_per_query :1
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00774s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 763428
Result is : INVALID
  Min duration satisfied : NO
  Min queries satisfied : Yes
Recommendations:
 * Decrease the expected latency so the loadgen pre-generates more queries.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 1392.17
QPS w/o loadgen overhead        : 1429.70

Min latency (ns)                : 565842
Max latency (ns)                : 16154016
Mean latency (ns)               : 699449
50.00 percentile latency (ns)   : 676799
90.00 percentile latency (ns)   : 763428
95.00 percentile latency (ns)   : 796025
97.00 percentile latency (ns)   : 837434
99.00 percentile latency (ns)   : 1373752
99.90 percentile latency (ns)   : 1777017

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

2 ERRORS encountered. See detailed log.
Device Device:0 processed:
  74030 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 74030
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-06-11 11:19:46,487 main.py:154 INFO] Result: 90th percentile latency (ns) : 763428 and Result is : INVALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: 90th percentile latency (ns) : 763428 and Result is : INVALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: No accuracy results in PerformanceOnly mode.
[2020-06-11 11:19:47,041 main.py:303 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/config.json
[2020-06-11 11:19:47,041 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/config.json ...
[2020-06-11 11:19:47,041 main.py:307 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStreamB2"
[2020-06-11 11:19:47,041 main.py:117 INFO] Running harness for ssd-small benchmark in SingleStreamB2 scenario...
BenchmarkHarness (
{'gpu_multi_stream_samples_per_query': 2, 'gpu_batch_size': 2, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStreamB2', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStreamB2', 'test_mode': 'PerformanceOnly', 'warmup_duration': 5.0, 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.19.46'}
BenchmarkHarness )
=========================================================
argstr:
--plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.19.46/GeforceRTX2080Ti/ssd-small/SingleStreamB2" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=2 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB2/ssd-small-SingleStreamB2-gpu-b2-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/user.conf" --scenario SingleStreamB2 --model ssd-small
[2020-06-11 11:19:47,043 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.19.46/GeforceRTX2080Ti/ssd-small/SingleStreamB2" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=2 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB2/ssd-small-SingleStreamB2-gpu-b2-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/user.conf" --scenario SingleStreamB2 --model ssd-small --response_postprocess coco
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/user.conf
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/mlperf.conf
model: ssd-small
scenario: SingleStreamB2
multi_stream_samples_per_query :1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB2/user.conf
model: ssd-small
scenario: SingleStreamB2
multi_stream_samples_per_query :2
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB2/ssd-small-SingleStreamB2-gpu-b2-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00759s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 865908
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 2326.76
QPS w/o loadgen overhead        : 1194.28

Min latency (ns)                : 668459
Max latency (ns)                : 11928798
Mean latency (ns)               : 837328
50.00 percentile latency (ns)   : 820278
90.00 percentile latency (ns)   : 865908
95.00 percentile latency (ns)   : 898003
97.00 percentile latency (ns)   : 1326685
99.00 percentile latency (ns)   : 1564436
99.90 percentile latency (ns)   : 2032594

================================================
Test Parameters Used
================================================
samples_per_query : 2
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

1 ERROR encountered. See detailed log.
Device Device:0 processed:
  69804 batches of size 2
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 139098
  BatchedCudaMemcpy Calls: 255
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-06-11 11:20:53,660 main.py:154 INFO] Result: 90th percentile latency (ns) : 865908 and Result is : VALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStreamB2:
    ssd-small: 90th percentile latency (ns) : 865908 and Result is : VALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStreamB2:
    ssd-small: No accuracy results in PerformanceOnly mode.
[2020-06-11 11:20:54,248 main.py:303 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/config.json
[2020-06-11 11:20:54,248 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/config.json ...
[2020-06-11 11:20:54,248 main.py:307 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStreamB4"
[2020-06-11 11:20:54,248 main.py:117 INFO] Running harness for ssd-small benchmark in SingleStreamB4 scenario...
BenchmarkHarness (
{'gpu_multi_stream_samples_per_query': 4, 'gpu_batch_size': 4, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStreamB4', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStreamB4', 'test_mode': 'PerformanceOnly', 'warmup_duration': 5.0, 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.20.53'}
BenchmarkHarness )
=========================================================
argstr:
--plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.20.53/GeforceRTX2080Ti/ssd-small/SingleStreamB4" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=4 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB4/ssd-small-SingleStreamB4-gpu-b4-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/user.conf" --scenario SingleStreamB4 --model ssd-small
[2020-06-11 11:20:54,250 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.20.53/GeforceRTX2080Ti/ssd-small/SingleStreamB4" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=4 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB4/ssd-small-SingleStreamB4-gpu-b4-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/user.conf" --scenario SingleStreamB4 --model ssd-small --response_postprocess coco
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/user.conf
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/mlperf.conf
model: ssd-small
scenario: SingleStreamB4
multi_stream_samples_per_query :1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB4/user.conf
model: ssd-small
scenario: SingleStreamB4
multi_stream_samples_per_query :4
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB4/ssd-small-SingleStreamB4-gpu-b4-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00964s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 1048366
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 3798.38
QPS w/o loadgen overhead        : 977.39

Min latency (ns)                : 859597
Max latency (ns)                : 9310222
Mean latency (ns)               : 1023133
50.00 percentile latency (ns)   : 1017991
90.00 percentile latency (ns)   : 1048366
95.00 percentile latency (ns)   : 1060626
97.00 percentile latency (ns)   : 1076371
99.00 percentile latency (ns)   : 1420701
99.90 percentile latency (ns)   : 2157515

================================================
Test Parameters Used
================================================
samples_per_query : 4
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

1 ERROR encountered. See detailed log.
Device Device:0 processed:
  56976 batches of size 4
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 227904
  BatchedCudaMemcpy Calls: 0
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-06-11 11:22:00,893 main.py:154 INFO] Result: 90th percentile latency (ns) : 1048366 and Result is : VALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStreamB4:
    ssd-small: 90th percentile latency (ns) : 1048366 and Result is : VALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStreamB4:
    ssd-small: No accuracy results in PerformanceOnly mode.
[2020-06-11 11:22:01,421 main.py:303 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/config.json
[2020-06-11 11:22:01,421 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/config.json ...
[2020-06-11 11:22:01,421 main.py:307 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStreamB8"
[2020-06-11 11:22:01,421 main.py:117 INFO] Running harness for ssd-small benchmark in SingleStreamB8 scenario...
BenchmarkHarness (
{'gpu_multi_stream_samples_per_query': 8, 'gpu_batch_size': 8, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStreamB8', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStreamB8', 'test_mode': 'PerformanceOnly', 'warmup_duration': 5.0, 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.22.01'}
BenchmarkHarness )
=========================================================
argstr:
--plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.22.01/GeforceRTX2080Ti/ssd-small/SingleStreamB8" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=8 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB8/ssd-small-SingleStreamB8-gpu-b8-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/user.conf" --scenario SingleStreamB8 --model ssd-small
[2020-06-11 11:22:01,423 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.06.11-11.22.01/GeforceRTX2080Ti/ssd-small/SingleStreamB8" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --warmup_duration=5.0 --use_graphs=false --gpu_batch_size=8 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB8/ssd-small-SingleStreamB8-gpu-b8-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/user.conf" --scenario SingleStreamB8 --model ssd-small --response_postprocess coco
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/user.conf
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/mlperf.conf
model: ssd-small
scenario: SingleStreamB8
multi_stream_samples_per_query :1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
path: measurements/GeforceRTX2080Ti/ssd-small/SingleStreamB8/user.conf
model: ssd-small
scenario: SingleStreamB8
multi_stream_samples_per_query :8
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStreamB8/ssd-small-SingleStreamB8-gpu-b8-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.01281s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 1519506
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 5350.43
QPS w/o loadgen overhead        : 686.26

Min latency (ns)                : 1277011
Max latency (ns)                : 13146655
Mean latency (ns)               : 1457174
50.00 percentile latency (ns)   : 1453995
90.00 percentile latency (ns)   : 1519506
95.00 percentile latency (ns)   : 1553000
97.00 percentile latency (ns)   : 1582855
99.00 percentile latency (ns)   : 1856573
99.90 percentile latency (ns)   : 2858533

================================================
Test Parameters Used
================================================
samples_per_query : 8
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

1 ERROR encountered. See detailed log.
Device Device:0 processed:
  40129 batches of size 8
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 321032
  BatchedCudaMemcpy Calls: 0
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-06-11 11:23:08,094 main.py:154 INFO] Result: 90th percentile latency (ns) : 1519506 and Result is : VALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStreamB8:
    ssd-small: 90th percentile latency (ns) : 1519506 and Result is : VALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStreamB8:
    ssd-small: No accuracy results in PerformanceOnly mode.

2020-02-03

INT8 CHW4 Performace mode only (C++)

[2020-02-04 07:48:29,885 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-04 07:48:29,886 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-04 07:48:29,886 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-04 07:48:29,886 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-04 07:48:29,888 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.48.29/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'PerformanceOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.48.29'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00845s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 744387
Result is : INVALID
  Min duration satisfied : NO
  Min queries satisfied : Yes
Recommendations:
 * Decrease the expected latency so the loadgen pre-generates more queries.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 1456.14
QPS w/o loadgen overhead        : 1494.25

Min latency (ns)                : 565862
Max latency (ns)                : 13811889
Mean latency (ns)               : 669231
50.00 percentile latency (ns)   : 648871
90.00 percentile latency (ns)   : 744387
95.00 percentile latency (ns)   : 772777
97.00 percentile latency (ns)   : 793756
99.00 percentile latency (ns)   : 1139579
99.90 percentile latency (ns)   : 1980274

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

1 ERROR encountered. See detailed log.
Device Device:0 processed:
  74030 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 74030
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-04 07:49:27,719 main.py:144 INFO] Result: 90th percentile latency (ns) : 744387 and Result is : INVALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: 90th percentile latency (ns) : 744387 and Result is : INVALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: No accuracy results in PerformanceOnly mode.

INT8 CHW4 Accuracy mode only (C++)

[2020-02-04 07:49:28,257 main.py:293 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-02-04 07:49:28,257 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-02-04 07:49:28,257 main.py:297 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-02-04 07:49:28,257 main.py:113 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
[2020-02-04 07:49:28,259 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.49.27/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="AccuracyOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'AccuracyOnly', 'log_dir': '/work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.49.27'}
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00712s.

No warnings encountered during test.

No errors encountered during test.
Device Device:0 processed:
  5000 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 5000
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-02-04 07:49:39,188 main.py:144 INFO] Result: Cannot find performance result. Maybe you are running in AccuracyOnly mode.
[2020-02-04 07:49:39,194 __init__.py:42 INFO] Running command: python3 build/inference/v0.5/classification_and_detection/tools/accuracy-coco.py --mlperf-accuracy-file /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.02.04-07.49.27/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf_log_accuracy.json             --coco-dir /work/mlperf/inference_results_v0.5/closed/NVIDIA/build/preprocessed_data/coco --output-file build/ssd-small-results.json
loading annotations into memory...
Done (t=0.41s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.14s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=13.84s).
Accumulating evaluation results...
DONE (t=2.27s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.229
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.346
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.253
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.017
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.164
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.525
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.261
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.261
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.189
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.598
mAP=22.911%

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: Cannot find performance result. Maybe you are running in AccuracyOnly mode.


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: Accuracy = 22.911, Threshold = 21.780. Accuracy test PASSED.

2020-01-10

FP32 Performance mode only (C++)

  • Unfortunately NVIDIA C++ harness code does not support float32
# sh run_harness.sh 
[2020-01-10 07:49:41,536 main.py:291 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-01-10 07:49:41,537 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-01-10 07:49:41,537 main.py:295 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-01-10 07:49:41,537 main.py:111 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'float32', 'input_format': 'linear', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'float32', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/float32_linear', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'test_mode': 'PerformanceOnly', 'log_dir': '/mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.01.10-07.49.41'}
[2020-01-10 07:49:41,539 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.01.10-07.49.41/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/float32_linear" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-float32.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-float32.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.01566s.
F0110 07:49:48.296392  1091 qsl.hpp:145] Check failed: fs Unable to open: /mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/build/preprocessed_data/coco/val2017/SSDMobileNet/float32_linear/000000108503.jpg.npy
*** Check failure stack trace: ***
    @     0x7f2d1c656362  google::LogMessage::Fail()
    @     0x7f2d1c6562aa  google::LogMessage::SendToLog()
    @     0x7f2d1c655beb  google::LogMessage::Flush()
    @     0x7f2d1c659066  google::LogMessageFatal::~LogMessageFatal()
    @     0x56179020b39f  qsl::SampleLibrary::LoadSamplesToRam()
    @     0x5617902932a3  mlperf::loadgen::RunPerformanceMode<>()
    @     0x56179027d512  mlperf::StartTest()
    @     0x561790205e95  doInference()
    @     0x56179020388f  main
    @     0x7f2d0e3cfb6b  __libc_start_main
    @     0x561790203f9a  _start
    @              (nil)  (unknown)
Aborted (core dumped)
Traceback (most recent call last):
  File "code/main.py", line 327, in <module>
    main()
  File "code/main.py", line 319, in main
    handle_run_harness(benchmark_name, benchmark_conf, need_gpu, need_dla)
  File "code/main.py", line 141, in handle_run_harness
    result = harness.run_harness()
  File "/mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/code/common/harness.py", line 240, in run_harness
    output = run_command(cmd, get_output=True)
  File "/mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/code/common/__init__.py", line 58, in run_command
    raise subprocess.CalledProcessError(ret, cmd)
subprocess.CalledProcessError: Command './build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.01.10-07.49.41/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --test_mode="PerformanceOnly" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/float32_linear" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-float32.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco' returned non-zero exit status 134.
Makefile:303: recipe for target 'run_harness' failed
make: *** [run_harness] Error 1

FP32 Inference Only (Python)

# sh ./run_infer_geforcertx2080ti_fp32.sh 
[2020-01-10 07:28:12,487 infer.py:144 INFO] Running accuracy test...
[2020-01-10 07:28:12,487 infer.py:58 INFO] Running SSDMobileNet functionality test for engine [ ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-float32.plan ] with batch size 1
[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - FlattenConcat_TRT
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[TensorRT] VERBOSE: Deserialize required 1283583 microseconds.
[2020-01-10 07:28:14,026 runner.py:38 INFO] Binding Input
[2020-01-10 07:28:14,026 runner.py:38 INFO] Binding Postprocessor
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
[2020-01-10 07:28:14,445 infer.py:85 INFO] Running validation on 100 images. Please wait...
[2020-01-10 07:28:14,459 infer.py:95 INFO] Batch 0 >> Inference time:  0.007466
[2020-01-10 07:28:14,462 infer.py:95 INFO] Batch 1 >> Inference time:  0.001644
[2020-01-10 07:28:14,464 infer.py:95 INFO] Batch 2 >> Inference time:  0.001649
[2020-01-10 07:28:14,467 infer.py:95 INFO] Batch 3 >> Inference time:  0.001646
[2020-01-10 07:28:14,469 infer.py:95 INFO] Batch 4 >> Inference time:  0.001648
[2020-01-10 07:28:14,471 infer.py:95 INFO] Batch 5 >> Inference time:  0.001648
[2020-01-10 07:28:14,474 infer.py:95 INFO] Batch 6 >> Inference time:  0.001639
[2020-01-10 07:28:14,476 infer.py:95 INFO] Batch 7 >> Inference time:  0.001646
[2020-01-10 07:28:14,478 infer.py:95 INFO] Batch 8 >> Inference time:  0.001654
[2020-01-10 07:28:14,480 infer.py:95 INFO] Batch 9 >> Inference time:  0.001646
...
[2020-01-10 07:28:14,668 infer.py:95 INFO] Batch 90 >> Inference time:  0.001299
[2020-01-10 07:28:14,670 infer.py:95 INFO] Batch 91 >> Inference time:  0.001302
[2020-01-10 07:28:14,672 infer.py:95 INFO] Batch 92 >> Inference time:  0.001297
[2020-01-10 07:28:14,674 infer.py:95 INFO] Batch 93 >> Inference time:  0.001299
[2020-01-10 07:28:14,676 infer.py:95 INFO] Batch 94 >> Inference time:  0.001292
[2020-01-10 07:28:14,678 infer.py:95 INFO] Batch 95 >> Inference time:  0.001303
[2020-01-10 07:28:14,680 infer.py:95 INFO] Batch 96 >> Inference time:  0.001288
[2020-01-10 07:28:14,682 infer.py:95 INFO] Batch 97 >> Inference time:  0.001298
[2020-01-10 07:28:14,684 infer.py:95 INFO] Batch 98 >> Inference time:  0.001292
[2020-01-10 07:28:14,686 infer.py:95 INFO] Batch 99 >> Inference time:  0.001298
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.29s).
Accumulating evaluation results...
DONE (t=0.32s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.301
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.440
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.333
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.022
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.214
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.255
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.317
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.318
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.216
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.659
[2020-01-10 07:28:15,312 infer.py:139 INFO] Get mAP score = 0.300559 Target = 0.223860

2020-01-07

INT8 CHW4 Performance mode only (C++)

root@35192637cbac:/inference_results_v0.5/closed/NVIDIA# sh run_harness.sh 
[2020-01-07 12:03:52,643 main.py:291 INFO] Using config files: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json
[2020-01-07 12:03:52,643 __init__.py:142 INFO] Parsing config file measurements/GeforceRTX2080Ti/ssd-small/SingleStream/config.json ...
[2020-01-07 12:03:52,643 main.py:295 INFO] Processing config "GeforceRTX2080Ti_ssd-small_SingleStream"
[2020-01-07 12:03:52,643 main.py:111 INFO] Running harness for ssd-small benchmark in SingleStream scenario...
{'gpu_batch_size': 1, 'gpu_single_stream_expected_latency_ns': 1621000, 'input_dtype': 'int8', 'input_format': 'chw4', 'map_path': 'data_maps/coco/val_map.txt', 'precision': 'int8', 'tensor_path': '${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4', 'use_graphs': False, 'system_id': 'GeforceRTX2080Ti', 'scenario': 'SingleStream', 'benchmark': 'ssd-small', 'config_name': 'GeforceRTX2080Ti_ssd-small_SingleStream', 'log_dir': '/mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.01.07-12.03.52'}
[2020-01-07 12:03:52,645 __init__.py:42 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so" --logfile_outdir="/mnt/home/jylee/work2/mlperf/inference_results_v0.5/closed/NVIDIA/build/logs/2020.01.07-12.03.52/GeforceRTX2080Ti/ssd-small/SingleStream" --logfile_prefix="mlperf_log_" --use_graphs=false --gpu_batch_size=1 --map_path="data_maps/coco/val_map.txt" --tensor_path="${PREPROCESSED_DATA_DIR}/coco/val2017/SSDMobileNet/int8_chw4" --gpu_engines="./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan" --performance_sample_count=256 --max_dlas=0 --single_stream_expected_latency_ns=1621000 --mlperf_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf" --user_conf_path="measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf" --scenario SingleStream --model ssd-small --response_postprocess coco
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/mlperf.conf
[I] user.conf path: measurements/GeforceRTX2080Ti/ssd-small/SingleStream/user.conf
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Device:0: ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan has been successfully loaded.
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[W] [TRT] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.00763s.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 653294
Result is : INVALID
  Min duration satisfied : NO
  Min queries satisfied : Yes
Recommendations:
 * Decrease the expected latency so the loadgen pre-generates more queries.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 1574.22
QPS w/o loadgen overhead        : 1604.91

Min latency (ns)                : 572597
Max latency (ns)                : 6675639
Mean latency (ns)               : 623088
50.00 percentile latency (ns)   : 618265
90.00 percentile latency (ns)   : 653294
95.00 percentile latency (ns)   : 666952
97.00 percentile latency (ns)   : 674364
99.00 percentile latency (ns)   : 695681
99.90 percentile latency (ns)   : 1104913

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 616.903
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 256

No warnings encountered during test.

1 ERROR encountered. See detailed log.
Device Device:0 processed:
  74030 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 74030
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2020-01-07 12:04:46,713 main.py:142 INFO] Result: 90th percentile latency (ns) : 653294 and Result is : INVALID

======================= Perf harness results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: 90th percentile latency (ns) : 653294 and Result is : INVALID


======================= Accuracy results: =======================

GeforceRTX2080Ti-SingleStream:
    ssd-small: No accuracy results in PerformanceOnly mode.

Inference Only (Python)

root@35192637cbac:/inference_results_v0.5/closed/NVIDIA# sh run_infer_geforcertx2080ti.sh 
[2020-01-07 12:26:52,944 infer.py:144 INFO] Running accuracy test...
[2020-01-07 12:26:52,944 infer.py:58 INFO] Running SSDMobileNet functionality test for engine [ ./build/engines/GeforceRTX2080Ti/ssd-small/SingleStream/ssd-small-SingleStream-gpu-b1-int8.plan ] with batch size 1
[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - FlattenConcat_TRT
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
[TensorRT] VERBOSE: Deserialize required 1216182 microseconds.
[2020-01-07 12:26:54,406 runner.py:38 INFO] Binding Input
[2020-01-07 12:26:54,421 runner.py:38 INFO] Binding Postprocessor
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
[2020-01-07 12:26:54,845 infer.py:85 INFO] Running validation on 100 images. Please wait...
[2020-01-07 12:26:54,858 infer.py:95 INFO] Batch 0 >> Inference time:  0.006898
[2020-01-07 12:26:54,860 infer.py:95 INFO] Batch 1 >> Inference time:  0.000735
[2020-01-07 12:26:54,861 infer.py:95 INFO] Batch 2 >> Inference time:  0.000728
[2020-01-07 12:26:54,862 infer.py:95 INFO] Batch 3 >> Inference time:  0.000731
[2020-01-07 12:26:54,863 infer.py:95 INFO] Batch 4 >> Inference time:  0.000727
[2020-01-07 12:26:54,865 infer.py:95 INFO] Batch 5 >> Inference time:  0.000722
[2020-01-07 12:26:54,866 infer.py:95 INFO] Batch 6 >> Inference time:  0.000725
[2020-01-07 12:26:54,867 infer.py:95 INFO] Batch 7 >> Inference time:  0.000726
[2020-01-07 12:26:54,868 infer.py:95 INFO] Batch 8 >> Inference time:  0.000731
[2020-01-07 12:26:54,869 infer.py:95 INFO] Batch 9 >> Inference time:  0.000732

...

[2020-01-07 12:26:54,969 infer.py:95 INFO] Batch 90 >> Inference time:  0.000722
[2020-01-07 12:26:54,970 infer.py:95 INFO] Batch 91 >> Inference time:  0.000726
[2020-01-07 12:26:54,971 infer.py:95 INFO] Batch 92 >> Inference time:  0.000723
[2020-01-07 12:26:54,973 infer.py:95 INFO] Batch 93 >> Inference time:  0.000723
[2020-01-07 12:26:54,974 infer.py:95 INFO] Batch 94 >> Inference time:  0.000728
[2020-01-07 12:26:54,975 infer.py:95 INFO] Batch 95 >> Inference time:  0.000730
[2020-01-07 12:26:54,976 infer.py:95 INFO] Batch 96 >> Inference time:  0.000731
[2020-01-07 12:26:54,978 infer.py:95 INFO] Batch 97 >> Inference time:  0.000729
[2020-01-07 12:26:54,979 infer.py:95 INFO] Batch 98 >> Inference time:  0.000731
[2020-01-07 12:26:54,980 infer.py:95 INFO] Batch 99 >> Inference time:  0.000724
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.29s).
Accumulating evaluation results...
DONE (t=0.32s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.295
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.422
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.327
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.209
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.253
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.312
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.314
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.027
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.212
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.656
[2020-01-07 12:26:55,610 infer.py:139 INFO] Get mAP score = 0.295377 Target = 0.223860
⚠️ **GitHub.com Fallback** ⚠️