Testing - tingxingdong/clBLAS-private GitHub Wiki
The primary correctness test tools for clBLAS are based on the googletest test framework located in ./src/tests/correctness
. When the test tools are built, three executables are generated based on googletest which are similar in function except for the amount of execution time the tests run.
Test Name | Test Length |
---|---|
test-short | can run from minutes to hours |
test-medium | can run from hours up to a day |
test-correctness | can run for days |
The cmake generated build projects define an INSTALL target that can be built, which in addition to compiling the sources also creates a ./bin/clBLAS/develop/vs10x64/package
subdirectory and copies all the built executables and libraries together into the same directory. This greatly eases performance measurement and testing, as typically the executables and libraries scattered are scattered across the build tree and are built in their own build directories.
If the test executables are built using ACML as the reference library (the default), the test-'short|medium|correctness' executables have an external dependency on ACML as the reference library. As ACML is not built with CMake, it does not know to copy the libacml_dll.<PlatExt>
file into the package directory, so you must copy it manually to successfully run the tests. Depending on the version of ACML used, ACML itself may have dependencies on the Fortran runtime files which must also be copied into the ./bin/clBLAS/develop/vs10x64/package
subdirectory alongside libacml_dll.<PlatExt>
.
Executing the test programs with --help shows the gtest related flags that control how tests are run
F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64> .\test-short.exe --help
This program contains tests written using Google Test. You can use the
following command line flags to control its behavior:
Test Selection:
--gtest_list_tests
List the names of all tests instead of running them. The name of
TEST(Foo, Bar) is "Foo.Bar".
--gtest_filter=POSTIVE_PATTERNS[-NEGATIVE_PATTERNS]
Run only the tests whose name matches one of the positive patterns but
none of the negative patterns. '?' matches any single character; '*'
matches any substring; ':' separates two patterns.
--gtest_also_run_disabled_tests
Run all disabled tests too.
Test Execution:
--gtest_repeat=[COUNT]
Run the tests repeatedly; use a negative count to repeat forever.
--gtest_shuffle
Randomize tests' orders on every iteration.
--gtest_random_seed=[NUMBER]
Random number seed to use for shuffling test orders (between 1 and
99999, or 0 to use a seed based on the current time).
Test Output:
--gtest_color=(yes|no|auto)
Enable/disable colored output. The default is auto.
--gtest_print_time=0
Don't print the elapsed time of each test.
--gtest_output=xml[:DIRECTORY_PATH\|:FILE_PATH]
Generate an XML report in the given directory or with the given file
name. FILE_PATH defaults to test_details.xml.
Assertion Behavior:
--gtest_break_on_failure
Turn assertion failures into debugger break-points.
--gtest_throw_on_failure
Turn assertion failures into C++ exceptions.
--gtest_catch_exceptions=0
Do not report exceptions as test failures. Instead, allow them
to crash the program or throw a pop-up (on Windows).
Except for the --gtest_list_tests, you can alternatively set the corresponding
environment variable of a flag (all letters in upper-case). For example, to
disable colored text output, you can either specify --gtest_color=no or set
the GTEST_COLOR environment variable to no.
For more information, see the Google Test documentation at
http://code.google.com/p/googletest/. If you find a bug in Google Test
(not one in your own code or tests), report it to
<googletestframework@googlegroups.com>.
Initialize OpenCL and clAmdBlas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
Test environment:
Device name: Tahiti
Device vendor: Advanced Micro Devices, Inc.
Platform (bit): Windows (x64)
clAmdBlas version: 2.1.0
Driver version: 1124.2 (VM)
Device version: OpenCL 1.2 AMD-APP (1124.2)
Global mem size: 2048 MB
---------------------------------------------------------
The exact time it takes for the test executables to finish is dependent on the hardware under test, and varies widely. For day to day use or to automate in a continuous integration setting, it is recommended to use test-short. Gtest filters can additionally be applied to narrow testing to only a fraction of the overall tests, if it is known that recent code edits are limited to a strict subset of functionality. The gtest_filter flag takes a regular expression that it matches to the test name, and each test name is unique.
An example of running only sgemm related tests
F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64> .\test-short.exe --gtest_filter=*sgemm*
Initialize OpenCL and clAmdBlas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
Test environment:
Device name: Tahiti
Device vendor: Advanced Micro Devices, Inc.
Platform (bit): Windows (x64)
clAmdBlas version: 2.1.0
Driver version: 1124.2 (VM)
Device version: OpenCL 1.2 AMD-APP (1124.2)
Global mem size: 2048 MB
---------------------------------------------------------
Note: Google Test filter = *sgemm*
[==========] Running 308 tests from 7 test cases.
[----------] Global test environment set-up.
[----------] 72 tests from ColumnMajor_SmallRange/GEMM
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/0
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clAmdBlas xGEMM routine... Done
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/0 (398 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/1
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
Negative test filters are also available with the '-' operator to filter out tests. An example that filters out all ztrmm tests would look like .\test-short.exe --gtest_filter=-*ztrmm*
with much the same output as above, except no ztrmm tests will run in the test pass.
A complicated expression can be created using both positive and negative filters, separated by the ':' character. This is just using standard googletest filter notation.
.\test-short.exe --gtest_filter=*ColumnMajor*:-*ztrmm*
The primary API and interface test tool for clBLAS is based on the googletest test framework and is located in ./src/tests/functional
. After building the INSTALL target, the binary is located alongside the other test-* tests in ./bin/clBLAS/develop/vs10x64/package
and is named test-functional.
The scope of test-functional is to test permutations of correct and incorrect parameters into the BLAS API and to also validate the return results, including error results. As a function of testing the API, a subset of tests in test-functional create multiple OpenCL devices and call into the clBLAS with multiple queue's. As the logic handling the input parameters validation are not likely to change often once written, this test program is usually only run manually by a developer to sanity check their changes.
test-performance is based on the googletest test framework and is located in ./src/tests/performance
. After building the INSTALL target, the binary is located alongside the other test-* tests in ./bin/clBLAS/develop/vs10x64/package
and is named test-performance.
test-performance is a performance testing program that compares the performance of a particular BLAS algorithm running on CPU vs. GPU, and reports results as a speedup of GPU over CPU. Each individual test case reported the result of a particular matrix size and function. While this program was simple to create and served its purpose well during the beginning of clBLAS development, it is now deprecated as the python scripts in scripts\perf
are more flexible and can create graphs of performance sweeping over the matrix size on the x-axis. However, the code is still available as an example for writing gtest based performance tests. Example output of running test-performance is
F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64> .\test-performance.exe
Initialize OpenCL and CLBLAS...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
[==========] Running 48124 tests from 92 test cases.
[----------] Global test environment set-up.
[----------] 2304 tests from Generic/GEMM
[ RUN ] Generic/GEMM.sgemm/0
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 2048, N = 2048, K = 2048
offA = 0, offB = 0, offC = 0
lda = 2048, ldb = 2048, ldc = 2048
seed = 12345
queues = 1
Acml reference function has worked in 678 milliseconds, clBlas function has worked in 16 milliseconds, time ratio i
s 42.2137
clBlas GFLOPS : 1069.53
[ OK ] Generic/GEMM.sgemm/0 (5456 ms)
[ RUN ] Generic/GEMM.sgemm/1
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 2048, N = 2048, K = 2800
offA = 0, offB = 0, offC = 0
lda = 2048, ldb = 2800, ldc = 2048
seed = 12345
queues = 1
Acml reference function has worked in 938 milliseconds, clBlas function has worked in 21 milliseconds, time ratio i
s 43.0902
clBlas GFLOPS : 1078.43
It can be convenient to automatically generate a standalone executable which executes a BLAS call of interest.
For instance, the googletest suite has may find a failure in a particular function, and it would be nice to submit a bug
report with an example test case without having to distribute the entire test application. Or, a performance regression of a particular function was found with specific parameters. Regardless of reason, the make-ktest application is a tool that can help automate this process
F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64>make-ktest.exe --help
Application Arguments:
--config arg (=ktest.cfg) Configuration file
-h [ --help ] Show this help message
Generator Arguments:
--cpp arg (=ktest.cpp) Output file name for C++ generated source
--cl arg Output file name for OpenCL generated source
--data arg (=random) Data generation pattern
Format: {random | unit | sawtooth}
--skip-accuracy Don't generate code for accuracy check. Applicable if
the program is needed only for performance measurement
OpenCL Arguments:
--platform arg (=AMD Accelerated Parallel Processing)
Platform name
--device arg (=Tahiti) Device name
--build-options arg Build options
BLAS Arguments:
-f [ --function ] arg Function name, mandatory
Format: {s | d | c | z}{BLAS function}
--order arg (=row) Data ordering
Format: {column | row}
--side arg (=left) The side matrix A is located relative to matrix B
Format: {left | right}
--uplo arg (=upper) Upper or lower triangle of matrix is being referenced
Format: {upper | lower}
--transA arg (=n) Matrix A transposition operation
Format: {n | t | c}
--transB arg (=n) Matrix B transposition operation
Format: {n | t | c}
--diag arg (=nonunit) Whether the matrix is unit triangular
Format: {unit | nonunit}
-M [ --M ] arg (=256)
-N [ --N ] arg (=256)
-K [ --K ] arg (=256)
--alpha arg (=1) Alpha multiplier
Format: real[,imag]
--beta arg (=1) Beta multiplier
Format: real[,imag]
--lda arg Leading dimension of the matrix A
--ldb arg Leading dimension of the matrix B
--ldc arg Leading dimension of the matrix C
--offA arg (=0) Start offset in buffer of matrix A
--offBX arg (=0) Start offset in buffer of matrix B or vector X
--offCY arg (=0) Start offset in buffer of matrix C or vector Y
--incx arg (=1) Increment in the array X
--incy arg (=1) Increment in the array Y
Decomposition Options:
-d [ --decomposition ] arg SubproblemDim
Format: {subdims[0].x},{subdims[0].y},
{subdims[0].bwidth},
{subdims[1].x},{subdims[1].y},
{subdims[1].bwidth}
--multikernel arg (=0) Allow division of one BLAS function between
several kernels
In the directory that the tool is launched from, it saves a host application source file with a
name defined by the 'cpp' command line argument, and either one (typical) or several files
with kernels. The file <BLAS_SRC_ROOT>/src/library/tools/ktest/naive/naive_blas.cpp
should be
copied to the same directory as the files generated above. This file contains a naive blas
implementation to check accuracy and is referenced by the host application source.
BLAS_SRC_ROOT is the root directory of the library's source code.
To add ability for generating test cases for other functions, a class derived from the class amd::Step should implemented.