opt helper - Joejiong/buddy-mlir GitHub Wiki

> /Workspace/buddy-mlir/llvm/build/bin/mlir-opt -h                             pointwise_benchmark [18e39c6] modified
OVERVIEW: MLIR modular optimizer driver

Available Dialects: acc, affine, amx, arith, arm_neon, arm_sve, async, bufferization, builtin, complex, dlti, emitc, gpu, linalg, llvm, math, memref, nvvm, omp, pdl, pdl_interp, quant, rocdl, scf, shape, sparse_tensor, spv, std, tensor, test, tosa, vector, x86vector
USAGE: mlir-opt [options] <input file>

OPTIONS:

Color Options:

  --color                                               - Use colors in output (default=autodetect)

General options:

  --allow-unregistered-dialect                          - Allow operation with no registered dialects
  --disable-i2p-p2i-opt                                 - Disables inttoptr/ptrtoint roundtrip optimization
  --dot-cfg-mssa=<file name for generated dot file>     - file name for generated dot file
  --mlir-debug-counter=<string>                         - Comma separated list of debug counter skip and count arguments
  --mlir-disable-threading                              - Disable multi-threading within MLIR, overrides any further call to MLIRContext::enableMultiThreading()
  --mlir-elide-elementsattrs-if-larger=<uint>           - Elide ElementsAttrs with "..." that have more elements than the given upper limit
  --mlir-pretty-debuginfo                               - Print pretty debug info in MLIR output
  --mlir-print-debug-counter                            - Print out debug counter information after all counters have been accumulated
  --mlir-print-debuginfo                                - Print debug info in MLIR output
  --mlir-print-elementsattrs-with-hex-if-larger=<long>  - Print DenseElementsAttrs with a hex string that have more elements than the given upper limit (use -1 to disable)
  --mlir-print-op-on-diagnostic                         - When a diagnostic is emitted on an operation, also print the operation as an attached note
  --mlir-print-stacktrace-on-diagnostic                 - When a diagnostic is emitted, also print the stack trace as an attached note
  --mlir-timing                                         - Display execution times
  --mlir-timing-display=<value>                         - Display method for timing data
    =list                                               -   display the results in a list sorted by total time
    =tree                                               -   display the results ina with a nested tree view
  -o=<filename>                                         - Output filename
  --opaque-pointers                                     - Use opaque pointers
  --pass-pipeline-crash-reproducer=<string>             - Generate a .mlir reproducer file at the given output path if the pass manager crashes or fails
  --pass-pipeline-local-reproducer                      - When generating a crash reproducer, attempt to generated a reproducer with the smallest pipeline.
  --pass-statistics                                     - Display the statistics of each pass
  --pass-statistics-display=<value>                     - Display method for pass statistics
    =list                                               -   display the results in a merged list sorted by pass name
    =pipeline                                           -   display the results with a nested pipeline view
  --print-ir-after=<pass-arg>                           - Print IR after specified passes
  --print-ir-after-all                                  - Print IR after each pass
  --print-ir-after-change                               - When printing the IR after a pass, only print if the IR changed
  --print-ir-after-failure                              - When printing the IR after a pass, only print if the pass failed
  --print-ir-before=<pass-arg>                          - Print IR before specified passes
  --print-ir-before-all                                 - Print IR before each pass
  --print-ir-module-scope                               - When printing IR for print-ir-[before|after]{-all} always print the top-level operation
  --run-reproducer                                      - Append the command line options of the reproducer
  Compiler passes to run
    --pass-pipeline                                     -   A textual description of a pass pipeline to run
    Passes:
      --affine-data-copy-generate                       -   Generate explicit copying for affine memory operations
        --fast-mem-capacity=<ulong>                     - Set fast memory space capacity in KiB (default: unlimited)
        --fast-mem-space=<uint>                         - Fast memory space identifier for copy generation (default: 1)
        --generate-dma                                  - Generate DMA instead of point-wise copy
        --min-dma-transfer=<int>                        - Minimum DMA transfer size supported by the target in bytes
        --skip-non-unit-stride-loops                    - Testing purposes: avoid non-unit stride loop choice depths for copy placement
        --slow-mem-space=<uint>                         - Slow memory space identifier for copy generation (default: 0)
        --tag-mem-space=<uint>                          - Tag memory space identifier for copy generation (default: 0)
      --affine-loop-fusion                              -   Fuse affine loop nests
        --fusion-compute-tolerance=<number>             - Fractional increase in additional computation tolerated while fusing
        --fusion-fast-mem-space=<uint>                  - Faster memory space number to promote fusion buffers to
        --fusion-local-buf-threshold=<ulong>            - Threshold size (KiB) for promoting local buffers to fast memory space
        --fusion-maximal                                - Enables maximal loop fusion
        --mode=<value>                                  - fusion mode to attempt
    =greedy                                       -   Perform greedy (both producer-consumer and sibling)  fusion
    =producer                                     -   Perform only producer-consumer fusion
    =sibling                                      -   Perform only sibling fusion
      --affine-loop-invariant-code-motion               -   Hoist loop invariant instructions outside of affine loops
      --affine-loop-normalize                           -   Apply normalization transformations to affine loop-like ops
      --affine-loop-tile                                -   Tile affine loop nests
        --cache-size=<ulong>                            - Set size of cache to tile for in KiB
        --separate                                      - Separate full and partial tiles
        --tile-size=<uint>                              - Use this tile size for all loops
        --tile-sizes=<uint>                             - List of tile sizes for each perfect nest (overridden by -tile-size)
      --affine-loop-unroll                              -   Unroll affine loops
        --unroll-factor=<uint>                          - Use this unroll factor for all loops being unrolled
        --unroll-full                                   - Fully unroll loops
        --unroll-full-threshold=<uint>                  - Unroll all loops with trip count less than or equal to this
        --unroll-num-reps=<uint>                        - Unroll innermost loops repeatedly this many times
        --unroll-up-to-factor                           - Allow unrolling up to the factor specified
      --affine-loop-unroll-jam                          -   Unroll and jam affine loops
        --unroll-jam-factor=<uint>                      - Use this unroll jam factor for all loops (default 4)
      --affine-parallelize                              -   Convert affine.for ops into 1-D affine.parallel
        --max-nested=<uint>                             - Maximum number of nested parallel loops to produce. Defaults to unlimited (UINT_MAX).
        --parallel-reductions                           - Whether to parallelize reduction loops. Defaults to false.
      --affine-pipeline-data-transfer                   -   Pipeline non-blocking data transfers between explicitly managed levels of the memory hierarchy
      --affine-scalrep                                  -   Replace affine memref acceses by scalars by forwarding stores to loads and eliminating redundant loads
      --affine-super-vectorize                          -   Vectorize to a target independent n-D vector abstraction
        --test-fastest-varying=<long>                   - Specify a 1-D, 2-D or 3-D pattern of fastest varying memory dimensions to match. See defaultPatterns in Vectorize.cpp for a description and examples. This is used for testing purposes
        --vectorize-reductions                          - Vectorize known reductions expressed via iter_args. Switched off by default.
        --virtual-vector-size=<long>                    - Specify an n-D virtual vector size for vectorization
      --affine-super-vectorizer-test                    -   Tests vectorizer standalone functionality.
      --arith-bufferize                                 -   Bufferize Arithmetic dialect ops.
      --arith-expand                                    -   Legalize Arithmetic ops to be convertible to LLVM.
      --arm-neon-2d-to-intr                             -   Convert Arm NEON structured ops to intrinsics
      --async-parallel-for                              -   Convert scf.parallel operations to multiple async compute ops executed concurrently for non-overlapping iteration ranges
        --async-dispatch                                - Dispatch async compute tasks using recursive work splitting. If `false` async compute tasks will be launched using simple for loop in the caller thread.
        --min-task-size=<int>                           - The minimum task size for sharding parallel operation.
        --num-workers=<int>                             - The number of available workers to execute async operations.
      --async-runtime-policy-based-ref-counting         -   Policy based reference counting for Async runtime operations
      --async-runtime-ref-counting                      -   Automatic reference counting for Async runtime operations
      --async-runtime-ref-counting-opt                  -   Optimize automatic reference counting operations for theAsync runtime by removing redundant operations
      --async-to-async-runtime                          -   Lower high level async operations (e.g. async.execute) to theexplicit async.runtime and async.coro operations
        --eliminate-blocking-await-ops                  - Rewrite functions with blocking async.runtime.await as coroutines with async.runtime.await_and_resume.
      --buffer-deallocation                             -   Adds all required dealloc operations for all allocations in the input program
      --buffer-hoisting                                 -   Optimizes placement of allocation operations by moving them into common dominators and out of nested regions
      --buffer-loop-hoisting                            -   Optimizes placement of allocation operations by moving them out of loop nests
      --buffer-results-to-out-params                    -   Converts memref-typed function results to out-params
      --canonicalize                                    -   Canonicalize operations
        --disable-patterns=<string>                     - Labels of patterns that should be filtered out during application
        --enable-patterns=<string>                      - Labels of patterns that should be used during application, all other patterns are filtered out
        --max-iterations=<long>                         - Seed the worklist in general top-down order
        --region-simplify                               - Seed the worklist in general top-down order
        --top-down                                      - Seed the worklist in general top-down order
      --convert-affine-for-to-gpu                       -   Convert top-level AffineFor Ops to GPU kernels
        --gpu-block-dims=<uint>                         - Number of GPU block dimensions for mapping
        --gpu-thread-dims=<uint>                        - Number of GPU thread dimensions for mapping
      --convert-arith-to-llvm                           -   Convert Arithmetic dialect to LLVM dialect
        --index-bitwidth=<uint>                         - Bitwidth of the index type, 0 to use size of machine word
      --convert-arith-to-spirv                          -   Convert Arithmetic dialect to SPIR-V dialect
        --emulate-non-32-bit-scalar-types               - Emulate non-32-bit scalar types with 32-bit ones if missing native support
      --convert-async-to-llvm                           -   Convert the operations from the async dialect into the LLVM dialect
      --convert-bufferization-to-memref                 -   Convert operations from the Bufferization dialect to the MemRef dialect
      --convert-complex-to-llvm                         -   Convert Complex dialect to LLVM dialect
      --convert-complex-to-standard                     -   Convert Complex dialect to standard dialect
      --convert-elementwise-to-linalg                   -   Convert ElementwiseMappable ops to linalg
      --convert-gpu-launch-to-vulkan-launch             -   Convert gpu.launch_func to vulkanLaunch external call
      --convert-gpu-to-nvvm                             -   Generate NVVM operations for gpu operations
        --index-bitwidth=<uint>                         - Bitwidth of the index type, 0 to use size of machine word
      --convert-gpu-to-rocdl                            -   Generate ROCDL operations for gpu operations
        --index-bitwidth=<uint>                         - Bitwidth of the index type, 0 to use size of machine word
        --runtime=<value>                               - Runtime code will be run on (default is Unknown, can also use HIP or OpenCl)
    =unknown                                      -   Unknown (default)
    =HIP                                          -   HIP
    =OpenCL                                       -   OpenCL
      --convert-gpu-to-spirv                            -   Convert GPU dialect to SPIR-V dialect
      --convert-linalg-tiled-loops-to-scf               -   Lower linalg tiled loops to SCF loops and parallel loops
      --convert-linalg-to-affine-loops                  -   Lower the operations from the linalg dialect into affine loops
      --convert-linalg-to-llvm                          -   Convert the operations from the linalg dialect into the LLVM dialect
      --convert-linalg-to-loops                         -   Lower the operations from the linalg dialect into loops
      --convert-linalg-to-parallel-loops                -   Lower the operations from the linalg dialect into parallel loops
      --convert-linalg-to-spirv                         -   Convert Linalg dialect to SPIR-V dialect
      --convert-linalg-to-std                           -   Convert the operations from the linalg dialect into the Standard dialect
      --convert-math-to-libm                            -   Convert Math dialect to libm calls
      --convert-math-to-llvm                            -   Convert Math dialect to LLVM dialect
      --convert-math-to-spirv                           -   Convert Math dialect to SPIR-V dialect
      --convert-memref-to-llvm                          -   Convert operations from the MemRef dialect to the LLVM dialect
        --index-bitwidth=<uint>                         - Bitwidth of the index type, 0 to use size of machine word
        --use-aligned-alloc                             - Use aligned_alloc in place of malloc for heap allocations
      --convert-memref-to-spirv                         -   Convert MemRef dialect to SPIR-V dialect
        --bool-num-bits=<int>                           - The number of bits to store a boolean value
      --convert-openacc-to-llvm                         -   Convert the OpenACC ops to LLVM dialect
      --convert-openacc-to-scf                          -   Convert the OpenACC ops to OpenACC with SCF dialect
      --convert-openmp-to-llvm                          -   Convert the OpenMP ops to OpenMP ops with LLVM dialect
      --convert-parallel-loops-to-gpu                   -   Convert mapped scf.parallel ops to gpu launch operations
      --convert-pdl-to-pdl-interp                       -   Convert PDL ops to PDL interpreter ops
      --convert-scf-to-openmp                           -   Convert SCF parallel loop to OpenMP parallel + workshare constructs.
      --convert-scf-to-spirv                            -   Convert SCF dialect to SPIR-V dialect.
      --convert-scf-to-std                              -   Convert SCF dialect to Standard dialect, replacing structured control flow with a CFG
      --convert-shape-constraints                       -   Convert shape constraint operations to the standard dialect
      --convert-shape-to-std                            -   Convert operations from the shape dialect into the standard dialect
      --convert-spirv-to-llvm                           -   Convert SPIR-V dialect to LLVM dialect
      --convert-std-to-llvm                             -   Convert scalar and vector operations from the Standard to the LLVM dialect
        --data-layout=<string>                          - String description (LLVM format) of the data layout that is expected on the produced module
        --emit-c-wrappers                               - Emit wrappers for C-compatible pointer-to-struct memref descriptors
        --index-bitwidth=<uint>                         - Bitwidth of the index type, 0 to use size of machine word
        --use-bare-ptr-memref-call-conv                 - Replace FuncOp's MemRef arguments with bare pointers to the MemRef element types
      --convert-std-to-spirv                            -   Convert Standard dialect to SPIR-V dialect
        --emulate-non-32-bit-scalar-types               - Emulate non-32-bit scalar types with 32-bit ones if missing native support
      --convert-vector-to-gpu                           -   Lower the operations from the vector dialect into the GPU dialect
      --convert-vector-to-llvm                          -   Lower the operations from the vector dialect into the LLVM dialect
        --enable-amx                                    - Enables the use of AMX dialect while lowering the vector dialect.
        --enable-arm-neon                               - Enables the use of ArmNeon dialect while lowering the vector dialect.
        --enable-arm-sve                                - Enables the use of ArmSVE dialect while lowering the vector dialect.
        --enable-index-optimizations                    - Allows compiler to assume indices fit in 32-bit if that yields faster code
        --enable-x86vector                              - Enables the use of X86Vector dialect while lowering the vector dialect.
        --reassociate-fp-reductions                     - Allows llvm to reassociate floating-point reductions for speed
      --convert-vector-to-rocdl                         -   Lower the operations from the vector dialect into the ROCDL dialect
      --convert-vector-to-scf                           -   Lower the operations from the vector dialect into the SCF dialect
        --full-unroll                                   - Perform full unrolling when converting vector transfers to SCF
        --lower-permutation-maps                        - Replace permutation maps with vector transposes/broadcasts before lowering transfer ops
        --lower-tensors                                 - Lower transfer ops that operate on tensors
        --target-rank=<uint>                            - Target vector rank to which transfer ops should be lowered
      --convert-vector-to-spirv                         -   Convert Vector dialect to SPIR-V dialect
      --cse                                             -   Eliminate common sub-expressions
      --decorate-spirv-composite-type-layout            -   Decorate SPIR-V composite type with layout info
      --finalizing-bufferize                            -   Finalize a partial bufferization
      --fold-memref-subview-ops                         -   Fold memref.subview ops into consumer load/store ops
      --for-loop-canonicalization                       -   Canonicalize operations within scf.for loop bodies
      --for-loop-peeling                                -   Peel `for` loops at their upper bounds.
        --skip-partial                                  - Do not peel loops inside of the last, partial iteration of another already peeled loop.
      --for-loop-range-folding                          -   Fold add/mul ops into loop range
      --for-loop-specialization                         -   Specialize `for` loops for vectorization
      --func-bufferize                                  -   Bufferize func/call/return ops
      --gpu-async-region                                -   Make GPU ops async
      --gpu-kernel-outlining                            -   Outline gpu.launch bodies to kernel functions
      --gpu-to-llvm                                     -   Convert GPU dialect to LLVM dialect with GPU runtime calls
        --gpu-binary-annotation=<string>                - Annotation attribute string for GPU binary
      --inline                                          -   Inline function calls
        --default-pipeline=<string>                     - The default optimizer pipeline used for callables
        --max-iterations=<uint>                         - Maximum number of iterations when inlining within an SCC
        --op-pipelines=<string>                         - Callable operation specific optimizer pipelines (in the form of `dialect.op(pipeline)`)
      --launch-func-to-vulkan                           -   Convert vulkanLaunch external call to Vulkan runtime external calls
      --linalg-bufferize                                -   Bufferize the linalg dialect
      --linalg-comprehensive-module-bufferize           -   Bufferize (tensor into memref) for a Module.
        --allow-return-memref                           - Allows the return of memrefs (for testing purposes only)
        --allow-unknown-ops                             - Allows unknown (not bufferizable) ops in the input IR.
        --analysis-fuzzer-seed=<uint>                   - Analyze ops in random order with a given seed (fuzzer)
        --test-analysis-only                            - Only runs inplaceability analysis (for testing purposes only)
        --use-alloca                                    - Use stack allocations for memrefs (for testing purposes only)
      --linalg-detensorize                              -   Detensorize linalg ops
        --aggressive-mode                               - Detensorize all ops that qualify for detensoring along with branch operands and basic-block arguments.
      --linalg-fold-reshape-ops-by-linearization        -   Fold TensorReshapeOps with generic/indexed generic ops by linearization
        --allow-folding-unit-dim-reshapes               - Allow fusing linalg.tensor_reshape ops that performs unit dimension collapsing
      --linalg-fold-unit-extent-dims                    -   Remove unit-extent dimension in Linalg ops on tensors
        --fold-one-trip-loops-only                      - Only folds the one-trip loops from Linalg ops on tensors (for testing purposes only)
      --linalg-fuse-elementwise-ops                     -   Fuse elementwise operations on tensors
        --allow-folding-unit-dim-reshapes               - Allow fusing linalg.tensor_reshape ops that performs unit dimension collapsing
      --linalg-generalize-named-ops                     -   Convert named ops into generic ops
      --linalg-inline-scalar-operands                   -   Inline scalar operands into linalg generic ops
      --linalg-promote-subviews                         -   Promote subview ops to local buffers
        --test-promote-dynamic                          - Test generation of dynamic promoted buffers
        --test-use-alloca                               - Test generation of alloca'ed buffers.
      --linalg-strategy-decompose-pass                  -   Configurable pass to apply pattern-based generalization.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
      --linalg-strategy-enable-pass                     -   Configurable pass to enable the application of other pattern-based linalg passes.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
      --linalg-strategy-generalize-pass                 -   Configurable pass to apply pattern-based generalization.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
        --anchor-op=<string>                            - Which linalg op within the func is the anchor to latch on.
      --linalg-strategy-interchange-pass                -   Configurable pass to apply pattern-based iterator interchange.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
      --linalg-strategy-lower-vectors-pass              -   Configurable pass to lower vector operations.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
      --linalg-strategy-pad-pass                        -   Configurable pass to apply padding and hoisting.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
        --anchor-op=<string>                            - Which linalg op within the func is the anchor to latch on.
      --linalg-strategy-promote-pass                    -   Configurable pass to apply pattern-based linalg promotion.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
        --anchor-op=<string>                            - Which linalg op within the func is the anchor to latch on.
      --linalg-strategy-remove-markers-pass             -   Cleanup pass that drops markers.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
      --linalg-strategy-tile-and-fuse-pass              -   Configurable pass to apply pattern-based tiling and fusion.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
        --anchor-op=<string>                            - Which linalg op within the func is the anchor to latch on.
      --linalg-strategy-tile-pass                       -   Configurable pass to apply pattern-based linalg tiling.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
        --anchor-op=<string>                            - Which linalg op within the func is the anchor to latch on.
      --linalg-strategy-vectorize-pass                  -   Configurable pass to apply pattern-based linalg vectorization.
        --anchor-func=<string>                          - Which func op is the anchor to latch on.
        --anchor-op=<string>                            - Which linalg op within the func is the anchor to latch on.
        --vectorize-padding                             - Enable vectorization of padding ops.
      --linalg-tile                                     -   Tile operations in the linalg dialect
        --distribution-types=<string>                   - DistributionTypes (if loop-type=tiled_loop)
        --loop-type=<string>                            - Specify the type of loops to generate: for, parallel or tiled_loop
        --tile-sizes=<long>                             - Tile sizes
      --llvm-legalize-for-export                        -   Legalize LLVM dialect to be convertible to LLVM IR
      --loop-coalescing                                 -   Coalesce nested loops with independent bounds into a single loop
      --loop-invariant-code-motion                      -   Hoist loop invariant instructions outside of the loop
      --lower-affine                                    -   Lower Affine operations to a combination of Standard and SCF operations
      --lower-host-to-llvm                              -   Lowers the host module code and `gpu.launch_func` to LLVM
      --normalize-memrefs                               -   Normalize memrefs
      --parallel-loop-collapsing                        -   Collapse parallel loops to use less induction variables
        --collapsed-indices-0=<uint>                    - Which loop indices to combine 0th loop index
        --collapsed-indices-1=<uint>                    - Which loop indices to combine into the position 1 loop index
        --collapsed-indices-2=<uint>                    - Which loop indices to combine into the position 2 loop index
      --parallel-loop-fusion                            -   Fuse adjacent parallel loops
      --parallel-loop-specialization                    -   Specialize parallel loops for vectorization
      --parallel-loop-tiling                            -   Tile parallel loops
        --no-min-max-bounds                             - Perform tiling with fixed upper bound with inbound check inside the internal loops
        --parallel-loop-tile-sizes=<long>               - Factors to tile parallel loops by
      --print-op-stats                                  -   Print statistics of operations
      --promote-buffers-to-stack                        -   Promotes heap-based allocations to automatically managed stack-based allocations
        --bitwidth-of-index-type=<uint>                 - Bitwidth of the index type. Used for size estimation.
        --max-alloc-size-in-bytes=<uint>                - Maximal size in bytes to promote allocations to stack.
        --max-rank-of-allocated-memref=<uint>           - Maximal memref rank to promote dynamic buffers.
      --quant-convert-const                             -   Converts constants followed by qbarrier to actual quantized values
      --quant-convert-simulated-quantization            -   Converts training-time simulated quantization ops to corresponding quantize/dequantize casts
      --reconcile-unrealized-casts                      -   Simplify and eliminate unrealized conversion casts
      --remove-shape-constraints                        -   Replace all cstr_ ops with a true witness
      --resolve-ranked-shaped-type-result-dims          -   Resolve memref.dim of result values of ranked shape type
      --resolve-shaped-type-result-dims                 -   Resolve memref.dim of result values
      --sccp                                            -   Sparse Conditional Constant Propagation
      --scf-bufferize                                   -   Bufferize the scf dialect.
      --scf-for-to-while                                -   Convert SCF for loops to SCF while loops
      --shape-bufferize                                 -   Bufferize the shape dialect.
      --shape-to-shape-lowering                         -   Legalize Shape dialect to be convertible to Standard
      --simplify-affine-structures                      -   Simplify affine expressions in maps/sets and normalize memrefs
      --slice-analysis-test                             -   Test Slice analysis functionality.
      --snapshot-op-locations                           -   Generate new locations from the current IR
        --filename=<string>                             - The filename to print the generated IR
        --tag=<string>                                  - A tag to use when fusing the new locations with the original. If unset, the locations are replaced.
      --sparse-tensor-conversion                        -   Apply conversion rules to sparse tensor primitives and types
      --sparsification                                  -   Automatically generate sparse tensor code from sparse tensor types
        --enable-simd-index32                           - Enable i32 indexing into vectors (for efficiency)
        --parallelization-strategy=<int>                - Set the parallelization strategy
        --vectorization-strategy=<int>                  - Set the vectorization strategy
        --vl=<int>                                      - Set the vector length
      --spirv-lower-abi-attrs                           -   Decorate SPIR-V composite type with layout info
      --spirv-rewrite-inserts                           -   Rewrite sequential chains of spv.CompositeInsert operations into spv.CompositeConstruct operations
      --spirv-update-vce                                -   Deduce and attach minimal (version, capabilities, extensions) requirements to spv.module ops
      --std-bufferize                                   -   Bufferize the std dialect
      --std-expand                                      -   Legalize std operations to be convertible to LLVM.
      --strip-debuginfo                                 -   Strip debug info from all operations
      --symbol-dce                                      -   Eliminate dead symbols
      --tensor-bufferize                                -   Bufferize the `tensor` dialect
      --tensor-constant-bufferize                       -   Bufferize tensor constants.
        --alignment=<uint>                              - Create global memrefs with a specified alignment
      --test-affine-data-copy                           -   Tests affine data copy utility functions.
        --for-memref-region                             - Test copy generation for a single memref region
        --memref-filter                                 - Enable memref filter testing in affine data copy optimization
      --test-affine-loop-unswitch                       -   Tests affine loop unswitching / if/else hoisting
      --test-affine-parametric-tile                     -   Tile affine loops using SSA values as tile sizes
      --test-alias-analysis                             -   Test alias analysis results.
      --test-alias-analysis-modref                      -   Test alias analysis ModRef results.
      --test-compose-subview                            -   Test combining composed subviews
      --test-comprehensive-function-bufferize           -   Test Comprehensive Bufferize of FuncOps (body only).
        --allow-return-memref                           - Allow returning/yielding memrefs from functions/blocks
        --allow-unknown-ops                             - Allows the return of memrefs (for testing purposes only)
        --analysis-fuzzer-seed=<uint>                   - Analyze ops in random order with a given seed (fuzzer)
        --dialect-filter=<string>                       - Bufferize only ops from the specified dialects
        --test-analysis-only                            - Only runs inplaceability analysis (for testing purposes only)
      --test-constant-fold                              -   Test operation constant folding
      --test-conv-vectorization                         -   Test vectorization of convolutions
        --tile-sizes=<long>                             - Vectorization sizes.
      --test-convert-call-op                            -   Tests conversion of `std.call` to `llvm.call` in presence of custom types
      --test-data-layout-query                          -   Test data layout queries
      --test-decompose-call-graph-types                 -   Decomposes types at call graph boundaries.
      --test-derived-attr                               -   Run test derived attributes
      --test-diagnostic-filter                          -   Test diagnostic filtering support.
        --filters=<string>                              - Specifies the diagnostic file name filters.
      --test-dynamic-pipeline                           -   Tests the dynamic pipeline feature by applying a pipeline on a selected set of functions
        --dynamic-pipeline=<string>                     - The pipeline description that will run on the filtered function.
        --op-name=<string>                              - List of function name to apply the pipeline to
        --run-on-nested-operations                      - This will apply the pipeline on nested operations under the visited operation.
        --run-on-parent                                 - This will apply the pipeline on the parent operation if it exist, this is expected to fail.
      --test-elements-attr-interface                    -   Test ElementsAttr interface support.
      --test-expand-tanh                                -   Test expanding tanh
      --test-extract-fixed-outer-loops                  -   test application of parametric tiling to the outer loops so that the ranges of outer loops become static
        --test-outer-loop-sizes=<long>                  - fixed number of iterations that the outer loops should have
      --test-func-erase-arg                             -   Test erasing func args.
      --test-func-erase-result                          -   Test erasing func results.
      --test-func-insert-arg                            -   Test inserting func args.
      --test-func-insert-result                         -   Test inserting func results.
      --test-func-set-type                              -   Test FuncOp::setType.
      --test-function-pass                              -   Test a function pass in the pass manager
      --test-gpu-greedy-parallel-loop-mapping           -   Greedily maps all parallel loops to gpu hardware ids.
      --test-gpu-memory-promotion                       -   Promotes the annotated arguments of gpu.func to workgroup memory.
      --test-gpu-rewrite                                -   Applies all rewrite patterns within the GPU dialect.
      --test-inline                                     -   Test inlining region calls
      --test-ir-visitors                                -   Test various visitors.
      --test-legalize-patterns                          -   Run test dialect legalization patterns
      --test-legalize-type-conversion                   -   Test various type conversion functionalities in DialectConversion
      --test-legalize-unknown-root-patterns             -   Test public remapped value mechanism in ConversionPatternRewriter
      --test-linalg-codegen-strategy                    -   Test Linalg Codegen Strategy.
        --anchor-func=<string>                          - Which single func op is the anchor for the codegen strategy to latch on.
        --anchor-op=<string>                            - Which single linalg op is the anchor for the codegen strategy to latch on:
                                                        linalg.matmul: anchor on linalg.matmul
                                                        linalg.matmul_column_major: anchor on linalg.matmul_column_major
                                                        linalg.copy: anchor on linalg.copy
                                                        linalg.fill: anchor on linalg.fill
        --decompose                                     - Decompose convolutions to lower dimensional ones.
        --fuse                                          - Fuse the producers after tiling the root op.
        --generalize                                    - Generalize named operations.
        --hoist-paddings=<long>                         - Operand hoisting depths when test-pad-pattern.
        --iterator-interchange=<long>                   - Specifies the iterator interchange.
        --pack-paddings=<long>                          - Operand packing flags when test-pad-pattern.
        --pad                                           - Pad the operands.
        --pad-inputs-only                               - Only pad input operands when test-pad-pattern
        --promote                                       - Promote the tile into a small aligned memory buffer.
        --promote-full-tile-pad                         - Pad the small aligned memory buffer to the tile sizes.
        --register-promote                              - Promote the register tile into a small aligned memory buffer.
        --register-promote-full-tile-pad                - Pad the small aligned memory buffer to the tile sizes.
        --register-tile-sizes=<long>                    - Specifies the size of the register tile that will be used  to vectorize
        --run-enable-pass                               - Run the enable pass between transformations
        --split-transfers=<string>                      - Split vector transfers between slow (masked) and fast (unmasked) variants. Possible options are:
                                                        none: keep unsplit vector.transfer and pay the full price
                                                        linalg-copy: use linalg.fill + linalg.copy for the slow path
                                                        vector-transfers: use extra small unmasked vector.transfer for the slow path
        --tile-interchange=<long>                       - Specifies the tile interchange.
        --tile-sizes=<long>                             - Specifies the tile sizes.
        --unroll-vector-transfers                       - Enable full unrolling of vector.transfer operations
        --vectorize                                     - Rewrite the linalg op as a vector operation.
        --vectorize-contraction-to=<string>             - the type of vector op to use for linalg contractions
      --test-linalg-control-fusion-by-expansion         -   Test controlling of fusion of elementwise ops with reshape by expansion
      --test-linalg-distribution                        -   Test Linalg distribution.
      --test-linalg-elementwise-fusion-patterns         -   Test Linalg element wise operation fusion patterns
      --test-linalg-fusion-transform-patterns           -   Test Linalg fusion transformation patterns by applying them greedily.
      --test-linalg-greedy-fusion                       -   Test Linalg fusion by applying a greedy test transformation.
      --test-linalg-hoisting                            -   Test Linalg hoisting functions.
        --test-hoist-redundant-transfers                - Test hoisting transfer_read/transfer_write pairs
      --test-linalg-push-reshape                        -   Test Linalg reshape push patterns
      --test-linalg-tensor-fusion-transform-patterns    -   Test Linalg on tensor fusion transformation patterns by applying them greedily.
      --test-linalg-tile-and-fuse                       -   Test Linalg tiling and fusion of a sequence of Linalg operations.
        --tile-sizes=<long>                             - Tile sizes to use for ops
      --test-linalg-tiled-loop-fusion-transform-patterns-   Test Linalg on tensor fusion transformation patterns by applying them greedily.
      --test-linalg-transform-patterns                  -   Test Linalg transformation patterns by applying them greedily.
        --loop-type=<string>                            - Specify the type of loops to generate: for, parallel or tiled_loop
        --peeled-loops=<long>                           - Loops to be peeled when test-tile-pattern
        --skip-partial                                  - Skip loops inside partial iterations during peeling
        --test-generalize-pad-tensor                    - Test transform pad tensor by copying with generic ops
        --test-linalg-promotion-options                 - Test promotion options
        --test-linalg-to-vector-patterns                - Test a set of patterns that rewrite a linalg contraction in vector.contract form
        --test-matmul-to-vector-patterns-tile-1d        - Test a fused pass that applies patterns from matmul to vectors via 1-d tiling
        --test-matmul-to-vector-patterns-tile-2d        - Test a fused pass that applies patterns from matmul to vectors via 2-d tiling
        --test-patterns                                 - Test a mixed set of patterns
        --test-swap-subtensor-padtensor                 - Test rewrite of subtensor(pad_tensor) into pad_tensor(subtensor)
        --test-tile-and-distribute-options              - Test tile and distribute options
        --test-tile-pattern                             - Test tile pattern
        --test-tile-scalarize-dynamic-dims              - Test tiling of dynamic dims by 1
        --test-tiled-loop-peeling=<uint>                - Test peeling of linalg.tiled_loop ops
        --test-transform-pad-tensor                     - Test transform pad tensor by copying with generic ops
        --test-vector-transfer-forwarding-patterns      - Test a fused pass that forwards linalg.copy to vector.transfer
        --tile-sizes=<long>                             - Linalg tile sizes for test-tile-pattern
      --test-loop-fusion                                -   Tests loop fusion utility functions.
      --test-loop-permutation                           -   Tests affine loop permutation utility
        --permutation-map=<uint>                        - Specify the loop permutation
      --test-loop-unrolling                             -   Tests loop unrolling transformation
        --annotate                                      - Annotate unrolled iterations.
        --loop-depth=<uint>                             - Loop depth.
        --unroll-factor=<ulong>                         - Loop unroll factor.
        --unroll-up-to-factor                           - Loop unroll up to factor.
      --test-mapping-to-processing-elements             -   test mapping a single loop on a virtual processor grid
      --test-match-reduction                            -   Test the match reduction utility.
      --test-matchers                                   -   Test C++ pattern matchers.
      --test-math-algebraic-simplification              -   Test math algebraic simplification
      --test-math-polynomial-approximation              -   Test math polynomial approximations
        --enable-avx2                                   - Enable approximations that emit AVX2 intrinsics via the X86Vector dialect
      --test-memref-bound-check                         -   Check memref access bounds in a Function
      --test-memref-dependence-check                    -   Checks dependences between all pairs of memref accesses.
      --test-memref-stride-calculation                  -   Test operation constant folding
      --test-merge-blocks                               -   Test Merging operation in ConversionPatternRewriter
      --test-mlir-reducer                               -   Tests MLIR Reduce tool by generating failures
      --test-module-pass                                -   Test a module pass in the pass manager
      --test-opaque-loc                                 -   Changes all leaf locations to opaque locations
      --test-operations-equality                        -   Test operations equality.
      --test-options-pass                               -   Test options parsing capabilities
        --list=<int>                                    - Example list option
        --string=<string>                               - Example string option
        --string-list=<string>                          - Example string list option
      --test-pass-crash                                 -   Test a pass in the pass manager that always crashes
      --test-pass-failure                               -   Test a pass in the pass manager that always fails
      --test-pattern-selective-replacement              -   Test selective replacement in the PatternRewriter
      --test-patterns                                   -   Run test dialect patterns
      --test-pdl-bytecode-pass                          -   Test PDL ByteCode functionality
      --test-print-callgraph                            -   Print the contents of a constructed callgraph.
      --test-print-defuse                               -   Test various printing.
      --test-print-dominance                            -   Print the dominance information for multiple regions.
      --test-print-liveness                             -   Print the contents of a constructed liveness information.
      --test-print-nesting                              -   Test various printing.
      --test-print-number-of-block-executions           -   Print the contents of a constructed number of executions analysis for all blocks.
      --test-print-number-of-operation-executions       -   Print the contents of a constructed number of executions analysis for all operations.
      --test-print-topological-sort                     -   Print operations in topological order
      --test-recursive-types                            -   Test support for recursive types
      --test-remapped-value                             -   Test public remapped value mechanism in ConversionPatternRewriter
      --test-return-type                                -   Run return type functions
      --test-scf-for-utils                              -   test scf.for utils
      --test-scf-if-utils                               -   test scf.if utils
      --test-scf-pipelining                             -   test scf.forOp pipelining
      --test-shape-function-report                      -   Test pass to report associated shape functions
      --test-side-effects                               -   Test side effects interfaces
      --test-spirv-entry-point-abi                      -   Set the spv.entry_point_abi attribute on GPU kernel function within the module, intended for testing only
        --workgroup-size=<int>                          - Workgroup size to use for all gpu.func kernels in the module, specified with x-dimension first, y-dimension next and z-dimension last. Unspecified dimensions will be set to 1
      --test-spirv-glsl-canonicalization                -   Tests SPIR-V canonicalization patterns for GLSL extension.
      --test-spirv-module-combiner                      -   Tests SPIR-V module combiner library
      --test-spirv-op-availability                      -   Test SPIR-V op availability
      --test-spirv-target-env                           -   Test SPIR-V target environment
      --test-stats-pass                                 -   Test pass statistics
      --test-symbol-rauw                                -   Test replacement of symbol uses
      --test-symbol-uses                                -   Test detection of symbol uses
      --test-trait-folder                               -   Run trait folding
      --test-type-interfaces                            -   Test type interface support.
      --test-vector-contraction-lowering                -   Test lowering patterns that lower contract ops in the vector dialect
        --vector-filter-outerproduct                    - Lower vector.contract to vector.outerproduct but not for vectors of size 4.
        --vector-lower-matrix-intrinsics                - Lower vector.contract to llvm.intr.matrix.multiply
        --vector-outerproduct                           - Lower vector.contract to vector.outerproduct
      --test-vector-distribute-patterns                 -   Test lowering patterns to distribute vector ops in the vector dialect
        --distribution-multiplicity=<int>               - Set the multiplicity used for distributing vector
      --test-vector-multi-reduction-lowering-patterns   -   Test lowering patterns to lower vector.multi_reduction to other vector ops
        --use-outer-reductions                          - Move reductions to outer most dimensions
      --test-vector-reduction-to-contract-patterns      -   Test patterns to convert multireduce op to contract and combine broadcast/transpose to contract
      --test-vector-to-forloop                          -   Test lowering patterns to break up a vector op into a for loop
        --distribution-multiplicity=<int>               - Set the multiplicity used for distributing vector
      --test-vector-to-vector-lowering                  -   Test lowering patterns between ops in the vector dialect
        --unroll                                        - Include unrolling
      --test-vector-transfer-collapse-inner-most-dims   -   Test lowering patterns that reducedes the rank of the vector transfer memory and vector operands.
      --test-vector-transfer-full-partial-split         -   Test lowering patterns to split transfer ops via scf.if + linalg ops
        --use-linalg-copy                               - Split using a unmasked vector.transfer + linalg.fill + linalg.copy operations.
      --test-vector-transfer-lowering-patterns          -   Test lowering patterns to lower transfer ops to other vector ops
      --test-vector-transfer-unrolling-patterns         -   Test lowering patterns to unroll transfer ops in the vector dialect
      --test-vector-transferop-opt                      -   Test optimization transformations for transfer ops
      --test-vector-transpose-lowering                  -   Test lowering patterns that lower contract ops in the vector dialect
        --avx2                                          - Lower vector.transpose to avx2-specific patterns
        --eltwise                                       - Lower 2-D vector.transpose to eltwise insert/extract
        --flat                                          - Lower 2-D vector.transpose to vector.flat_transpose
        --shuffle                                       - Lower 2-D vector.transpose to shape_cast + shuffle
      --test-vector-unrolling-patterns                  -   Test lowering patterns to unroll contract ops in the vector dialect
        --unroll-based-on-type                          - Set the unroll factor based on type of the operation
      --tosa-decompose-transpose-conv                   -   Deompose transpose convolutiions into standard convolutions.
      --tosa-infer-shapes                               -   Propagate shapes across TOSA operations
      --tosa-make-broadcastable                         -   TOSA rank Reshape to enable Broadcasting
      --tosa-test-quant-utils                           -   TOSA Test: Exercise the APIs in QuantUtils.cpp.
      --tosa-to-linalg                                  -   Lower TOSA to LinAlg on tensors
      --tosa-to-scf                                     -   Lower TOSA to the SCF dialect
      --tosa-to-standard                                -   Lower TOSA to the Standard dialect
      --view-op-graph                                   -   Print Graphviz visualization of an operation
        --max-label-len=<uint>                          - Limit attribute/type length to number of chars
        --print-attrs                                   - Print attributes of operations
        --print-control-flow-edges                      - Print control flow edges
        --print-data-flow-edges                         - Print data flow edges
        --print-result-types                            - Print result types of operations
    Pass Pipelines:
      --test-dump-pipeline                              -   Dumps the pipeline build so far for debugging purposes
      --test-options-pass-pipeline                      -   Parses options using pass pipeline registration
        --list=<int>                                    - Example list option
        --string=<string>                               - Example string option
        --string-list=<string>                          - Example string list option
      --test-pm-nested-pipeline                         -   Test a nested pipeline in the pass manager
      --test-textual-pm-nested-pipeline                 -   Test a nested pipeline in the pass manager
  --show-dialects                                       - Print the list of registered dialects
  --split-input-file                                    - Split the input file into pieces and process each chunk independently
  --test-legalize-mode=<value>                          - The legalization mode to use with the test driver
    =analysis                                           -   Perform an analysis conversion
    =full                                               -   Perform a full conversion
    =partial                                            -   Perform a partial conversion
  --verify-diagnostics                                  - Check that emitted diagnostics match expected-* lines on the corresponding line
  --verify-each                                         - Run the verifier after each transformation pass

Generic Options:

  --help                                                - Display available options (--help-hidden for more)
  --help-list                                           - Display list of available options (--help-list-hidden for more)
  --version                                             - Display the version of this program

affine-super-vectorizer-test options:

  --backward-slicing                                    - Enable testing backward static slicing and topological sort functionalities
  --compose-maps                                        - Enable testing the composition of AffineMap where each AffineMap in the composition is specified as the affine_map attribute in a constant op.
  --forward-slicing                                     - Enable testing forward static slicing and topological sort functionalities
  --slicing                                             - Enable testing static slicing and topological sort functionalities
  --vector-shape-ratio=<int>                            - Specify the HW vector size for vectorization
  --vectorize-affine-loop-nest                          - Enable testing for the 'vectorizeAffineLoopNest' utility by vectorizing the outermost loops found

test-loop-fusion options:

  --test-loop-fusion-dependence-check                   - Enable testing of loop fusion dependence check
  --test-loop-fusion-slice-computation                  - Enable testing of loop fusion slice computation
  --test-loop-fusion-transformation    
⚠️ **GitHub.com Fallback** ⚠️