> /Workspace/buddy-mlir/llvm/build/bin/mlir-opt -h pointwise_benchmark [18e39c6] modified
OVERVIEW: MLIR modular optimizer driver
Available Dialects: acc, affine, amx, arith, arm_neon, arm_sve, async, bufferization, builtin, complex, dlti, emitc, gpu, linalg, llvm, math, memref, nvvm, omp, pdl, pdl_interp, quant, rocdl, scf, shape, sparse_tensor, spv, std, tensor, test, tosa, vector, x86vector
USAGE: mlir-opt [options] <input file>
OPTIONS:
Color Options:
--color - Use colors in output (default=autodetect)
General options:
--allow-unregistered-dialect - Allow operation with no registered dialects
--disable-i2p-p2i-opt - Disables inttoptr/ptrtoint roundtrip optimization
--dot-cfg-mssa=<file name for generated dot file> - file name for generated dot file
--mlir-debug-counter=<string> - Comma separated list of debug counter skip and count arguments
--mlir-disable-threading - Disable multi-threading within MLIR, overrides any further call to MLIRContext::enableMultiThreading()
--mlir-elide-elementsattrs-if-larger=<uint> - Elide ElementsAttrs with "..." that have more elements than the given upper limit
--mlir-pretty-debuginfo - Print pretty debug info in MLIR output
--mlir-print-debug-counter - Print out debug counter information after all counters have been accumulated
--mlir-print-debuginfo - Print debug info in MLIR output
--mlir-print-elementsattrs-with-hex-if-larger=<long> - Print DenseElementsAttrs with a hex string that have more elements than the given upper limit (use -1 to disable)
--mlir-print-op-on-diagnostic - When a diagnostic is emitted on an operation, also print the operation as an attached note
--mlir-print-stacktrace-on-diagnostic - When a diagnostic is emitted, also print the stack trace as an attached note
--mlir-timing - Display execution times
--mlir-timing-display=<value> - Display method for timing data
=list - display the results in a list sorted by total time
=tree - display the results ina with a nested tree view
-o=<filename> - Output filename
--opaque-pointers - Use opaque pointers
--pass-pipeline-crash-reproducer=<string> - Generate a .mlir reproducer file at the given output path if the pass manager crashes or fails
--pass-pipeline-local-reproducer - When generating a crash reproducer, attempt to generated a reproducer with the smallest pipeline.
--pass-statistics - Display the statistics of each pass
--pass-statistics-display=<value> - Display method for pass statistics
=list - display the results in a merged list sorted by pass name
=pipeline - display the results with a nested pipeline view
--print-ir-after=<pass-arg> - Print IR after specified passes
--print-ir-after-all - Print IR after each pass
--print-ir-after-change - When printing the IR after a pass, only print if the IR changed
--print-ir-after-failure - When printing the IR after a pass, only print if the pass failed
--print-ir-before=<pass-arg> - Print IR before specified passes
--print-ir-before-all - Print IR before each pass
--print-ir-module-scope - When printing IR for print-ir-[before|after]{-all} always print the top-level operation
--run-reproducer - Append the command line options of the reproducer
Compiler passes to run
--pass-pipeline - A textual description of a pass pipeline to run
Passes:
--affine-data-copy-generate - Generate explicit copying for affine memory operations
--fast-mem-capacity=<ulong> - Set fast memory space capacity in KiB (default: unlimited)
--fast-mem-space=<uint> - Fast memory space identifier for copy generation (default: 1)
--generate-dma - Generate DMA instead of point-wise copy
--min-dma-transfer=<int> - Minimum DMA transfer size supported by the target in bytes
--skip-non-unit-stride-loops - Testing purposes: avoid non-unit stride loop choice depths for copy placement
--slow-mem-space=<uint> - Slow memory space identifier for copy generation (default: 0)
--tag-mem-space=<uint> - Tag memory space identifier for copy generation (default: 0)
--affine-loop-fusion - Fuse affine loop nests
--fusion-compute-tolerance=<number> - Fractional increase in additional computation tolerated while fusing
--fusion-fast-mem-space=<uint> - Faster memory space number to promote fusion buffers to
--fusion-local-buf-threshold=<ulong> - Threshold size (KiB) for promoting local buffers to fast memory space
--fusion-maximal - Enables maximal loop fusion
--mode=<value> - fusion mode to attempt
=greedy - Perform greedy (both producer-consumer and sibling) fusion
=producer - Perform only producer-consumer fusion
=sibling - Perform only sibling fusion
--affine-loop-invariant-code-motion - Hoist loop invariant instructions outside of affine loops
--affine-loop-normalize - Apply normalization transformations to affine loop-like ops
--affine-loop-tile - Tile affine loop nests
--cache-size=<ulong> - Set size of cache to tile for in KiB
--separate - Separate full and partial tiles
--tile-size=<uint> - Use this tile size for all loops
--tile-sizes=<uint> - List of tile sizes for each perfect nest (overridden by -tile-size)
--affine-loop-unroll - Unroll affine loops
--unroll-factor=<uint> - Use this unroll factor for all loops being unrolled
--unroll-full - Fully unroll loops
--unroll-full-threshold=<uint> - Unroll all loops with trip count less than or equal to this
--unroll-num-reps=<uint> - Unroll innermost loops repeatedly this many times
--unroll-up-to-factor - Allow unrolling up to the factor specified
--affine-loop-unroll-jam - Unroll and jam affine loops
--unroll-jam-factor=<uint> - Use this unroll jam factor for all loops (default 4)
--affine-parallelize - Convert affine.for ops into 1-D affine.parallel
--max-nested=<uint> - Maximum number of nested parallel loops to produce. Defaults to unlimited (UINT_MAX).
--parallel-reductions - Whether to parallelize reduction loops. Defaults to false.
--affine-pipeline-data-transfer - Pipeline non-blocking data transfers between explicitly managed levels of the memory hierarchy
--affine-scalrep - Replace affine memref acceses by scalars by forwarding stores to loads and eliminating redundant loads
--affine-super-vectorize - Vectorize to a target independent n-D vector abstraction
--test-fastest-varying=<long> - Specify a 1-D, 2-D or 3-D pattern of fastest varying memory dimensions to match. See defaultPatterns in Vectorize.cpp for a description and examples. This is used for testing purposes
--vectorize-reductions - Vectorize known reductions expressed via iter_args. Switched off by default.
--virtual-vector-size=<long> - Specify an n-D virtual vector size for vectorization
--affine-super-vectorizer-test - Tests vectorizer standalone functionality.
--arith-bufferize - Bufferize Arithmetic dialect ops.
--arith-expand - Legalize Arithmetic ops to be convertible to LLVM.
--arm-neon-2d-to-intr - Convert Arm NEON structured ops to intrinsics
--async-parallel-for - Convert scf.parallel operations to multiple async compute ops executed concurrently for non-overlapping iteration ranges
--async-dispatch - Dispatch async compute tasks using recursive work splitting. If `false` async compute tasks will be launched using simple for loop in the caller thread.
--min-task-size=<int> - The minimum task size for sharding parallel operation.
--num-workers=<int> - The number of available workers to execute async operations.
--async-runtime-policy-based-ref-counting - Policy based reference counting for Async runtime operations
--async-runtime-ref-counting - Automatic reference counting for Async runtime operations
--async-runtime-ref-counting-opt - Optimize automatic reference counting operations for theAsync runtime by removing redundant operations
--async-to-async-runtime - Lower high level async operations (e.g. async.execute) to theexplicit async.runtime and async.coro operations
--eliminate-blocking-await-ops - Rewrite functions with blocking async.runtime.await as coroutines with async.runtime.await_and_resume.
--buffer-deallocation - Adds all required dealloc operations for all allocations in the input program
--buffer-hoisting - Optimizes placement of allocation operations by moving them into common dominators and out of nested regions
--buffer-loop-hoisting - Optimizes placement of allocation operations by moving them out of loop nests
--buffer-results-to-out-params - Converts memref-typed function results to out-params
--canonicalize - Canonicalize operations
--disable-patterns=<string> - Labels of patterns that should be filtered out during application
--enable-patterns=<string> - Labels of patterns that should be used during application, all other patterns are filtered out
--max-iterations=<long> - Seed the worklist in general top-down order
--region-simplify - Seed the worklist in general top-down order
--top-down - Seed the worklist in general top-down order
--convert-affine-for-to-gpu - Convert top-level AffineFor Ops to GPU kernels
--gpu-block-dims=<uint> - Number of GPU block dimensions for mapping
--gpu-thread-dims=<uint> - Number of GPU thread dimensions for mapping
--convert-arith-to-llvm - Convert Arithmetic dialect to LLVM dialect
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word
--convert-arith-to-spirv - Convert Arithmetic dialect to SPIR-V dialect
--emulate-non-32-bit-scalar-types - Emulate non-32-bit scalar types with 32-bit ones if missing native support
--convert-async-to-llvm - Convert the operations from the async dialect into the LLVM dialect
--convert-bufferization-to-memref - Convert operations from the Bufferization dialect to the MemRef dialect
--convert-complex-to-llvm - Convert Complex dialect to LLVM dialect
--convert-complex-to-standard - Convert Complex dialect to standard dialect
--convert-elementwise-to-linalg - Convert ElementwiseMappable ops to linalg
--convert-gpu-launch-to-vulkan-launch - Convert gpu.launch_func to vulkanLaunch external call
--convert-gpu-to-nvvm - Generate NVVM operations for gpu operations
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word
--convert-gpu-to-rocdl - Generate ROCDL operations for gpu operations
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word
--runtime=<value> - Runtime code will be run on (default is Unknown, can also use HIP or OpenCl)
=unknown - Unknown (default)
=HIP - HIP
=OpenCL - OpenCL
--convert-gpu-to-spirv - Convert GPU dialect to SPIR-V dialect
--convert-linalg-tiled-loops-to-scf - Lower linalg tiled loops to SCF loops and parallel loops
--convert-linalg-to-affine-loops - Lower the operations from the linalg dialect into affine loops
--convert-linalg-to-llvm - Convert the operations from the linalg dialect into the LLVM dialect
--convert-linalg-to-loops - Lower the operations from the linalg dialect into loops
--convert-linalg-to-parallel-loops - Lower the operations from the linalg dialect into parallel loops
--convert-linalg-to-spirv - Convert Linalg dialect to SPIR-V dialect
--convert-linalg-to-std - Convert the operations from the linalg dialect into the Standard dialect
--convert-math-to-libm - Convert Math dialect to libm calls
--convert-math-to-llvm - Convert Math dialect to LLVM dialect
--convert-math-to-spirv - Convert Math dialect to SPIR-V dialect
--convert-memref-to-llvm - Convert operations from the MemRef dialect to the LLVM dialect
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word
--use-aligned-alloc - Use aligned_alloc in place of malloc for heap allocations
--convert-memref-to-spirv - Convert MemRef dialect to SPIR-V dialect
--bool-num-bits=<int> - The number of bits to store a boolean value
--convert-openacc-to-llvm - Convert the OpenACC ops to LLVM dialect
--convert-openacc-to-scf - Convert the OpenACC ops to OpenACC with SCF dialect
--convert-openmp-to-llvm - Convert the OpenMP ops to OpenMP ops with LLVM dialect
--convert-parallel-loops-to-gpu - Convert mapped scf.parallel ops to gpu launch operations
--convert-pdl-to-pdl-interp - Convert PDL ops to PDL interpreter ops
--convert-scf-to-openmp - Convert SCF parallel loop to OpenMP parallel + workshare constructs.
--convert-scf-to-spirv - Convert SCF dialect to SPIR-V dialect.
--convert-scf-to-std - Convert SCF dialect to Standard dialect, replacing structured control flow with a CFG
--convert-shape-constraints - Convert shape constraint operations to the standard dialect
--convert-shape-to-std - Convert operations from the shape dialect into the standard dialect
--convert-spirv-to-llvm - Convert SPIR-V dialect to LLVM dialect
--convert-std-to-llvm - Convert scalar and vector operations from the Standard to the LLVM dialect
--data-layout=<string> - String description (LLVM format) of the data layout that is expected on the produced module
--emit-c-wrappers - Emit wrappers for C-compatible pointer-to-struct memref descriptors
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word
--use-bare-ptr-memref-call-conv - Replace FuncOp's MemRef arguments with bare pointers to the MemRef element types
--convert-std-to-spirv - Convert Standard dialect to SPIR-V dialect
--emulate-non-32-bit-scalar-types - Emulate non-32-bit scalar types with 32-bit ones if missing native support
--convert-vector-to-gpu - Lower the operations from the vector dialect into the GPU dialect
--convert-vector-to-llvm - Lower the operations from the vector dialect into the LLVM dialect
--enable-amx - Enables the use of AMX dialect while lowering the vector dialect.
--enable-arm-neon - Enables the use of ArmNeon dialect while lowering the vector dialect.
--enable-arm-sve - Enables the use of ArmSVE dialect while lowering the vector dialect.
--enable-index-optimizations - Allows compiler to assume indices fit in 32-bit if that yields faster code
--enable-x86vector - Enables the use of X86Vector dialect while lowering the vector dialect.
--reassociate-fp-reductions - Allows llvm to reassociate floating-point reductions for speed
--convert-vector-to-rocdl - Lower the operations from the vector dialect into the ROCDL dialect
--convert-vector-to-scf - Lower the operations from the vector dialect into the SCF dialect
--full-unroll - Perform full unrolling when converting vector transfers to SCF
--lower-permutation-maps - Replace permutation maps with vector transposes/broadcasts before lowering transfer ops
--lower-tensors - Lower transfer ops that operate on tensors
--target-rank=<uint> - Target vector rank to which transfer ops should be lowered
--convert-vector-to-spirv - Convert Vector dialect to SPIR-V dialect
--cse - Eliminate common sub-expressions
--decorate-spirv-composite-type-layout - Decorate SPIR-V composite type with layout info
--finalizing-bufferize - Finalize a partial bufferization
--fold-memref-subview-ops - Fold memref.subview ops into consumer load/store ops
--for-loop-canonicalization - Canonicalize operations within scf.for loop bodies
--for-loop-peeling - Peel `for` loops at their upper bounds.
--skip-partial - Do not peel loops inside of the last, partial iteration of another already peeled loop.
--for-loop-range-folding - Fold add/mul ops into loop range
--for-loop-specialization - Specialize `for` loops for vectorization
--func-bufferize - Bufferize func/call/return ops
--gpu-async-region - Make GPU ops async
--gpu-kernel-outlining - Outline gpu.launch bodies to kernel functions
--gpu-to-llvm - Convert GPU dialect to LLVM dialect with GPU runtime calls
--gpu-binary-annotation=<string> - Annotation attribute string for GPU binary
--inline - Inline function calls
--default-pipeline=<string> - The default optimizer pipeline used for callables
--max-iterations=<uint> - Maximum number of iterations when inlining within an SCC
--op-pipelines=<string> - Callable operation specific optimizer pipelines (in the form of `dialect.op(pipeline)`)
--launch-func-to-vulkan - Convert vulkanLaunch external call to Vulkan runtime external calls
--linalg-bufferize - Bufferize the linalg dialect
--linalg-comprehensive-module-bufferize - Bufferize (tensor into memref) for a Module.
--allow-return-memref - Allows the return of memrefs (for testing purposes only)
--allow-unknown-ops - Allows unknown (not bufferizable) ops in the input IR.
--analysis-fuzzer-seed=<uint> - Analyze ops in random order with a given seed (fuzzer)
--test-analysis-only - Only runs inplaceability analysis (for testing purposes only)
--use-alloca - Use stack allocations for memrefs (for testing purposes only)
--linalg-detensorize - Detensorize linalg ops
--aggressive-mode - Detensorize all ops that qualify for detensoring along with branch operands and basic-block arguments.
--linalg-fold-reshape-ops-by-linearization - Fold TensorReshapeOps with generic/indexed generic ops by linearization
--allow-folding-unit-dim-reshapes - Allow fusing linalg.tensor_reshape ops that performs unit dimension collapsing
--linalg-fold-unit-extent-dims - Remove unit-extent dimension in Linalg ops on tensors
--fold-one-trip-loops-only - Only folds the one-trip loops from Linalg ops on tensors (for testing purposes only)
--linalg-fuse-elementwise-ops - Fuse elementwise operations on tensors
--allow-folding-unit-dim-reshapes - Allow fusing linalg.tensor_reshape ops that performs unit dimension collapsing
--linalg-generalize-named-ops - Convert named ops into generic ops
--linalg-inline-scalar-operands - Inline scalar operands into linalg generic ops
--linalg-promote-subviews - Promote subview ops to local buffers
--test-promote-dynamic - Test generation of dynamic promoted buffers
--test-use-alloca - Test generation of alloca'ed buffers.
--linalg-strategy-decompose-pass - Configurable pass to apply pattern-based generalization.
--anchor-func=<string> - Which func op is the anchor to latch on.
--linalg-strategy-enable-pass - Configurable pass to enable the application of other pattern-based linalg passes.
--anchor-func=<string> - Which func op is the anchor to latch on.
--linalg-strategy-generalize-pass - Configurable pass to apply pattern-based generalization.
--anchor-func=<string> - Which func op is the anchor to latch on.
--anchor-op=<string> - Which linalg op within the func is the anchor to latch on.
--linalg-strategy-interchange-pass - Configurable pass to apply pattern-based iterator interchange.
--anchor-func=<string> - Which func op is the anchor to latch on.
--linalg-strategy-lower-vectors-pass - Configurable pass to lower vector operations.
--anchor-func=<string> - Which func op is the anchor to latch on.
--linalg-strategy-pad-pass - Configurable pass to apply padding and hoisting.
--anchor-func=<string> - Which func op is the anchor to latch on.
--anchor-op=<string> - Which linalg op within the func is the anchor to latch on.
--linalg-strategy-promote-pass - Configurable pass to apply pattern-based linalg promotion.
--anchor-func=<string> - Which func op is the anchor to latch on.
--anchor-op=<string> - Which linalg op within the func is the anchor to latch on.
--linalg-strategy-remove-markers-pass - Cleanup pass that drops markers.
--anchor-func=<string> - Which func op is the anchor to latch on.
--linalg-strategy-tile-and-fuse-pass - Configurable pass to apply pattern-based tiling and fusion.
--anchor-func=<string> - Which func op is the anchor to latch on.
--anchor-op=<string> - Which linalg op within the func is the anchor to latch on.
--linalg-strategy-tile-pass - Configurable pass to apply pattern-based linalg tiling.
--anchor-func=<string> - Which func op is the anchor to latch on.
--anchor-op=<string> - Which linalg op within the func is the anchor to latch on.
--linalg-strategy-vectorize-pass - Configurable pass to apply pattern-based linalg vectorization.
--anchor-func=<string> - Which func op is the anchor to latch on.
--anchor-op=<string> - Which linalg op within the func is the anchor to latch on.
--vectorize-padding - Enable vectorization of padding ops.
--linalg-tile - Tile operations in the linalg dialect
--distribution-types=<string> - DistributionTypes (if loop-type=tiled_loop)
--loop-type=<string> - Specify the type of loops to generate: for, parallel or tiled_loop
--tile-sizes=<long> - Tile sizes
--llvm-legalize-for-export - Legalize LLVM dialect to be convertible to LLVM IR
--loop-coalescing - Coalesce nested loops with independent bounds into a single loop
--loop-invariant-code-motion - Hoist loop invariant instructions outside of the loop
--lower-affine - Lower Affine operations to a combination of Standard and SCF operations
--lower-host-to-llvm - Lowers the host module code and `gpu.launch_func` to LLVM
--normalize-memrefs - Normalize memrefs
--parallel-loop-collapsing - Collapse parallel loops to use less induction variables
--collapsed-indices-0=<uint> - Which loop indices to combine 0th loop index
--collapsed-indices-1=<uint> - Which loop indices to combine into the position 1 loop index
--collapsed-indices-2=<uint> - Which loop indices to combine into the position 2 loop index
--parallel-loop-fusion - Fuse adjacent parallel loops
--parallel-loop-specialization - Specialize parallel loops for vectorization
--parallel-loop-tiling - Tile parallel loops
--no-min-max-bounds - Perform tiling with fixed upper bound with inbound check inside the internal loops
--parallel-loop-tile-sizes=<long> - Factors to tile parallel loops by
--print-op-stats - Print statistics of operations
--promote-buffers-to-stack - Promotes heap-based allocations to automatically managed stack-based allocations
--bitwidth-of-index-type=<uint> - Bitwidth of the index type. Used for size estimation.
--max-alloc-size-in-bytes=<uint> - Maximal size in bytes to promote allocations to stack.
--max-rank-of-allocated-memref=<uint> - Maximal memref rank to promote dynamic buffers.
--quant-convert-const - Converts constants followed by qbarrier to actual quantized values
--quant-convert-simulated-quantization - Converts training-time simulated quantization ops to corresponding quantize/dequantize casts
--reconcile-unrealized-casts - Simplify and eliminate unrealized conversion casts
--remove-shape-constraints - Replace all cstr_ ops with a true witness
--resolve-ranked-shaped-type-result-dims - Resolve memref.dim of result values of ranked shape type
--resolve-shaped-type-result-dims - Resolve memref.dim of result values
--sccp - Sparse Conditional Constant Propagation
--scf-bufferize - Bufferize the scf dialect.
--scf-for-to-while - Convert SCF for loops to SCF while loops
--shape-bufferize - Bufferize the shape dialect.
--shape-to-shape-lowering - Legalize Shape dialect to be convertible to Standard
--simplify-affine-structures - Simplify affine expressions in maps/sets and normalize memrefs
--slice-analysis-test - Test Slice analysis functionality.
--snapshot-op-locations - Generate new locations from the current IR
--filename=<string> - The filename to print the generated IR
--tag=<string> - A tag to use when fusing the new locations with the original. If unset, the locations are replaced.
--sparse-tensor-conversion - Apply conversion rules to sparse tensor primitives and types
--sparsification - Automatically generate sparse tensor code from sparse tensor types
--enable-simd-index32 - Enable i32 indexing into vectors (for efficiency)
--parallelization-strategy=<int> - Set the parallelization strategy
--vectorization-strategy=<int> - Set the vectorization strategy
--vl=<int> - Set the vector length
--spirv-lower-abi-attrs - Decorate SPIR-V composite type with layout info
--spirv-rewrite-inserts - Rewrite sequential chains of spv.CompositeInsert operations into spv.CompositeConstruct operations
--spirv-update-vce - Deduce and attach minimal (version, capabilities, extensions) requirements to spv.module ops
--std-bufferize - Bufferize the std dialect
--std-expand - Legalize std operations to be convertible to LLVM.
--strip-debuginfo - Strip debug info from all operations
--symbol-dce - Eliminate dead symbols
--tensor-bufferize - Bufferize the `tensor` dialect
--tensor-constant-bufferize - Bufferize tensor constants.
--alignment=<uint> - Create global memrefs with a specified alignment
--test-affine-data-copy - Tests affine data copy utility functions.
--for-memref-region - Test copy generation for a single memref region
--memref-filter - Enable memref filter testing in affine data copy optimization
--test-affine-loop-unswitch - Tests affine loop unswitching / if/else hoisting
--test-affine-parametric-tile - Tile affine loops using SSA values as tile sizes
--test-alias-analysis - Test alias analysis results.
--test-alias-analysis-modref - Test alias analysis ModRef results.
--test-compose-subview - Test combining composed subviews
--test-comprehensive-function-bufferize - Test Comprehensive Bufferize of FuncOps (body only).
--allow-return-memref - Allow returning/yielding memrefs from functions/blocks
--allow-unknown-ops - Allows the return of memrefs (for testing purposes only)
--analysis-fuzzer-seed=<uint> - Analyze ops in random order with a given seed (fuzzer)
--dialect-filter=<string> - Bufferize only ops from the specified dialects
--test-analysis-only - Only runs inplaceability analysis (for testing purposes only)
--test-constant-fold - Test operation constant folding
--test-conv-vectorization - Test vectorization of convolutions
--tile-sizes=<long> - Vectorization sizes.
--test-convert-call-op - Tests conversion of `std.call` to `llvm.call` in presence of custom types
--test-data-layout-query - Test data layout queries
--test-decompose-call-graph-types - Decomposes types at call graph boundaries.
--test-derived-attr - Run test derived attributes
--test-diagnostic-filter - Test diagnostic filtering support.
--filters=<string> - Specifies the diagnostic file name filters.
--test-dynamic-pipeline - Tests the dynamic pipeline feature by applying a pipeline on a selected set of functions
--dynamic-pipeline=<string> - The pipeline description that will run on the filtered function.
--op-name=<string> - List of function name to apply the pipeline to
--run-on-nested-operations - This will apply the pipeline on nested operations under the visited operation.
--run-on-parent - This will apply the pipeline on the parent operation if it exist, this is expected to fail.
--test-elements-attr-interface - Test ElementsAttr interface support.
--test-expand-tanh - Test expanding tanh
--test-extract-fixed-outer-loops - test application of parametric tiling to the outer loops so that the ranges of outer loops become static
--test-outer-loop-sizes=<long> - fixed number of iterations that the outer loops should have
--test-func-erase-arg - Test erasing func args.
--test-func-erase-result - Test erasing func results.
--test-func-insert-arg - Test inserting func args.
--test-func-insert-result - Test inserting func results.
--test-func-set-type - Test FuncOp::setType.
--test-function-pass - Test a function pass in the pass manager
--test-gpu-greedy-parallel-loop-mapping - Greedily maps all parallel loops to gpu hardware ids.
--test-gpu-memory-promotion - Promotes the annotated arguments of gpu.func to workgroup memory.
--test-gpu-rewrite - Applies all rewrite patterns within the GPU dialect.
--test-inline - Test inlining region calls
--test-ir-visitors - Test various visitors.
--test-legalize-patterns - Run test dialect legalization patterns
--test-legalize-type-conversion - Test various type conversion functionalities in DialectConversion
--test-legalize-unknown-root-patterns - Test public remapped value mechanism in ConversionPatternRewriter
--test-linalg-codegen-strategy - Test Linalg Codegen Strategy.
--anchor-func=<string> - Which single func op is the anchor for the codegen strategy to latch on.
--anchor-op=<string> - Which single linalg op is the anchor for the codegen strategy to latch on:
linalg.matmul: anchor on linalg.matmul
linalg.matmul_column_major: anchor on linalg.matmul_column_major
linalg.copy: anchor on linalg.copy
linalg.fill: anchor on linalg.fill
--decompose - Decompose convolutions to lower dimensional ones.
--fuse - Fuse the producers after tiling the root op.
--generalize - Generalize named operations.
--hoist-paddings=<long> - Operand hoisting depths when test-pad-pattern.
--iterator-interchange=<long> - Specifies the iterator interchange.
--pack-paddings=<long> - Operand packing flags when test-pad-pattern.
--pad - Pad the operands.
--pad-inputs-only - Only pad input operands when test-pad-pattern
--promote - Promote the tile into a small aligned memory buffer.
--promote-full-tile-pad - Pad the small aligned memory buffer to the tile sizes.
--register-promote - Promote the register tile into a small aligned memory buffer.
--register-promote-full-tile-pad - Pad the small aligned memory buffer to the tile sizes.
--register-tile-sizes=<long> - Specifies the size of the register tile that will be used to vectorize
--run-enable-pass - Run the enable pass between transformations
--split-transfers=<string> - Split vector transfers between slow (masked) and fast (unmasked) variants. Possible options are:
none: keep unsplit vector.transfer and pay the full price
linalg-copy: use linalg.fill + linalg.copy for the slow path
vector-transfers: use extra small unmasked vector.transfer for the slow path
--tile-interchange=<long> - Specifies the tile interchange.
--tile-sizes=<long> - Specifies the tile sizes.
--unroll-vector-transfers - Enable full unrolling of vector.transfer operations
--vectorize - Rewrite the linalg op as a vector operation.
--vectorize-contraction-to=<string> - the type of vector op to use for linalg contractions
--test-linalg-control-fusion-by-expansion - Test controlling of fusion of elementwise ops with reshape by expansion
--test-linalg-distribution - Test Linalg distribution.
--test-linalg-elementwise-fusion-patterns - Test Linalg element wise operation fusion patterns
--test-linalg-fusion-transform-patterns - Test Linalg fusion transformation patterns by applying them greedily.
--test-linalg-greedy-fusion - Test Linalg fusion by applying a greedy test transformation.
--test-linalg-hoisting - Test Linalg hoisting functions.
--test-hoist-redundant-transfers - Test hoisting transfer_read/transfer_write pairs
--test-linalg-push-reshape - Test Linalg reshape push patterns
--test-linalg-tensor-fusion-transform-patterns - Test Linalg on tensor fusion transformation patterns by applying them greedily.
--test-linalg-tile-and-fuse - Test Linalg tiling and fusion of a sequence of Linalg operations.
--tile-sizes=<long> - Tile sizes to use for ops
--test-linalg-tiled-loop-fusion-transform-patterns- Test Linalg on tensor fusion transformation patterns by applying them greedily.
--test-linalg-transform-patterns - Test Linalg transformation patterns by applying them greedily.
--loop-type=<string> - Specify the type of loops to generate: for, parallel or tiled_loop
--peeled-loops=<long> - Loops to be peeled when test-tile-pattern
--skip-partial - Skip loops inside partial iterations during peeling
--test-generalize-pad-tensor - Test transform pad tensor by copying with generic ops
--test-linalg-promotion-options - Test promotion options
--test-linalg-to-vector-patterns - Test a set of patterns that rewrite a linalg contraction in vector.contract form
--test-matmul-to-vector-patterns-tile-1d - Test a fused pass that applies patterns from matmul to vectors via 1-d tiling
--test-matmul-to-vector-patterns-tile-2d - Test a fused pass that applies patterns from matmul to vectors via 2-d tiling
--test-patterns - Test a mixed set of patterns
--test-swap-subtensor-padtensor - Test rewrite of subtensor(pad_tensor) into pad_tensor(subtensor)
--test-tile-and-distribute-options - Test tile and distribute options
--test-tile-pattern - Test tile pattern
--test-tile-scalarize-dynamic-dims - Test tiling of dynamic dims by 1
--test-tiled-loop-peeling=<uint> - Test peeling of linalg.tiled_loop ops
--test-transform-pad-tensor - Test transform pad tensor by copying with generic ops
--test-vector-transfer-forwarding-patterns - Test a fused pass that forwards linalg.copy to vector.transfer
--tile-sizes=<long> - Linalg tile sizes for test-tile-pattern
--test-loop-fusion - Tests loop fusion utility functions.
--test-loop-permutation - Tests affine loop permutation utility
--permutation-map=<uint> - Specify the loop permutation
--test-loop-unrolling - Tests loop unrolling transformation
--annotate - Annotate unrolled iterations.
--loop-depth=<uint> - Loop depth.
--unroll-factor=<ulong> - Loop unroll factor.
--unroll-up-to-factor - Loop unroll up to factor.
--test-mapping-to-processing-elements - test mapping a single loop on a virtual processor grid
--test-match-reduction - Test the match reduction utility.
--test-matchers - Test C++ pattern matchers.
--test-math-algebraic-simplification - Test math algebraic simplification
--test-math-polynomial-approximation - Test math polynomial approximations
--enable-avx2 - Enable approximations that emit AVX2 intrinsics via the X86Vector dialect
--test-memref-bound-check - Check memref access bounds in a Function
--test-memref-dependence-check - Checks dependences between all pairs of memref accesses.
--test-memref-stride-calculation - Test operation constant folding
--test-merge-blocks - Test Merging operation in ConversionPatternRewriter
--test-mlir-reducer - Tests MLIR Reduce tool by generating failures
--test-module-pass - Test a module pass in the pass manager
--test-opaque-loc - Changes all leaf locations to opaque locations
--test-operations-equality - Test operations equality.
--test-options-pass - Test options parsing capabilities
--list=<int> - Example list option
--string=<string> - Example string option
--string-list=<string> - Example string list option
--test-pass-crash - Test a pass in the pass manager that always crashes
--test-pass-failure - Test a pass in the pass manager that always fails
--test-pattern-selective-replacement - Test selective replacement in the PatternRewriter
--test-patterns - Run test dialect patterns
--test-pdl-bytecode-pass - Test PDL ByteCode functionality
--test-print-callgraph - Print the contents of a constructed callgraph.
--test-print-defuse - Test various printing.
--test-print-dominance - Print the dominance information for multiple regions.
--test-print-liveness - Print the contents of a constructed liveness information.
--test-print-nesting - Test various printing.
--test-print-number-of-block-executions - Print the contents of a constructed number of executions analysis for all blocks.
--test-print-number-of-operation-executions - Print the contents of a constructed number of executions analysis for all operations.
--test-print-topological-sort - Print operations in topological order
--test-recursive-types - Test support for recursive types
--test-remapped-value - Test public remapped value mechanism in ConversionPatternRewriter
--test-return-type - Run return type functions
--test-scf-for-utils - test scf.for utils
--test-scf-if-utils - test scf.if utils
--test-scf-pipelining - test scf.forOp pipelining
--test-shape-function-report - Test pass to report associated shape functions
--test-side-effects - Test side effects interfaces
--test-spirv-entry-point-abi - Set the spv.entry_point_abi attribute on GPU kernel function within the module, intended for testing only
--workgroup-size=<int> - Workgroup size to use for all gpu.func kernels in the module, specified with x-dimension first, y-dimension next and z-dimension last. Unspecified dimensions will be set to 1
--test-spirv-glsl-canonicalization - Tests SPIR-V canonicalization patterns for GLSL extension.
--test-spirv-module-combiner - Tests SPIR-V module combiner library
--test-spirv-op-availability - Test SPIR-V op availability
--test-spirv-target-env - Test SPIR-V target environment
--test-stats-pass - Test pass statistics
--test-symbol-rauw - Test replacement of symbol uses
--test-symbol-uses - Test detection of symbol uses
--test-trait-folder - Run trait folding
--test-type-interfaces - Test type interface support.
--test-vector-contraction-lowering - Test lowering patterns that lower contract ops in the vector dialect
--vector-filter-outerproduct - Lower vector.contract to vector.outerproduct but not for vectors of size 4.
--vector-lower-matrix-intrinsics - Lower vector.contract to llvm.intr.matrix.multiply
--vector-outerproduct - Lower vector.contract to vector.outerproduct
--test-vector-distribute-patterns - Test lowering patterns to distribute vector ops in the vector dialect
--distribution-multiplicity=<int> - Set the multiplicity used for distributing vector
--test-vector-multi-reduction-lowering-patterns - Test lowering patterns to lower vector.multi_reduction to other vector ops
--use-outer-reductions - Move reductions to outer most dimensions
--test-vector-reduction-to-contract-patterns - Test patterns to convert multireduce op to contract and combine broadcast/transpose to contract
--test-vector-to-forloop - Test lowering patterns to break up a vector op into a for loop
--distribution-multiplicity=<int> - Set the multiplicity used for distributing vector
--test-vector-to-vector-lowering - Test lowering patterns between ops in the vector dialect
--unroll - Include unrolling
--test-vector-transfer-collapse-inner-most-dims - Test lowering patterns that reducedes the rank of the vector transfer memory and vector operands.
--test-vector-transfer-full-partial-split - Test lowering patterns to split transfer ops via scf.if + linalg ops
--use-linalg-copy - Split using a unmasked vector.transfer + linalg.fill + linalg.copy operations.
--test-vector-transfer-lowering-patterns - Test lowering patterns to lower transfer ops to other vector ops
--test-vector-transfer-unrolling-patterns - Test lowering patterns to unroll transfer ops in the vector dialect
--test-vector-transferop-opt - Test optimization transformations for transfer ops
--test-vector-transpose-lowering - Test lowering patterns that lower contract ops in the vector dialect
--avx2 - Lower vector.transpose to avx2-specific patterns
--eltwise - Lower 2-D vector.transpose to eltwise insert/extract
--flat - Lower 2-D vector.transpose to vector.flat_transpose
--shuffle - Lower 2-D vector.transpose to shape_cast + shuffle
--test-vector-unrolling-patterns - Test lowering patterns to unroll contract ops in the vector dialect
--unroll-based-on-type - Set the unroll factor based on type of the operation
--tosa-decompose-transpose-conv - Deompose transpose convolutiions into standard convolutions.
--tosa-infer-shapes - Propagate shapes across TOSA operations
--tosa-make-broadcastable - TOSA rank Reshape to enable Broadcasting
--tosa-test-quant-utils - TOSA Test: Exercise the APIs in QuantUtils.cpp.
--tosa-to-linalg - Lower TOSA to LinAlg on tensors
--tosa-to-scf - Lower TOSA to the SCF dialect
--tosa-to-standard - Lower TOSA to the Standard dialect
--view-op-graph - Print Graphviz visualization of an operation
--max-label-len=<uint> - Limit attribute/type length to number of chars
--print-attrs - Print attributes of operations
--print-control-flow-edges - Print control flow edges
--print-data-flow-edges - Print data flow edges
--print-result-types - Print result types of operations
Pass Pipelines:
--test-dump-pipeline - Dumps the pipeline build so far for debugging purposes
--test-options-pass-pipeline - Parses options using pass pipeline registration
--list=<int> - Example list option
--string=<string> - Example string option
--string-list=<string> - Example string list option
--test-pm-nested-pipeline - Test a nested pipeline in the pass manager
--test-textual-pm-nested-pipeline - Test a nested pipeline in the pass manager
--show-dialects - Print the list of registered dialects
--split-input-file - Split the input file into pieces and process each chunk independently
--test-legalize-mode=<value> - The legalization mode to use with the test driver
=analysis - Perform an analysis conversion
=full - Perform a full conversion
=partial - Perform a partial conversion
--verify-diagnostics - Check that emitted diagnostics match expected-* lines on the corresponding line
--verify-each - Run the verifier after each transformation pass
Generic Options:
--help - Display available options (--help-hidden for more)
--help-list - Display list of available options (--help-list-hidden for more)
--version - Display the version of this program
affine-super-vectorizer-test options:
--backward-slicing - Enable testing backward static slicing and topological sort functionalities
--compose-maps - Enable testing the composition of AffineMap where each AffineMap in the composition is specified as the affine_map attribute in a constant op.
--forward-slicing - Enable testing forward static slicing and topological sort functionalities
--slicing - Enable testing static slicing and topological sort functionalities
--vector-shape-ratio=<int> - Specify the HW vector size for vectorization
--vectorize-affine-loop-nest - Enable testing for the 'vectorizeAffineLoopNest' utility by vectorizing the outermost loops found
test-loop-fusion options:
--test-loop-fusion-dependence-check - Enable testing of loop fusion dependence check
--test-loop-fusion-slice-computation - Enable testing of loop fusion slice computation
--test-loop-fusion-transformation