models qwen3 embedding 8b generic gpu - Azure/azureml-assets GitHub Wiki

qwen3-embedding-8b-generic-gpu

Overview

Qwen3 Embedding 8B Webgpu Gpu

This is the GPU (WebGPU)-optimized variant of qwen3-embedding-8b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

  • Model Type: Text Embedding (ONNX)
  • Parameters: 8 billion
  • Context Length: 32K tokens
  • Embedding Dimension: Up to 4096
  • Quantization: KLD Gradient quantization
  • Target Device: GPU (WebGPU)
  • Execution Provider: WebGPUExecutionProvider
  • Supported Languages: 100+

Intended Use

This model is optimized for local execution on devices with GPU (WebGPU) hardware acceleration using Foundry Local.

Capabilities

  • Text retrieval and semantic search
  • Code retrieval
  • Text classification and clustering
  • Bitext mining
  • Multilingual and cross-lingual retrieval

License

This model is licensed under Apache 2.0. See license details.

Source

Version: 1

Tags

foundryLocal : test license : apache-2.0 licenseDescription : This model is provided under the License Terms available at https://huggingface.co/Qwen/Qwen3-Embedding-8B/blob/main/LICENSE author : Microsoft inputModalities : text outputModalities : text task : embeddings maxOutputTokens : 1 alias : qwen3-embedding-8b directoryPath : v1 promptTemplate supportsToolCalling : false capabilities : embedding supportsReasoning : false reasoningStart reasoningEnd contextLength : 32768 minFLVersion : 0.0.0 disable-maap : true

View in Studio: https://ml.azure.com/registries/azureml/models/qwen3-embedding-8b-generic-gpu/version/1

License: apache-2.0

⚠️ **GitHub.com Fallback** ⚠️