models qwen3 embedding 0.6b cuda gpu - Azure/azureml-assets GitHub Wiki

qwen3-embedding-0.6b-cuda-gpu

Overview

Qwen3 Embedding 0.6B Cuda Gpu

This is the GPU (NVIDIA CUDA)-optimized variant of qwen3-embedding-0.6b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

  • Model Type: Text Embedding (ONNX)
  • Parameters: 0.6 billion
  • Context Length: 32K tokens
  • Embedding Dimension: Up to 1024
  • Quantization: KLD Gradient quantization
  • Target Device: GPU (NVIDIA CUDA)
  • Execution Provider: CUDAExecutionProvider
  • Supported Languages: 100+

Intended Use

This model is optimized for local execution on devices with GPU (NVIDIA CUDA) hardware acceleration using Foundry Local.

Capabilities

  • Text retrieval and semantic search
  • Code retrieval
  • Text classification and clustering
  • Bitext mining
  • Multilingual and cross-lingual retrieval

License

This model is licensed under Apache 2.0. See license details.

Source

Version: 1

Tags

capabilities : embedding supportsReasoning : false reasoningStart reasoningEnd contextLength : 32768 minFLVersion : 1.1.0 disable-maap : true

View in Studio: https://ml.azure.com/registries/azureml/models/qwen3-embedding-0.6b-cuda-gpu/version/1

⚠️ **GitHub.com Fallback** ⚠️