23 Feb 2026

Configuring Local Qwen3 Embeddings for OpenClaw Memory

Configuring Local Qwen3 Embeddings for OpenClaw’s Memory System

OpenClaw’s memory_search and memory_get tools enable semantic recall from MEMORY.md, memory/*.md, and session transcripts before answering about prior work/decisions. Defaults use cloud embeddings, but I’ve switched to a local Qwen3-Embedding-0.6B (GGUF quantized) for privacy, speed, and zero cost.

Why Qwen3-Embedding-0.6B?

  • Sweet Spot: 0.6B paramsβ€”20-100ms/chunk on CPU (10x faster than 7B).
  • Quality: Tops MTEB benchmarks for retrieval, beats E5-small.
  • GGUF: Q8_0 (~400MB, near-lossless) via llama.cpp.
  • Hybrid Mode: CPU + auto-GPU (CUDA/Metal/ROCm).

Model Specs

Setup (5 Mins)

  1. Download:

    mkdir -p ~/.openclaw/models/embeddings
    cd ~/.openclaw/models/embeddings
    huggingface-cli download Qwen/Qwen3-Embedding-0.6B-GGUF Qwen3-Embedding-0.6B-Q8_0.gguf
    
  2. Config (~/.openclaw/config.yaml or CLI):

    agents:
      defaults:
        embeddingModel: "hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf?provider=local&hybrid=true&cpu_threads=8"
    

    Params:

    KeyDesc
    provider=localllama.cpp
    hybrid=trueGPU auto
    cpu_threads=8Cores

    CLI: openclaw configure --section agents.defaults.embeddingModel '...'

  3. Restart: openclaw gateway restart

  4. Test:

    memory_search query="test prior decision"  # ~80ms top-5
    

Benchmarks (i7-12700K, 32GB, No GPU)

TaskTimeRAM
Embed 512t chunk45ms350MB
Search 100 docs80ms<500MB
100 queries/minβœ…

GPU (RTX 3060): 15ms/embed.

How It Works

  1. Chunk: Files β†’ 512t snippets (w/ lines).
  2. Embed Query: Qwen3 β†’ vector.
  3. SimSearch: Cosine top-k (minScore filter).
  4. Cite: path#line for memory_get.

On-demand (snippet cache).

Tweaks & Troubleshooting

  • Faster: Q4_K_M.gguf.
  • GPU?: gpu=true.
  • OOM: context_length=2048.
  • Logs: journalctl -u openclaw-gateway | grep embedding.
  • Verify: session_status β†’ embedding provider.

Alternatives

ModelParamsSpeedUse Case
Qwen3-0.6B-Q80.6B45msGeneral
NV-Embed-4B-Q54B150msCode
all-minilm-L622M10msLight

Pro Tips

  • Pre-cache: JSONL embeddings via cron.
  • Scale: memory/ subdirs.
  • Monitor: Query hit rates.

Local RAG + Claude = unbeatable. Questions?

Docs | Qwen