启动并运行 Llama 3、Mistral、Gemma 和其他大型语言模型。
Go to file
jmorganca 8c2c9d4c89 llama/compat: extend gemma3 handler to cover 1B and 270M blobs
Previous handler only fired on vision-capable gemma3 (4B/12B/27B) because
its detection looked for `gemma3.mm.tokens_per_image` or embedded v.*/mm.*
tensors. The 1B blob has neither — but its old Ollama converter emitted:

  - gemma3.rope.global.freq_base  (upstream uses gemma3.rope.freq_base)
  - gemma3.rope.local.freq_base   (upstream uses gemma3.rope.freq_base_swa)
  - tokenizer.ggml.add_{padding,unknown}_token

so llama.cpp would fall back to default rope_freq_base=10000 and produce
visibly-worse output.

Also inject rope.scaling.factor=8.0 / type=linear on 4B/12B/27B — those
variants ship with that scaling in their HF config to extend the native
~16k trained context to 131072. Without this KV, llama.cpp uses factor=1.0
and the positional embeddings are subtly off everywhere.

Detection now flips on any Ollama-specific marker. All three variants
verified end-to-end via `ollama run gemma3:{latest,1b,270m}`.
2026-04-20 09:29:34 -07:00
.github runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
anthropic anthropic: fix empty inputs in content blocks (#15105) 2026-03-27 15:41:27 -07:00
api Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
app runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
auth auth: fix problems with the ollama keypairs (#12373) 2025-09-22 23:20:20 -07:00
cmd runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
convert runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
discover runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
docs docs: update hermes (#15655) 2026-04-17 14:20:59 -07:00
envconfig runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
format chore(all): replace instances of interface with any (#10067) 2025-04-02 09:44:27 -07:00
fs gemma4: enable flash attention (#15378) 2026-04-07 08:12:36 -07:00
harmony Parser for Cogito v2 (#13145) 2025-11-19 17:21:07 -08:00
integration runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
internal Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
kvcache model: support for qwen3.5 architecture (#14378) 2026-02-24 20:08:05 -08:00
llama llama/compat: extend gemma3 handler to cover 1B and 270M blobs 2026-04-20 09:29:34 -07:00
llm llm,server: route Ollama-format gemma3 blobs through llama/compat 2026-04-20 09:29:34 -07:00
logutil logutil: fix source field (#12279) 2025-09-16 16:18:07 -07:00
manifest create: avoid gc race with create (#15628) 2026-04-16 13:29:16 -07:00
middleware Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
ml runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
model runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
openai fix: improve error message for unknown input item type in responses API (#15424) 2026-04-08 17:41:12 -07:00
parser MLX: add header vendoring and remove go build tag (#14642) 2026-03-09 17:24:45 -07:00
progress Add z-image image generation prototype (#13659) 2026-01-09 21:09:46 -08:00
readline Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
runner runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
scripts runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
server llm,server: route Ollama-format gemma3 blobs through llama/compat 2026-04-20 09:29:34 -07:00
template runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
thinking thinking: fix double emit when no opening tag 2025-08-21 21:03:12 -07:00
tokenizer tokenizer: add byte fallback for SentencePiece BPE encoding (#15232) 2026-04-02 13:04:45 -07:00
tools runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
types Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
version add version 2023-08-22 09:40:58 -07:00
x mlx: apply repeat penalties in sampler (#15631) 2026-04-18 07:49:38 -07:00
.dockerignore next build (#8539) 2025-01-29 15:03:38 -08:00
.gitattributes .gitattributes: add app/webview to linguist-vendored (#13274) 2025-11-29 23:46:10 -05:00
.gitignore create: Clean up experimental paths, fix create from existing safetensor model (#14679) 2026-04-07 08:12:57 -07:00
.golangci.yaml ci: restore previous linter rules (#13322) 2025-12-03 18:55:02 -08:00
CMakeLists.txt runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
CMakePresets.json runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
CONTRIBUTING.md docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00
Dockerfile runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
go.mod runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
go.sum runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
LICENSE proto -> ollama 2023-06-26 15:57:13 -04:00
LLAMA_CPP_VERSION runner: Remove CGO engines, use llama-server exclusively for GGML models 2026-04-20 08:44:02 -07:00
main.go lint 2024-08-01 17:06:06 -07:00
MLX_C_VERSION mlx: update as of 3/23 (#14789) 2026-03-23 11:28:44 -07:00
MLX_VERSION mlx: update as of 3/23 (#14789) 2026-03-23 11:28:44 -07:00
README.md cmd/launch: add Copilot CLI integration (#15583) 2026-04-15 17:22:53 -07:00
SECURITY.md docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00

ollama

Ollama

Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as Claude Code, OpenClaw, OpenCode , Codex, Copilot, and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Copilot CLI, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama
from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama
import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

  • llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Desktop

  • Dify.AI - LLM app development platform
  • AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
  • Maid - Cross-platform mobile and desktop client
  • Witsy - AI desktop app for Mac, Windows, and Linux
  • Cherry Studio - Multi-provider desktop client
  • Ollama App - Multi-platform client for desktop and mobile
  • PyGPT - AI desktop assistant for Linux, Windows, and Mac
  • Alpaca - GTK4 client for Linux and macOS
  • SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
  • Enchanted - Native macOS and iOS client
  • RWKV-Runner - Multi-model desktop runner
  • Ollama Grid Search - Evaluate and compare models
  • macai - macOS client for Ollama and ChatGPT
  • AI Studio - Multi-provider desktop IDE
  • Reins - Parameter tuning and reasoning model support
  • ConfiChat - Privacy-focused with optional encryption
  • LLocal.in - Electron desktop client
  • MindMac - AI chat client for Mac
  • Msty - Multi-model desktop client
  • BoltAI for Mac - AI chat client for Mac
  • IntelliBar - AI-powered assistant for macOS
  • Kerlig AI - AI writing assistant for macOS
  • Hillnote - Markdown-first AI workspace
  • Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

  • RAGFlow - RAG engine based on deep document understanding
  • R2R - Open-source RAG engine
  • MaxKB - Ready-to-use RAG chatbot
  • Minima - On-premises or fully local RAG
  • Chipper - AI interface with Haystack RAG
  • ARGO - RAG and deep research on Mac/Windows/Linux
  • Archyve - RAG-enabling document library
  • Casibase - AI knowledge base with RAG and SSO
  • BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

  • Opik - Debug, evaluate, and monitor LLM applications
  • OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
  • Lunary - LLM observability with analytics and PII masking
  • Langfuse - Open source LLM observability
  • HoneyHive - AI observability and evaluation for agents
  • MLflow Tracing - Open source LLM observability

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers