What Is Hermes
Hermes 3 (by Nous Research) is a fine-tune of Llama 3 trained specifically for function calling, structured outputs, and multi-turn agentic workflows. Unlike base Llama 3, Hermes follows a consistent tool-use format and rarely hallucinates tool names.
Key properties:
- Supports OpenAI-compatible tool schemas (JSON schema format)
- Deterministic
<tool_call>tag output — easy to parse - Runs entirely offline via Ollama
- 8B parameter variant fits in 8 GB VRAM / 16 GB unified RAM
Prerequisites
- Ollama installed and running (
ollama --version→0.3+) - Python 3.11 or later
- 8 GB RAM minimum; 16 GB recommended for comfortable operation
Install Ollama if not present:
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download from https://ollama.com/download
Create a Python environment:
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
pip install requests # only stdlib + requests needed for the minimal agent
Pull the Model
ollama pull hermes3:8b
Download size: ~5.1 GB (Q4_K_M quantization). After download, verify:
ollama list | grep hermes
# hermes3:8b ... 5.1 GB
For a smaller footprint on machines with 8 GB RAM:
ollama pull hermes3:3b # ~2.1 GB
Minimal Agent Script
The agent loop below handles tool dispatch manually. No LangChain required.
# agent.py
import json
import re
import requests
OLLAMA_URL = "http://localhost:11434/api/chat"
MODEL = "hermes3:8b"
TOOLS = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Returns current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
def get_weather(city: str) -> str:
# Replace with a real API call in production
return json.dumps({"city": city, "temp_c": 18, "condition": "cloudy"})
TOOL_REGISTRY = {"get_weather": get_weather}
def chat(messages: list, tools: list) -> dict:
response = requests.post(OLLAMA_URL, json={
"model": MODEL,
"messages": messages,
"tools": tools,
"stream": False
})
response.raise_for_status()
return response.json()["message"]
def run_agent(user_input: str):
messages = [
{"role": "system", "content": "You are a helpful assistant with access to tools."},
{"role": "user", "content": user_input}
]
while True:
reply = chat(messages, TOOLS)
messages.append(reply)
if reply.get("tool_calls"):
for call in reply["tool_calls"]:
fn_name = call["function"]["name"]
fn_args = call["function"]["arguments"]
if isinstance(fn_args, str):
fn_args = json.loads(fn_args)
result = TOOL_REGISTRY[fn_name](**fn_args)
messages.append({
"role": "tool",
"content": result
})
else:
print(reply["content"])
break
if __name__ == "__main__":
run_agent("What is the weather in Kyiv?")
Run it:
python agent.py
# The weather in Kyiv is 18°C and cloudy.
Tool Registration Pattern
For more than a handful of tools, use a decorator to auto-register:
import inspect
from typing import get_type_hints
TOOLS = []
TOOL_REGISTRY = {}
def tool(fn):
"""Register a function as an agent tool."""
hints = get_type_hints(fn)
properties = {}
for name, hint in hints.items():
if name == "return":
continue
properties[name] = {
"type": "string" if hint == str else "number" if hint in (int, float) else "object"
}
TOOLS.append({
"type": "function",
"function": {
"name": fn.__name__,
"description": (fn.__doc__ or "").strip(),
"parameters": {
"type": "object",
"properties": properties,
"required": list(properties.keys())
}
}
})
TOOL_REGISTRY[fn.__name__] = fn
return fn
@tool
def search_web(query: str) -> str:
"""Search the web for current information."""
return f"Results for: {query}"
@tool
def read_file(path: str) -> str:
"""Read content of a local file."""
with open(path) as f:
return f.read()
Optional: LangGraph Orchestration
For complex multi-step workflows with conditional branching, LangGraph adds minimal overhead:
pip install langgraph langchain-ollama
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
llm = ChatOllama(model="hermes3:8b")
# define tools as LangChain Tool objects
from langchain_core.tools import tool as lc_tool
@lc_tool
def get_weather(city: str) -> str:
"""Returns weather for a city."""
return f"18°C, cloudy in {city}"
agent = create_react_agent(llm, tools=[get_weather])
result = agent.invoke({"messages": [("user", "Weather in Berlin?")]})
print(result["messages"][-1].content)
Optional: ChromaDB Memory Backend
Add persistent conversation memory in one install:
pip install chromadb
import chromadb
from chromadb.utils import embedding_functions
client = chromadb.PersistentClient(path=".agent_memory")
ef = embedding_functions.DefaultEmbeddingFunction()
collection = client.get_or_create_collection("conversations", embedding_function=ef)
def remember(text: str, doc_id: str):
collection.upsert(documents=[text], ids=[doc_id])
def recall(query: str, n: int = 3) -> list[str]:
results = collection.query(query_texts=[query], n_results=n)
return results["documents"][0]
Pass recalled context into the system message before each agent run.
Cost
| Option | Cost |
|---|---|
| hermes3:8b via Ollama | $0 (local) |
| hermes3:8b via Groq | Free tier: 14,400 req/day |
| hermes3:70b via Groq | Free tier with rate limits |
Groq API uses the same OpenAI-compatible endpoint — swap OLLAMA_URL for https://api.groq.com/openai/v1/chat/completions and add an Authorization: Bearer <key> header.
See Also
- AI Agents tools directory — filter by AI Agents category
- Text model tier list — compare Hermes against GPT-4o, Claude, and others