Guidesai-agents

Running a Hermes Agent Locally

Set up a function-calling Hermes agent with tool registration and optional memory backend in under 20 minutes.

Intermediate18 min readUpdated 5/30/2026

What Is Hermes

Hermes 3 (by Nous Research) is a fine-tune of Llama 3 trained specifically for function calling, structured outputs, and multi-turn agentic workflows. Unlike base Llama 3, Hermes follows a consistent tool-use format and rarely hallucinates tool names.

Key properties:

Supports OpenAI-compatible tool schemas (JSON schema format)
Deterministic <tool_call> tag output — easy to parse
Runs entirely offline via Ollama
8B parameter variant fits in 8 GB VRAM / 16 GB unified RAM

Prerequisites

Ollama installed and running (ollama --version → 0.3+)
Python 3.11 or later
8 GB RAM minimum; 16 GB recommended for comfortable operation

Install Ollama if not present:

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download from https://ollama.com/download

Create a Python environment:

python -m venv .venv
source .venv/bin/activate     # Linux/macOS
# .venv\Scripts\activate      # Windows

pip install requests          # only stdlib + requests needed for the minimal agent

Pull the Model

ollama pull hermes3:8b

Download size: ~5.1 GB (Q4_K_M quantization). After download, verify:

ollama list | grep hermes
# hermes3:8b    ...    5.1 GB

For a smaller footprint on machines with 8 GB RAM:

ollama pull hermes3:3b   # ~2.1 GB

Minimal Agent Script

The agent loop below handles tool dispatch manually. No LangChain required.

# agent.py
import json
import re
import requests

OLLAMA_URL = "http://localhost:11434/api/chat"
MODEL = "hermes3:8b"

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Returns current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

def get_weather(city: str) -> str:
    # Replace with a real API call in production
    return json.dumps({"city": city, "temp_c": 18, "condition": "cloudy"})

TOOL_REGISTRY = {"get_weather": get_weather}

def chat(messages: list, tools: list) -> dict:
    response = requests.post(OLLAMA_URL, json={
        "model": MODEL,
        "messages": messages,
        "tools": tools,
        "stream": False
    })
    response.raise_for_status()
    return response.json()["message"]

def run_agent(user_input: str):
    messages = [
        {"role": "system", "content": "You are a helpful assistant with access to tools."},
        {"role": "user",   "content": user_input}
    ]

    while True:
        reply = chat(messages, TOOLS)
        messages.append(reply)

        if reply.get("tool_calls"):
            for call in reply["tool_calls"]:
                fn_name = call["function"]["name"]
                fn_args = call["function"]["arguments"]
                if isinstance(fn_args, str):
                    fn_args = json.loads(fn_args)

                result = TOOL_REGISTRY[fn_name](**fn_args)
                messages.append({
                    "role": "tool",
                    "content": result
                })
        else:
            print(reply["content"])
            break

if __name__ == "__main__":
    run_agent("What is the weather in Kyiv?")

Run it:

python agent.py
# The weather in Kyiv is 18°C and cloudy.

Tool Registration Pattern

For more than a handful of tools, use a decorator to auto-register:

import inspect
from typing import get_type_hints

TOOLS = []
TOOL_REGISTRY = {}

def tool(fn):
    """Register a function as an agent tool."""
    hints = get_type_hints(fn)
    properties = {}
    for name, hint in hints.items():
        if name == "return":
            continue
        properties[name] = {
            "type": "string" if hint == str else "number" if hint in (int, float) else "object"
        }

    TOOLS.append({
        "type": "function",
        "function": {
            "name": fn.__name__,
            "description": (fn.__doc__ or "").strip(),
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": list(properties.keys())
            }
        }
    })
    TOOL_REGISTRY[fn.__name__] = fn
    return fn

@tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    return f"Results for: {query}"

@tool
def read_file(path: str) -> str:
    """Read content of a local file."""
    with open(path) as f:
        return f.read()

Optional: LangGraph Orchestration

For complex multi-step workflows with conditional branching, LangGraph adds minimal overhead:

pip install langgraph langchain-ollama

from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent

llm = ChatOllama(model="hermes3:8b")

# define tools as LangChain Tool objects
from langchain_core.tools import tool as lc_tool

@lc_tool
def get_weather(city: str) -> str:
    """Returns weather for a city."""
    return f"18°C, cloudy in {city}"

agent = create_react_agent(llm, tools=[get_weather])
result = agent.invoke({"messages": [("user", "Weather in Berlin?")]})
print(result["messages"][-1].content)

Optional: ChromaDB Memory Backend

Add persistent conversation memory in one install:

pip install chromadb

import chromadb
from chromadb.utils import embedding_functions

client = chromadb.PersistentClient(path=".agent_memory")
ef = embedding_functions.DefaultEmbeddingFunction()
collection = client.get_or_create_collection("conversations", embedding_function=ef)

def remember(text: str, doc_id: str):
    collection.upsert(documents=[text], ids=[doc_id])

def recall(query: str, n: int = 3) -> list[str]:
    results = collection.query(query_texts=[query], n_results=n)
    return results["documents"][0]

Pass recalled context into the system message before each agent run.

Cost

Option	Cost
hermes3:8b via Ollama	$0 (local)
hermes3:8b via Groq	Free tier: 14,400 req/day
hermes3:70b via Groq	Free tier with rate limits

Groq API uses the same OpenAI-compatible endpoint — swap OLLAMA_URL for https://api.groq.com/openai/v1/chat/completions and add an Authorization: Bearer <key> header.