Using Ollama with OpenRegister

OpenRegister supports Ollama for running local Large Language Models (LLMs) for AI-powered features like chat, agents, and RAG (Retrieval Augmented Generation).

What is Ollama?

Ollama allows you to run open-source LLMs locally on your machine, providing:

Privacy: Your data never leaves your infrastructure
Cost-effective: No API costs for inference
Customization: Run any compatible model
Offline capability: Works without internet connection

Technical Implementation

OpenRegister uses LLPhant's native Ollama support, which provides:

Direct API Integration: Uses Ollama's native /api/ endpoints (not OpenAI-compatible layer)
Better Performance: Optimized for Ollama's specific API structure
Simpler Configuration: No need for API key workarounds
Full Feature Support: Native support for chat, embeddings, and function calling

Quick Start

1. Start Ollama Container

The Ollama container is included in the docker-compose configuration:

# Start all services including Ollama
docker-compose up -d

# Or specifically start Ollama
docker-compose up -d ollama

2. Pull a Model

Pull one of the supported models:

# Llama 3.2 (8B) - Recommended, best balance
docker exec openregister-ollama ollama pull llama3.2

# Llama 3.2 (3B) - Lighter alternative
docker exec openregister-ollama ollama pull llama3.2:3b

# Mistral (7B) - Fast and efficient
docker exec openregister-ollama ollama pull mistral:7b

# Phi-3 Mini (3.8B) - Lightweight option
docker exec openregister-ollama ollama pull phi3:mini

# CodeLlama (7B) - Optimized for code
docker exec openregister-ollama ollama pull codellama:latest

3. Configure OpenRegister

Navigate to Settings → OpenRegister → LLM Configuration
Select Ollama as the chat provider
Configure the settings based on your setup:

Integrated Setup (docker-compose.yml):

Ollama URL: http://openregister-ollama:11434 ⚠️ NOT http://localhost:11434
Chat Model: llama3.2:latest (or model you pulled with full name including tag)
Embedding Model: nomic-embed-text:latest
Why not localhost? OpenRegister runs inside Nextcloud container; use container name instead

Standalone Setup (using --profile ollama):

Via Docker Network: http://openregister-ollama:11434 (from Nextcloud container)
Via Host IP: http://YOUR_HOST_IP:11434 (e.g., http://192.168.1.100:11434)
Different Host: http://your-server-ip:11434
❌ NOT: http://localhost:11434 (won't work from inside container)
Chat Model: llama3.2:latest (use full name with tag)
Embedding Model: nomic-embed-text:latest

4. Pull Embedding Model (for RAG)

If using RAG features, pull an embedding model:

# Nomic Embed Text - Recommended for embeddings
docker exec openregister-ollama ollama pull nomic-embed-text:latest

# Alternative: all-minilm (smaller, faster)
docker exec openregister-ollama ollama pull all-minilm:latest

Configuration Details

Ollama Service Configuration

The Ollama service is configured in docker-compose.yml:

ollama:
  image: ollama/ollama:latest
  container_name: openregister-ollama
  restart: always
  ports:
    - "11434:11434"
  volumes:
    - ollama:/root/.ollama
  environment:
    - OLLAMA_HOST=0.0.0.0
    - OLLAMA_NUM_PARALLEL=4
    - OLLAMA_KEEP_ALIVE=30m
  deploy:
    resources:
      limits:
        memory: 16G
      reservations:
        memory: 8G
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
  healthcheck:
    test: ["CMD-SHELL", "curl -f http://localhost:11434/api/tags || exit 1"]
    interval: 30s
    timeout: 10s
    retries: 3

Accessing Ollama

Important: Docker Container Communication

When configuring Ollama in OpenRegister, you MUST use the Docker service name, not localhost:

✅ Correct (from Nextcloud container): http://openregister-ollama:11434 or http://ollama:11434
❌ Wrong: http://localhost:11434 (this only works from your host machine, not from inside containers)

Access Points:

From host machine (terminal, browser): http://localhost:11434
From Nextcloud container (OpenRegister settings): http://openregister-ollama:11434 or http://ollama:11434
From other Docker containers: http://openregister-ollama:11434

Why? Inside a Docker container, localhost refers to the container itself, not your host machine. Containers communicate with each other using service names defined in docker-compose.yml.

Model Naming

Important: Ollama models must be referenced with their full name including version tags:

✅ Correct: mistral:7b, llama3.2:latest, phi3:mini
❌ Wrong: mistral, llama3.2, phi3 (without tags)

The OpenRegister UI dropdown shows full model names with tags to ensure compatibility.

Recommended Models

For Chat & Agents

Model	Size	RAM Required	Speed	Quality	Use Case
llama3.2:latest ⭐	4.7GB	8-16GB	Fast	Excellent	RECOMMENDED - Latest, best balance
llama3.2:3b	2.0GB	4-8GB	Very Fast	Very Good	Lighter, faster alternative
llama3.1:latest	4.7GB	8-16GB	Fast	Excellent	Previous gen, still great
llama3.2:70b 🔥	40GB	64-80GB	Slow	Best	HIGH-END - Maximum quality
mistral:7b	4.1GB	8GB	Very Fast	Very Good	Fast responses
phi3:mini	2.3GB	4GB	Very Fast	Good	Low-resource environments
codellama:latest	3.8GB	8GB	Fast	Excellent (code)	Code generation/analysis

Memory Configuration Notes:

Our docker-compose files are configured with 16GB limit (suitable for 8B models)
For 70B models, increase memory limit to 80GB in docker-compose
Shared memory (shm_size) set to 2GB for efficient model loading

For Embeddings (RAG)

Model	Size	Dimensions	Use Case
nomic-embed-text:latest	274MB	768	Recommended for most uses
all-minilm:latest	45MB	384	Lightweight, faster
mxbai-embed-large:latest	670MB	1024	Highest quality

GPU Support

GPU acceleration provides 10-100x speedup for model inference, dramatically improving response times and enabling larger models to run smoothly.

Prerequisites

NVIDIA GPU with CUDA support (check compatibility at https://developer.nvidia.com/cuda-gpus)
NVIDIA drivers installed on Windows/Linux host
WSL2 (if using Windows with WSL)
NVIDIA Container Toolkit for Docker

Verify WSL GPU Support (Windows + WSL)

First, check if your GPU is accessible from WSL2:

# Check GPU is visible in WSL
nvidia-smi

# Should show your GPU, driver version, and CUDA version
# Example output:
# NVIDIA-SMI 546.30    Driver Version: 546.30    CUDA Version: 12.3
# GPU Name: NVIDIA GeForce RTX 3070 Laptop GPU

If 'nvidia-smi' is not found, you need to:

Install latest NVIDIA drivers on Windows host
Update WSL2 kernel: wsl --update
Restart WSL: wsl --shutdown then reopen

Verify Docker GPU Support

Check if Docker can access the GPU:

# Test GPU access from Docker
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu20.04 nvidia-smi

# Should show the same GPU information
# If this fails, install NVIDIA Container Toolkit:
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Enable GPU in Docker Compose

The GPU configuration is already included in docker-compose files. Verify it contains:

ollama:
  image: ollama/ollama:latest
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all  # Use all available GPUs (or count: 1 for single GPU)
            capabilities: [gpu]

Apply GPU Configuration

If using docker-compose:

# Stop and remove the old container
docker stop openregister-ollama
docker rm openregister-ollama

# Recreate with GPU support
docker-compose up -d ollama

If using docker run (alternative method):

# Stop old container
docker stop openregister-ollama
docker rm openregister-ollama

# Start with GPU support
docker run -d \
  --name openregister-ollama \
  --restart always \
  -p 11434:11434 \
  -v openregister_ollama:/root/.ollama \
  -e OLLAMA_HOST=0.0.0.0 \
  -e OLLAMA_NUM_PARALLEL=4 \
  -e OLLAMA_KEEP_ALIVE=30m \
  --shm-size=2gb \
  --gpus all \
  ollama/ollama:latest

Verify GPU is Working

After restarting Ollama with GPU support:

# 1. Check GPU is accessible inside container
docker exec openregister-ollama nvidia-smi

# Should show GPU information from inside the container

# 2. Check Docker device configuration
docker inspect openregister-ollama --format '{{json .HostConfig.DeviceRequests}}'

# Should show: [{"Driver":"","Count":-1,"Capabilities":[["gpu"]],"Options":{}}]

# 3. Check Ollama logs for GPU detection
docker logs openregister-ollama | grep -i gpu

# Should show lines like:
# inference compute id=GPU-xxx library=CUDA compute=8.6 name=CUDA0
# description="NVIDIA GeForce RTX 3070 Laptop GPU"
# total="8.0 GiB" available="6.2 GiB"

# 4. Test inference speed (should be much faster)
docker exec openregister-ollama ollama run llama3.2:latest "What is 2+2?"

Performance Comparison

Mode	Loading Time	First Token	Tokens/sec	Use Case
CPU	30-60s	5-10s	2-5	Testing only
GPU	2-5s	0.5-1s	50-200	Production use

GPU Troubleshooting

Issue: 'nvidia-smi: command not found' in container

This means GPU is NOT configured. Check:

# Verify DeviceRequests is not null
docker inspect openregister-ollama --format '{{json .HostConfig.DeviceRequests}}'

# If null, container was created without GPU support
# Solution: Remove and recreate with --gpus flag or proper docker-compose config

Issue: GPU not detected in Ollama logs

# Check Ollama startup logs
docker logs openregister-ollama 2>&1 | grep -A 5 "discovering available GPUs"

# If no GPU found, check:
# 1. NVIDIA drivers installed on host
# 2. nvidia-smi works on host
# 3. Docker GPU support working (test with nvidia/cuda image)

Issue: 'Failed to initialize NVML: Unknown Error'

# This usually means driver mismatch
# Solution: Update NVIDIA drivers on host machine
# Then restart Docker and WSL:
wsl --shutdown
# Restart Docker Desktop
# Recreate Ollama container

Issue: Out of VRAM

# Check GPU memory usage
nvidia-smi

# If VRAM is full:
# 1. Use smaller model (3B instead of 8B)
# 2. Reduce OLLAMA_MAX_LOADED_MODELS
# 3. Set shorter OLLAMA_KEEP_ALIVE (e.g., 5m)
# 4. Restart container to clear VRAM
docker restart openregister-ollama

Standalone Setup

Running Ollama standalone is ideal for:

🖥️ Separate Hardware: Run on a more powerful machine with more RAM/GPU
🔧 Production: Isolate AI workloads from application stack
📊 Scalability: Serve multiple OpenRegister instances from one Ollama
💰 Cost Optimization: Use a dedicated GPU machine only for AI

Start Standalone Ollama

# From the openregister directory (using ollama profile)
docker-compose -f docker-compose.dev.yml --profile ollama up -d

# Check status
docker-compose -f docker-compose.dev.yml --profile ollama ps

# View logs
docker-compose -f docker-compose.dev.yml --profile ollama logs -f ollama

Configure OpenRegister for Standalone

Option 1: Use Host IP Address (Simplest)

URL: http://YOUR_HOST_IP:11434 (e.g., http://192.168.1.100:11434)
Model: llama3.2:latest
Find your IP: Run hostname -I | awk '{print $1}' on Linux/Mac or ipconfig on Windows

Option 2: Connect via Docker Network (Recommended for Production)

First, connect the standalone Ollama container to your Nextcloud network:

# Find your Nextcloud network
docker network ls | grep master

# Connect standalone Ollama to that network
docker network connect master_default standalone-ollama

# Verify connection from Nextcloud container
docker exec master-nextcloud-1 curl -s http://standalone-ollama:11434/api/tags

Then in OpenRegister settings:

URL: http://standalone-ollama:11434
Model: llama3.2:latest

Option 3: Different Physical Machine

URL: http://your-ollama-server-ip:11434
Model: llama3.2:latest
Ensure port 11434 is accessible (firewall rules)

Using Ollama Features

1. Chat

Once configured, the chat feature works automatically:

Go to any page in OpenRegister
Click the chat icon in the sidebar
Start a conversation with your local AI

2. RAG (Retrieval Augmented Generation)

Enable RAG for agents to search your documents:

Create/edit an agent
Enable RAG in agent settings
Configure:
- Search Mode: Vector, Hybrid, or Full-text
- Number of Sources: How many documents to retrieve
- Include Files: Search uploaded files
- Include Objects: Search OpenRegister objects

3. Agents with Tools

Create agents that can interact with your data:

Create a new agent
Enable tools:
- Register Tool: Query registers
- Schema Tool: Access schemas
- Objects Tool: Manipulate objects
Set the system prompt
Chat with the agent

Managing Models

List Installed Models

docker exec openregister-ollama ollama list

Remove a Model

docker exec openregister-ollama ollama rm llama2

Update a Model

docker exec openregister-ollama ollama pull llama3.2:latest

Check Ollama Status

# Check if Ollama is running
curl http://localhost:11434/api/tags

# View running models
docker exec openregister-ollama ollama ps

Performance Tuning

Memory Configuration

Adjust model context size in OpenRegister settings:

Max Tokens: Control response length (default: 2048)
Temperature: Control randomness (0.0-1.0, default: 0.7)

Concurrent Requests

Ollama can handle multiple concurrent requests. Monitor resource usage:

# Check container resource usage
docker stats openregister-ollama

Environment Variables

Edit docker-compose.yml to tune performance:

environment:
  # Number of concurrent requests (default: 4)
  - OLLAMA_NUM_PARALLEL=8
  
  # How long to keep models in memory (default: 30m)
  - OLLAMA_KEEP_ALIVE=1h
  
  # Maximum models to keep loaded (default: 2)
  - OLLAMA_MAX_LOADED_MODELS=3

Troubleshooting

Connection Error: "Failed to connect to localhost port 11434"

Symptoms:

Error: "cURL error 7: Failed to connect to localhost port 11434"
Error: "Could not connect to server"
Chat test fails with connection refused

Cause: You're using http://localhost:11434 in OpenRegister settings, but inside the Nextcloud container, localhost refers to the container itself, not the host machine.

Solution:

Update Ollama URL in OpenRegister settings:
- Go to Settings → OpenRegister → LLM Configuration
- Change Ollama URL from http://localhost:11434 to http://openregister-ollama:11434
- Click Save and Test Connection
Verify containers are on the same network:

# Check if Nextcloud can reach Ollama
docker exec master-nextcloud-1 curl -s http://openregister-ollama:11434/api/tags

# Should return JSON with model list

If still failing, connect containers to the same network:

# Find your Nextcloud network
docker network ls | grep master

# Connect Ollama to the same network
docker network connect master_default openregister-ollama

# Test again
docker exec master-nextcloud-1 curl -s http://openregister-ollama:11434/api/tags

Remember: Always use the container/service name, not localhost, when configuring services inside Docker.

Model Not Found (404 Error)

Cause: Model name doesn't include version tag.

Solution: Use full model name with tag:

✅ mistral:7b (correct)
❌ mistral (wrong - missing tag)

Update model name in OpenRegister settings to include the tag.

Ollama Container Not Starting

# Check logs
docker logs openregister-ollama

# Restart the container
docker-compose restart ollama

Out of Memory

Use a smaller model (phi3:mini, mistral:7b)
Reduce max tokens in settings
Allocate more RAM to Docker
Stop other services temporarily

Slow Responses

Use a faster model (mistral:7b, phi3:mini)
Reduce context window
Enable GPU support
Check system resources: docker stats

Model Recommendations by Use Case

General Customer Support

Model: llama3.2:latest or mistral:7b
Why: Good balance of quality and speed

Technical Documentation

Model: codellama:latest
Why: Better understanding of technical content

Low Resource Environments

Model: phi3:mini
Why: Smallest footprint while maintaining quality

Maximum Quality

Model: llama3.2:70b (requires 64GB+ RAM)
Why: Best possible responses (if you have the resources)

Security Considerations

Network Isolation: Ollama is only accessible within the Docker network by default
No External API Calls: All processing happens locally
Data Privacy: Your data never leaves your infrastructure
Model Safety: Use official models from Ollama library

Resources

Ollama Documentation: https://ollama.ai/
Model Library: https://ollama.ai/library
GitHub: https://github.com/ollama/ollama
OpenRegister AI Docs: See AI Features

FAQ

Q: Can I use multiple models simultaneously?
A: Yes, configure different models for chat vs embeddings.

Q: How much disk space do I need?
A: Plan for 5-10GB per model. Embedding models are smaller (~500MB).

Q: Can I use custom models?
A: Yes, see Ollama documentation on creating custom Modelfiles.

Q: Does it work on ARM (Apple Silicon)?
A: Yes, Ollama supports ARM64 architecture.

Q: Can I use Ollama with other OpenRegister apps?
A: Yes, any app in the workspace can access the Ollama container.

What is Ollama?​

Technical Implementation​

Quick Start​

1. Start Ollama Container​

2. Pull a Model​

3. Configure OpenRegister​

4. Pull Embedding Model (for RAG)​

Configuration Details​

Ollama Service Configuration​

Accessing Ollama​

Model Naming​

Recommended Models​

For Chat & Agents​

For Embeddings (RAG)​

GPU Support​

Prerequisites​

Verify WSL GPU Support (Windows + WSL)​

Verify Docker GPU Support​

Enable GPU in Docker Compose​

Apply GPU Configuration​

Verify GPU is Working​

Performance Comparison​

GPU Troubleshooting​

Standalone Setup​

Start Standalone Ollama​

Configure OpenRegister for Standalone​

Using Ollama Features​

1. Chat​

2. RAG (Retrieval Augmented Generation)​

3. Agents with Tools​

Managing Models​

List Installed Models​

Remove a Model​

Update a Model​

Check Ollama Status​

Performance Tuning​

Memory Configuration​

Concurrent Requests​

Environment Variables​

Troubleshooting​

Connection Error: "Failed to connect to localhost port 11434"​

Model Not Found (404 Error)​

Ollama Container Not Starting​

Out of Memory​

Slow Responses​

Model Recommendations by Use Case​

General Customer Support​

Technical Documentation​

Low Resource Environments​

Maximum Quality​

Security Considerations​

Resources​

FAQ​