Skip to main content

Hugging Face Integration

Integrate OpenRegister with Hugging Face Text Generation Inference (TGI) or vLLM to run Mistral and other Hugging Face models locally with an OpenAI-compatible API.

Overview

Hugging Face provides two options for running local LLMs with OpenAI-compatible APIs:

  • Text Generation Inference (TGI): Official Hugging Face solution, optimized for production
  • vLLM: Alternative with better throughput, full OpenAI API compatibility

Both provide:

  • OpenAI-Compatible API - Drop-in replacement for OpenAI
  • Privacy-First - All data stays local
  • Cost-Free - No API fees
  • Fast - Optimized inference engines
  • Flexible - Choose any Hugging Face model

Comparison

FeatureTGIvLLMOllama
OpenAI API✅ Yes (v1.4.0+)✅ Yes❌ No
Speed⚡⚡ Fast⚡⚡⚡ Very Fast⚡ Good
ModelsHugging FaceHugging FaceCurated list
SetupMediumMediumEasy
Memory8-16GB8-16GB8-16GB
Use CaseProductionHigh throughputSimple setup

Prerequisites

  • Nextcloud 28+ with OpenRegister installed
  • Docker and Docker Compose
  • GPU recommended (8GB+ VRAM) for optimal performance
  • At least 16GB RAM for larger models

Quick Start

Pros:

  • Official Hugging Face solution
  • Well-maintained and documented
  • Optimized for production
  • Automatic quantization

Installation:

# Start TGI with Mistral (using huggingface profile)
docker-compose -f docker-compose.dev.yml --profile huggingface up -d tgi-mistral

# Wait for model download (~15GB for Mistral 7B)
docker logs -f openregister-tgi-mistral

Configuration in OpenRegister:

  1. Navigate to SettingsOpenRegisterLLM Configuration
  2. Select OpenAI as provider (TGI is OpenAI-compatible)
  3. Configure:
    • Base URL: http://tgi-mistral:80 (from Nextcloud container)
    • Model: mistral-7b-instruct
    • API Key: dummy (not used for local)

Option 2: vLLM - Alternative

Pros:

  • Faster inference
  • Better throughput for multiple requests
  • Full OpenAI API compatibility
  • PagedAttention optimization

Installation:

# Start vLLM with Mistral (if configured)
docker-compose -f docker-compose.dev.yml --profile huggingface up -d vllm-mistral

# Wait for model download
docker logs -f openregister-vllm-mistral

Configuration in OpenRegister:

  1. Navigate to SettingsOpenRegisterLLM Configuration
  2. Select OpenAI as provider
  3. Configure:
    • Base URL: http://vllm-mistral:8000 (from Nextcloud container)
    • Model: mistral-7b-instruct
    • API Key: dummy (not used for local)

Configuration Details

TGI Service Configuration

tgi-mistral:
image: ghcr.io/huggingface/text-generation-inference:latest
container_name: openregister-tgi-mistral
restart: always
ports:
- "8081:80"
volumes:
- tgi-models:/data
environment:
- MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
- MAX_INPUT_LENGTH=4096
- MAX_TOTAL_TOKENS=8192
- MAX_CONCURRENT_REQUESTS=128
deploy:
resources:
limits:
memory: 16G
reservations:
memory: 8G
devices:
- driver: nvidia
count: all
capabilities: [gpu]

vLLM Service Configuration

vllm-mistral:
image: vllm/vllm-openai:latest
container_name: openregister-vllm-mistral
restart: always
ports:
- "8082:8000"
volumes:
- vllm-models:/root/.cache/huggingface
environment:
- MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.1
- TENSOR_PARALLEL_SIZE=1
- GPU_MEMORY_UTILIZATION=0.9
- SERVED_MODEL_NAME=mistral-7b-instruct

Available Models

ModelSizeUse CaseMemory Required
Mistral-7B-Instruct-v0.27BGeneral purpose, RAG16GB
Mixtral-8x7B-Instruct47BHigh quality, complex48GB+
Llama-3-8B-Instruct8BGeneral purpose16GB
Phi-3-mini-instruct3.8BFast, lightweight8GB
Qwen2-7B-Instruct7BMultilingual, code16GB

Changing the Model

Edit docker-compose.dev.yml:

For TGI:

tgi-mistral:
environment:
- MODEL_ID=mistralai/Mistral-7B-Instruct-v0.2 # Change this

For vLLM (if configured):

vllm-mistral:
environment:
- MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2 # Change this
command:
- --model
- mistralai/Mistral-7B-Instruct-v0.2 # Change this too

Then restart the service:

docker-compose -f docker-compose.dev.yml --profile huggingface restart tgi-mistral
# or
docker-compose -f docker-compose.dev.yml --profile huggingface restart vllm-mistral

API Usage

Testing the API

TGI (port 8081):

curl http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-7b-instruct",
"messages": [
{"role": "user", "content": "Hello! What is the capital of France?"}
],
"max_tokens": 100
}'

vLLM (port 8082):

curl http://localhost:8082/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-7b-instruct",
"messages": [
{"role": "user", "content": "Hello! What is the capital of France?"}
],
"max_tokens": 100
}'

Using with LLPhant (PHP)

LLPhant's OpenAI client can point to TGI/vLLM:

use LLPhant\Chat\OpenAIChat;
use OpenAI\Client;

// Create OpenAI client pointing to TGI/vLLM
$client = Client::factory()
->withBaseUri('http://tgi-mistral:80') // or http://vllm-mistral:8000
->withHttpHeader('Content-Type', 'application/json')
->make();

// Use with LLPhant
$chat = new OpenAIChat($client);
$response = $chat->generateText('What is the capital of France?');

Use Cases

1. AI Chat

Enable conversational AI using local models:

  1. Configure TGI or vLLM
  2. Set OpenAI provider with local base URL
  3. Use chat features in OpenRegister

2. RAG (Retrieval Augmented Generation)

Answer questions using your data:

  1. Configure embedding model (separate from chat)
  2. Vectorize your objects and files
  3. Ask questions - AI retrieves relevant context

3. Function Calling

Use Mistral with OpenRegister's function calling:

  • Search objects
  • Create objects
  • Update objects
  • Query registers

Troubleshooting

Container Won't Start

# Check logs
docker logs openregister-tgi-mistral
# or
docker logs openregister-vllm-mistral

# Common issues:
# 1. Port already in use
sudo lsof -i :8081 # TGI
sudo lsof -i :8082 # vLLM

# 2. Insufficient memory
docker stats openregister-tgi-mistral

# 3. GPU not available
docker exec openregister-tgi-mistral nvidia-smi

Model Download Fails

# Check internet connection
docker exec openregister-tgi-mistral ping -c 3 huggingface.co

# For gated models, set Hugging Face token:
# Edit docker-compose.dev.yml:
environment:
- HUGGING_FACE_HUB_TOKEN=your_token_here

Connection Errors from OpenRegister

Problem: OpenRegister can't connect to TGI/vLLM.

Solutions:

  1. Verify base URL uses container name: http://tgi-mistral:80
  2. Check containers are on same Docker network
  3. Test connection from Nextcloud container:
    docker exec <nextcloud-container> curl http://tgi-mistral:80/health

Slow Performance

Solutions:

  1. Use GPU acceleration (10-100x faster)
  2. Choose smaller model (3B instead of 7B)
  3. Increase MAX_CONCURRENT_REQUESTS for TGI
  4. Adjust GPU_MEMORY_UTILIZATION for vLLM

Performance Optimization

GPU Acceleration

For best performance, use GPU:

deploy:
resources:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

Performance Gain: 10-100x faster inference with GPU

Concurrent Requests

TGI:

environment:
- MAX_CONCURRENT_REQUESTS=128 # Increase for more parallel requests

vLLM:

environment:
- GPU_MEMORY_UTILIZATION=0.9 # Use 90% of GPU memory

Further Reading

Support

For issues specific to: