SOLR Setup and Configuration Guide

This guide documents the complete SOLR setup process used by OpenRegister, including SolrCloud configuration requirements, authentication handling, and system field pre-configuration.

Overview

OpenRegister uses Apache SOLR in SolrCloud mode with a sophisticated setup process that automatically creates tenant-specific configSets and collections with pre-configured system fields. This approach ensures optimal performance, proper tenant isolation, and eliminates runtime field creation overhead.

The system includes a comprehensive SOLR Management Dashboard that provides real-time monitoring, warmup operations, and index management capabilities with proper loading states and error handling.

SOLR Configuration Requirements

SolrCloud Mode

OpenRegister requires SOLR to run in SolrCloud mode with ZooKeeper coordination:

# docker-compose.yml
services:
  solr:
    image: solr:9-slim
    container_name: master-solr-1
    restart: always
    ports:
      - '8983:8983'
    volumes:
      - solr:/var/solr
    environment:
      - SOLR_HEAP=512m
      # SolrCloud mode with embedded ZooKeeper
      - ZK_HOST=localhost:9983
    command:
      # Start in cloud mode (no precreate needed)
      - solr
      - -c
      - -f
    healthcheck:
      test: ['CMD-SHELL', 'curl -f http://localhost:8983/solr/admin/ping || exit 1']
      interval: 30s
      timeout: 10s
      retries: 3
      
  # Optional: External ZooKeeper for production
  zookeeper:
    image: zookeeper:3.8
    container_name: master-zookeeper-1  
    restart: always
    ports:
      - '2181:2181'
      - '9983:9983'
    environment:
      - ZOO_MY_ID=1
      - ZOO_SERVERS=server.1=0.0.0.0:2888:3888;2181

Key SolrCloud Requirements

Cloud Mode: SOLR must start with '-c' flag
ZooKeeper: Required for configSet and collection management
No Authentication: Default setup without security (development)
ConfigSet API: Must support UPLOAD action for ZIP-based configSets
Collection API: Must support CREATE with configName reference

Authentication and Security

Development Configuration

For development environments, SOLR runs without authentication:

# No authentication required
environment:
  - SOLR_AUTH_TYPE=none

Production Considerations

For production deployments, consider:

# Basic authentication (if needed)
environment:
  - SOLR_AUTH_TYPE=basic
  - SOLR_AUTHENTICATION_OPTS='-Dbasicauth=admin:password'

Important: OpenRegister's setup process uses ZIP upload to bypass authentication issues with trusted configSet creation in SolrCloud mode.

Setup Process Architecture

1. Tenant ID Generation

Each Nextcloud instance gets a unique tenant ID:

/**
 * Generate tenant-specific identifier
 * Format: nc_{8-character-hash}
 * Example: nc_f0e53393
 */
private function getTenantId(): string
{
    $instanceId = $this->systemConfig->getValue('instanceid');
    return 'nc_' . substr(hash('sha256', $instanceId), 0, 8);
}

2. ConfigSet Creation Strategy

Base ConfigSet Download

OpenRegister downloads the working '_default' configSet as a foundation:

# Inside SOLR container
/opt/solr/bin/solr zk downconfig -n _default -d /tmp/default_config -z localhost:9983

System Fields Integration

The base schema is enhanced with 25 pre-configured system fields:

<!-- OpenRegister System Fields (self_*) - Always present for all tenants -->
<!-- Core tenant field -->
<field name="self_tenant" type="string" indexed="true" stored="true" required="true" multiValued="false" />

<!-- Metadata fields -->
<field name="self_object_id" type="pint" indexed="true" stored="true" multiValued="false" />
<field name="self_uuid" type="string" indexed="true" stored="true" multiValued="false" />

<!-- Context fields -->
<field name="self_register" type="pint" indexed="true" stored="true" multiValued="false" />
<field name="self_schema" type="pint" indexed="true" stored="true" multiValued="false" />
<field name="self_schema_version" type="string" indexed="true" stored="true" multiValued="false" />

<!-- Ownership and metadata -->
<field name="self_owner" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_organisation" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_application" type="string" indexed="true" stored="true" multiValued="false" />

<!-- Core object fields -->
<field name="self_name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_description" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="self_summary" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="self_image" type="string" indexed="false" stored="true" multiValued="false" />
<field name="self_slug" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_uri" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_version" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_size" type="string" indexed="false" stored="true" multiValued="false" />
<field name="self_folder" type="string" indexed="true" stored="true" multiValued="false" />

<!-- Timestamps -->
<field name="self_created" type="pdate" indexed="true" stored="true" multiValued="false" />
<field name="self_updated" type="pdate" indexed="true" stored="true" multiValued="false" />
<field name="self_published" type="pdate" indexed="true" stored="true" multiValued="false" />
<field name="self_depublished" type="pdate" indexed="true" stored="true" multiValued="false" />

<!-- Relation fields -->
<field name="self_relations" type="string" indexed="true" stored="true" multiValued="true" />
<field name="self_files" type="string" indexed="true" stored="true" multiValued="true" />
<field name="self_parent_uuid" type="string" indexed="true" stored="true" multiValued="false" />

ZIP Package Creation

The enhanced configSet is packaged into a ZIP file:

/**
 * Create configSet ZIP package
 * Location: resources/solr/openregister-configset.zip
 */
private function createConfigSetZip(): void
{
    $zipFile = $this->appPath . '/resources/solr/openregister-configset.zip';
    $configPath = $this->appPath . '/resources/solr/default_configset';
    
    $zip = new ZipArchive();
    $zip->open($zipFile, ZipArchive::CREATE | ZipArchive::OVERWRITE);
    
    // Add all configSet files
    $files = new RecursiveIteratorIterator(
        new RecursiveDirectoryIterator($configPath),
        RecursiveIteratorIterator::LEAVES_ONLY
    );
    
    foreach ($files as $file) {
        if (!$file->isDir()) {
            $filePath = $file->getRealPath();
            $relativePath = substr($filePath, strlen($configPath) + 1);
            $zip->addFile($filePath, $relativePath);
        }
    }
    
    $zip->close();
}

3. ConfigSet Upload Process

Upload API Usage

OpenRegister uses SOLR's UPLOAD API to create configSets:

/**
 * Upload configSet via ZIP to bypass authentication
 */
private function uploadConfigSet(string $configSetName): bool
{
    $zipPath = $this->appPath . '/resources/solr/openregister-configset.zip';
    $zipContent = file_get_contents($zipPath);
    
    $url = $this->buildSolrUrl() . '/admin/configs';
    $params = [
        'action' => 'UPLOAD',
        'name' => $configSetName,
        'wt' => 'json'
    ];
    
    try {
        $response = $this->httpClient->post($url . '?' . http_build_query($params), [
            'body' => $zipContent,
            'headers' => [
                'Content-Type' => 'application/octet-stream'
            ]
        ]);
        
        return $response->getStatusCode() === 200;
    } catch (RequestException $e) {
        $this->logger->error('ConfigSet upload failed', [
            'error' => $e->getMessage(),
            'configSet' => $configSetName
        ]);
        return false;
    }
}

Why ZIP Upload vs CREATE

Method	Authentication	Trusted ConfigSets	Status
CREATE	Required	❌ Fails with 401	Not usable
UPLOAD	Not required	✅ Works	Used

The UPLOAD method bypasses SolrCloud's authentication requirement for creating configSets from trusted templates.

4. Collection Creation

Tenant-Specific Naming

Collections use tenant-specific naming for isolation:

/**
 * Generate tenant-specific collection name
 * Format: openregister_{tenantId}
 * Example: openregister_nc_f0e53393
 */
public function getTenantSpecificCollectionName(): string
{
    return 'openregister_' . $this->getTenantId();
}

Collection API Call

/**
 * Create collection with tenant-specific configSet
 */
private function createCollection(string $collectionName, string $configSetName): bool
{
    $url = $this->buildSolrUrl() . '/admin/collections';
    $params = [
        'action' => 'CREATE',
        'name' => $collectionName,
        'collection.configName' => $configSetName,
        'numShards' => 1,
        'replicationFactor' => 1,
        'wt' => 'json'
    ];
    
    try {
        $response = $this->httpClient->get($url . '?' . http_build_query($params));
        $data = json_decode($response->getBody()->getContents(), true);
        
        return isset($data['responseHeader']['status']) && 
               $data['responseHeader']['status'] === 0;
    } catch (RequestException $e) {
        $this->logger->error('Collection creation failed', [
            'error' => $e->getMessage(),
            'collection' => $collectionName,
            'configSet' => $configSetName
        ]);
        return false;
    }
}

System Fields Architecture

Field Categories

OpenRegister pre-configures 25 system fields in 5 categories:

1. Core Identity (3 fields)

'self_tenant': Tenant isolation (required)
'self_object_id': Database object ID
'self_uuid': Unique identifier

2. Context Fields (3 fields)

'self_register': Register association
'self_schema': Schema reference
'self_schema_version': Schema version tracking

3. Ownership (3 fields)

'self_owner': Object owner
'self_organisation': Organization association
'self_application': Source application

4. Object Metadata (9 fields)

'self_name': Object name
'self_description': Full description
'self_summary': Short summary
'self_image': Image reference
'self_slug': URL-friendly identifier
'self_uri': Resource URI
'self_version': Object version
'self_size': Size information
'self_folder': Folder location

5. Timestamps (4 fields)

'self_created': Creation timestamp
'self_updated': Last modification
'self_published': Publication date
'self_depublished': Unpublication date

6. Relations (3 fields)

'self_relations': Related objects (multi-valued)
'self_files': Attached files (multi-valued)
'self_parent_uuid': Parent relationship

Field Type Mapping

Benefits of Pre-configured Fields

Performance: No runtime field creation overhead
Consistency: All tenants have identical system field structure
Reliability: Eliminates schema validation errors during indexing
Maintenance: Centralized field definition management
Development: Predictable field availability for queries

Setup Validation Process

Automated Validation Steps

The setup process includes comprehensive validation:

Validation Checkpoints

ConfigSet Validation: Verify configSet exists and contains system fields
Collection Validation: Confirm collection creation and accessibility
Schema Validation: Check all 25 system fields are present
Index Validation: Test document indexing with system fields
Search Validation: Verify query functionality and field access

Example Validation Output

{
  "setup_status": "success",
  "validation": {
    "configSet": {
      "name": "openregister_nc_f0e53393",
      "exists": true,
      "system_fields": 25
    },
    "collection": {
      "name": "openregister_nc_f0e53393", 
      "exists": true,
      "documents": 0
    },
    "schema": {
      "total_fields": 30,
      "system_fields": 25,
      "basic_fields": 5
    },
    "test_document": {
      "indexed": true,
      "searchable": true,
      "fields_populated": [
        "self_tenant",
        "self_uuid", 
        "self_name",
        "self_description",
        "self_created"
      ]
    }
  }
}

SOLR Management Dashboard

OpenRegister includes a comprehensive SOLR Management Dashboard accessible through the Settings page that provides:

Dashboard Features

Real-time Statistics

Connection Status: Live SOLR connectivity monitoring
Document Count: Total indexed documents (e.g., 13,489 objects)
Collection Information: Active tenant-specific collection names
Tenant ID: Current tenant identification

Interactive Operations

Warmup Index

Object Count Prediction: Displays total objects to be processed
Batch Calculation: Shows estimated batches (e.g., 14 batches for 13,489 objects)
Duration Estimation: Provides time estimates (e.g., ~21 seconds for parallel mode)
Execution Modes:
- Serial Mode (safer, slower)
- Parallel Mode (faster, more resource intensive)
Progress Tracking: Real-time warmup progress with loading states
Results Display: Comprehensive results with execution time and statistics

Clear Index

Confirmation Dialog: Safety confirmation before clearing
API Integration: Direct integration with '/api/settings/solr/clear' endpoint
Error Handling: Proper error feedback and recovery options

User Experience Features

Loading States: Spinners and disabled controls during operations
Error Feedback: Clear error messages with troubleshooting information
State Management: Proper modal state handling and cleanup
Debug Logging: Comprehensive logging for troubleshooting

Dashboard Architecture

Troubleshooting

Common Setup Issues

ConfigSet Upload Fails

Symptoms: HTTP 401 errors during configSet creation

Causes:

Attempting to use CREATE with trusted configSet
Authentication issues in SolrCloud

Solution: Use UPLOAD method with ZIP file

# Test configSet upload manually
curl -X POST -H "Content-Type:application/octet-stream" \
  --data-binary @openregister-configset.zip \
  "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=test_config"

Collection Creation Fails

Symptoms: "Underlying core creation failed" errors

Causes:

Invalid schema in configSet
Missing field type definitions
ZooKeeper connectivity issues

Solution: Validate configSet schema and ZooKeeper connection

# Check ZooKeeper connectivity
docker exec master-solr-1 /opt/solr/bin/solr healthcheck -c openregister_nc_f0e53393

# Validate schema
curl "http://localhost:8983/solr/openregister_nc_f0e53393/schema/fields?wt=json"

System Fields Missing

Symptoms: Fields not found in schema or search results

Causes:

ConfigSet ZIP doesn't contain updated schema
Collection created with wrong configSet
Schema not properly uploaded

Solution: Recreate configSet with system fields

# Verify system fields in collection
curl "http://localhost:8983/solr/openregister_nc_f0e53393/schema/fields?wt=json" | \
  grep -c "self_"

Diagnostic Commands

# Check SOLR health
docker exec master-solr-1 curl "http://localhost:8983/solr/admin/ping"

# List configSets
docker exec master-solr-1 curl "http://localhost:8983/solr/admin/configs?action=LIST&wt=json"

# List collections  
docker exec master-solr-1 curl "http://localhost:8983/solr/admin/collections?action=LIST&wt=json"

# Check collection schema
docker exec master-solr-1 curl "http://localhost:8983/solr/{collection}/schema/fields?wt=json"

# Test document indexing
docker exec master-solr-1 curl -X POST -H 'Content-Type: application/json' \
  "http://localhost:8983/solr/{collection}/update/json/docs" \
  -d '{"id": "test", "self_tenant": "nc_test", "self_name": "Test Document"}'

# Commit changes
docker exec master-solr-1 curl -X POST \
  "http://localhost:8983/solr/{collection}/update?commit=true"

# Search test
docker exec master-solr-1 curl \
  "http://localhost:8983/solr/{collection}/select?q=*:*&wt=json"

Performance Considerations

Setup Performance

ConfigSet Upload: ~500ms (one-time per tenant)
Collection Creation: ~800ms (one-time per tenant)
Schema Validation: ~100ms (per setup)
Total Setup Time: ~1.5 seconds

Runtime Performance Benefits

No Dynamic Fields: Eliminates field creation overhead during indexing
Pre-optimized Schema: Faster query planning and execution
Consistent Structure: Predictable performance characteristics
Tenant Isolation: No cross-tenant query overhead

Memory Usage

ConfigSet Storage: ~200KB per tenant configSet
System Fields: Minimal indexing overhead
Schema Cache: Shared across all documents in collection

Production Deployment

Infrastructure Requirements

# Production SOLR configuration
services:
  zookeeper:
    image: zookeeper:3.8
    deploy:
      replicas: 3
    environment:
      - ZOO_MY_ID=1
      - ZOO_SERVERS=server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
      
  solr:
    image: solr:9-slim
    deploy:
      replicas: 2
    environment:
      - SOLR_HEAP=2g
      - ZK_HOST=zoo1:2181,zoo2:2181,zoo3:2181
      - SOLR_LOG_LEVEL=WARN
    command:
      - solr
      - -c
      - -f

Monitoring Setup

# Add monitoring for SOLR
  solr-exporter:
    image: solr:9-slim
    command:
      - /opt/solr/contrib/prometheus-exporter/bin/solr-exporter
      - -p 9854
      - -z zoo1:2181,zoo2:2181,zoo3:2181
      - -f /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml

Backup Strategy

# Backup configSets
docker exec solr1 /opt/solr/bin/solr zk cp zk:/configs /backup/configs -r -z zoo1:2181

# Backup collections
docker exec solr1 /opt/solr/bin/solr create_backup -c openregister_nc_f0e53393 -b backup-$(date +%Y%m%d)

Migration and Upgrades

ConfigSet Updates

When system fields need updates:

Update local configSet files
Recreate ZIP package
Upload new configSet version
Reload collection configuration
Validate field changes

Version Management

# Tag configSet versions
docker exec master-solr-1 curl -X POST \
  "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=openregister_v2&wt=json" \
  --data-binary @openregister-configset-v2.zip

# Update collection to use new configSet
docker exec master-solr-1 curl \
  "http://localhost:8983/solr/admin/collections?action=MODIFYCOLLECTION&collection=openregister_nc_f0e53393&collection.configName=openregister_v2"

Best Practices

Development

Always test setup process with clean SOLR instance
Validate system fields after each schema change
Use version control for configSet files
Document field purpose and usage patterns

Production

Monitor setup success rates and performance
Implement automated configSet backups
Test disaster recovery procedures
Monitor system field usage and performance

Security

Restrict SOLR admin interface access
Use authentication in production environments
Implement network-level access controls
Regular security updates for SOLR and ZooKeeper

Dense Vector Configuration

OpenRegister supports Solr 9+ dense vector search for semantic similarity operations. The system automatically configures vector fields when setting up collections.

Requirements

Solr Version: 9.0+ (dense vector support introduced in Solr 9.0)
Field Type: knn_vector (DenseVectorField)
Field Name: _embedding_ (reserved system field, hardcoded)

Automatic Configuration

When running Solr setup, the system automatically:

Creates knn_vector field type with appropriate dimensions
Configures _embedding_ field in both file and object collections
Sets up supporting fields for vector metadata

Field Configuration

Vector Field Type:

<fieldType name="knn_vector" class="solr.DenseVectorField" 
           vectorDimension="4096" 
           similarityFunction="cosine" 
           knnAlgorithm="hnsw"/>

Vector Field:

<field name="_embedding_" type="knn_vector" indexed="true" stored="true" multiValued="false"/>

Important Notes:

_embedding_ is a reserved system field and cannot be changed
Field type must be knn_vector, not pfloat or other types
Vector dimensions should match your embedding model (default: 4096 for Ollama)
The field is single-valued (not multiValued) - one vector per document

Vector Storage

Vectors are stored directly in existing collections:

Files: Stored in fileCollection alongside file chunks
Objects: Stored in objectCollection alongside object data

This enables:

Single source of truth for each entity
Full document retrieval without additional lookups
Atomic updates to existing documents

KNN Search

Once configured, semantic search uses Solr's KNN query parser:

{!knn f=_embedding_ topK=10}[query_vector_array]

This returns the 10 most similar documents based on cosine similarity.

Troubleshooting

Error: "multiple values encountered for non multiValued field embedding"

This indicates the field was incorrectly configured as pfloat instead of knn_vector. Solution:

# Run Solr setup to fix schema
docker exec -u 33 master-nextcloud-1 php occ openregister:solr:manage setup

Error: Field type not found

Ensure Solr version is 9.0+ and run setup to create the field type automatically.

Performance Issues

Verify HNSW indexing is enabled
Check vector dimensions match your embedding model
Monitor Solr performance metrics

File Warmup API

The File Warmup API provides endpoints for bulk file processing, text extraction, chunking, and SOLR indexing. These endpoints enable efficient batch operations for indexing large numbers of files.

Endpoints

1. Warmup Files

POST /api/solr/warmup/files

Bulk process and index files in SOLR file collection.

Request Body:

{
  "max_files": 1000,
  "batch_size": 100,
  "file_types": ["application/pdf", "text/plain"],
  "skip_indexed": true,
  "mode": "parallel"
}

Parameters:

max_files (optional): Maximum number of files to process (default: 1000)
batch_size (optional): Number of files to process per batch (default: 100)
file_types (optional): Array of MIME types to filter (e.g., ["application/pdf"])
skip_indexed (optional): Skip files already indexed in Solr (default: true)
mode (optional): Processing mode - "parallel" or "sequential" (default: "parallel")

Response:

{
  "success": true,
  "message": "File warmup completed",
  "files_processed": 847,
  "indexed": 844,
  "failed": 3,
  "errors": ["File 123: No extracted text available"],
  "mode": "parallel"
}

2. Index Specific File

POST /api/solr/files/{fileId}/index

Index a single file in SOLR.

Response:

{
  "success": true,
  "message": "File indexed successfully",
  "file_id": 5213
}

3. Reindex All Files

POST /api/solr/files/reindex

Reindex all files that have completed text extraction.

Request Body:

{
  "max_files": 1000,
  "batch_size": 100
}

Response:

{
  "success": true,
  "message": "Reindex completed",
  "files_processed": 500,
  "indexed": 497,
  "failed": 3,
  "errors": []
}

4. Get File Index Statistics

GET /api/solr/files/stats

Get statistics about indexed files.

Response:

{
  "success": true,
  "total_chunks": 4235,
  "unique_files": 847,
  "mime_types": {
    "application/pdf": 500,
    "text/plain": 200,
    "application/vnd.openxmlformats-officedocument.wordprocessingml.document": 147
  },
  "collection": "openregister_files"
}

Usage Examples

cURL: Warmup Files

curl -X POST -u 'admin:admin' \
  -H 'Content-Type: application/json' \
  -d '{
    "max_files": 500,
    "batch_size": 50,
    "file_types": ["application/pdf"],
    "skip_indexed": true
  }' \
  http://master-nextcloud-1/index.php/apps/openregister/api/solr/warmup/files

cURL: Index Specific File

curl -X POST -u 'admin:admin' \
  http://master-nextcloud-1/index.php/apps/openregister/api/solr/files/5213/index

cURL: Get Stats

curl -u 'admin:admin' \
  http://master-nextcloud-1/index.php/apps/openregister/api/solr/files/stats

Error Handling

All endpoints return proper HTTP status codes:

200: Success
422: Unprocessable (e.g., file has no extracted text)
500: Internal server error

Error responses include:

{
  "success": false,
  "message": "Error description here"
}

Implementation Details

The warmup endpoints are implemented in SettingsController.php:

warmupFiles(): Gets files that need indexing, filters by MIME type if specified, processes in batches, returns comprehensive results
indexFile(int $fileId): Indexes a single file, returns success/failure
reindexFiles(): Gets all completed file texts, reindexes in batches, returns statistics
getFileIndexStats(): Queries SOLR for statistics, returns chunk counts and file counts

Integration with Frontend

These endpoints are used by:

SOLR Configuration Modal - File warmup UI
File Management Dialog - Individual file indexing
Dashboard - Statistics display

Published-Only Indexing Strategy

OpenRegister implements a published-only indexing strategy for Apache Solr search functionality. This means that only objects with a published date are indexed to Solr, ensuring that search results only contain publicly available content.

Implementation Details

Current Behavior

Single Object Indexing: The indexObject() method checks if an object has a published date before indexing
Bulk Indexing: Both bulkIndexFromDatabase() and bulkIndexFromDatabaseOptimized() methods filter out unpublished objects
Search Results: Only published objects appear in search results since unpublished objects are not indexed

Code Locations

The published-only logic is implemented in:

lib/Service/GuzzleSolrService.php::indexObject() - Single object indexing
lib/Service/GuzzleSolrService.php::bulkIndexFromDatabase() - Bulk indexing (serial mode)
lib/Service/GuzzleSolrService.php::bulkIndexFromDatabaseOptimized() - Bulk indexing (optimized mode)

Database vs Solr Counts

The system tracks two different counts:

Published Count: Number of objects in the database with a published date (from oc_openregister_objects table)
Indexed Count: Number of documents actually indexed in Solr (should match published count)

These counts are displayed in the Solr Configuration dashboard to help administrators monitor indexing status.

Benefits

Relevant Search Results: Users only see content that is meant to be public
Performance: Smaller Solr index size improves search performance
Security: Unpublished/draft content is not accidentally exposed through search
Resource Efficiency: Reduced storage and memory usage in Solr

Monitoring

Dashboard Statistics

The Solr Configuration dashboard shows:

Indexed Documents: Number of documents in Solr
Published Objects Available: Total number of published objects in the database

If these numbers don't match, it indicates that some published objects haven't been indexed yet.

Logging

The system logs when unpublished objects are skipped:

Single objects: DEBUG level - 'Skipping indexing of unpublished object'
Bulk operations: INFO level - 'Skipped unpublished objects in batch'

Configuration

No additional configuration is required. The published-only indexing is enabled by default and works automatically based on the published field in object entities.

Troubleshooting

Common Issues

Mismatched Counts: If indexed count < published count
- Run a Solr warmup to re-index all published objects
- Check Solr logs for indexing errors
Objects Not Appearing in Search:
- Verify the object has a published date set
- Check if the object was indexed after being published
- Run a manual re-index if needed
Performance Issues:
- Monitor the published vs indexed ratio
- Consider batch size adjustments for bulk operations

Debugging

Enable debug logging to see which objects are being skipped:

// In your Nextcloud config
'loglevel' => 0, // Debug level

Look for log entries containing:

'Skipping indexing of unpublished object'
'Skipped unpublished objects in batch'

Future Considerations

TODO: Full Object Indexing

There are TODO comments in the code indicating that in the future, we may want to index all objects to Solr for comprehensive search capabilities. This would require:

Access Control: Implementing proper access control in search queries
Filtering: Adding published/unpublished filters to search results
Performance: Handling larger index sizes
Security: Ensuring unpublished content is properly protected

Collection-Specific Endpoints

OpenRegister uses RESTful collection-specific endpoints for Solr collection management operations. Collection names are specified as URL parameters, following REST principles.

Endpoints

1. Delete Specific Collection

DELETE /api/solr/collections/{name}

Controller Method: SettingsController::deleteSpecificSolrCollection(string $name)

Example:

curl -X DELETE "http://nextcloud.local/index.php/apps/openregister/api/solr/collections/nc_test_collection" \
  -u "admin:admin"

Response:

{
  "success": true,
  "message": "Collection deleted successfully",
  "collection": "nc_test_collection"
}

2. Clear Specific Collection

POST /api/solr/collections/{name}/clear

Controller Method: SettingsController::clearSpecificCollection(string $name)

Example:

curl -X POST "http://nextcloud.local/index.php/apps/openregister/api/solr/collections/nc_test_collection/clear" \
  -u "admin:admin"

Response:

{
  "success": true,
  "message": "Collection cleared successfully",
  "collection": "nc_test_collection"
}

3. Reindex Specific Collection

POST /api/solr/collections/{name}/reindex

Controller Method: SettingsController::reindexSpecificCollection(string $name)

Example:

curl -X POST "http://nextcloud.local/index.php/apps/openregister/api/solr/collections/nc_test_collection/reindex" \
  -u "admin:admin"

Response:

{
  "success": true,
  "message": "Reindex completed successfully",
  "stats": {
    "processed_objects": 1250,
    "duration_seconds": 4.5
  },
  "collection": "nc_test_collection"
}

Benefits

RESTful Design

Collection name is now part of the URL path, following REST principles
Resources are clearly identified by their URLs
HTTP verbs (DELETE, POST) indicate the action

Improved API Clarity

No ambiguity about which collection is being operated on
Collection name is explicit in every request
Easier to read API logs and debug issues

Better Error Handling

404 errors now correctly indicate "collection not found"
URL validation happens at the routing level
Clearer separation between route parameters and request body

Migration from Old Endpoints

The following old endpoints have been removed:

❌ POST /api/solr/reindex (replaced by /api/solr/collections/{name}/reindex)
❌ POST /api/settings/solr/clear (replaced by /api/solr/collections/{name}/clear)
❌ DELETE /api/solr/collection/delete (replaced by DELETE /api/solr/collections/{name})

Vector Search Backends - Complete vector backend guide
Vectorization Architecture - How vectors are generated
Solr Development Troubleshooting - Development troubleshooting guide

This comprehensive setup process ensures reliable, performant, and maintainable SOLR integration with proper tenant isolation and pre-configured system fields for optimal runtime performance.

Overview​

SOLR Configuration Requirements​

SolrCloud Mode​

Key SolrCloud Requirements​

Authentication and Security​

Development Configuration​

Production Considerations​

Setup Process Architecture​

1. Tenant ID Generation​

2. ConfigSet Creation Strategy​

Base ConfigSet Download​

System Fields Integration​

ZIP Package Creation​

3. ConfigSet Upload Process​

Upload API Usage​

Why ZIP Upload vs CREATE​

4. Collection Creation​

Tenant-Specific Naming​

Collection API Call​

System Fields Architecture​

Field Categories​

1. Core Identity (3 fields)​

2. Context Fields (3 fields)​

3. Ownership (3 fields)​

4. Object Metadata (9 fields)​

5. Timestamps (4 fields)​

6. Relations (3 fields)​

Field Type Mapping​

Benefits of Pre-configured Fields​

Setup Validation Process​

Automated Validation Steps​

Validation Checkpoints​

Example Validation Output​

SOLR Management Dashboard​

Dashboard Features​

Real-time Statistics​

Interactive Operations​

Warmup Index​

Clear Index​

User Experience Features​

Dashboard Architecture​

Troubleshooting​

Common Setup Issues​

ConfigSet Upload Fails​

Collection Creation Fails​

System Fields Missing​

Diagnostic Commands​

Performance Considerations​

Setup Performance​

Runtime Performance Benefits​

Memory Usage​

Production Deployment​

Infrastructure Requirements​

Monitoring Setup​

Backup Strategy​

Migration and Upgrades​

ConfigSet Updates​

Version Management​

Best Practices​

Development​

Production​

Security​

Dense Vector Configuration​

Requirements​

Automatic Configuration​

Field Configuration​

Vector Storage​

KNN Search​

Troubleshooting​

File Warmup API​

Endpoints​

1. Warmup Files​

2. Index Specific File​

3. Reindex All Files​

4. Get File Index Statistics​

Usage Examples​

cURL: Warmup Files​

cURL: Index Specific File​

cURL: Get Stats​

Error Handling​

Overview

SOLR Configuration Requirements

SolrCloud Mode

Key SolrCloud Requirements

Authentication and Security

Development Configuration

Production Considerations

Setup Process Architecture

1. Tenant ID Generation

2. ConfigSet Creation Strategy

Base ConfigSet Download

System Fields Integration

ZIP Package Creation

3. ConfigSet Upload Process

Upload API Usage

Why ZIP Upload vs CREATE

4. Collection Creation

Tenant-Specific Naming

Collection API Call

System Fields Architecture

Field Categories

1. Core Identity (3 fields)

2. Context Fields (3 fields)

3. Ownership (3 fields)

4. Object Metadata (9 fields)

5. Timestamps (4 fields)

6. Relations (3 fields)

Field Type Mapping

Benefits of Pre-configured Fields

Setup Validation Process

Automated Validation Steps

Validation Checkpoints

Example Validation Output

SOLR Management Dashboard

Dashboard Features

Real-time Statistics

Interactive Operations

Warmup Index

Clear Index

User Experience Features

Dashboard Architecture

Troubleshooting

Common Setup Issues

ConfigSet Upload Fails

Collection Creation Fails

System Fields Missing

Diagnostic Commands

Performance Considerations

Setup Performance

Runtime Performance Benefits

Memory Usage

Production Deployment

Infrastructure Requirements

Monitoring Setup

Backup Strategy

Migration and Upgrades

ConfigSet Updates

Version Management

Best Practices

Development

Production

Security

Dense Vector Configuration

Requirements

Automatic Configuration

Field Configuration

Vector Storage

KNN Search

Troubleshooting

File Warmup API

Endpoints

1. Warmup Files

2. Index Specific File

3. Reindex All Files

4. Get File Index Statistics

Usage Examples

cURL: Warmup Files

cURL: Index Specific File

cURL: Get Stats

Error Handling