SOLR Setup and Configuration Guide
This guide documents the complete SOLR setup process used by OpenRegister, including SolrCloud configuration requirements, authentication handling, and system field pre-configuration.
Overview
OpenRegister uses Apache SOLR in SolrCloud mode with a sophisticated setup process that automatically creates tenant-specific configSets and collections with pre-configured system fields. This approach ensures optimal performance, proper tenant isolation, and eliminates runtime field creation overhead.
The system includes a comprehensive SOLR Management Dashboard that provides real-time monitoring, warmup operations, and index management capabilities with proper loading states and error handling.
SOLR Configuration Requirements
SolrCloud Mode
OpenRegister requires SOLR to run in SolrCloud mode with ZooKeeper coordination:
# docker-compose.yml
services:
solr:
image: solr:9-slim
container_name: master-solr-1
restart: always
ports:
- '8983:8983'
volumes:
- solr:/var/solr
environment:
- SOLR_HEAP=512m
# SolrCloud mode with embedded ZooKeeper
- ZK_HOST=localhost:9983
command:
# Start in cloud mode (no precreate needed)
- solr
- -c
- -f
healthcheck:
test: ['CMD-SHELL', 'curl -f http://localhost:8983/solr/admin/ping || exit 1']
interval: 30s
timeout: 10s
retries: 3
# Optional: External ZooKeeper for production
zookeeper:
image: zookeeper:3.8
container_name: master-zookeeper-1
restart: always
ports:
- '2181:2181'
- '9983:9983'
environment:
- ZOO_MY_ID=1
- ZOO_SERVERS=server.1=0.0.0.0:2888:3888;2181
Key SolrCloud Requirements
- Cloud Mode: SOLR must start with '-c' flag
- ZooKeeper: Required for configSet and collection management
- No Authentication: Default setup without security (development)
- ConfigSet API: Must support UPLOAD action for ZIP-based configSets
- Collection API: Must support CREATE with configName reference
Authentication and Security
Development Configuration
For development environments, SOLR runs without authentication:
# No authentication required
environment:
- SOLR_AUTH_TYPE=none
Production Considerations
For production deployments, consider:
# Basic authentication (if needed)
environment:
- SOLR_AUTH_TYPE=basic
- SOLR_AUTHENTICATION_OPTS='-Dbasicauth=admin:password'
Important: OpenRegister's setup process uses ZIP upload to bypass authentication issues with trusted configSet creation in SolrCloud mode.
Setup Process Architecture
1. Tenant ID Generation
Each Nextcloud instance gets a unique tenant ID:
/**
* Generate tenant-specific identifier
* Format: nc_{8-character-hash}
* Example: nc_f0e53393
*/
private function getTenantId(): string
{
$instanceId = $this->systemConfig->getValue('instanceid');
return 'nc_' . substr(hash('sha256', $instanceId), 0, 8);
}
2. ConfigSet Creation Strategy
Base ConfigSet Download
OpenRegister downloads the working '_default' configSet as a foundation:
# Inside SOLR container
/opt/solr/bin/solr zk downconfig -n _default -d /tmp/default_config -z localhost:9983
System Fields Integration
The base schema is enhanced with 25 pre-configured system fields:
<!-- OpenRegister System Fields (self_*) - Always present for all tenants -->
<!-- Core tenant field -->
<field name="self_tenant" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<!-- Metadata fields -->
<field name="self_object_id" type="pint" indexed="true" stored="true" multiValued="false" />
<field name="self_uuid" type="string" indexed="true" stored="true" multiValued="false" />
<!-- Context fields -->
<field name="self_register" type="pint" indexed="true" stored="true" multiValued="false" />
<field name="self_schema" type="pint" indexed="true" stored="true" multiValued="false" />
<field name="self_schema_version" type="string" indexed="true" stored="true" multiValued="false" />
<!-- Ownership and metadata -->
<field name="self_owner" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_organisation" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_application" type="string" indexed="true" stored="true" multiValued="false" />
<!-- Core object fields -->
<field name="self_name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_description" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="self_summary" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="self_image" type="string" indexed="false" stored="true" multiValued="false" />
<field name="self_slug" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_uri" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_version" type="string" indexed="true" stored="true" multiValued="false" />
<field name="self_size" type="string" indexed="false" stored="true" multiValued="false" />
<field name="self_folder" type="string" indexed="true" stored="true" multiValued="false" />
<!-- Timestamps -->
<field name="self_created" type="pdate" indexed="true" stored="true" multiValued="false" />
<field name="self_updated" type="pdate" indexed="true" stored="true" multiValued="false" />
<field name="self_published" type="pdate" indexed="true" stored="true" multiValued="false" />
<field name="self_depublished" type="pdate" indexed="true" stored="true" multiValued="false" />
<!-- Relation fields -->
<field name="self_relations" type="string" indexed="true" stored="true" multiValued="true" />
<field name="self_files" type="string" indexed="true" stored="true" multiValued="true" />
<field name="self_parent_uuid" type="string" indexed="true" stored="true" multiValued="false" />
ZIP Package Creation
The enhanced configSet is packaged into a ZIP file:
/**
* Create configSet ZIP package
* Location: resources/solr/openregister-configset.zip
*/
private function createConfigSetZip(): void
{
$zipFile = $this->appPath . '/resources/solr/openregister-configset.zip';
$configPath = $this->appPath . '/resources/solr/default_configset';
$zip = new ZipArchive();
$zip->open($zipFile, ZipArchive::CREATE | ZipArchive::OVERWRITE);
// Add all configSet files
$files = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($configPath),
RecursiveIteratorIterator::LEAVES_ONLY
);
foreach ($files as $file) {
if (!$file->isDir()) {
$filePath = $file->getRealPath();
$relativePath = substr($filePath, strlen($configPath) + 1);
$zip->addFile($filePath, $relativePath);
}
}
$zip->close();
}
3. ConfigSet Upload Process
Upload API Usage
OpenRegister uses SOLR's UPLOAD API to create configSets:
/**
* Upload configSet via ZIP to bypass authentication
*/
private function uploadConfigSet(string $configSetName): bool
{
$zipPath = $this->appPath . '/resources/solr/openregister-configset.zip';
$zipContent = file_get_contents($zipPath);
$url = $this->buildSolrUrl() . '/admin/configs';
$params = [
'action' => 'UPLOAD',
'name' => $configSetName,
'wt' => 'json'
];
try {
$response = $this->httpClient->post($url . '?' . http_build_query($params), [
'body' => $zipContent,
'headers' => [
'Content-Type' => 'application/octet-stream'
]
]);
return $response->getStatusCode() === 200;
} catch (RequestException $e) {
$this->logger->error('ConfigSet upload failed', [
'error' => $e->getMessage(),
'configSet' => $configSetName
]);
return false;
}
}
Why ZIP Upload vs CREATE
| Method | Authentication | Trusted ConfigSets | Status |
|---|---|---|---|
| CREATE | Required | ❌ Fails with 401 | Not usable |
| UPLOAD | Not required | ✅ Works | Used |
The UPLOAD method bypasses SolrCloud's authentication requirement for creating configSets from trusted templates.
4. Collection Creation
Tenant-Specific Naming
Collections use tenant-specific naming for isolation:
/**
* Generate tenant-specific collection name
* Format: openregister_{tenantId}
* Example: openregister_nc_f0e53393
*/
public function getTenantSpecificCollectionName(): string
{
return 'openregister_' . $this->getTenantId();
}
Collection API Call
/**
* Create collection with tenant-specific configSet
*/
private function createCollection(string $collectionName, string $configSetName): bool
{
$url = $this->buildSolrUrl() . '/admin/collections';
$params = [
'action' => 'CREATE',
'name' => $collectionName,
'collection.configName' => $configSetName,
'numShards' => 1,
'replicationFactor' => 1,
'wt' => 'json'
];
try {
$response = $this->httpClient->get($url . '?' . http_build_query($params));
$data = json_decode($response->getBody()->getContents(), true);
return isset($data['responseHeader']['status']) &&
$data['responseHeader']['status'] === 0;
} catch (RequestException $e) {
$this->logger->error('Collection creation failed', [
'error' => $e->getMessage(),
'collection' => $collectionName,
'configSet' => $configSetName
]);
return false;
}
}
System Fields Architecture
Field Categories
OpenRegister pre-configures 25 system fields in 5 categories:
1. Core Identity (3 fields)
- 'self_tenant': Tenant isolation (required)
- 'self_object_id': Database object ID
- 'self_uuid': Unique identifier
2. Context Fields (3 fields)
- 'self_register': Register association
- 'self_schema': Schema reference
- 'self_schema_version': Schema version tracking
3. Ownership (3 fields)
- 'self_owner': Object owner
- 'self_organisation': Organization association
- 'self_application': Source application
4. Object Metadata (9 fields)
- 'self_name': Object name
- 'self_description': Full description
- 'self_summary': Short summary
- 'self_image': Image reference
- 'self_slug': URL-friendly identifier
- 'self_uri': Resource URI
- 'self_version': Object version
- 'self_size': Size information
- 'self_folder': Folder location
5. Timestamps (4 fields)
- 'self_created': Creation timestamp
- 'self_updated': Last modification
- 'self_published': Publication date
- 'self_depublished': Unpublication date
6. Relations (3 fields)
- 'self_relations': Related objects (multi-valued)
- 'self_files': Attached files (multi-valued)
- 'self_parent_uuid': Parent relationship
Field Type Mapping
Benefits of Pre-configured Fields
- Performance: No runtime field creation overhead
- Consistency: All tenants have identical system field structure
- Reliability: Eliminates schema validation errors during indexing
- Maintenance: Centralized field definition management
- Development: Predictable field availability for queries
Setup Validation Process
Automated Validation Steps
The setup process includes comprehensive validation:
Validation Checkpoints
- ConfigSet Validation: Verify configSet exists and contains system fields
- Collection Validation: Confirm collection creation and accessibility
- Schema Validation: Check all 25 system fields are present
- Index Validation: Test document indexing with system fields
- Search Validation: Verify query functionality and field access
Example Validation Output
{
"setup_status": "success",
"validation": {
"configSet": {
"name": "openregister_nc_f0e53393",
"exists": true,
"system_fields": 25
},
"collection": {
"name": "openregister_nc_f0e53393",
"exists": true,
"documents": 0
},
"schema": {
"total_fields": 30,
"system_fields": 25,
"basic_fields": 5
},
"test_document": {
"indexed": true,
"searchable": true,
"fields_populated": [
"self_tenant",
"self_uuid",
"self_name",
"self_description",
"self_created"
]
}
}
}
SOLR Management Dashboard
OpenRegister includes a comprehensive SOLR Management Dashboard accessible through the Settings page that provides:
Dashboard Features
Real-time Statistics
- Connection Status: Live SOLR connectivity monitoring
- Document Count: Total indexed documents (e.g., 13,489 objects)
- Collection Information: Active tenant-specific collection names
- Tenant ID: Current tenant identification
Interactive Operations
Warmup Index
- Object Count Prediction: Displays total objects to be processed
- Batch Calculation: Shows estimated batches (e.g., 14 batches for 13,489 objects)
- Duration Estimation: Provides time estimates (e.g., ~21 seconds for parallel mode)
- Execution Modes:
- Serial Mode (safer, slower)
- Parallel Mode (faster, more resource intensive)
- Progress Tracking: Real-time warmup progress with loading states
- Results Display: Comprehensive results with execution time and statistics
Clear Index
- Confirmation Dialog: Safety confirmation before clearing
- API Integration: Direct integration with '/api/settings/solr/clear' endpoint
- Error Handling: Proper error feedback and recovery options
User Experience Features
- Loading States: Spinners and disabled controls during operations
- Error Feedback: Clear error messages with troubleshooting information
- State Management: Proper modal state handling and cleanup
- Debug Logging: Comprehensive logging for troubleshooting
Dashboard Architecture
Troubleshooting
Common Setup Issues
ConfigSet Upload Fails
Symptoms: HTTP 401 errors during configSet creation
Causes:
- Attempting to use CREATE with trusted configSet
- Authentication issues in SolrCloud
Solution: Use UPLOAD method with ZIP file
# Test configSet upload manually
curl -X POST -H "Content-Type:application/octet-stream" \
--data-binary @openregister-configset.zip \
"http://localhost:8983/solr/admin/configs?action=UPLOAD&name=test_config"
Collection Creation Fails
Symptoms: "Underlying core creation failed" errors
Causes:
- Invalid schema in configSet
- Missing field type definitions
- ZooKeeper connectivity issues
Solution: Validate configSet schema and ZooKeeper connection
# Check ZooKeeper connectivity
docker exec master-solr-1 /opt/solr/bin/solr healthcheck -c openregister_nc_f0e53393
# Validate schema
curl "http://localhost:8983/solr/openregister_nc_f0e53393/schema/fields?wt=json"
System Fields Missing
Symptoms: Fields not found in schema or search results
Causes:
- ConfigSet ZIP doesn't contain updated schema
- Collection created with wrong configSet
- Schema not properly uploaded
Solution: Recreate configSet with system fields
# Verify system fields in collection
curl "http://localhost:8983/solr/openregister_nc_f0e53393/schema/fields?wt=json" | \
grep -c "self_"
Diagnostic Commands
# Check SOLR health
docker exec master-solr-1 curl "http://localhost:8983/solr/admin/ping"
# List configSets
docker exec master-solr-1 curl "http://localhost:8983/solr/admin/configs?action=LIST&wt=json"
# List collections
docker exec master-solr-1 curl "http://localhost:8983/solr/admin/collections?action=LIST&wt=json"
# Check collection schema
docker exec master-solr-1 curl "http://localhost:8983/solr/{collection}/schema/fields?wt=json"
# Test document indexing
docker exec master-solr-1 curl -X POST -H 'Content-Type: application/json' \
"http://localhost:8983/solr/{collection}/update/json/docs" \
-d '{"id": "test", "self_tenant": "nc_test", "self_name": "Test Document"}'
# Commit changes
docker exec master-solr-1 curl -X POST \
"http://localhost:8983/solr/{collection}/update?commit=true"
# Search test
docker exec master-solr-1 curl \
"http://localhost:8983/solr/{collection}/select?q=*:*&wt=json"
Performance Considerations
Setup Performance
- ConfigSet Upload: ~500ms (one-time per tenant)
- Collection Creation: ~800ms (one-time per tenant)
- Schema Validation: ~100ms (per setup)
- Total Setup Time: ~1.5 seconds
Runtime Performance Benefits
- No Dynamic Fields: Eliminates field creation overhead during indexing
- Pre-optimized Schema: Faster query planning and execution
- Consistent Structure: Predictable performance characteristics
- Tenant Isolation: No cross-tenant query overhead
Memory Usage
- ConfigSet Storage: ~200KB per tenant configSet
- System Fields: Minimal indexing overhead
- Schema Cache: Shared across all documents in collection
Production Deployment
Infrastructure Requirements
# Production SOLR configuration
services:
zookeeper:
image: zookeeper:3.8
deploy:
replicas: 3
environment:
- ZOO_MY_ID=1
- ZOO_SERVERS=server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
solr:
image: solr:9-slim
deploy:
replicas: 2
environment:
- SOLR_HEAP=2g
- ZK_HOST=zoo1:2181,zoo2:2181,zoo3:2181
- SOLR_LOG_LEVEL=WARN
command:
- solr
- -c
- -f
Monitoring Setup
# Add monitoring for SOLR
solr-exporter:
image: solr:9-slim
command:
- /opt/solr/contrib/prometheus-exporter/bin/solr-exporter
- -p 9854
- -z zoo1:2181,zoo2:2181,zoo3:2181
- -f /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml
Backup Strategy
# Backup configSets
docker exec solr1 /opt/solr/bin/solr zk cp zk:/configs /backup/configs -r -z zoo1:2181
# Backup collections
docker exec solr1 /opt/solr/bin/solr create_backup -c openregister_nc_f0e53393 -b backup-$(date +%Y%m%d)
Migration and Upgrades
ConfigSet Updates
When system fields need updates:
- Update local configSet files
- Recreate ZIP package
- Upload new configSet version
- Reload collection configuration
- Validate field changes
Version Management
# Tag configSet versions
docker exec master-solr-1 curl -X POST \
"http://localhost:8983/solr/admin/configs?action=UPLOAD&name=openregister_v2&wt=json" \
--data-binary @openregister-configset-v2.zip
# Update collection to use new configSet
docker exec master-solr-1 curl \
"http://localhost:8983/solr/admin/collections?action=MODIFYCOLLECTION&collection=openregister_nc_f0e53393&collection.configName=openregister_v2"
Best Practices
Development
- Always test setup process with clean SOLR instance
- Validate system fields after each schema change
- Use version control for configSet files
- Document field purpose and usage patterns
Production
- Monitor setup success rates and performance
- Implement automated configSet backups
- Test disaster recovery procedures
- Monitor system field usage and performance
Security
- Restrict SOLR admin interface access
- Use authentication in production environments
- Implement network-level access controls
- Regular security updates for SOLR and ZooKeeper
Dense Vector Configuration
OpenRegister supports Solr 9+ dense vector search for semantic similarity operations. The system automatically configures vector fields when setting up collections.
Requirements
- Solr Version: 9.0+ (dense vector support introduced in Solr 9.0)
- Field Type:
knn_vector(DenseVectorField) - Field Name:
_embedding_(reserved system field, hardcoded)
Automatic Configuration
When running Solr setup, the system automatically:
- Creates
knn_vectorfield type with appropriate dimensions - Configures
_embedding_field in both file and object collections - Sets up supporting fields for vector metadata
Field Configuration
Vector Field Type:
<fieldType name="knn_vector" class="solr.DenseVectorField"
vectorDimension="4096"
similarityFunction="cosine"
knnAlgorithm="hnsw"/>
Vector Field:
<field name="_embedding_" type="knn_vector" indexed="true" stored="true" multiValued="false"/>
Important Notes:
_embedding_is a reserved system field and cannot be changed- Field type must be
knn_vector, notpfloator other types - Vector dimensions should match your embedding model (default: 4096 for Ollama)
- The field is single-valued (not multiValued) - one vector per document
Vector Storage
Vectors are stored directly in existing collections:
- Files: Stored in
fileCollectionalongside file chunks - Objects: Stored in
objectCollectionalongside object data
This enables:
- Single source of truth for each entity
- Full document retrieval without additional lookups
- Atomic updates to existing documents
KNN Search
Once configured, semantic search uses Solr's KNN query parser:
{!knn f=_embedding_ topK=10}[query_vector_array]
This returns the 10 most similar documents based on cosine similarity.
Troubleshooting
Error: "multiple values encountered for non multiValued field embedding"
This indicates the field was incorrectly configured as pfloat instead of knn_vector. Solution:
# Run Solr setup to fix schema
docker exec -u 33 master-nextcloud-1 php occ openregister:solr:manage setup
Error: Field type not found
Ensure Solr version is 9.0+ and run setup to create the field type automatically.
Performance Issues
- Verify HNSW indexing is enabled
- Check vector dimensions match your embedding model
- Monitor Solr performance metrics
File Warmup API
The File Warmup API provides endpoints for bulk file processing, text extraction, chunking, and SOLR indexing. These endpoints enable efficient batch operations for indexing large numbers of files.
Endpoints
1. Warmup Files
POST /api/solr/warmup/files
Bulk process and index files in SOLR file collection.
Request Body:
{
"max_files": 1000,
"batch_size": 100,
"file_types": ["application/pdf", "text/plain"],
"skip_indexed": true,
"mode": "parallel"
}
Parameters:
max_files(optional): Maximum number of files to process (default: 1000)batch_size(optional): Number of files to process per batch (default: 100)file_types(optional): Array of MIME types to filter (e.g.,["application/pdf"])skip_indexed(optional): Skip files already indexed in Solr (default: true)mode(optional): Processing mode - "parallel" or "sequential" (default: "parallel")
Response:
{
"success": true,
"message": "File warmup completed",
"files_processed": 847,
"indexed": 844,
"failed": 3,
"errors": ["File 123: No extracted text available"],
"mode": "parallel"
}
2. Index Specific File
POST /api/solr/files/{fileId}/index
Index a single file in SOLR.
Response:
{
"success": true,
"message": "File indexed successfully",
"file_id": 5213
}
3. Reindex All Files
POST /api/solr/files/reindex
Reindex all files that have completed text extraction.
Request Body:
{
"max_files": 1000,
"batch_size": 100
}
Response:
{
"success": true,
"message": "Reindex completed",
"files_processed": 500,
"indexed": 497,
"failed": 3,
"errors": []
}
4. Get File Index Statistics
GET /api/solr/files/stats
Get statistics about indexed files.
Response:
{
"success": true,
"total_chunks": 4235,
"unique_files": 847,
"mime_types": {
"application/pdf": 500,
"text/plain": 200,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": 147
},
"collection": "openregister_files"
}
Usage Examples
cURL: Warmup Files
curl -X POST -u 'admin:admin' \
-H 'Content-Type: application/json' \
-d '{
"max_files": 500,
"batch_size": 50,
"file_types": ["application/pdf"],
"skip_indexed": true
}' \
http://master-nextcloud-1/index.php/apps/openregister/api/solr/warmup/files
cURL: Index Specific File
curl -X POST -u 'admin:admin' \
http://master-nextcloud-1/index.php/apps/openregister/api/solr/files/5213/index
cURL: Get Stats
curl -u 'admin:admin' \
http://master-nextcloud-1/index.php/apps/openregister/api/solr/files/stats
Error Handling
All endpoints return proper HTTP status codes:
- 200: Success
- 422: Unprocessable (e.g., file has no extracted text)
- 500: Internal server error
Error responses include:
{
"success": false,
"message": "Error description here"
}
Implementation Details
The warmup endpoints are implemented in SettingsController.php:
warmupFiles(): Gets files that need indexing, filters by MIME type if specified, processes in batches, returns comprehensive resultsindexFile(int $fileId): Indexes a single file, returns success/failurereindexFiles(): Gets all completed file texts, reindexes in batches, returns statisticsgetFileIndexStats(): Queries SOLR for statistics, returns chunk counts and file counts
Integration with Frontend
These endpoints are used by:
- SOLR Configuration Modal - File warmup UI
- File Management Dialog - Individual file indexing
- Dashboard - Statistics display
Published-Only Indexing Strategy
OpenRegister implements a published-only indexing strategy for Apache Solr search functionality. This means that only objects with a published date are indexed to Solr, ensuring that search results only contain publicly available content.
Implementation Details
Current Behavior
- Single Object Indexing: The
indexObject()method checks if an object has apublisheddate before indexing - Bulk Indexing: Both
bulkIndexFromDatabase()andbulkIndexFromDatabaseOptimized()methods filter out unpublished objects - Search Results: Only published objects appear in search results since unpublished objects are not indexed
Code Locations
The published-only logic is implemented in:
lib/Service/GuzzleSolrService.php::indexObject()- Single object indexinglib/Service/GuzzleSolrService.php::bulkIndexFromDatabase()- Bulk indexing (serial mode)lib/Service/GuzzleSolrService.php::bulkIndexFromDatabaseOptimized()- Bulk indexing (optimized mode)
Database vs Solr Counts
The system tracks two different counts:
- Published Count: Number of objects in the database with a
publisheddate (fromoc_openregister_objectstable) - Indexed Count: Number of documents actually indexed in Solr (should match published count)
These counts are displayed in the Solr Configuration dashboard to help administrators monitor indexing status.
Benefits
- Relevant Search Results: Users only see content that is meant to be public
- Performance: Smaller Solr index size improves search performance
- Security: Unpublished/draft content is not accidentally exposed through search
- Resource Efficiency: Reduced storage and memory usage in Solr
Monitoring
Dashboard Statistics
The Solr Configuration dashboard shows:
- Indexed Documents: Number of documents in Solr
- Published Objects Available: Total number of published objects in the database
If these numbers don't match, it indicates that some published objects haven't been indexed yet.
Logging
The system logs when unpublished objects are skipped:
- Single objects:
DEBUGlevel - 'Skipping indexing of unpublished object' - Bulk operations:
INFOlevel - 'Skipped unpublished objects in batch'
Configuration
No additional configuration is required. The published-only indexing is enabled by default and works automatically based on the published field in object entities.
Troubleshooting
Common Issues
-
Mismatched Counts: If indexed count < published count
- Run a Solr warmup to re-index all published objects
- Check Solr logs for indexing errors
-
Objects Not Appearing in Search:
- Verify the object has a
publisheddate set - Check if the object was indexed after being published
- Run a manual re-index if needed
- Verify the object has a
-
Performance Issues:
- Monitor the published vs indexed ratio
- Consider batch size adjustments for bulk operations
Debugging
Enable debug logging to see which objects are being skipped:
// In your Nextcloud config
'loglevel' => 0, // Debug level
Look for log entries containing:
- 'Skipping indexing of unpublished object'
- 'Skipped unpublished objects in batch'
Future Considerations
TODO: Full Object Indexing
There are TODO comments in the code indicating that in the future, we may want to index all objects to Solr for comprehensive search capabilities. This would require:
- Access Control: Implementing proper access control in search queries
- Filtering: Adding published/unpublished filters to search results
- Performance: Handling larger index sizes
- Security: Ensuring unpublished content is properly protected
Collection-Specific Endpoints
OpenRegister uses RESTful collection-specific endpoints for Solr collection management operations. Collection names are specified as URL parameters, following REST principles.
Endpoints
1. Delete Specific Collection
DELETE /api/solr/collections/{name}
Controller Method: SettingsController::deleteSpecificSolrCollection(string $name)
Example:
curl -X DELETE "http://nextcloud.local/index.php/apps/openregister/api/solr/collections/nc_test_collection" \
-u "admin:admin"
Response:
{
"success": true,
"message": "Collection deleted successfully",
"collection": "nc_test_collection"
}
2. Clear Specific Collection
POST /api/solr/collections/{name}/clear
Controller Method: SettingsController::clearSpecificCollection(string $name)
Example:
curl -X POST "http://nextcloud.local/index.php/apps/openregister/api/solr/collections/nc_test_collection/clear" \
-u "admin:admin"
Response:
{
"success": true,
"message": "Collection cleared successfully",
"collection": "nc_test_collection"
}
3. Reindex Specific Collection
POST /api/solr/collections/{name}/reindex
Controller Method: SettingsController::reindexSpecificCollection(string $name)
Example:
curl -X POST "http://nextcloud.local/index.php/apps/openregister/api/solr/collections/nc_test_collection/reindex" \
-u "admin:admin"
Response:
{
"success": true,
"message": "Reindex completed successfully",
"stats": {
"processed_objects": 1250,
"duration_seconds": 4.5
},
"collection": "nc_test_collection"
}
Benefits
RESTful Design
- Collection name is now part of the URL path, following REST principles
- Resources are clearly identified by their URLs
- HTTP verbs (DELETE, POST) indicate the action
Improved API Clarity
- No ambiguity about which collection is being operated on
- Collection name is explicit in every request
- Easier to read API logs and debug issues
Better Error Handling
- 404 errors now correctly indicate "collection not found"
- URL validation happens at the routing level
- Clearer separation between route parameters and request body
Migration from Old Endpoints
The following old endpoints have been removed:
- ❌
POST /api/solr/reindex(replaced by/api/solr/collections/{name}/reindex) - ❌
POST /api/settings/solr/clear(replaced by/api/solr/collections/{name}/clear) - ❌
DELETE /api/solr/collection/delete(replaced byDELETE /api/solr/collections/{name})
Related Documentation
- Vector Search Backends - Complete vector backend guide
- Vectorization Architecture - How vectors are generated
- Solr Development Troubleshooting - Development troubleshooting guide
This comprehensive setup process ensures reliable, performant, and maintainable SOLR integration with proper tenant isolation and pre-configured system fields for optimal runtime performance.