Comprehensive Faceting System
This document describes the new Elasticsearch-inspired faceting system implemented for the OpenRegister application. The system provides powerful, flexible faceting capabilities that support both metadata and object field facets with enumerated values and range buckets.
Overview
The faceting system provides a modern, user-friendly approach to building faceted search interfaces. It supports:
- Disjunctive faceting - Facet options don't disappear when selected
- Multiple facet types - Terms, date histograms, and numeric ranges
- Metadata and object field facets - Both table columns and JSON data
- Facetable field discovery - Automatic detection of available faceting options
- Elasticsearch-style API - Familiar structure for developers
- Performance optimization - Efficient database queries with proper indexing
Key Features
1. Disjunctive Faceting
Each facet shows counts as if its own filter were not applied. This prevents facet options from disappearing when selected, providing a better user experience.
2. Multiple Facet Types
- Terms aggregation - For categorical data (status, priority, etc.)
- Date histogram - For time-based data with configurable intervals
- Range aggregation - For numeric data with custom buckets
3. Dual Data Sources
- Metadata facets - Based on ObjectEntity table columns (@self)
- Object field facets - Based on JSON object data
4. Enhanced Labels with Caching
Automatic resolution of register, schema, organisation IDs, and object UUIDs to human-readable names using an optimized caching mechanism. The system intelligently detects UUIDs in any facet and resolves them to object names (naam, name, title, etc.) using batch loading and multi-tier caching. Facet buckets are automatically sorted alphabetically by label for consistent, user-friendly display.
UUID Resolution Technical Implementation
The faceting system includes intelligent UUID-to-name resolution that works automatically:
Resolution Process:
- UUID Detection - Identifies bucket values containing hyphens (UUID format)
- Lazy Service Loading - ObjectCacheService loaded from container only when needed
- Batch Resolution - All UUIDs in facets resolved in a single database query
- Multi-Tier Caching - Checks in-memory cache → distributed cache → database
- Name Extraction - Searches common name fields (naam, name, title, contractNummer, achternaam)
- Alphabetical Sorting - Facets sorted by resolved labels (case-insensitive A-Z)
- Graceful Fallback - Uses UUID if name cannot be resolved
Example Transformation:
Before UUID resolution:
{
'value': '01c26b42-e047-4322-95ba-46d53a1696c0',
'count': 2,
'label': '01c26b42-e047-4322-95ba-46d53a1696c0'
}
After UUID resolution:
{
'value': '01c26b42-e047-4322-95ba-46d53a1696c0',
'count': 2,
'label': 'Component Name Here'
}
Performance Characteristics:
- Batch queries: All UUIDs resolved in one DB query (no N+1 problem)
- Cached: <10ms for cached names
- Uncached: <100ms for 100 UUIDs (batch DB query)
- Lazy loading: Service only loaded when facets contain UUIDs
Service Integration:
// Lazy-loading pattern to avoid circular dependencies
private function getObjectCacheService(): ?ObjectCacheService
{
if ($this->objectCacheServiceAttempted) {
return $this->objectCacheService;
}
$this->objectCacheServiceAttempted = true;
try {
$this->objectCacheService = \OC::$server->get(ObjectCacheService::class);
$this->logger->debug('ObjectCacheService loaded successfully');
} catch (\Exception $e) {
$this->logger->warning('ObjectCacheService not available for UUID resolution', [
'error' => $e->getMessage()
]);
return null;
}
return $this->objectCacheService;
}
This ensures the service is only loaded when facets containing UUIDs are encountered, avoiding performance overhead for regular facets.
5. Facetable Field Discovery
Automatic analysis of available fields and their characteristics to help frontends build dynamic facet interfaces.
Facetable Field Discovery
The system includes powerful discovery capabilities that analyze your data to determine which fields can be used for faceting and what types of facets are appropriate.
Discovery API
// Get facetable fields for a specific context
$facetableFields = $objectService->getFacetableFields($baseQuery, $sampleSize);
// Example with context filters
$baseQuery = [
'@self' => ['register' => 1],
'_search' => 'customer'
];
$facetableFields = $objectService->getFacetableFields($baseQuery, 100);
Discovery Response Structure
[
'@self' => [
'register' => [
'type' => 'categorical',
'description' => 'Register that contains the object',
'facet_types' => ['terms'],
'has_labels' => true,
'sample_values' => [
['value' => 1, 'label' => 'Publications Register', 'count' => 150],
['value' => 2, 'label' => 'Events Register', 'count' => 75]
]
],
'created' => [
'type' => 'date',
'description' => 'Date and time when the object was created',
'facet_types' => ['date_histogram', 'range'],
'intervals' => ['day', 'week', 'month', 'year'],
'has_labels' => false,
'date_range' => [
'min' => '2023-01-01 00:00:00',
'max' => '2024-12-31 23:59:59'
]
]
],
'object_fields' => [
'status' => [
'type' => 'string',
'description' => 'Object field: status',
'facet_types' => ['terms'],
'cardinality' => 'low', // ≤50 unique values
'sample_values' => ['published', 'draft', 'archived'],
'appearance_rate' => 85 // Count of objects containing this field
],
'priority' => [
'type' => 'integer',
'description' => 'Object field: priority',
'facet_types' => ['range', 'terms'],
'cardinality' => 'numeric', // Numeric field type
'sample_values' => ['1', '2', '3', '4', '5'],
'appearance_rate' => 72 // Count of objects containing this field
]
]
]
Field Properties Explained
Key Terms
appearance_rate: The actual count of objects (from the analyzed sample) that contain this field. For example, if 100 objects were analyzed and 85 contained the 'status' field, the appearance_rate would be 85. This is not a percentage but an absolute count.
cardinality: Indicates the uniqueness characteristics of field values:
'low'- String fields with ≤50 unique values (suitable for terms facets)'numeric'- Integer, float, or numeric string fields'binary'- Boolean fields (true/false values only)- Not set for date fields (they use intervals instead)
Field Types and Characteristics
Metadata Fields (@self)
Predefined fields from the ObjectEntity table:
- register - Categorical with labels from register table
- schema - Categorical with labels from schema table
- uuid - Identifier field (usually not suitable for faceting)
- owner - Categorical user field
- organisation - Categorical organisation field
- application - Categorical application field
- created/updated/published/depublished - Date fields with range support
Object Fields
Dynamically discovered from JSON object data:
- string - Text fields (low cardinality suitable for terms facets)
- integer/float - Numeric fields (suitable for range and terms facets)
- date - Date fields (suitable for date_histogram and range facets)
- boolean - Binary fields (suitable for terms facets)
Discovery Configuration
Field Analysis Parameters
- Sample Size - Number of objects to analyze (default: 100)
- Appearance Threshold - Minimum percentage of objects that must contain the field (default: 10%)
- Cardinality Threshold - Maximum unique values for terms facets (default: 50)
- Recursion Depth - Maximum nesting level to analyze (default: 2)
Field Filtering
The discovery system automatically filters out:
- System fields (starting with @ or _)
- Nested objects and arrays of objects
- High cardinality string fields (more than 50 unique values)
- Fields appearing in less than 10% of objects
- Fields with inconsistent types (less than 70% type consistency)
API Integration
Discovery Parameter
Add _facetable=true to any search endpoint to include facetable field information:
GET /api/objects?_facetable=true&limit=0
Response includes additional facetable property:
[
'results' => [],
'total' => 0,
'facetable' => [
'@self' => [...],
'object_fields' => [...]
]
]
Dynamic Facet Configuration
Use discovery results to build facet configurations:
// Frontend example: Build facet config from discovery
const buildFacetConfig = (facetableFields) => {
const config = { _facets: { '@self': {} } };
// Add metadata facets
Object.entries(facetableFields['@self']).forEach(([field, info]) => {
if (info.facet_types.includes('terms')) {
config._facets['@self'][field] = { type: 'terms' };
} else if (info.facet_types.includes('date_histogram')) {
config._facets['@self'][field] = {
type: 'date_histogram',
interval: 'month'
};
}
});
// Add object field facets
Object.entries(facetableFields.object_fields).forEach(([field, info]) => {
if (info.facet_types.includes('terms')) {
config._facets[field] = { type: 'terms' };
} else if (info.facet_types.includes('range')) {
config._facets[field] = {
type: 'range',
ranges: generateRanges(info.sample_values)
};
}
});
return config;
};
API Structure
Basic Query Structure
$query = [
// Search filters (same as searchObjects)
'@self' => [
'register' => 1,
'schema' => 2
],
'status' => 'active',
'_search' => 'customer',
// Facet configuration
'_facets' => [
// Metadata facets
'@self' => [
'register' => ['type' => 'terms'],
'schema' => ['type' => 'terms'],
'created' => [
'type' => 'date_histogram',
'interval' => 'month'
]
],
// Object field facets
'status' => ['type' => 'terms'],
'priority' => ['type' => 'terms'],
'price' => [
'type' => 'range',
'ranges' => [
['to' => 100],
['from' => 100, 'to' => 500],
['from' => 500]
]
]
]
];
$facets = $objectService->getFacetsForObjects($query);
Response Structure
[
'facets' => [
'@self' => [
'register' => [
'type' => 'terms',
'buckets' => [
['key' => 1, 'doc_count' => 150, 'label' => 'Register Name'],
['key' => 2, 'doc_count' => 75, 'label' => 'Other Register']
]
],
'created' => [
'type' => 'date_histogram',
'interval' => 'month',
'buckets' => [
['key' => '2024-01', 'doc_count' => 45],
['key' => '2024-02', 'doc_count' => 67]
]
]
],
'status' => [
'type' => 'terms',
'buckets' => [
['key' => 'active', 'doc_count' => 134],
['key' => 'inactive', 'doc_count' => 45]
]
],
'price' => [
'type' => 'range',
'buckets' => [
['key' => '0-100', 'from' => 0, 'to' => 100, 'doc_count' => 120],
['key' => '100-500', 'from' => 100, 'to' => 500, 'doc_count' => 80],
['key' => '500+', 'from' => 500, 'doc_count' => 15]
]
]
]
]
Facet Types
1. Terms Aggregation
For categorical data like status, priority, category, etc.
'_facets' => [
'status' => ['type' => 'terms'],
'priority' => ['type' => 'terms'],
'@self' => [
'register' => ['type' => 'terms'],
'schema' => ['type' => 'terms']
]
]
Response:
'status' => [
'type' => 'terms',
'buckets' => [
['key' => 'active', 'doc_count' => 134],
['key' => 'pending', 'doc_count' => 45],
['key' => 'inactive', 'doc_count' => 23]
]
]
2. Date Histogram
For time-based data with configurable intervals.
Supported intervals: day, week, month, year
'_facets' => [
'event_date' => [
'type' => 'date_histogram',
'interval' => 'month'
],
'@self' => [
'created' => [
'type' => 'date_histogram',
'interval' => 'week'
]
]
]
Response:
'event_date' => [
'type' => 'date_histogram',
'interval' => 'month',
'buckets' => [
['key' => '2024-01', 'doc_count' => 45],
['key' => '2024-02', 'doc_count' => 67],
['key' => '2024-03', 'doc_count' => 52]
]
]
3. Range Aggregation
For numeric data with custom buckets.
'_facets' => [
'price' => [
'type' => 'range',
'ranges' => [
['to' => 100], // 0-100
['from' => 100, 'to' => 500], // 100-500
['from' => 500, 'to' => 1000], // 500-1000
['from' => 1000] // 1000+
]
]
]
Response:
'price' => [
'type' => 'range',
'buckets' => [
['key' => '0-100', 'to' => 100, 'doc_count' => 120],
['key' => '100-500', 'from' => 100, 'to' => 500, 'doc_count' => 80],
['key' => '500-1000', 'from' => 500, 'to' => 1000, 'doc_count' => 35],
['key' => '1000+', 'from' => 1000, 'doc_count' => 15]
]
]
Usage Examples
Basic Enumerated Facets
$query = [
'@self' => ['register' => 1],
'status' => 'active',
'_facets' => [
'@self' => [
'register' => ['type' => 'terms'],
'schema' => ['type' => 'terms']
],
'status' => ['type' => 'terms'],
'priority' => ['type' => 'terms']
]
];
$facets = $objectService->getFacetsForObjects($query);
// Use facet data for UI checkboxes
foreach ($facets['facets']['status']['buckets'] as $bucket) {
$selected = ($bucket['key'] === 'active') ? 'checked' : '';
echo "<input type='checkbox' {$selected}> {$bucket['key']} ({$bucket['doc_count']})\n";
}
Date Timeline Facets
$query = [
'_facets' => [
'@self' => [
'created' => [
'type' => 'date_histogram',
'interval' => 'month'
]
]
]
];
$facets = $objectService->getFacetsForObjects($query);
// Use for timeline visualization
foreach ($facets['facets']['@self']['created']['buckets'] as $bucket) {
echo "{$bucket['key']}: {$bucket['doc_count']} objects\n";
}
Price Range Facets
$query = [
'_facets' => [
'price' => [
'type' => 'range',
'ranges' => [
['to' => 100],
['from' => 100, 'to' => 500],
['from' => 500]
]
]
]
];
$facets = $objectService->getFacetsForObjects($query);
// Use for price filter UI
foreach ($facets['facets']['price']['buckets'] as $bucket) {
$from = $bucket['from'] ?? 0;
$to = $bucket['to'] ?? '∞';
echo "€{$from} - €{$to}: {$bucket['doc_count']} items\n";
}
Integration with Search
Combined Search and Facets
$query = [
// Search filters
'@self' => [
'register' => [1, 2, 3],
'organisation' => 'IS NOT NULL'
],
'status' => ['active', 'pending'],
'_search' => 'important customer',
'_published' => true,
// Pagination
'_limit' => 25,
'_page' => 1,
// Facet configuration
'_facets' => [
'@self' => [
'register' => ['type' => 'terms'],
'schema' => ['type' => 'terms']
],
'status' => ['type' => 'terms'],
'priority' => ['type' => 'terms']
]
];
// Get complete paginated results with facets
$result = $objectService->searchObjectsPaginated($query);
// Result contains:
// - results: Array of objects
// - total: Total count
// - page/pages: Pagination info
// - facets: Facet data