Audit Trails
PDaaS provides comprehensive audit logging that automatically captures all API activity, security-sensitive operations, and business logic events. Audit trails enable compliance (GDPR, SOC2, HIPAA), security monitoring, and operational visibility.
Overview
The audit system captures:
- HTTP Requests/Responses: All API calls with full context
- Security Events: Authentication, authorization, policy changes
- Business Logic: Custom events from application code
- Operational Events: Background jobs, system operations
All audit events are:
- Multi-tenant Isolated: Separated by organization and account
- Immutable: Cannot be modified after creation
- Searchable: Indexed in OpenSearch for powerful queries
- Sanitized: Sensitive data automatically redacted
Automatic Request Auditing
How It Works
PDaaS automatically captures all HTTP requests and responses through the AuditMiddleware. This middleware:
- Captures request metadata (method, path, headers, body, IP)
- Measures request duration
- Captures response metadata (status, headers, body)
- Extracts tenant context (organization, account)
- Extracts actor information (user, service account)
- Emits audit event asynchronously (non-blocking)
What Gets Captured
Every HTTP request generates an audit event with:
Request Information:
- HTTP method (GET, POST, PUT, DELETE, etc.)
- Request path and query parameters
- Request headers (sanitized)
- Request body (sanitized, truncated if large)
- Client IP address
- User agent
Response Information:
- HTTP status code
- Response headers (sanitized)
- Response body (sanitized, truncated if large)
- Request duration (milliseconds)
Context Information:
- Actor (who performed the action)
- Organization and account (tenant context)
- Trace ID (for distributed tracing)
- Timestamp (ISO 8601 UTC)
Operation Result:
- Success (2xx status codes)
- Failure (4xx, 5xx status codes)
Path Exclusion
High-traffic, low-value endpoints are automatically excluded from auditing:
/health- Health check endpoint/healthz- Kubernetes health check/metrics- Prometheus metrics/docs- API documentation/redoc- ReDoc documentation/openapi.json- OpenAPI schema
Additional paths can be configured via environment variables.
Manual Event Emission
For business logic events that don't map to HTTP requests, use the emit() function:
Basic Usage
from fastapi import Request
from backend.audit import emit
@app.post("/policies")
async def create_policy(policy: PolicyCreate, request: Request):
# Create policy
policy_id = await policy_service.create(policy)
# Manually emit audit event
await emit(
action="policy.create",
target=f"policy:{policy_id}",
metadata={"policy_name": policy.name},
request=request # Auto-extracts tenant, actor, trace_id
)
return {"id": policy_id}
Background Jobs
For events from background jobs or system operations:
from backend.audit import emit
from backend.utils.actor import ActorInfo
async def cleanup_expired_sessions():
deleted_count = await session_service.cleanup_expired()
await emit(
action="session.cleanup",
target="system",
metadata={"deleted_count": deleted_count},
actor=ActorInfo(actor_type="system", actor_id="system"),
organization_id="system",
account_id="system"
)
Audit Event Model
Basic Audit Event
{
"occurred_at": "2025-09-30T12:34:56.789Z",
"actor_type": "user",
"actor_id": "user_123",
"organization_id": "org_abc",
"account_id": "acc_xyz",
"action": "user.login",
"target": "user:123",
"metadata": {"method": "password"},
"trace_id": "trace_456"
}
Enhanced HTTP Audit Event
{
// Base fields
"occurred_at": "2025-09-30T12:34:56.789Z",
"actor_type": "user",
"actor_id": "user_123",
"organization_id": "org_abc",
"account_id": "acc_xyz",
"action": "http.post",
"target": "/users",
"trace_id": "trace_789",
// Service context
"service": "api",
"service_version": "1.0.0",
"environment": "production",
// Request details
"request_method": "POST",
"request_path": "/users",
"request_query_params": {"invite": "true"},
"request_headers": {"content-type": "application/json"},
"request_body": "{\"email\": \"[email protected]\"}",
"request_client_ip": "192.168.1.1",
"request_user_agent": "Mozilla/5.0",
// Response details
"response_status_code": 201,
"response_headers": {"content-type": "application/json"},
"response_body": "{\"id\": \"user_456\"}",
"response_duration_ms": 45.6,
// Resource information
"resource_type": "user",
"resource_id": "user_456",
// Result
"operation_result": "success"
}
Data Sanitization
PDaaS automatically sanitizes sensitive data before persisting audit events.
Header Sanitization
These headers are automatically removed from audit events:
AuthorizationCookie/Set-CookieX-API-KeyX-Auth-TokenProxy-Authorization
Body Sanitization
These fields are automatically redacted in request/response bodies:
password,passwd,pwdtoken,access_token,refresh_tokensecret,client_secretapi_key,apikeycredit_card,card_number,cvvssn,social_securityprivate_key
Example:
// Original request body
{
"username": "alice",
"password": "secret123",
"email": "[email protected]"
}
// Sanitized in audit event
{
"username": "alice",
"password": "[REDACTED]",
"email": "[email protected]"
}
Body Truncation
Request and response bodies larger than 10KB are automatically truncated to prevent excessive storage usage. Truncated bodies include metadata indicating the original size.
Multi-Tenant Isolation
Audit events are stored in tenant-specific OpenSearch indices to ensure complete data isolation between organizations.
Index Naming Pattern
audit-{organization_id}-{account_id}-{service}-{date}
Examples:
- Daily rotation:
audit-org123-acc456-api-2025-09-30 - Weekly rotation:
audit-org123-acc456-api-2025-W39 - Monthly rotation:
audit-org123-acc456-api-2025-09
This ensures:
- Complete isolation between organizations
- Efficient queries (organization-scoped indices)
- Flexible retention policies (per-organization)
- Index lifecycle management (automatic deletion)
Searching Audit Events
OpenSearch Dashboards
Access your organization's audit logs through OpenSearch Dashboards:
- Navigate to OpenSearch Dashboards
- Select your organization's indices:
audit-{your_org_id}-* - Use Discover to search and filter events
- Create visualizations and dashboards
Common Queries
Find all actions by a user:
{
"query": {
"term": {"actor_id": "user_123"}
}
}
Find failed operations:
{
"query": {
"bool": {
"must": [
{"term": {"operation_result": "failure"}},
{"range": {"response_status_code": {"gte": 400}}}
]
}
}
}
Find all policy changes in last 24 hours:
{
"query": {
"bool": {
"must": [
{"prefix": {"action": "policy."}},
{"range": {"occurred_at": {"gte": "now-24h"}}}
]
}
}
}
Trace a specific request:
{
"query": {
"term": {"trace_id": "trace_abc123"}
}
}
Configuration
Audit behavior can be configured via environment variables:
Enable/Disable Auditing
AUDIT_ENABLED=true # Set to false to disable all auditing
Service Information
AUDIT_SERVICE_NAME=api
AUDIT_ENVIRONMENT=production
AUDIT_SERVICE_VERSION=1.0.0
OpenSearch Connection
AUDIT_OPENSEARCH_HOST=localhost
AUDIT_OPENSEARCH_PORT=9200
AUDIT_OPENSEARCH_USE_SSL=true
AUDIT_OPENSEARCH_USERNAME=admin
AUDIT_OPENSEARCH_PASSWORD=secret
Performance Tuning
AUDIT_BATCH_SIZE=100 # Events per batch write
AUDIT_FLUSH_INTERVAL_SECONDS=5.0 # Max time before flush
AUDIT_MAX_BODY_SIZE=10240 # 10KB max body size
AUDIT_MAX_QUEUE_SIZE=10000 # Max events in queue
Path Exclusion
# JSON array format
AUDIT_EXCLUDED_PATHS='["/health","/metrics","/docs","/internal/*"]'
Index Settings
AUDIT_INDEX_PREFIX=audit
AUDIT_INDEX_ROTATION=daily # daily, weekly, or monthly
Performance Impact
The audit middleware is designed for minimal performance impact:
Latency Overhead
- Target: < 5ms added latency (p95)
- Actual: < 2ms in production
- Mechanism: Async event emission (fire-and-forget)
Throughput
- Capacity: 10,000+ events/second per instance
- Batching: Groups events for efficient bulk writes
- Buffering: In-memory queue with configurable size
Resource Usage
- Memory: < 100MB for audit buffers
- CPU: < 5% overhead
- Network: Minimal (batched writes to OpenSearch)
Compliance Features
GDPR
- Right to Access: Search by user ID to retrieve all events
- Right to Erasure: Anonymization support (replace user ID with hash)
- Data Minimization: Configurable retention policies
- Privacy by Design: Automatic PII redaction
SOC2
- Complete Audit Trail: All data access and changes logged
- Immutability: Events cannot be modified after creation
- Access Control: OpenSearch role-based access
- Retention: Configurable per-organization
HIPAA
- PHI Access Logging: All protected health information access logged
- Audit Reports: Generate compliance reports from OpenSearch
- 6-Year Retention: Configurable retention policies
- Encryption: At rest (OpenSearch) and in transit (TLS)
Retention and Cleanup
Default Retention
- Default: 90 days
- Configurable: Per organization
- Automatic: Daily cleanup job
Manual Cleanup
# CLI tool for manual cleanup
python -m backend.audit.cleanup --organization-id org_123 --older-than 90
Lifecycle Management
OpenSearch Index Lifecycle Management (ILM) policies:
- Hot: Recent indices (0-30 days) - High performance
- Warm: Medium-aged indices (31-60 days) - Reduced replicas
- Cold: Old indices (61-90 days) - Compressed, snapshot
- Delete: Indices older than retention period
Best Practices
When to Use Manual Emit
Use manual emit() for:
- Business logic events (policy changes, grants, etc.)
- Background job results
- System operations
- Security-sensitive actions
Don't use manual emit for:
- HTTP requests (automatic via middleware)
- Health checks or metrics
- Internal debugging (use logging instead)
Action Naming Conventions
Use dot-notation for hierarchical actions:
user.login- User authenticationuser.create- User creationpolicy.create- Policy creationpolicy.update- Policy modificationgrant.attach- Grant attachmentsession.cleanup- Session cleanup
Target Format
Use resource type prefix:
user:{user_id}- User resourcespolicy:{policy_id}- Policy resourcesgrant:{grant_id}- Grant resourcesorganization:{org_id}- Organization resources
Metadata Guidelines
Include relevant context in metadata:
- Keep metadata small (< 1KB)
- Use structured data (JSON-serializable)
- Avoid sensitive information
- Include business-relevant details
Good metadata:
{
"policy_name": "admin-access",
"resource_count": 5,
"changes": ["added_action", "modified_condition"]
}
Bad metadata:
{
"password": "secret123", // Sensitive data
"huge_list": [...], // Too large
"debug_info": "..." // Not business-relevant
}
Troubleshooting
Audit Events Not Appearing
-
Check if auditing is enabled:
# Verify environment variable
echo $AUDIT_ENABLED # Should be "true" -
Check OpenSearch connectivity:
from backend.audit.client import OpenSearchClientFactory
client = await OpenSearchClientFactory.get_client(config)
is_healthy = await client.health_check() -
Check path exclusion:
from backend.audit.config import get_audit_config
config = get_audit_config()
is_excluded = config.is_path_excluded("/your/path")
High Memory Usage
Reduce batch size and queue size:
AUDIT_BATCH_SIZE=50 # Reduce from default 100
AUDIT_MAX_QUEUE_SIZE=5000 # Reduce from default 10000
Slow Performance
- Enable batching (should be default)
- Increase batch size for higher throughput
- Use appropriate index rotation (daily for high volume)
- Ensure OpenSearch cluster is properly sized
Missing Context (Unknown Organization)
Ensure tenant middleware runs before audit middleware:
# Correct order
app.add_middleware(TenantMiddleware) # First
app.add_middleware(AuditMiddleware) # After tenant
Security Considerations
Audit Log Tampering
Audit logs are protected against tampering:
- Write-Once: Indices configured for append-only
- No Updates/Deletes: OpenSearch security policies prevent modifications
- Backup: Daily snapshots to S3 for disaster recovery
- Integrity: Checksums verify log integrity
Access Control
Restrict access to audit logs:
- Authentication: Require username/password or IAM
- SSL/TLS: Encrypt connections to OpenSearch
- RBAC: Role-based access control in OpenSearch
- Isolation: Organization-specific indices prevent cross-tenant access
Sensitive Data Leakage
Multiple layers of protection:
- Automatic Sanitization: Headers and body fields redacted
- Size Limits: Large bodies truncated
- Code Review: Security review of sanitization rules
- Testing: Comprehensive tests verify sanitization
Support
For audit-related questions or issues:
- Review the API Reference in the documentation
- Check the Troubleshooting section in the documentation
- Contact your PDaaS administrator