Audit Trails

PDaaS provides comprehensive audit logging that automatically captures all API activity, security-sensitive operations, and business logic events. Audit trails enable compliance (GDPR, SOC2, HIPAA), security monitoring, and operational visibility.

Overview

The audit system captures:

HTTP Requests/Responses: All API calls with full context
Security Events: Authentication, authorization, policy changes
Business Logic: Custom events from application code
Operational Events: Background jobs, system operations

All audit events are:

Multi-tenant Isolated: Separated by organization and account
Immutable: Cannot be modified after creation
Searchable: Indexed in OpenSearch for powerful queries
Sanitized: Sensitive data automatically redacted

Automatic Request Auditing

How It Works

PDaaS automatically captures all HTTP requests and responses through the AuditMiddleware. This middleware:

Captures request metadata (method, path, headers, body, IP)
Measures request duration
Captures response metadata (status, headers, body)
Extracts tenant context (organization, account)
Extracts actor information (user, service account)
Emits audit event asynchronously (non-blocking)

What Gets Captured

Every HTTP request generates an audit event with:

Request Information:

HTTP method (GET, POST, PUT, DELETE, etc.)
Request path and query parameters
Request headers (sanitized)
Request body (sanitized, truncated if large)
Client IP address
User agent

Response Information:

HTTP status code
Response headers (sanitized)
Response body (sanitized, truncated if large)
Request duration (milliseconds)

Context Information:

Actor (who performed the action)
Organization and account (tenant context)
Trace ID (for distributed tracing)
Timestamp (ISO 8601 UTC)

Operation Result:

Success (2xx status codes)
Failure (4xx, 5xx status codes)

Path Exclusion

High-traffic, low-value endpoints are automatically excluded from auditing:

/health - Health check endpoint
/healthz - Kubernetes health check
/metrics - Prometheus metrics
/docs - API documentation
/redoc - ReDoc documentation
/openapi.json - OpenAPI schema

Additional paths can be configured via environment variables.

Manual Event Emission

For business logic events that don't map to HTTP requests, use the emit() function:

Basic Usage

from fastapi import Request
from backend.audit import emit

@app.post("/policies")
async def create_policy(policy: PolicyCreate, request: Request):
    # Create policy
    policy_id = await policy_service.create(policy)

    # Manually emit audit event
    await emit(
        action="policy.create",
        target=f"policy:{policy_id}",
        metadata={"policy_name": policy.name},
        request=request  # Auto-extracts tenant, actor, trace_id
    )

    return {"id": policy_id}

Background Jobs

For events from background jobs or system operations:

from backend.audit import emit
from backend.utils.actor import ActorInfo

async def cleanup_expired_sessions():
    deleted_count = await session_service.cleanup_expired()

    await emit(
        action="session.cleanup",
        target="system",
        metadata={"deleted_count": deleted_count},
        actor=ActorInfo(actor_type="system", actor_id="system"),
        organization_id="system",
        account_id="system"
    )

Audit Event Model

Basic Audit Event

{
  "occurred_at": "2025-09-30T12:34:56.789Z",
  "actor_type": "user",
  "actor_id": "user_123",
  "organization_id": "org_abc",
  "account_id": "acc_xyz",
  "action": "user.login",
  "target": "user:123",
  "metadata": {"method": "password"},
  "trace_id": "trace_456"
}

Enhanced HTTP Audit Event

{
  // Base fields
  "occurred_at": "2025-09-30T12:34:56.789Z",
  "actor_type": "user",
  "actor_id": "user_123",
  "organization_id": "org_abc",
  "account_id": "acc_xyz",
  "action": "http.post",
  "target": "/users",
  "trace_id": "trace_789",

  // Service context
  "service": "api",
  "service_version": "1.0.0",
  "environment": "production",

  // Request details
  "request_method": "POST",
  "request_path": "/users",
  "request_query_params": {"invite": "true"},
  "request_headers": {"content-type": "application/json"},
  "request_body": "{\"email\": \"[email protected]\"}",
  "request_client_ip": "192.168.1.1",
  "request_user_agent": "Mozilla/5.0",

  // Response details
  "response_status_code": 201,
  "response_headers": {"content-type": "application/json"},
  "response_body": "{\"id\": \"user_456\"}",
  "response_duration_ms": 45.6,

  // Resource information
  "resource_type": "user",
  "resource_id": "user_456",

  // Result
  "operation_result": "success"
}

Data Sanitization

PDaaS automatically sanitizes sensitive data before persisting audit events.

Header Sanitization

These headers are automatically removed from audit events:

Authorization
Cookie / Set-Cookie
X-API-Key
X-Auth-Token
Proxy-Authorization

Body Sanitization

These fields are automatically redacted in request/response bodies:

password, passwd, pwd
token, access_token, refresh_token
secret, client_secret
api_key, apikey
credit_card, card_number, cvv
ssn, social_security
private_key

Example:

// Original request body
{
  "username": "alice",
  "password": "secret123",
  "email": "[email protected]"
}

// Sanitized in audit event
{
  "username": "alice",
  "password": "[REDACTED]",
  "email": "[email protected]"
}

Body Truncation

Request and response bodies larger than 10KB are automatically truncated to prevent excessive storage usage. Truncated bodies include metadata indicating the original size.

Multi-Tenant Isolation

Audit events are stored in tenant-specific OpenSearch indices to ensure complete data isolation between organizations.

Index Naming Pattern

audit-{organization_id}-{account_id}-{service}-{date}

Examples:

Daily rotation: audit-org123-acc456-api-2025-09-30
Weekly rotation: audit-org123-acc456-api-2025-W39
Monthly rotation: audit-org123-acc456-api-2025-09

This ensures:

Complete isolation between organizations
Efficient queries (organization-scoped indices)
Flexible retention policies (per-organization)
Index lifecycle management (automatic deletion)

Searching Audit Events

OpenSearch Dashboards

Access your organization's audit logs through OpenSearch Dashboards:

Navigate to OpenSearch Dashboards
Select your organization's indices: audit-{your_org_id}-*
Use Discover to search and filter events
Create visualizations and dashboards

Common Queries

Find all actions by a user:

{
  "query": {
    "term": {"actor_id": "user_123"}
  }
}

Find failed operations:

{
  "query": {
    "bool": {
      "must": [
        {"term": {"operation_result": "failure"}},
        {"range": {"response_status_code": {"gte": 400}}}
      ]
    }
  }
}

Find all policy changes in last 24 hours:

{
  "query": {
    "bool": {
      "must": [
        {"prefix": {"action": "policy."}},
        {"range": {"occurred_at": {"gte": "now-24h"}}}
      ]
    }
  }
}

Trace a specific request:

{
  "query": {
    "term": {"trace_id": "trace_abc123"}
  }
}

Configuration

Audit behavior can be configured via environment variables:

Enable/Disable Auditing

AUDIT_ENABLED=true  # Set to false to disable all auditing

Service Information

AUDIT_SERVICE_NAME=api
AUDIT_ENVIRONMENT=production
AUDIT_SERVICE_VERSION=1.0.0

OpenSearch Connection

AUDIT_OPENSEARCH_HOST=localhost
AUDIT_OPENSEARCH_PORT=9200
AUDIT_OPENSEARCH_USE_SSL=true
AUDIT_OPENSEARCH_USERNAME=admin
AUDIT_OPENSEARCH_PASSWORD=secret

Performance Tuning

AUDIT_BATCH_SIZE=100                    # Events per batch write
AUDIT_FLUSH_INTERVAL_SECONDS=5.0        # Max time before flush
AUDIT_MAX_BODY_SIZE=10240               # 10KB max body size
AUDIT_MAX_QUEUE_SIZE=10000              # Max events in queue

Path Exclusion

# JSON array format
AUDIT_EXCLUDED_PATHS='["/health","/metrics","/docs","/internal/*"]'

Index Settings

AUDIT_INDEX_PREFIX=audit
AUDIT_INDEX_ROTATION=daily  # daily, weekly, or monthly

Performance Impact

The audit middleware is designed for minimal performance impact:

Latency Overhead

Target: < 5ms added latency (p95)
Actual: < 2ms in production
Mechanism: Async event emission (fire-and-forget)

Throughput

Capacity: 10,000+ events/second per instance
Batching: Groups events for efficient bulk writes
Buffering: In-memory queue with configurable size

Resource Usage

Memory: < 100MB for audit buffers
CPU: < 5% overhead
Network: Minimal (batched writes to OpenSearch)

Compliance Features

Right to Access: Search by user ID to retrieve all events
Right to Erasure: Anonymization support (replace user ID with hash)
Data Minimization: Configurable retention policies
Privacy by Design: Automatic PII redaction

SOC2

Complete Audit Trail: All data access and changes logged
Immutability: Events cannot be modified after creation
Access Control: OpenSearch role-based access
Retention: Configurable per-organization

HIPAA

PHI Access Logging: All protected health information access logged
Audit Reports: Generate compliance reports from OpenSearch
6-Year Retention: Configurable retention policies
Encryption: At rest (OpenSearch) and in transit (TLS)

Retention and Cleanup

Default Retention

Default: 90 days
Configurable: Per organization
Automatic: Daily cleanup job

Manual Cleanup

# CLI tool for manual cleanup
python -m backend.audit.cleanup --organization-id org_123 --older-than 90

Lifecycle Management

OpenSearch Index Lifecycle Management (ILM) policies:

Hot: Recent indices (0-30 days) - High performance
Warm: Medium-aged indices (31-60 days) - Reduced replicas
Cold: Old indices (61-90 days) - Compressed, snapshot
Delete: Indices older than retention period

Best Practices

When to Use Manual Emit

Use manual emit() for:

Business logic events (policy changes, grants, etc.)
Background job results
System operations
Security-sensitive actions

Don't use manual emit for:

HTTP requests (automatic via middleware)
Health checks or metrics
Internal debugging (use logging instead)

Action Naming Conventions

Use dot-notation for hierarchical actions:

user.login - User authentication
user.create - User creation
policy.create - Policy creation
policy.update - Policy modification
grant.attach - Grant attachment
session.cleanup - Session cleanup

Target Format

Use resource type prefix:

user:{user_id} - User resources
policy:{policy_id} - Policy resources
grant:{grant_id} - Grant resources
organization:{org_id} - Organization resources

Metadata Guidelines

Include relevant context in metadata:

Keep metadata small (< 1KB)
Use structured data (JSON-serializable)
Avoid sensitive information
Include business-relevant details

Good metadata:

{
  "policy_name": "admin-access",
  "resource_count": 5,
  "changes": ["added_action", "modified_condition"]
}

Bad metadata:

{
  "password": "secret123",  // Sensitive data
  "huge_list": [...],        // Too large
  "debug_info": "..."        // Not business-relevant
}

Troubleshooting

Audit Events Not Appearing

Check if auditing is enabled:

# Verify environment variable
echo $AUDIT_ENABLED  # Should be "true"

Check OpenSearch connectivity:

from backend.audit.client import OpenSearchClientFactory
client = await OpenSearchClientFactory.get_client(config)
is_healthy = await client.health_check()

Check path exclusion:

from backend.audit.config import get_audit_config
config = get_audit_config()
is_excluded = config.is_path_excluded("/your/path")

High Memory Usage

Reduce batch size and queue size:

AUDIT_BATCH_SIZE=50          # Reduce from default 100
AUDIT_MAX_QUEUE_SIZE=5000    # Reduce from default 10000

Slow Performance

Enable batching (should be default)
Increase batch size for higher throughput
Use appropriate index rotation (daily for high volume)
Ensure OpenSearch cluster is properly sized

Missing Context (Unknown Organization)

Ensure tenant middleware runs before audit middleware:

# Correct order
app.add_middleware(TenantMiddleware)  # First
app.add_middleware(AuditMiddleware)   # After tenant

Security Considerations

Audit Log Tampering

Audit logs are protected against tampering:

Write-Once: Indices configured for append-only
No Updates/Deletes: OpenSearch security policies prevent modifications
Backup: Daily snapshots to S3 for disaster recovery
Integrity: Checksums verify log integrity

Access Control

Restrict access to audit logs:

Authentication: Require username/password or IAM
SSL/TLS: Encrypt connections to OpenSearch
RBAC: Role-based access control in OpenSearch
Isolation: Organization-specific indices prevent cross-tenant access

Sensitive Data Leakage

Multiple layers of protection:

Automatic Sanitization: Headers and body fields redacted
Size Limits: Large bodies truncated
Code Review: Security review of sanitization rules
Testing: Comprehensive tests verify sanitization

Support

For audit-related questions or issues:

Review the API Reference in the documentation
Check the Troubleshooting section in the documentation
Contact your PDaaS administrator

Overview​

Automatic Request Auditing​

How It Works​

What Gets Captured​

Path Exclusion​

Manual Event Emission​

Basic Usage​

Background Jobs​

Audit Event Model​

Basic Audit Event​

Enhanced HTTP Audit Event​

Data Sanitization​

Header Sanitization​

Body Sanitization​

Body Truncation​

Multi-Tenant Isolation​

Index Naming Pattern​

Searching Audit Events​

OpenSearch Dashboards​

Common Queries​

Configuration​

Enable/Disable Auditing​

Service Information​

OpenSearch Connection​

Performance Tuning​

Path Exclusion​

Index Settings​

Performance Impact​

Latency Overhead​

Throughput​

Resource Usage​

Compliance Features​

GDPR​

SOC2​

HIPAA​

Retention and Cleanup​

Default Retention​

Manual Cleanup​

Lifecycle Management​

Best Practices​

When to Use Manual Emit​

Action Naming Conventions​

Target Format​

Metadata Guidelines​

Troubleshooting​

Audit Events Not Appearing​

High Memory Usage​

Slow Performance​

Missing Context (Unknown Organization)​

Security Considerations​

Audit Log Tampering​

Access Control​

Sensitive Data Leakage​

Support​

Overview

Automatic Request Auditing

How It Works

What Gets Captured

Path Exclusion

Manual Event Emission

Basic Usage

Background Jobs

Audit Event Model

Basic Audit Event

Enhanced HTTP Audit Event

Data Sanitization

Header Sanitization

Body Sanitization

Body Truncation

Multi-Tenant Isolation

Index Naming Pattern

Searching Audit Events

OpenSearch Dashboards

Common Queries

Configuration

Enable/Disable Auditing

Service Information

OpenSearch Connection

Performance Tuning

Path Exclusion

Index Settings

Performance Impact

Latency Overhead

Throughput

Resource Usage

Compliance Features

GDPR

SOC2

HIPAA

Retention and Cleanup

Default Retention

Manual Cleanup

Lifecycle Management

Best Practices

When to Use Manual Emit

Action Naming Conventions

Target Format

Metadata Guidelines

Troubleshooting

Audit Events Not Appearing

High Memory Usage

Slow Performance

Missing Context (Unknown Organization)

Security Considerations

Audit Log Tampering

Access Control

Sensitive Data Leakage

Support