Telemetry Configuration Guide¶
Comprehensive guide for configuring AgentiCraft's telemetry system for different environments and use cases.
Configuration Methods¶
AgentiCraft telemetry can be configured through multiple methods, with the following precedence:
- Direct Python configuration (highest priority)
- Environment variables
- Configuration files (
.env
,config.yaml
) - Default values (lowest priority)
Basic Configuration¶
Python Configuration¶
from agenticraft.telemetry import TelemetryConfig
# Minimal configuration
telemetry = TelemetryConfig(enabled=True)
telemetry.initialize()
# Full configuration
telemetry = TelemetryConfig(
enabled=True,
exporter_type="otlp",
service_name="my-agent-service",
service_version="1.0.0",
deployment_environment="production",
otlp_endpoint="localhost:4317",
otlp_headers={"api-key": "your-key"},
sample_rate=0.1,
auto_instrument=True,
batch_size=1024,
export_interval_ms=5000,
max_queue_size=2048,
resource_attributes={
"service.namespace": "ai-agents",
"cloud.provider": "aws",
"cloud.region": "us-east-1"
}
)
telemetry.initialize()
Environment Variables¶
# Basic settings
export AGENTICRAFT_TELEMETRY_ENABLED=true
export AGENTICRAFT_EXPORTER_TYPE=otlp
export AGENTICRAFT_SERVICE_NAME=my-agent-service
export AGENTICRAFT_SERVICE_VERSION=1.0.0
export AGENTICRAFT_DEPLOYMENT_ENVIRONMENT=production
# OTLP settings
export AGENTICRAFT_OTLP_ENDPOINT=localhost:4317
export AGENTICRAFT_OTLP_HEADERS='{"api-key": "your-key"}'
export AGENTICRAFT_OTLP_COMPRESSION=gzip
export AGENTICRAFT_OTLP_TIMEOUT=10000
export AGENTICRAFT_OTLP_PROTOCOL=grpc
# Sampling
export AGENTICRAFT_SAMPLE_RATE=0.1
export AGENTICRAFT_SAMPLE_PARENT_BASED=true
# Performance
export AGENTICRAFT_BATCH_SIZE=1024
export AGENTICRAFT_EXPORT_INTERVAL_MS=5000
export AGENTICRAFT_MAX_QUEUE_SIZE=2048
export AGENTICRAFT_MAX_EXPORT_ATTEMPTS=5
# Prometheus
export AGENTICRAFT_PROMETHEUS_PORT=8000
export AGENTICRAFT_PROMETHEUS_HOST=0.0.0.0
# Console exporter
export AGENTICRAFT_CONSOLE_PRETTY_PRINT=true
export AGENTICRAFT_CONSOLE_COLORS=true
# Debug
export AGENTICRAFT_TELEMETRY_DEBUG=true
export OTEL_LOG_LEVEL=debug
Configuration File (.env)¶
# .env file
AGENTICRAFT_TELEMETRY_ENABLED=true
AGENTICRAFT_EXPORTER_TYPE=otlp
AGENTICRAFT_SERVICE_NAME=agent-production
AGENTICRAFT_OTLP_ENDPOINT=telemetry.company.com:4317
AGENTICRAFT_SAMPLE_RATE=0.1
Configuration File (config.yaml)¶
# config.yaml
telemetry:
enabled: true
exporter_type: otlp
service:
name: my-agent-service
version: 1.0.0
environment: production
otlp:
endpoint: localhost:4317
headers:
api-key: your-key
compression: gzip
timeout: 10000
sampling:
rate: 0.1
parent_based: true
performance:
batch_size: 1024
export_interval_ms: 5000
max_queue_size: 2048
resource_attributes:
service.namespace: ai-agents
cloud.provider: aws
cloud.region: us-east-1
Environment-Specific Configurations¶
Development Environment¶
# Development configuration with console output
telemetry = TelemetryConfig(
enabled=True,
exporter_type="console",
console_pretty_print=True,
console_colors=True,
debug=True,
sample_rate=1.0, # Sample everything in dev
auto_instrument=True
)
Testing Environment¶
# Testing configuration - minimal overhead
telemetry = TelemetryConfig(
enabled=False, # Usually disabled in tests
# Or use in-memory exporter for testing
exporter_type="memory",
sample_rate=1.0
)
Staging Environment¶
# Staging configuration - similar to production
telemetry = TelemetryConfig(
enabled=True,
exporter_type="otlp",
service_name="agent-staging",
deployment_environment="staging",
otlp_endpoint="staging-telemetry.company.com:4317",
sample_rate=0.5, # Higher sampling than production
batch_size=512,
export_interval_ms=3000
)
Production Environment¶
# Production configuration - optimized for performance
telemetry = TelemetryConfig(
enabled=True,
exporter_type="otlp",
service_name="agent-production",
service_version=os.getenv("APP_VERSION", "1.0.0"),
deployment_environment="production",
otlp_endpoint="telemetry.company.com:4317",
otlp_headers={
"api-key": os.getenv("TELEMETRY_API_KEY"),
"x-service-auth": os.getenv("SERVICE_AUTH_TOKEN")
},
sample_rate=0.1, # Sample 10% in production
batch_size=2048,
export_interval_ms=10000,
max_queue_size=4096,
max_export_attempts=3,
resource_attributes={
"service.namespace": "ai-platform",
"cloud.provider": "aws",
"cloud.region": os.getenv("AWS_REGION", "us-east-1"),
"deployment.version": os.getenv("DEPLOYMENT_VERSION"),
"k8s.pod.name": os.getenv("HOSTNAME"),
"k8s.namespace": os.getenv("K8S_NAMESPACE")
}
)
Exporter Configurations¶
Console Exporter¶
Best for development and debugging.
telemetry = TelemetryConfig(
enabled=True,
exporter_type="console",
console_pretty_print=True, # Human-readable format
console_colors=True, # Colorized output
console_show_hidden=False, # Show internal spans
console_max_width=120, # Maximum line width
console_indent_size=2, # Indentation spaces
console_show_timestamps=True, # Include timestamps
console_show_duration=True # Show span duration
)
Output format:
[2025-06-14 10:23:45.123] agent.execute (1.234s)
├─ agent.name: ResearchAgent
├─ agent.operation: execute
├─ input.length: 45
├─ [10:23:45.200] tool.execute (0.800s)
│ ├─ tool.name: web_search
│ └─ results.count: 10
└─ output.length: 1250
OTLP Exporter¶
For production use with OpenTelemetry collectors.
telemetry = TelemetryConfig(
enabled=True,
exporter_type="otlp",
# Endpoint configuration
otlp_endpoint="localhost:4317", # gRPC endpoint
# or for HTTP:
# otlp_endpoint="http://localhost:4318/v1/traces",
# Authentication
otlp_headers={
"api-key": "your-api-key",
"x-service-name": "agent-service"
},
# Protocol settings
otlp_protocol="grpc", # or "http/protobuf"
otlp_compression="gzip", # or "none", "deflate"
otlp_timeout=10000, # milliseconds
# Retry configuration
otlp_retry_enabled=True,
otlp_retry_max_attempts=5,
otlp_retry_initial_interval=1000, # ms
otlp_retry_max_interval=32000, # ms
otlp_retry_max_elapsed_time=120000, # ms
# TLS configuration
otlp_insecure=False, # Use TLS
otlp_certificate_path="/path/to/cert.pem",
otlp_client_key_path="/path/to/key.pem",
otlp_client_certificate_path="/path/to/client-cert.pem"
)
Prometheus Exporter¶
For metrics scraping by Prometheus.
telemetry = TelemetryConfig(
enabled=True,
exporter_type="prometheus",
# Server configuration
prometheus_port=8000,
prometheus_host="0.0.0.0", # Bind to all interfaces
prometheus_path="/metrics", # Metrics endpoint path
# Metric configuration
prometheus_namespace="agenticraft",
prometheus_subsystem="agents",
# Histogram buckets
prometheus_histogram_buckets={
"latency": [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
"token_count": [10, 50, 100, 500, 1000, 5000, 10000]
},
# Labels to include
prometheus_default_labels={
"environment": "production",
"region": "us-east-1"
}
)
Prometheus configuration:
# prometheus.yml
scrape_configs:
- job_name: 'agenticraft'
static_configs:
- targets: ['localhost:8000']
scrape_interval: 15s
metrics_path: '/metrics'
Multiple Exporters¶
Export to multiple destinations simultaneously.
from agenticraft.telemetry import TelemetryConfig, CompositeExporter
telemetry = TelemetryConfig(
enabled=True,
exporter_type="composite",
exporters=[
{
"type": "console",
"config": {"pretty_print": True}
},
{
"type": "otlp",
"config": {
"endpoint": "localhost:4317",
"headers": {"api-key": "key1"}
}
},
{
"type": "prometheus",
"config": {"port": 8000}
}
]
)
Sampling Configuration¶
Basic Sampling¶
# Fixed rate sampling - 10% of traces
telemetry = TelemetryConfig(
sample_rate=0.1
)
# Always sample
telemetry = TelemetryConfig(
sample_rate=1.0
)
# Never sample (metrics still collected)
telemetry = TelemetryConfig(
sample_rate=0.0
)
Advanced Sampling¶
from agenticraft.telemetry import (
TelemetryConfig,
CompositeSampler,
RateLimitingSampler,
AttributeBasedSampler
)
# Rate limiting sampler - max 100 traces per second
rate_limiter = RateLimitingSampler(max_traces_per_second=100)
# Attribute-based sampling
attribute_sampler = AttributeBasedSampler(
rules=[
# Always sample errors
{"attribute": "error", "value": True, "sample_rate": 1.0},
# Sample 50% of high-priority operations
{"attribute": "priority", "value": "high", "sample_rate": 0.5},
# Sample 1% of low-priority operations
{"attribute": "priority", "value": "low", "sample_rate": 0.01},
# Default sampling rate
{"default": True, "sample_rate": 0.1}
]
)
# Composite sampler
telemetry = TelemetryConfig(
sampler=CompositeSampler([rate_limiter, attribute_sampler])
)
Parent-Based Sampling¶
# Honor parent sampling decision
telemetry = TelemetryConfig(
sample_parent_based=True,
sample_rate=0.1 # For root spans
)
Performance Tuning¶
Batch Processing¶
telemetry = TelemetryConfig(
# Batch configuration
batch_size=2048, # Spans per batch
export_interval_ms=10000, # Export every 10 seconds
max_queue_size=8192, # Maximum queued spans
# Export behavior
max_export_attempts=3, # Retry failed exports
export_timeout_ms=30000, # Export timeout
# Memory limits
max_span_attributes=128, # Max attributes per span
max_span_events=128, # Max events per span
max_span_links=128, # Max links per span
max_attribute_length=4096 # Max attribute value length
)
Resource Optimization¶
# Minimal resource usage
telemetry = TelemetryConfig(
# Reduce memory usage
batch_size=256,
max_queue_size=1024,
export_interval_ms=30000, # Export less frequently
# Limit span data
max_span_attributes=32,
max_attribute_length=1024,
# Aggressive sampling
sample_rate=0.01, # 1% sampling
# Disable features
auto_instrument_tools=False,
auto_instrument_memory=False,
record_token_usage=False
)
Security Configuration¶
API Key Management¶
# Using environment variables (recommended)
telemetry = TelemetryConfig(
otlp_headers={
"api-key": os.getenv("TELEMETRY_API_KEY"),
"x-api-secret": os.getenv("TELEMETRY_API_SECRET")
}
)
# Using key vault
from azure.keyvault.secrets import SecretClient
client = SecretClient(vault_url, credential)
api_key = client.get_secret("telemetry-api-key").value
telemetry = TelemetryConfig(
otlp_headers={"api-key": api_key}
)
Data Privacy¶
from agenticraft.telemetry import TelemetryConfig, AttributeFilter
# Filter sensitive attributes
telemetry = TelemetryConfig(
attribute_filters=[
# Remove PII
AttributeFilter(
action="remove",
attributes=["user.email", "user.id", "user.name"]
),
# Hash sensitive values
AttributeFilter(
action="hash",
attributes=["session.id", "request.ip"]
),
# Redact patterns
AttributeFilter(
action="redact",
pattern=r"\b\d{3}-\d{2}-\d{4}\b", # SSN pattern
replacement="***-**-****"
)
]
)
Kubernetes Configuration¶
ConfigMap¶
# telemetry-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: agenticraft-telemetry
data:
AGENTICRAFT_TELEMETRY_ENABLED: "true"
AGENTICRAFT_EXPORTER_TYPE: "otlp"
AGENTICRAFT_OTLP_ENDPOINT: "otel-collector.observability:4317"
AGENTICRAFT_SERVICE_NAME: "agent-service"
AGENTICRAFT_SAMPLE_RATE: "0.1"
Deployment¶
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-service
spec:
template:
spec:
containers:
- name: agent
image: agent-service:latest
envFrom:
- configMapRef:
name: agenticraft-telemetry
env:
- name: AGENTICRAFT_SERVICE_VERSION
value: "1.0.0"
- name: AGENTICRAFT_OTLP_HEADERS
valueFrom:
secretKeyRef:
name: telemetry-secrets
key: api-key
Docker Configuration¶
Docker Compose¶
# docker-compose.yml
version: '3.8'
services:
agent-service:
image: agent-service:latest
environment:
AGENTICRAFT_TELEMETRY_ENABLED: "true"
AGENTICRAFT_EXPORTER_TYPE: "otlp"
AGENTICRAFT_OTLP_ENDPOINT: "otel-collector:4317"
AGENTICRAFT_SERVICE_NAME: "agent-service"
AGENTICRAFT_SAMPLE_RATE: "0.1"
depends_on:
- otel-collector
otel-collector:
image: otel/opentelemetry-collector:latest
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
volumes:
- ./otel-config.yaml:/etc/otel-collector-config.yaml
command: ["--config=/etc/otel-collector-config.yaml"]
Dockerfile¶
FROM python:3.11-slim
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy application
COPY . /app
WORKDIR /app
# Set telemetry defaults
ENV AGENTICRAFT_TELEMETRY_ENABLED=true \
AGENTICRAFT_EXPORTER_TYPE=otlp \
AGENTICRAFT_SERVICE_NAME=agent-service
# Run application
CMD ["python", "-m", "agenticraft"]
Debugging Configuration¶
Enable Debug Logging¶
import logging
# Enable debug logging for telemetry
logging.getLogger("agenticraft.telemetry").setLevel(logging.DEBUG)
logging.getLogger("opentelemetry").setLevel(logging.DEBUG)
telemetry = TelemetryConfig(
enabled=True,
debug=True,
debug_include_internal_spans=True,
debug_print_exports=True
)
Troubleshooting Exporter¶
# Use troubleshooting exporter to diagnose issues
telemetry = TelemetryConfig(
enabled=True,
exporter_type="debug",
debug_export_path="/tmp/telemetry-debug.json",
debug_include_failed_exports=True
)
Configuration Validation¶
from agenticraft.telemetry import TelemetryConfig, validate_config
# Validate configuration before initialization
config = TelemetryConfig(
enabled=True,
exporter_type="otlp",
otlp_endpoint="localhost:4317"
)
errors = validate_config(config)
if errors:
print("Configuration errors:", errors)
else:
config.initialize()
Best Practices¶
-
Use environment variables for sensitive data
-
Set appropriate sampling rates
- Development: 100% (1.0)
- Staging: 50% (0.5)
-
Production: 1-10% (0.01-0.1)
-
Configure batch sizes based on traffic
- Low traffic: 256-512
- Medium traffic: 1024-2048
-
High traffic: 4096-8192
-
Use resource attributes for filtering
-
Monitor telemetry health