# Monitoring & Observability When generating thousands of conversations, you need visibility into the process. Is it working? How fast is it? Are errors occurring? Afterimage provides a robust, thread-safe **Monitoring System** to track these metrics in real-time, visualize them, and export them for analysis. ## `GenerationMonitor` The central component is the `GenerationMonitor`. It collects metrics from the generator relative to performance, health, and quality, and routes them to various handlers (files, logs, or custom dashboards). ### Initialization You can attach a monitor to any generator (`ConversationGenerator`, `PersonaGenerator`, etc.) or to **OpenSimula** via `OpenSimula(..., monitor=monitor)` so taxonomy, sampling, meta-prompt, critic, and task JSON calls are recorded with `component="opensimula"` metadata. See [OpenSimula](opensimula.md) for operation naming and how that fits next to conversation generation. The monitor uses background threads to process metrics without blocking the main generation loop. ```python from afterimage import ConversationGenerator, GenerationMonitor # 1. Initialize Monitor # This writes metrics to metrics.jsonl and logs to afterimage.log under log_dir. # If log_dir is omitted, a timestamped directory is created under ./.afterimage-monitoring/ monitor = GenerationMonitor( log_dir="./logs", metrics_interval=60, # seconds between built-in alert rule runs (rolling windows stay 5 minutes) ) # 2. Attach to Generator generator = ConversationGenerator( ..., monitor=monitor ) ``` ### Metrics Tracked The monitor automatically tracks a wide range of metrics: * **Performance**: * `generation_time`: Time taken to generate one conversation. * `prompt_token_count`: Input tokens used. * `completion_token_count`: Output tokens generated. * `total_token_count`: Total token usage. * `conversation_length`: Number of turns in the generated conversation. * **Health**: * `success_rate`: Binary tracking of successful generations (1.0) vs failures (0.0). * `error_rate`: Binary tracking of errors. * `api_errors`: Specific API failures. * **Quality** (if Evaluation is running): * `evaluation_score_`: Scores from evaluators (e.g., `evaluation_score_coherence`). * `evaluation_time`: Time taken for evaluation steps. ### Exporting Data You can export your collected metrics to various formats for external analysis (e.g., in Jupyter Notebooks or Excel). ```python # Export to JSON monitor.export_metrics("metrics_export.json", format="json") # Export to CSV (creates separate files for each metric type) monitor.export_metrics("metrics_export.csv", format="csv") # Export to Excel (creates a multi-sheet workbook) monitor.export_metrics("metrics_report.xlsx", format="excel") # Export to Parquet (efficient binary format) monitor.export_metrics("metrics.parquet", format="parquet") ``` You can also filter exports by a time window: ```python from datetime import timedelta # Export only the last hour of data monitor.export_metrics("last_hour.csv", format="csv", window=timedelta(hours=1)) ``` ## Visualization The `GenerationMonitor` has built-in plotting capabilities using `matplotlib` and `seaborn`. It can generate a suite of plots to help you understand your generation run. ```python # Generate and save all standard plots to the log directory monitor.visualize_metrics() # Or specify a custom directory monitor.visualize_metrics(save_dir="./plots") ``` The standard visualizations include: 1. **Success/Error Rate Over Time**: Rolling averages of success and failure rates. 2. **Generation Time Distribution**: Histogram of latencies. 3. **Token Usage Over Time**: Trends for prompt, completion, and total tokens. 4. **Evaluation Scores Over Time**: Trends for quality metrics. 5. **Evaluation Time Distribution**: Histogram of evaluation latencies. ## Alerts `GenerationMonitor` exposes `alert_handlers` plus threshold kwargs (`alert_min_success_rate`, `alert_max_generation_time_seconds`, `alert_max_error_rate`, token means, `alert_max_conversation_length_mean`). A background **`_alert_worker`** thread wakes every **`metrics_interval`** seconds (minimum one second) and runs the same logic as **`check_alerts()`**, evaluating rolling **five-minute means** from `get_metrics`, for example: * Low success rate below **0.8** (default) * Mean generation time above **30s** * Mean error rate above **0.2** * Mean prompt / completion / total token counts above **4096 / 4096 / 8192** * Mean `conversation_length` **above** **2** turns (`long_conversations`) Set **`metrics_interval=0`** to disable only the periodic alert thread (metric and log workers are unchanged). To run the rules on demand from your own code, call **`monitor.check_alerts()`**. ### Custom alert handlers Register callables that accept an `Alert` dataclass (`name`, `message`, `level`, `timestamp`, `data`). They are invoked whenever a built-in rule fires (including on each periodic pass while the condition remains true). ```python def stop_on_critical_error(alert): if alert.level == "error": print(f"CRITICAL ALERT: {alert.name} - {alert.message}") monitor = GenerationMonitor( log_dir="./logs", alert_handlers=[stop_on_critical_error], ) ``` ## Internals & Extensibility (For Developers) ### Threading Model The `GenerationMonitor` uses a producer-consumer architecture to ensure monitoring does not impact generation performance. * **Producers**: `record_metric`, `log_info`, etc., simply put items into a thread-safe `queue.Queue`. * **Consumers**: Background worker threads (`_metric_worker`, `_log_worker`) pull items from the queues and process them (writing to files, forwarding to handlers). When `metrics_interval > 0`, **`_alert_worker`** also runs and may invoke **`alert_handlers`**. ### Custom Handlers By default, the monitor uses `FileMetricHandler` and `FileLogHandler`. You can implement your own handlers (e.g., to send metrics to Datadog, Prometheus, or WandB) by implementing the `MetricHandler` or `LogHandler` protocols. ```python from typing import Dict, Any class WandBMetricHandler: def handle_metric(self, metric_name: str, value: float, metadata: Dict[str, Any]) -> None: import wandb wandb.log({metric_name: value, **metadata}) monitor = GenerationMonitor( metric_handlers=[WandBMetricHandler()] ) ```