# Architecture & Design This document details the internal architecture of the Afterimage library. It is intended for advanced users who want to extend the library or understand its internals. ## System Overview Afterimage is designed as a modular pipeline for synthetic data generation. The core philosophy is **composition over inheritance**—you build a generator by composing different strategies for prompts, instructions, and storage. ### Core Components 1. **Generators (`BaseGenerator`)**: The orchestrators. They manage the main loop, concurrency, and state. * `ConversationGenerator` (exported as `AsyncConversationGenerator` for backward compatibility): multi-turn dialogs. * `StructuredGenerator` (alias `AsyncStructuredGenerator`): single-turn structured output. 2. **Instruction Generators (`BaseInstructionGeneratorCallback`)**: Strategies for "What to ask". * Responsible for producing the initial user instruction/question. * Can have internal state (e.g., to ensure coverage of a document set). 3. **Prompt Modifiers (`BaseRespondentPromptModifierCallback`)**: Strategies for "What to know". * Responsible for modifying the system prompt of the assistant at runtime. * Used for RAG (injecting context) or Persona adoption. * **Session-scoped retrieval:** `WithRAGRespondentPromptModifier` runs once per sampled instruction (before the multi-turn `go()` loop). Retrieved text is fixed for that dialog unless you add per-turn hooks or a future session driver (see *Conversation Generation* docs). * **Retriever protocol:** Implement `get_context(query) -> str`, optionally `aget_context`, and optionally `get_context_with_metadata` / `aget_context_with_metadata` returning `RetrievalResult` (`afterimage.retrievers`) so hit ids and scores can appear under `GeneratedResponsePrompt.metadata["retrieval"]`. The canonical empty-hit string is `NO_RETRIEVAL_CONTEXT`. * **Qdrant async I/O:** `QdrantRetriever` uses the sync client’s `query_points` for `get_context` paths and, when you pass `async_client` (`qdrant_client.AsyncQdrantClient`), awaits `query_points` on that client for `aget_context*` so HTTP work stays off the event loop without relying on `asyncio.to_thread` for Qdrant calls. 4. **Storage (`BaseStorage`)**: Persistence layer. * Decoupled from generation logic. * Can be swapped (JSONL vs SQL) without changing the generator. 5. **LLM Abstraction Layer (`afterimage.providers.llm_providers`)**: * **Uniform Interface**: `LLMProvider` protocol normalizes interactions across models (Gemini, OpenAI, etc.). * **Unified Responses**: Returns standardized `LLMResponse` or `StructuredLLMResponse` objects with consistent token counts and usage metadata. * **Chat Abstraction**: `ChatSession` manages conversation history statefully, independent of the underlying API's specific mechanics. * **Factory Creation**: `LLMFactory` allows dynamic instantiation of providers via strings. ## Extension Points Afterimage is designed to be extended. Here are the common patterns: ### Custom Instruction Generator If you want to generate instructions from a custom source (e.g., a live API or a specific algorithm), subclass `BaseInstructionGeneratorCallback`. ```python from afterimage.base import BaseInstructionGeneratorCallback from afterimage.common import GeneratedInstructions class MyCustomInstructionGenerator(BaseInstructionGeneratorCallback): async def agenerate(self, original_prompt: str) -> GeneratedInstructions: # Your logic here — return at least one instruction string in `instructions` return GeneratedInstructions( instructions=["Tell me a joke about API limits."], context="System load is high.", ) ``` ### Custom Storage To save data to a custom backend (e.g., S3, Mongo, or a specific API endpoint), implement the `BaseStorage` protocol. ```python from afterimage.storage import BaseStorage class MyCloudStorage(BaseStorage): """Implement every method on :class:`~afterimage.storage.BaseStorage` (sync + async + documents).""" def save_conversations(self, conversations): raise NotImplementedError async def asave_conversations(self, conversations): # Push to cloud pass def load_conversations(self, limit=None, offset=None): return [] def load_documents(self, limit=None, offset=None): return [] def save_documents(self, documents): raise NotImplementedError async def asave_documents(self, documents): pass ``` ### Custom LLM Provider To support a new model family (e.g., Anthropic, Mistral, or a local VLLM), implement the `LLMProvider` protocol. You must also implement a corresponding `ChatSession`. ```python from afterimage.providers import ChatSession, LLMProvider from afterimage.providers.llm_providers import LLMResponse class MyCustomChat(ChatSession): async def asend_message(self, message, **kwargs) -> LLMResponse: # Implement stateful chat logic raise NotImplementedError class MyCustomProvider: """Satisfy :class:`~afterimage.providers.llm_providers.LLMProvider` (structural typing).""" async def agenerate_content(self, prompt: str, **kwargs) -> LLMResponse: return LLMResponse( text="response", prompt_token_count=10, completion_token_count=10, total_token_count=20, finish_reason="stop", model_name="my-model", raw_response={}, ) def generate_content(self, prompt: str, **kwargs) -> LLMResponse: raise NotImplementedError async def agenerate_structured(self, prompt: str, schema, **kwargs): raise NotImplementedError def generate_structured(self, prompt: str, schema, **kwargs): raise NotImplementedError def start_chat(self, **kwargs) -> ChatSession: return MyCustomChat() async def astart_chat(self, **kwargs) -> ChatSession: return MyCustomChat() ``` **Developer Tips for LLM Providers:** * **Async Support**: Always implement both sync and async methods. The library core relies heavily on `agenerate_content` for performance. * **Token Counting**: Ensure you populate token counts in `LLMResponse`. This is critical for the `GenerationMonitor` to track costs and throughput. * **Structured Output**: For `generate_structured`, leveraging Pydantic is highly recommended. If the underlying API doesn't support JSON schema natively, use a robust parser or instructor library. * **Error Handling**: Wrap your API calls in try/except blocks and use `SmartKeyPool.report_error(key)` if an API error occurs, so the pool can rotate keys or back off. ## Design Patterns * **Async-First**: The library is built from the ground up using `asyncio` for high throughput. * **Callback Pattern**: Logic is injected via callbacks rather than subclassing the generator itself. * **Pydantic Models**: All data exchange (config, inputs, outputs) is validated using Pydantic models for type safety.