Architecture & Design

This document details the internal architecture of the Afterimage library. It is intended for advanced users who want to extend the library or understand its internals.

System Overview

Afterimage is designed as a modular pipeline for synthetic data generation. The core philosophy is composition over inheritance—you build a generator by composing different strategies for prompts, instructions, and storage.

Core Components

Generators (BaseGenerator): The orchestrators. They manage the main loop, concurrency, and state.
- ConversationGenerator (exported as AsyncConversationGenerator for backward compatibility): multi-turn dialogs.
- StructuredGenerator (alias AsyncStructuredGenerator): single-turn structured output.
Instruction Generators (BaseInstructionGeneratorCallback): Strategies for “What to ask”.
- Responsible for producing the initial user instruction/question.
- Can have internal state (e.g., to ensure coverage of a document set).
Prompt Modifiers (BaseRespondentPromptModifierCallback): Strategies for “What to know”.
- Responsible for modifying the system prompt of the assistant at runtime.
- Used for RAG (injecting context) or Persona adoption.
- Session-scoped retrieval: WithRAGRespondentPromptModifier runs once per sampled instruction (before the multi-turn go() loop). Retrieved text is fixed for that dialog unless you add per-turn hooks or a future session driver (see Conversation Generation docs).
- Retriever protocol: Implement get_context(query) -> str, optionally aget_context, and optionally get_context_with_metadata / aget_context_with_metadata returning RetrievalResult (afterimage.retrievers) so hit ids and scores can appear under GeneratedResponsePrompt.metadata["retrieval"]. The canonical empty-hit string is NO_RETRIEVAL_CONTEXT.
- Qdrant async I/O: QdrantRetriever uses the sync client’s query_points for get_context paths and, when you pass async_client (qdrant_client.AsyncQdrantClient), awaits query_points on that client for aget_context* so HTTP work stays off the event loop without relying on asyncio.to_thread for Qdrant calls.
Storage (BaseStorage): Persistence layer.
- Decoupled from generation logic.
- Can be swapped (JSONL vs SQL) without changing the generator.
LLM Abstraction Layer (afterimage.providers.llm_providers):
- Uniform Interface: LLMProvider protocol normalizes interactions across models (Gemini, OpenAI, etc.).
- Unified Responses: Returns standardized LLMResponse or StructuredLLMResponse objects with consistent token counts and usage metadata.
- Chat Abstraction: ChatSession manages conversation history statefully, independent of the underlying API’s specific mechanics.
- Factory Creation: LLMFactory allows dynamic instantiation of providers via strings.

Extension Points

Afterimage is designed to be extended. Here are the common patterns:

Custom Instruction Generator

If you want to generate instructions from a custom source (e.g., a live API or a specific algorithm), subclass BaseInstructionGeneratorCallback.

from afterimage.base import BaseInstructionGeneratorCallback
from afterimage.common import GeneratedInstructions

class MyCustomInstructionGenerator(BaseInstructionGeneratorCallback):
    async def agenerate(self, original_prompt: str) -> GeneratedInstructions:
        # Your logic here — return at least one instruction string in `instructions`
        return GeneratedInstructions(
            instructions=["Tell me a joke about API limits."],
            context="System load is high.",
        )

Custom Storage

To save data to a custom backend (e.g., S3, Mongo, or a specific API endpoint), implement the BaseStorage protocol.

from afterimage.storage import BaseStorage

class MyCloudStorage(BaseStorage):
    """Implement every method on :class:`~afterimage.storage.BaseStorage` (sync + async + documents)."""

    def save_conversations(self, conversations):
        raise NotImplementedError

    async def asave_conversations(self, conversations):
        # Push to cloud
        pass

    def load_conversations(self, limit=None, offset=None):
        return []

    def load_documents(self, limit=None, offset=None):
        return []

    def save_documents(self, documents):
        raise NotImplementedError

    async def asave_documents(self, documents):
        pass

Custom LLM Provider

To support a new model family (e.g., Anthropic, Mistral, or a local VLLM), implement the LLMProvider protocol. You must also implement a corresponding ChatSession.

from afterimage.providers import ChatSession, LLMProvider
from afterimage.providers.llm_providers import LLMResponse


class MyCustomChat(ChatSession):
    async def asend_message(self, message, **kwargs) -> LLMResponse:
        # Implement stateful chat logic
        raise NotImplementedError


class MyCustomProvider:
    """Satisfy :class:`~afterimage.providers.llm_providers.LLMProvider` (structural typing)."""

    async def agenerate_content(self, prompt: str, **kwargs) -> LLMResponse:
        return LLMResponse(
            text="response",
            prompt_token_count=10,
            completion_token_count=10,
            total_token_count=20,
            finish_reason="stop",
            model_name="my-model",
            raw_response={},
        )

    def generate_content(self, prompt: str, **kwargs) -> LLMResponse:
        raise NotImplementedError

    async def agenerate_structured(self, prompt: str, schema, **kwargs):
        raise NotImplementedError

    def generate_structured(self, prompt: str, schema, **kwargs):
        raise NotImplementedError

    def start_chat(self, **kwargs) -> ChatSession:
        return MyCustomChat()

    async def astart_chat(self, **kwargs) -> ChatSession:
        return MyCustomChat()

Developer Tips for LLM Providers:

Async Support: Always implement both sync and async methods. The library core relies heavily on agenerate_content for performance.
Token Counting: Ensure you populate token counts in LLMResponse. This is critical for the GenerationMonitor to track costs and throughput.
Structured Output: For generate_structured, leveraging Pydantic is highly recommended. If the underlying API doesn’t support JSON schema natively, use a robust parser or instructor library.
Error Handling: Wrap your API calls in try/except blocks and use SmartKeyPool.report_error(key) if an API error occurs, so the pool can rotate keys or back off.

Design Patterns

Async-First: The library is built from the ground up using asyncio for high throughput.
Callback Pattern: Logic is injected via callbacks rather than subclassing the generator itself.
Pydantic Models: All data exchange (config, inputs, outputs) is validated using Pydantic models for type safety.