Architecture & Design

This document details the internal architecture of the Afterimage library. It is intended for advanced users who want to extend the library or understand its internals.

System Overview

Afterimage is designed as a modular pipeline for synthetic data generation. The core philosophy is composition over inheritance—you build a generator by composing different strategies for prompts, instructions, and storage.

Core Components

  1. Generators (BaseGenerator): The orchestrators. They manage the main loop, concurrency, and state.

    • ConversationGenerator (exported as AsyncConversationGenerator for backward compatibility): multi-turn dialogs.

    • StructuredGenerator (alias AsyncStructuredGenerator): single-turn structured output.

  2. Instruction Generators (BaseInstructionGeneratorCallback): Strategies for “What to ask”.

    • Responsible for producing the initial user instruction/question.

    • Can have internal state (e.g., to ensure coverage of a document set).

  3. Prompt Modifiers (BaseRespondentPromptModifierCallback): Strategies for “What to know”.

    • Responsible for modifying the system prompt of the assistant at runtime.

    • Used for RAG (injecting context) or Persona adoption.

    • Session-scoped retrieval: WithRAGRespondentPromptModifier runs once per sampled instruction (before the multi-turn go() loop). Retrieved text is fixed for that dialog unless you add per-turn hooks or a future session driver (see Conversation Generation docs).

    • Retriever protocol: Implement get_context(query) -> str, optionally aget_context, and optionally get_context_with_metadata / aget_context_with_metadata returning RetrievalResult (afterimage.retrievers) so hit ids and scores can appear under GeneratedResponsePrompt.metadata["retrieval"]. The canonical empty-hit string is NO_RETRIEVAL_CONTEXT.

    • Qdrant async I/O: QdrantRetriever uses the sync client’s query_points for get_context paths and, when you pass async_client (qdrant_client.AsyncQdrantClient), awaits query_points on that client for aget_context* so HTTP work stays off the event loop without relying on asyncio.to_thread for Qdrant calls.

  4. Storage (BaseStorage): Persistence layer.

    • Decoupled from generation logic.

    • Can be swapped (JSONL vs SQL) without changing the generator.

  5. LLM Abstraction Layer (afterimage.providers.llm_providers):

    • Uniform Interface: LLMProvider protocol normalizes interactions across models (Gemini, OpenAI, etc.).

    • Unified Responses: Returns standardized LLMResponse or StructuredLLMResponse objects with consistent token counts and usage metadata.

    • Chat Abstraction: ChatSession manages conversation history statefully, independent of the underlying API’s specific mechanics.

    • Factory Creation: LLMFactory allows dynamic instantiation of providers via strings.

Extension Points

Afterimage is designed to be extended. Here are the common patterns:

Custom Instruction Generator

If you want to generate instructions from a custom source (e.g., a live API or a specific algorithm), subclass BaseInstructionGeneratorCallback.

from afterimage.base import BaseInstructionGeneratorCallback
from afterimage.common import GeneratedInstructions

class MyCustomInstructionGenerator(BaseInstructionGeneratorCallback):
    async def agenerate(self, original_prompt: str) -> GeneratedInstructions:
        # Your logic here — return at least one instruction string in `instructions`
        return GeneratedInstructions(
            instructions=["Tell me a joke about API limits."],
            context="System load is high.",
        )

Custom Storage

To save data to a custom backend (e.g., S3, Mongo, or a specific API endpoint), implement the BaseStorage protocol.

from afterimage.storage import BaseStorage

class MyCloudStorage(BaseStorage):
    """Implement every method on :class:`~afterimage.storage.BaseStorage` (sync + async + documents)."""

    def save_conversations(self, conversations):
        raise NotImplementedError

    async def asave_conversations(self, conversations):
        # Push to cloud
        pass

    def load_conversations(self, limit=None, offset=None):
        return []

    def load_documents(self, limit=None, offset=None):
        return []

    def save_documents(self, documents):
        raise NotImplementedError

    async def asave_documents(self, documents):
        pass

Custom LLM Provider

To support a new model family (e.g., Anthropic, Mistral, or a local VLLM), implement the LLMProvider protocol. You must also implement a corresponding ChatSession.

from afterimage.providers import ChatSession, LLMProvider
from afterimage.providers.llm_providers import LLMResponse


class MyCustomChat(ChatSession):
    async def asend_message(self, message, **kwargs) -> LLMResponse:
        # Implement stateful chat logic
        raise NotImplementedError


class MyCustomProvider:
    """Satisfy :class:`~afterimage.providers.llm_providers.LLMProvider` (structural typing)."""

    async def agenerate_content(self, prompt: str, **kwargs) -> LLMResponse:
        return LLMResponse(
            text="response",
            prompt_token_count=10,
            completion_token_count=10,
            total_token_count=20,
            finish_reason="stop",
            model_name="my-model",
            raw_response={},
        )

    def generate_content(self, prompt: str, **kwargs) -> LLMResponse:
        raise NotImplementedError

    async def agenerate_structured(self, prompt: str, schema, **kwargs):
        raise NotImplementedError

    def generate_structured(self, prompt: str, schema, **kwargs):
        raise NotImplementedError

    def start_chat(self, **kwargs) -> ChatSession:
        return MyCustomChat()

    async def astart_chat(self, **kwargs) -> ChatSession:
        return MyCustomChat()

Developer Tips for LLM Providers:

  • Async Support: Always implement both sync and async methods. The library core relies heavily on agenerate_content for performance.

  • Token Counting: Ensure you populate token counts in LLMResponse. This is critical for the GenerationMonitor to track costs and throughput.

  • Structured Output: For generate_structured, leveraging Pydantic is highly recommended. If the underlying API doesn’t support JSON schema natively, use a robust parser or instructor library.

  • Error Handling: Wrap your API calls in try/except blocks and use SmartKeyPool.report_error(key) if an API error occurs, so the pool can rotate keys or back off.

Design Patterns

  • Async-First: The library is built from the ground up using asyncio for high throughput.

  • Callback Pattern: Logic is injected via callbacks rather than subclassing the generator itself.

  • Pydantic Models: All data exchange (config, inputs, outputs) is validated using Pydantic models for type safety.