Architecture & Design
This document details the internal architecture of the Afterimage library. It is intended for advanced users who want to extend the library or understand its internals.
System Overview
Afterimage is designed as a modular pipeline for synthetic data generation. The core philosophy is composition over inheritance—you build a generator by composing different strategies for prompts, instructions, and storage.
Core Components
Generators (
BaseGenerator): The orchestrators. They manage the main loop, concurrency, and state.ConversationGenerator(exported asAsyncConversationGeneratorfor backward compatibility): multi-turn dialogs.StructuredGenerator(aliasAsyncStructuredGenerator): single-turn structured output.
Instruction Generators (
BaseInstructionGeneratorCallback): Strategies for “What to ask”.Responsible for producing the initial user instruction/question.
Can have internal state (e.g., to ensure coverage of a document set).
Prompt Modifiers (
BaseRespondentPromptModifierCallback): Strategies for “What to know”.Responsible for modifying the system prompt of the assistant at runtime.
Used for RAG (injecting context) or Persona adoption.
Session-scoped retrieval:
WithRAGRespondentPromptModifierruns once per sampled instruction (before the multi-turngo()loop). Retrieved text is fixed for that dialog unless you add per-turn hooks or a future session driver (see Conversation Generation docs).Retriever protocol: Implement
get_context(query) -> str, optionallyaget_context, and optionallyget_context_with_metadata/aget_context_with_metadatareturningRetrievalResult(afterimage.retrievers) so hit ids and scores can appear underGeneratedResponsePrompt.metadata["retrieval"]. The canonical empty-hit string isNO_RETRIEVAL_CONTEXT.Qdrant async I/O:
QdrantRetrieveruses the sync client’squery_pointsforget_contextpaths and, when you passasync_client(qdrant_client.AsyncQdrantClient), awaitsquery_pointson that client foraget_context*so HTTP work stays off the event loop without relying onasyncio.to_threadfor Qdrant calls.
Storage (
BaseStorage): Persistence layer.Decoupled from generation logic.
Can be swapped (JSONL vs SQL) without changing the generator.
LLM Abstraction Layer (
afterimage.providers.llm_providers):Uniform Interface:
LLMProviderprotocol normalizes interactions across models (Gemini, OpenAI, etc.).Unified Responses: Returns standardized
LLMResponseorStructuredLLMResponseobjects with consistent token counts and usage metadata.Chat Abstraction:
ChatSessionmanages conversation history statefully, independent of the underlying API’s specific mechanics.Factory Creation:
LLMFactoryallows dynamic instantiation of providers via strings.
Extension Points
Afterimage is designed to be extended. Here are the common patterns:
Custom Instruction Generator
If you want to generate instructions from a custom source (e.g., a live API or a specific algorithm), subclass BaseInstructionGeneratorCallback.
from afterimage.base import BaseInstructionGeneratorCallback
from afterimage.common import GeneratedInstructions
class MyCustomInstructionGenerator(BaseInstructionGeneratorCallback):
async def agenerate(self, original_prompt: str) -> GeneratedInstructions:
# Your logic here — return at least one instruction string in `instructions`
return GeneratedInstructions(
instructions=["Tell me a joke about API limits."],
context="System load is high.",
)
Custom Storage
To save data to a custom backend (e.g., S3, Mongo, or a specific API endpoint), implement the BaseStorage protocol.
from afterimage.storage import BaseStorage
class MyCloudStorage(BaseStorage):
"""Implement every method on :class:`~afterimage.storage.BaseStorage` (sync + async + documents)."""
def save_conversations(self, conversations):
raise NotImplementedError
async def asave_conversations(self, conversations):
# Push to cloud
pass
def load_conversations(self, limit=None, offset=None):
return []
def load_documents(self, limit=None, offset=None):
return []
def save_documents(self, documents):
raise NotImplementedError
async def asave_documents(self, documents):
pass
Custom LLM Provider
To support a new model family (e.g., Anthropic, Mistral, or a local VLLM), implement the LLMProvider protocol. You must also implement a corresponding ChatSession.
from afterimage.providers import ChatSession, LLMProvider
from afterimage.providers.llm_providers import LLMResponse
class MyCustomChat(ChatSession):
async def asend_message(self, message, **kwargs) -> LLMResponse:
# Implement stateful chat logic
raise NotImplementedError
class MyCustomProvider:
"""Satisfy :class:`~afterimage.providers.llm_providers.LLMProvider` (structural typing)."""
async def agenerate_content(self, prompt: str, **kwargs) -> LLMResponse:
return LLMResponse(
text="response",
prompt_token_count=10,
completion_token_count=10,
total_token_count=20,
finish_reason="stop",
model_name="my-model",
raw_response={},
)
def generate_content(self, prompt: str, **kwargs) -> LLMResponse:
raise NotImplementedError
async def agenerate_structured(self, prompt: str, schema, **kwargs):
raise NotImplementedError
def generate_structured(self, prompt: str, schema, **kwargs):
raise NotImplementedError
def start_chat(self, **kwargs) -> ChatSession:
return MyCustomChat()
async def astart_chat(self, **kwargs) -> ChatSession:
return MyCustomChat()
Developer Tips for LLM Providers:
Async Support: Always implement both sync and async methods. The library core relies heavily on
agenerate_contentfor performance.Token Counting: Ensure you populate token counts in
LLMResponse. This is critical for theGenerationMonitorto track costs and throughput.Structured Output: For
generate_structured, leveraging Pydantic is highly recommended. If the underlying API doesn’t support JSON schema natively, use a robust parser or instructor library.Error Handling: Wrap your API calls in try/except blocks and use
SmartKeyPool.report_error(key)if an API error occurs, so the pool can rotate keys or back off.
Design Patterns
Async-First: The library is built from the ground up using
asynciofor high throughput.Callback Pattern: Logic is injected via callbacks rather than subclassing the generator itself.
Pydantic Models: All data exchange (config, inputs, outputs) is validated using Pydantic models for type safety.