Guardians of Generation: Dynamic Inference-Time Copyright Shielding

Abstract

Modern text-to-image generative models can inadvertently reproduce copyrighted content memorized in their training data, raising serious concerns about potential copyright infringement. We introduce Guardians of Generation (GoG), a model-agnostic inference-time framework for dynamic copyright shielding in AI image generation. Our approach requires no retraining or modification of the generative model's weights, instead integrating seamlessly with existing diffusion pipelines. It augments the generation process with an adaptive guidance mechanism comprising three components: a detection module, a prompt rewriting module, and a guidance adjustment module. The detection module monitors user prompts and intermediate generation steps to identify features indicative of copyrighted content before they manifest in the final output. If such content is detected, the prompt rewriting mechanism dynamically transforms the user's prompt—sanitizing or replacing references that could trigger copyrighted material while preserving the prompt's intended semantics. The adaptive guidance module adaptively steers the diffusion process away from flagged content by modulating the model's sampling trajectory. Together, these components form a robust shield that enables a tunable balance between preserving creative fidelity and ensuring copyright compliance. We validate our method on a variety of generative models (Stable Diffusion 2.1, SDXL, Flux), demonstrating substantial reductions in copyrighted content generation with negligible impact on output fidelity or alignment with user intent. This work provides a practical, plug-and-play safeguard for generative image models, enabling more responsible deployment under real-world copyright constraints.

Overview

Figure 1: The Dynamic Inference-Time Copyright Shielding Pipeline. Our approach integrates concept detection, prompt rewriting, and adaptive guidance to balance compliance and creative fidelity.

Our approach introduces a novel adaptive classifier-free guidance mechanism that dynamically balances copyright compliance and fidelity to the user's intent, addressing a critical challenge in responsible AI deployment. The GoG framework works with existing diffusion models without requiring retraining, making it a practical solution for real-world deployment.

Methodology

1. Protected Concept Detection

We implement semantic matching between user prompts and a database of protected concepts, with language model verification to reduce false positives. Our detection mechanism identifies protected entities and prevents copyright infringement by monitoring both explicit references and indirect anchoring cues.

2. Prompt Rewriting

When protected concepts are detected, an LLM sanitizes the prompt while preserving non-protected elements and the user's intent. This maintains semantic similarity with the original prompt while removing potentially infringing content, ensuring the generated images remain aligned with user expectations.

3. Adaptive CFG

Our key innovation blends embeddings from original and rewritten prompts during the diffusion process through adaptive classifier-free guidance (CFG). This mechanism provides a continuous control spectrum between strict copyright compliance and creative expression.

Formal Definition:

ε_θ(x_t, c) = ε_θ(x_t) + λ(αε_θ(x_t, c_orig) + (1-α)ε_θ(x_t, c_safe) - ε_θ(x_t))

where α ∈ [0,1] is the mixing weight controlling the influence of original vs. rewritten prompt embeddings.

Experimental Results

We evaluated our approach on three text-to-image diffusion models (Stable Diffusion 2.1, SDXL, and Flux) using a diverse dataset of 33 protected concepts across multiple categories including movie characters, animated figures, brand logos, and portraits. Our experiments demonstrate that GoG successfully balances copyright protection with visual quality, maintaining semantic fidelity while introducing sufficient visual variation to avoid infringement.

Original (Unprotected)

Protected (α = 0.7)

Pikachu prompt comparison: "a yellow electric mouse character"

Original (Unprotected)

Protected (α = 0.7)

Elon Musk prompt comparison: "founder of tesla presenting a car"

Original (Unprotected)

Protected (α = 0.7)

Mario prompt comparison: "italian plumber character with red cap"

Original (Unprotected)

Protected (α = 0.7)

Spiderman prompt comparison: "web slinging superhero fighting villains"

Original (Unprotected)

Protected (α = 0.7)

Shrek prompt comparison: "green ogre character in swamp"

Original (Unprotected)

Protected (α = 0.7)

Batman prompt comparison: "dark vigilante with cape"

Indirect Anchoring Protection

A key challenge in copyright protection is addressing indirect anchoring—where prompts use descriptive cues rather than explicitly naming protected entities. Our experiments with cartoon and animated characters demonstrate that GoG effectively mitigates these vulnerabilities. At a guidance scale (η) of 3.0 and mixing weight (α) of 0.7, we achieved a balanced protection profile with optimal CLIP-I (0.84), CLIP-T (0.22), and SSIM (0.50) scores, while significantly reducing detected instances of copyright infringement.

Figure 3: Without GoG, models produce copyrighted images even when prompts lack explicit references (top row). Our approach prevents generating protected content while preserving the prompt's semantics (bottom row).

Evaluation Metrics

We evaluated our method using six key metrics that measure both semantic fidelity and copyright protection:

CLIP-I & CLIP-T: Measure semantic alignment between generated images, original copyrighted content, and user prompts
LPIPS: Quantifies perceptual similarity between original and protected images
SSIM: Assesses structural similarity
CONS: Employs a VQA model to determine if key visual features of target concepts are present
DETECT: Counts occurrences of target entities to measure unintended replication

Our experiments show that the GoG framework achieves a balanced range of metrics across different models, with moderate CLIP-I (0.66-0.85), CLIP-T (0.15-0.25), LPIPS (0.40-0.60), and SSIM (0.30-0.40) values. This indicates generated images that remain semantically aligned with user intent while avoiding direct copyright infringement.

Parametric Analysis: Effect of Mixing Weight (α)

The mixing weight parameter allows fine control over the balance between copyright protection and visual fidelity. Our experiments across different models show that optimal α values typically fall between 0.65-0.75, providing effective copyright protection while maintaining high image quality.

Figure 2: Transition of generated images with different mixing weights (α). Higher values of α allow more influence from the rewritten prompt, while lower values favor the original prompt. Analysis conducted over 500 prompt pairs.

Our results indicate different optimal parameters across models:

Stable Diffusion 2.1: Best results at guidance scale (η) of 8.0 with mixing weight (α) of 0.5
SDXL: Optimal performance at η=3.0 with α=0.5 or α=0.7
Flux: Most effective at η=3.0 with α=0.7 for indirect anchoring scenarios

These findings highlight that while the general approach is consistent, fine-tuning parameters for specific models can further optimize the balance between copyright protection and image quality.

Limitations and Future Work

While our approach demonstrates promising results, several limitations remain. The method relies on a pre-defined database of protected concepts, which may not cover all copyrighted materials. Additionally, the effectiveness of prompt rewriting depends on the capabilities of the language model used.

Performance Considerations

GoG introduces computational overhead that affects generation time:

SD 2.1: 35.84s without GoG vs. 185.27s with GoG
SDXL: 38.92s without GoG vs. 187.77s with GoG
Flux: 56.63s without GoG vs. 251.02s with GoG

These measurements were conducted on a single NVIDIA A6000 GPU. While the additional processing time is significant, the improved control over copyright compliance justifies this overhead for many applications.

Future Work

Future research will focus on:

Developing more comprehensive protected concept detection through active learning
Extending the approach to video generation models
Exploring personalized α parameter settings based on user preferences and risk profiles
Investigating cross-modal copyright detection between text and visual domains
Optimizing performance to reduce computational overhead

Guardians of Generation

Dynamic Inference-Time Copyright Shielding with Adaptive Guidance for AI Image Generation

Abstract

Overview

Methodology

1. Protected Concept Detection

2. Prompt Rewriting

3. Adaptive CFG

Experimental Results

Indirect Anchoring Protection

Evaluation Metrics

Parametric Analysis: Effect of Mixing Weight (α)

Limitations and Future Work

Performance Considerations

Future Work

Citation