# LLM Training Data Poisoning - ID: python-llm-training-data-poisoning - Severity: HIGH - CWE: Deserialization of Untrusted Data (CWE-502) - Languages: Python - Frameworks: flask, django, fastapi ## Description Detects untrusted or unvalidated data flowing into AI/LLM fine-tuning or training processes. OWASP LLM03 - Training Data Poisoning. Training data poisoning can: - Introduce backdoors into model behavior - Bias model outputs maliciously - Embed harmful content that appears in responses - Compromise model accuracy and reliability - Create security vulnerabilities in model behavior ## Detection Message Untrusted data from {source} flows to AI training/fine-tuning at {sink}. This can lead to training data poisoning attacks. ## Remediation Validate training data with Pydantic and use content moderation. ```python from pydantic import BaseModel, validator class TrainingData(BaseModel): examples: list @validator('examples', each_item=True) def validate_example(cls, v): if len(v.get('content', '')) > 4000: raise ValueError('Content too long') return v data = TrainingData(**request.json) moderation = await openai.moderations.create(input=data.json()) ``` Learn more: https://shoulder.dev/learn/python/cwe-502/llm-training-data-poisoning ## Documentation [object Object] ## Related Rules - **Insecure Deserialization** [HIGH]: - **LLM Training Data Poisoning** [HIGH]: - **LLM Training Data Poisoning** [HIGH]: - **Unsafe Deserialization** [CRITICAL]: - **Unsafe Deserialization** [CRITICAL]: