# LLM Training Data Poisoning - ID: javascript-llm-training-data-poisoning - Severity: HIGH - CWE: Deserialization of Untrusted Data (CWE-502) - Languages: JavaScript, TypeScript - Frameworks: nodejs ## Description Detects untrusted or unvalidated data flowing into AI/LLM fine-tuning or training processes. OWASP LLM03 - Training Data Poisoning. Training data poisoning can: - Introduce backdoors into model behavior - Bias model outputs maliciously - Embed harmful content that appears in responses - Compromise model accuracy and reliability - Create security vulnerabilities in model behavior This rule detects: - User-provided data used directly in fine-tuning - External data sources used without validation - Training data loaded from untrusted URLs - Missing data validation before training ## Detection Message Untrusted data from {source} flows to AI training/fine-tuning at {sink}. This can lead to training data poisoning attacks. ## Remediation Validate training data against schemas and use content moderation before fine-tuning. ```javascript if (!validate(trainingData)) { return res.status(400).json({ error: 'Invalid format' }); } await openai.files.create({ file: trainingData, purpose: 'fine-tune' }); ``` Learn more: https://shoulder.dev/learn/javascript/cwe-502/llm-training-data-poisoning ## Documentation [object Object] ## Related Rules - **Insecure Deserialization** [HIGH]: - **LLM Training Data Poisoning** [HIGH]: - **Unsafe Deserialization** [CRITICAL]: - **LLM Training Data Poisoning** [HIGH]: - **Unsafe Deserialization** [CRITICAL]: