# Improper Handling of Unicode Encoding (CWE-176) The product does not properly handle when an input contains Unicode encoding. **Stack:** Python - Prevalence: Medium 3 languages covered - Impact: Medium Review recommended - Prevention: Documented 3 fix examples **OWASP:** Injection (A03:2021-Injection) - #3 ## Description Unicode characters can have multiple encodings or representations. If an application does not properly handle Unicode, attackers may be able to bypass security filters or cause unexpected behavior using alternate encodings. ## Prevention Prevention strategies for Improper Handling of Unicode based on 1 Shoulder detection rules. ### Python Normalize Unicode strings with NFKC before comparison or security-critical operations ## Warning Signs - [MEDIUM] missing Unicode normalization leading to security bypasses ## Consequences - Bypass Protection Mechanism - Execute Unauthorized Code ## Mitigations - Normalize Unicode input to a canonical form before processing - Apply security checks after Unicode normalization - Use Unicode-aware comparison functions ## Detection - Total rules: 3 - Languages: go, javascript, typescript, python ## Rules by Language ### Python (1 rules) - **Unicode Normalization Issues** [MEDIUM]: Detects missing Unicode normalization leading to security bypasses. - Remediation: Normalize Unicode strings before comparison using unicodedata.normalize(). ```python import unicodedata def normalize_username(username): return unicodedata.normalize('NFKC', username).lower() if normalize_username(input_name) == normalize_username(stored_name): grant_access() ``` Learn more: https://shoulder.dev/learn/python/cwe-176/unicode-normalization