# Improper Handling of Unicode Encoding (CWE-176) The product does not properly handle when an input contains Unicode encoding. - Prevalence: 보통 3개 언어 지원 - Impact: 보통 검토 권장 - Prevention: 문서화됨 3개의 수정 예시 **OWASP:** Injection (A03:2021-Injection) - #3 ## Description Unicode characters can have multiple encodings or representations. If an application does not properly handle Unicode, attackers may be able to bypass security filters or cause unexpected behavior using alternate encodings. ## Prevention 3개의 Shoulder 탐지 규칙을 기반으로 한 Improper Handling of Unicode 예방 전략. ### Go Normalize strings with NFKC before security-sensitive comparisons ### JavaScript Normalize Unicode strings with NFKC before security-sensitive comparisons ### Python Normalize Unicode strings with NFKC before comparison or security-critical operations ## Warning Signs - [MEDIUM] missing Unicode normalization in security-sensitive string comparisons ## Consequences - 보호 메커니즘 우회 - 승인되지 않은 코드 실행 ## Mitigations - 처리 전에 유니코드 입력을 정규 형식으로 정규화하세요 - 유니코드 정규화 이후에 보안 검사를 적용하세요 - 유니코드를 인식하는 비교 함수를 사용하세요 ## Detection - Total rules: 3 - Languages: go, javascript, typescript, python ## Rules by Language ### Go (1 rules) - **Unicode Normalization Security Issues** [MEDIUM]: Security-sensitive string comparison without Unicode normalization. - Remediation: Normalize strings with NFKC before security-sensitive comparisons. ```go import "golang.org/x/text/unicode/norm" func isAdmin(username string) bool { normalized := norm.NFKC.String(strings.ToLower(username)) return normalized == "admin" } ``` Learn more: https://shoulder.dev/learn/go/cwe-176/unicode-normalization ### Javascript (1 rules) - **Unicode Normalization Security Issues** [MEDIUM]: Detects missing Unicode normalization in security-sensitive string comparisons. Unicode allows multiple representations of visually identical characters, which attackers can exploit to bypass input validation, authentication, or access control. Common attack vectors: - Homograph attacks (using lookalike characters): "аdmin" vs "admin" (Cyrillic 'а') - Case folding differences: "ß" (German sharp s) becomes "SS" when uppercased - Combining characters: "é" can be a single char or 'e' + combining a - Remediation: Normalize Unicode strings with NFKC before security-sensitive comparisons: ```javascript app.post('/login', (req, res) => { const username = req.body.username.normalize('NFKC').toLowerCase(); if (username === 'admin') { return res.send('Admin access'); } res.send('User access'); }); ``` Learn more: https://shoulder.dev/learn/javascript/cwe-176/unicode-normalization ### Typescript (1 rules) - **Unicode Normalization Security Issues** [MEDIUM]: Detects missing Unicode normalization in security-sensitive string comparisons. Unicode allows multiple representations of visually identical characters, which attackers can exploit to bypass input validation, authentication, or access control. Common attack vectors: - Homograph attacks (using lookalike characters): "аdmin" vs "admin" (Cyrillic 'а') - Case folding differences: "ß" (German sharp s) becomes "SS" when uppercased - Combining characters: "é" can be a single char or 'e' + combining a - Remediation: Normalize Unicode strings with NFKC before security-sensitive comparisons: ```javascript app.post('/login', (req, res) => { const username = req.body.username.normalize('NFKC').toLowerCase(); if (username === 'admin') { return res.send('Admin access'); } res.send('User access'); }); ``` Learn more: https://shoulder.dev/learn/javascript/cwe-176/unicode-normalization ### Python (1 rules) - **Unicode Normalization Issues** [MEDIUM]: Detects missing Unicode normalization leading to security bypasses. - Remediation: Normalize Unicode strings before comparison using unicodedata.normalize(). ```python import unicodedata def normalize_username(username): return unicodedata.normalize('NFKC', username).lower() if normalize_username(input_name) == normalize_username(stored_name): grant_access() ``` Learn more: https://shoulder.dev/learn/python/cwe-176/unicode-normalization