# Improper Handling of Unicode Encoding (CWE-176)

The product does not properly handle when an input contains Unicode encoding.

**Stack:** Python

- Prevalence: 中 覆盖 3 种语言
- Impact: 中 建议审查
- Prevention: 已记录 3 个修复示例

**OWASP:** Injection (A03:2021-Injection) - #3

## Description

Unicode characters can have multiple encodings or representations. If an application does not properly handle Unicode, attackers may be able to bypass security filters or cause unexpected behavior using alternate encodings.

## Prevention

基于 1 条 Shoulder 检测规则的 Improper Handling of Unicode 预防策略。

### Python

Normalize Unicode strings with NFKC before comparison or security-critical operations

## Warning Signs

- [MEDIUM] missing Unicode normalization leading to security bypasses

## Consequences

- 绕过保护机制
- 执行未授权代码

## Mitigations

- 在处理前将 Unicode 输入归一化为规范形式
- 在 Unicode 归一化之后再进行安全检查
- 使用支持 Unicode 的比较函数

## Detection

- Total rules: 3
- Languages: go, javascript, typescript, python

## Rules by Language

### Python (1 rules)

- **Unicode Normalization Issues** [MEDIUM]: Detects missing Unicode normalization leading to security bypasses.
  - Remediation: Normalize Unicode strings before comparison using unicodedata.normalize().

```python
import unicodedata

def normalize_username(username):
    return unicodedata.normalize('NFKC', username).lower()

if normalize_username(input_name) == normalize_username(stored_name):
    grant_access()
```

Learn more: https://shoulder.dev/learn/python/cwe-176/unicode-normalization