# Improper Handling of Unicode Encoding (CWE-176)

The product does not properly handle when an input contains Unicode encoding.

**Stack:** JavaScript

- Prevalence: 中 3 言語をカバー
- Impact: ミディアム レビュー推奨
- Prevention: 文書化済み 3 件の修正例

**OWASP:** Injection (A03:2021-Injection) - #3

## Description

Unicode characters can have multiple encodings or representations. If an application does not properly handle Unicode, attackers may be able to bypass security filters or cause unexpected behavior using alternate encodings.

## Prevention

1 件の Shoulder 検出ルールに基づく Improper Handling of Unicode の予防策。

### JavaScript

Normalize Unicode strings with NFKC before security-sensitive comparisons

## Warning Signs

- [MEDIUM] missing Unicode normalization in security-sensitive string comparisons

## Consequences

- 保護メカニズムの回避
- 未承認コードの実行

## Mitigations

- 処理前に Unicode 入力を正規形に正規化する
- Unicode 正規化の後にセキュリティチェックを適用する
- Unicode を考慮した比較関数を使用する

## Detection

- Total rules: 3
- Languages: go, javascript, typescript, python

## Rules by Language

### Javascript (1 rules)

- **Unicode Normalization Security Issues** [MEDIUM]: Detects missing Unicode normalization in security-sensitive string comparisons.
Unicode allows multiple representations of visually identical characters, which
attackers can exploit to bypass input validation, authentication, or access control.

Common attack vectors:
- Homograph attacks (using lookalike characters): "аdmin" vs "admin" (Cyrillic 'а')
- Case folding differences: "ß" (German sharp s) becomes "SS" when uppercased
- Combining characters: "é" can be a single char or 'e' + combining a
  - Remediation: Normalize Unicode strings with NFKC before security-sensitive comparisons:

```javascript
app.post('/login', (req, res) => {
  const username = req.body.username.normalize('NFKC').toLowerCase();
  if (username === 'admin') {
    return res.send('Admin access');
  }
  res.send('User access');
});
```

Learn more: https://shoulder.dev/learn/javascript/cwe-176/unicode-normalization

### Typescript (1 rules)

- **Unicode Normalization Security Issues** [MEDIUM]: Detects missing Unicode normalization in security-sensitive string comparisons.
Unicode allows multiple representations of visually identical characters, which
attackers can exploit to bypass input validation, authentication, or access control.

Common attack vectors:
- Homograph attacks (using lookalike characters): "аdmin" vs "admin" (Cyrillic 'а')
- Case folding differences: "ß" (German sharp s) becomes "SS" when uppercased
- Combining characters: "é" can be a single char or 'e' + combining a
  - Remediation: Normalize Unicode strings with NFKC before security-sensitive comparisons:

```javascript
app.post('/login', (req, res) => {
  const username = req.body.username.normalize('NFKC').toLowerCase();
  if (username === 'admin') {
    return res.send('Admin access');
  }
  res.send('User access');
});
```

Learn more: https://shoulder.dev/learn/javascript/cwe-176/unicode-normalization