# Improper Handling of Unicode Encoding (CWE-176)

The product does not properly handle when an input contains Unicode encoding.

- Prevalence: Moyenne 3 langages couverts
- Impact: Moyen Revue recommandée
- Prevention: Documentée 3 exemples de correctifs

**OWASP:** Injection (A03:2021-Injection) - #3

## Description

Unicode characters can have multiple encodings or representations. If an application does not properly handle Unicode, attackers may be able to bypass security filters or cause unexpected behavior using alternate encodings.

## Prevention

Stratégies de prévention pour Improper Handling of Unicode basées sur 3 règles de détection Shoulder.

### Go

Normalize strings with NFKC before security-sensitive comparisons

### JavaScript

Normalize Unicode strings with NFKC before security-sensitive comparisons

### Python

Normalize Unicode strings with NFKC before comparison or security-critical operations

## Warning Signs

- [MEDIUM] missing Unicode normalization in security-sensitive string comparisons

## Consequences

- Contourner le mécanisme de protection
- Exécuter du code non autorisé

## Mitigations

- Normalisez les entrées Unicode en forme canonique avant traitement
- Appliquez les contrôles de sécurité après la normalisation Unicode
- Utilisez des fonctions de comparaison qui prennent en compte Unicode

## Detection

- Total rules: 3
- Languages: go, javascript, typescript, python

## Rules by Language

### Go (1 rules)

- **Unicode Normalization Security Issues** [MEDIUM]: Security-sensitive string comparison without Unicode normalization.
  - Remediation: Normalize strings with NFKC before security-sensitive comparisons.

```go
import "golang.org/x/text/unicode/norm"

func isAdmin(username string) bool {
    normalized := norm.NFKC.String(strings.ToLower(username))
    return normalized == "admin"
}
```

Learn more: https://shoulder.dev/learn/go/cwe-176/unicode-normalization

### Javascript (1 rules)

- **Unicode Normalization Security Issues** [MEDIUM]: Detects missing Unicode normalization in security-sensitive string comparisons.
Unicode allows multiple representations of visually identical characters, which
attackers can exploit to bypass input validation, authentication, or access control.

Common attack vectors:
- Homograph attacks (using lookalike characters): "аdmin" vs "admin" (Cyrillic 'а')
- Case folding differences: "ß" (German sharp s) becomes "SS" when uppercased
- Combining characters: "é" can be a single char or 'e' + combining a
  - Remediation: Normalize Unicode strings with NFKC before security-sensitive comparisons:

```javascript
app.post('/login', (req, res) => {
  const username = req.body.username.normalize('NFKC').toLowerCase();
  if (username === 'admin') {
    return res.send('Admin access');
  }
  res.send('User access');
});
```

Learn more: https://shoulder.dev/learn/javascript/cwe-176/unicode-normalization

### Typescript (1 rules)

- **Unicode Normalization Security Issues** [MEDIUM]: Detects missing Unicode normalization in security-sensitive string comparisons.
Unicode allows multiple representations of visually identical characters, which
attackers can exploit to bypass input validation, authentication, or access control.

Common attack vectors:
- Homograph attacks (using lookalike characters): "аdmin" vs "admin" (Cyrillic 'а')
- Case folding differences: "ß" (German sharp s) becomes "SS" when uppercased
- Combining characters: "é" can be a single char or 'e' + combining a
  - Remediation: Normalize Unicode strings with NFKC before security-sensitive comparisons:

```javascript
app.post('/login', (req, res) => {
  const username = req.body.username.normalize('NFKC').toLowerCase();
  if (username === 'admin') {
    return res.send('Admin access');
  }
  res.send('User access');
});
```

Learn more: https://shoulder.dev/learn/javascript/cwe-176/unicode-normalization

### Python (1 rules)

- **Unicode Normalization Issues** [MEDIUM]: Detects missing Unicode normalization leading to security bypasses.
  - Remediation: Normalize Unicode strings before comparison using unicodedata.normalize().

```python
import unicodedata

def normalize_username(username):
    return unicodedata.normalize('NFKC', username).lower()

if normalize_username(input_name) == normalize_username(stored_name):
    grant_access()
```

Learn more: https://shoulder.dev/learn/python/cwe-176/unicode-normalization