# Improper Handling of Unicode Encoding (CWE-176)

The product does not properly handle when an input contains Unicode encoding.

**Stack:** Python

- Prevalence: 보통 3개 언어 지원
- Impact: 보통 검토 권장
- Prevention: 문서화됨 3개의 수정 예시

**OWASP:** Injection (A03:2021-Injection) - #3

## Description

Unicode characters can have multiple encodings or representations. If an application does not properly handle Unicode, attackers may be able to bypass security filters or cause unexpected behavior using alternate encodings.

## Prevention

1개의 Shoulder 탐지 규칙을 기반으로 한 Improper Handling of Unicode 예방 전략.

### Python

Normalize Unicode strings with NFKC before comparison or security-critical operations

## Warning Signs

- [MEDIUM] missing Unicode normalization leading to security bypasses

## Consequences

- 보호 메커니즘 우회
- 승인되지 않은 코드 실행

## Mitigations

- 처리 전에 유니코드 입력을 정규 형식으로 정규화하세요
- 유니코드 정규화 이후에 보안 검사를 적용하세요
- 유니코드를 인식하는 비교 함수를 사용하세요

## Detection

- Total rules: 3
- Languages: go, javascript, typescript, python

## Rules by Language

### Python (1 rules)

- **Unicode Normalization Issues** [MEDIUM]: Detects missing Unicode normalization leading to security bypasses.
  - Remediation: Normalize Unicode strings before comparison using unicodedata.normalize().

```python
import unicodedata

def normalize_username(username):
    return unicodedata.normalize('NFKC', username).lower()

if normalize_username(input_name) == normalize_username(stored_name):
    grant_access()
```

Learn more: https://shoulder.dev/learn/python/cwe-176/unicode-normalization