URL Encoding Explained: Percent-Encoding and When You Need It
You've seen URLs like https://example.com/search?q=hello%20world. That %20 is URL encoding (also called percent-encoding) for a space. But why do URLs need this? And when should you encode or decode URLs in your own work?
The Problem: URLs Have Reserved Characters
A URL is composed of components — scheme, host, path, query, fragment — each separated by specific characters:
https://example.com/path/to/page?key=value&other=data#section
^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^
scheme host path query fragment
Characters like /, ?, &, =, and # have structural meaning. If any of these appear in your data (not as separators), the URL parser gets confused.
For example: a search query for C++ vs C# contains + and #, both of which have special URL meanings (+ means space in query strings; # starts a fragment).
URL encoding solves this by replacing unsafe characters with % followed by the character's hexadecimal ASCII code.
Percent-Encoding Reference
| Character | Encoded | Why it needs encoding |
|---|---|---|
| Space | %20 (or + in query strings) |
Invalid in URLs |
+ |
%2B |
Means "space" in query strings |
# |
%23 |
Starts fragment |
% |
%25 |
Starts percent-encoding |
& |
%26 |
Separates query params |
= |
%3D |
Separates key/value in query |
? |
%3F |
Starts query string |
/ |
%2F |
Separates path segments |
: |
%3A |
Separates scheme from host |
@ |
%40 |
Username/password separator |
[, ] |
%5B, %5D |
IPv6 delimiters |
Safe Characters (Never Encoded)
RFC 3986 defines these as "unreserved" — they can appear in any URL component without encoding:
A-Z a-z 0-9 - _ . ~
Everything else must be encoded when appearing in data values (not as structural separators).
Spaces: %20 vs. +
There are two common encodings for spaces:
%20— strict percent-encoding. Used in paths and anywhere outside query strings. Part of RFC 3986.+— used in query strings only, based on the older HTML form encoding spec (application/x-www-form-urlencoded). A+in a path is a literal plus sign.
This distinction catches many developers off guard:
/search?q=hello+world → "hello world" (query string context)
/files/hello+world.pdf → file literally named "hello+world.pdf"
When in doubt, use %20. It's correct in all URL contexts.
Encoding in Different Languages
JavaScript (browser)
// For encoding a complete URL
encodeURI("https://example.com/path?q=hello world")
// → "https://example.com/path?q=hello%20world"
// For encoding a value within a URL
encodeURIComponent("C++ vs C#")
// → "C%2B%2B%20vs%20C%23"
// Decoding
decodeURIComponent("C%2B%2B%20vs%20C%23")
// → "C++ vs C#"
Key distinction:
- encodeURI() — does NOT encode structural characters (:, /, ?, =, &, #)
- encodeURIComponent() — encodes everything except unreserved characters
For encoding individual query parameter values, always use encodeURIComponent().
Python
from urllib.parse import quote, unquote, urlencode
# Encode a path segment
quote("hello world") # "hello%20world"
quote("C++", safe="") # "C%2B%2B"
# Encode query string
urlencode({"q": "C++ vs C#", "page": 1})
# "q=C%2B%2B+vs+C%23&page=1" (uses + for spaces in query)
# Decode
unquote("hello%20world") # "hello world"
Command line (curl)
# curl automatically encodes URLs, but for explicit control:
curl -G "https://api.example.com/search" --data-urlencode "q=hello world" --data-urlencode "tags=C++ programming"
Internationalized URLs: Punycode and Percent-Encoding
URLs are technically ASCII-only. Non-ASCII characters (like Korean, Chinese, Arabic, emoji) are handled two ways:
Domain names: Converted to Punycode — a special ASCII encoding.
- münchen.de → xn--mnchen-3ya.de
Path and query values: Converted to UTF-8 bytes, then percent-encoded.
- /ko/검색?q=안녕 → /ko/%EA%B2%80%EC%83%89?q=%EC%95%88%EB%85%95
Modern browsers display the decoded form in the address bar, but the actual HTTP request uses the encoded form.
Common Real-World Scenarios
Sharing URLs in emails or messages
Paste a URL into an email, and the email client might break it at special characters. If your URL contains & or other reserved characters in the path (not as separators), encode them first.
OAuth and API authentication
Most OAuth implementations require query parameters to be percent-encoded in a specific, strict way for signature generation. A single wrong encoding will cause the signature check to fail.
Building URLs programmatically
Never concatenate user input directly into a URL string:
// ❌ Dangerous — breaks if userName contains & or =
const url = `https://api.example.com/user/${userName}`;
// ✅ Safe
const url = `https://api.example.com/user/${encodeURIComponent(userName)}`;
CSV exports with file paths
File names with spaces or symbols in download URLs need encoding:
/files/My Report (Final).pdf
→ /files/My%20Report%20(Final).pdf
Encode or decode URLs instantly: URL Encoder/Decoder →
Experimentar a ferramenta
Abrir ferramenta