What's in a hostname? — screenshot of netmeister.org

What's in a hostname?

This article details the surprising complexity of a seemingly simple hostname regex. I show how common assumptions about valid hostname characters, case-insensitivity, and hyphen usage are often wrong, especially with IDNs and punycode.

Visit netmeister.org →

Questions & Answers

What is the main topic of "What's in a hostname?"
The article "What's in a hostname?" debunks common misconceptions about valid characters and rules for hostnames, particularly concerning their use in regular expressions and DNS. It highlights the complexities introduced by case-insensitivity, hyphens, and Internationalized Domain Names.
Who would find the "What's in a hostname?" article most useful?
This article is most useful for network engineers, developers, system administrators, and anyone who needs to validate or process hostnames correctly, especially when writing parsers, regexes, or security-related code.
How does this article challenge common beliefs about hostname validation?
The article challenges the common belief that hostnames only consist of a-z, 0-9, and hyphens, and cannot start or end with a hyphen. It demonstrates that DNS allows for case-insensitivity and multiple successive hyphens, and explains the complexities introduced by IDNs and punycode, which often violate simple regexes.
When is understanding the true complexity of hostnames particularly important?
Understanding the true complexity of hostnames is critical when normalizing hostnames in client certificates, SNI, or Host headers, especially in authorization contexts. It is also essential for anyone implementing hostname validation logic in applications or network services to prevent misconfigurations or security vulnerabilities.
What specific issue does the article raise regarding Internationalized Domain Names (IDNs)?
The article explains that IDNs, when converted to punycode, use the "xn--" prefix. This introduces specific rules, such as not allowing "--" in the third and fourth character positions and not starting or ending with a hyphen, which can complicate validation and lead to inconsistent client behavior.