Recently, I wrote a blog post about Ada URL parser version 2.0.0. In that post, I mentioned that the parser is fully compatible with the URL parser specification, passes all Web Platform Tests and currently used in Node.js 20.0.0.
A couple of days ago, I noticed a big difference in how URLs were handled by Ada, Safari, Chrome, and Firefox. It got me thinking that it would be a good idea to explain why these differences occur and how different browsers handle URL parsing in their own unique ways, which leads to inconsistencies and compatibility issues. In this blog post, I'll break down these variations and shed light on why URLs may not always work the same way across different browsers.
Disclosure: In the context of this blog post, I will be focusing on WHATWG URL specification, and not RFC 3986 or RFC 3987, even though historically it was written a lot earlier than the WHATWG.
Is WHATWG URL specification the only URL specification?
No. One notable alternative is the URL specification maintained by the World Wide Web Consortium.
RFC 3986, also known as the Uniform Resource Identifier (URI): Generic Syntax provides its own set of rules and guidelines for working with URLs.
Is there any differences between RFC 3986 and WHATWG?
It's worth noting that the WHATWG and the W3C URL specification have some differences and have evolved separately. The WHATWG specification has been widely implemented by all web browsers, while the W3C specification is used as a reference by various web-related standards and technologies, such as cURL.
Both the WHATWG and W3C specifications are important references for web developers and are widely followed in the industry. The choice of which specification to follow may depend on factors such as browser support, specific requirements of a project, or the recommendations of relevant standards organizations.
What is the meaning of WHATWG?
The term WHATWG stands for Web Hypertext Application Technology Working Group.. It is a community-driven organization that focuses on developing and maintaining web standards. The WHATWG was initially formed in response to the divergence between the World Wide Web Consortium (W3C) and the browser vendors at the time, who felt that the W3C process was too slow to address the evolving needs of web developers.
What is the URL specification?
The WHATWG URL specification is a set of rules and guidelines for working with URLs (Uniform Resource Locators) in web applications. URL is a standardized way to identify and locate resources on the internet, such as web pages, images, or files. The specification provides a detailed definition of the URL syntax, parsing algorithms, and methods for manipulating URLs.
The URL specification is one of the many standards maintained by the WHATWG. The group consists of a community of web developers, browser vendors, and other interested parties who collaborate to define and improve web technologies.
What is the difference between the WHATWG and the W3C?
While the W3C is another important organization involved in web standards, the WHATWG operates independently and maintains its own set of specifications, including the URL specification.
URL standard is a living document and it gets updated quite often. Whenever a change is introduced several WHATWG members inform URL implementors about the change. Depending on the reporting method, members of WHATWG open an issue on the relevant application's bug tracker or send an email to the mailing list.
Due to the priorities, some implementors may not be able to update their codebase to the latest version of the URL standard.
URLs might have protocols that does not conform to the URL standard, and in the context of specification, they are called non-special hosts. For example, HTTP, HTTPS, FTP, SSH and FILE as special protocols.
We recently developed a playground for Ada URL parser, if you want to dig deeper into the result of the URL parser.
In December 28, 2016, WHATWG added the notion of opaque hosts and added support for opaque host URLs to have hostnames. This change was introduced in Add opaque hosts pull-request.
Unfortunately, for this particular example, browsers return different results.
Safari returns the same result as Ada URL parser, and is kept in sync with the specification. I've tested this on Safari 16.4.
Returns a different hostname and pathname, and does not support opaque-hosts. Chromium bug report is stil open after this change in specification was introduced back in 2016.
Returns the same result as Firefox, hinting towards that they're not keeping up with the specification.
I'm afraid there are more differences than opaque-hosts and with the advancements in web3 applications, we will see more of these differences.
I recommend reading the following blog posts by Daniel Stenberg, the author of cURL, mentioning the differences in RFC 3986 and WHATWG URL specification.