How a Web Browser Works: Inside Modern Browsers
Have you ever wondered what happens behind the scenes of your browser when you type a URL and press Enter? From the outside, it looks simple, but behind it lies a very complex engineering process involving steps, layers, and protocols. In this article, we’ll explore how browsers work and how their inner architecture is designed.
Components of a Browser
A browser can be understood and divided into several important parts:
User Interface (UI): The layer of interaction with the user, meaning everything you see and interact with directly, such as the address bar, navigation buttons, and tabs.
Browser Engine: The central engine of the browser. It bridges the user interface with other browser components. It also handles data persistence (cookies, cache, localStorage) and communicates with the Rendering Engine.
Rendering Engine: Responsible for rendering content on screen. This component converts HTML, CSS, and JavaScript into visible pixels, working alongside the JS interpreter and networking layer.
Networking: Manages HTTP/HTTPS requests, DNS resolution, cookies, cache, data compression, and all the security measures of the connection.
JavaScript Interpreter: Parses, compiles, and executes JS code: parsing, creating the Abstract Syntax Tree (AST), converting to bytecode, applying JIT optimizations, and running it on the JS VM. This enables DOM manipulation, event handling, and page interactivity.
Data Persistence: Allows the browser to store and retrieve data locally, ensuring user data, settings, and preferences are maintained across sessions.
It’s worth noting that the browser is primarily single-threaded. This means only one main thread controls the flow, which is crucial when discussing performance. To overcome this limitation, browsers use mechanisms such as the event loop, very similar to what we know from Node.js.
What Happens When We Type a URL?
When a URL is entered and Enter is pressed, the following steps occur:
Domain Resolution (DNS):
The first step is converting the domain name (e.g., google.com) into an IP address. The browser checks various caches (browser, OS, router, ISP).
If not found, it queries the DNS server, which returns the correct IP.
Server Connection (TCP Handshake):
After obtaining the IP, the browser establishes a connection using the three-way TCP Handshake:
SYN → SYN-ACK → ACK.
Think of this as the digital equivalent of starting a phone conversation.
Security with TLS Handshake:
Before exchanging data, a secure encryption key must be established. This involves multiple message exchanges between client and server to ensure traffic cannot be intercepted.
Edge Computing & CDNs:
To speed this up, companies like Google, Netflix, and Amazon use CDNs and Edge Locations worldwide, reducing latency between users and servers.
First Response – TTFB:
Once everything is set, the browser receives the first byte (Time to First Byte). Even before the full page is loaded, rendering begins.
Parsing and the Critical Render Path
When the browser receives the initial HTML, it begins parsing. The process includes:
DOM Construction: HTML is transformed into a tree structure called the Document Object Model.
CSSOM Construction: CSS is also parsed into a tree, the CSS Object Model. Unlike DOM, it must be fully built before continuing.
Render Tree: DOM + CSSOM combine into the Render Tree, which includes only visible elements with calculated styles.
Layout: Determines each element’s position and size on screen.
Painting: Finally, pixels are drawn on screen.
This flow is known as the Critical Render Path (CRP). Scripts may block DOM construction, and styles may delay CSSOM construction. That’s why techniques like async and defer for scripts improve loading speed.
Preload Scanner
As the DOM is being built, the browser runs the Preload Scanner, which looks for external resources (images, scripts, styles) referenced in HTML. These are downloaded in advance, reducing overall render time.
JavaScript and AST
Unlike compiled languages, JavaScript is interpreted, but modern browsers optimize execution:
Parsing: Code is parsed into an Abstract Syntax Tree (AST).
Bytecode & JIT: The AST is converted into bytecode, optimized by the Just-in-Time (JIT) compiler, and then executed by the JS Virtual Machine.
Execution: Deferred and blocking scripts are executed only after HTML and CSS are ready, ensuring faster initial rendering.
DOM in Detail
An HTML document can be represented in memory as a JavaScript tree (nodes and children). This allows frameworks, libraries, and developers to manipulate the page in real time.
Example:
const domTree = {
nodeType: 'document',
children: [
{ nodeType: 'doctype', name: 'html' },
{
tagName: 'html',
attributes: { lang: 'en' },
children: [
{
tagName: 'head',
children: [
{ tagName: 'meta', attributes: { charset: 'utf-8' }, children: [] },
{ tagName: 'meta', attributes: { name: 'viewport', content: 'width=device-width, initial-scale=1' }, children: [] },
{ tagName: 'title', children: [{ type: 'text', content: 'Page Title' }] },
{ tagName: 'link', attributes: { rel: 'stylesheet', href: '/styles.css' }, children: [] },
{ tagName: 'script', attributes: { src: '/scripts/head.js', defer: true }, children: [] }
]
},
{
tagName: 'body',
children: [
{
tagName: 'header',
children: [
{ tagName: 'h1', children: [{ type: 'text', content: 'Main Header' }] },
{
tagName: 'nav',
children: [
{
tagName: 'ul',
children: [
{ tagName: 'li', children: [{ tagName: 'a', attributes: { href: '/' }, children: [{ type: 'text', content: 'Home' }] }] },
{ tagName: 'li', children: [{ tagName: 'a', attributes: { href: '/about' }, children: [{ type: 'text', content: 'About' }] }] }
]
}
]
}
]
},
{
tagName: 'main',
children: [
{
tagName: 'article',
attributes: { id: 'post-1' },
children: [
{ tagName: 'h2', children: [{ type: 'text', content: 'Article Title' }] },
{ tagName: 'p', children: [{ type: 'text', content: 'First paragraph of the article.' }] },
{ tagName: 'img', attributes: { src: '/img/photo.jpg', alt: 'Photo' }, children: [] }
]
},
{
tagName: 'section',
attributes: { id: 'features' },
children: [
{ tagName: 'h3', children: [{ type: 'text', content: 'Features Section' }] },
{
tagName: 'ul',
children: [
{ tagName: 'li', children: [{ type: 'text', content: 'Feature A' }] },
{ tagName: 'li', children: [{ type: 'text', content: 'Feature B' }] }
]
}
]
}
]
},
{
tagName: 'aside',
children: [
{ tagName: 'h4', children: [{ type: 'text', content: 'Sidebar' }] },
{ tagName: 'p', children: [{ type: 'text', content: 'Supporting content / widgets.' }] }
]
},
{
tagName: 'footer',
children: [
{ tagName: 'p', children: [{ type: 'text', content: '© 2025 My Company' }] },
{ tagName: 'ul', children: [{ tagName: 'li', children: [{ tagName: 'a', attributes: { href: '/privacy' }, children: [{ type: 'text', content: 'Privacy Policy' }] }] }] }
]
},
{ tagName: 'script', attributes: { src: '/scripts/bundle.js' }, children: [] }
]
}
]
}
]
};
Also, the DOM is not static. The browser may build it incrementally while downloading new resources. This makes the DOM a live and dynamic structure, fully manipulable via JavaScript APIs.
Conclusion
To render a single webpage, the browser goes through a huge process that involves:
- Resolving domains
- Performing multiple handshakes
- Handling security protocols
- Building DOM and CSSOM trees
- Generating the Render Tree, layout, and painting
- Executing optimized JavaScript
And all of this happens within a fraction of a second. This article was inspired by Augusto Galego’s amazing video How Browsers Work?, where he explains these topics in a clear and simple way. Of course, many other layers are involved in browser functionality, and building one from scratch would be an incredibly complex task.
If you’ve read this far, thank you very much! Any feedback will be greatly appreciated.