How Email Works: SMTP, IMAP, and the Mail System
Every day, over 300 billion emails traverse the internet. Behind the simplicity of clicking "send" lies a stack of protocols, DNS lookups, authentication checks, and relay hops that have evolved over four decades. Understanding how email works — from the SMTP handshake to DKIM signature verification — reveals one of the most complex and heavily abused systems on the internet, and explains why so much engineering effort goes into making sure a legitimate message actually reaches the inbox.
The SMTP Protocol: How Email Gets Sent
The Simple Mail Transfer Protocol (SMTP), defined in RFC 5321, is the protocol used to send email between servers. SMTP operates over TCP, traditionally on port 25 (server-to-server relay), port 587 (authenticated submission from mail clients), or port 465 (implicit TLS submission). Despite its name, the protocol is anything but simple in modern practice.
An SMTP conversation is a text-based dialogue between a sending server (the client) and a receiving server. Here is what a typical session looks like:
S: 220 mx.example.com ESMTP ready
C: EHLO mail.sender.com
S: 250-mx.example.com Hello mail.sender.com
S: 250-SIZE 52428800
S: 250-STARTTLS
S: 250-AUTH PLAIN LOGIN
S: 250 8BITMIME
C: STARTTLS
S: 220 Ready to start TLS
[TLS handshake occurs]
C: EHLO mail.sender.com
C: MAIL FROM:<[email protected]>
S: 250 OK
C: RCPT TO:<[email protected]>
S: 250 OK
C: DATA
S: 354 Start mail input
C: From: Alice <[email protected]>
C: To: Bob <[email protected]>
C: Subject: Meeting tomorrow
C: Date: Thu, 24 Apr 2026 10:30:00 -0700
C: MIME-Version: 1.0
C: Content-Type: text/plain; charset=UTF-8
C:
C: Hi Bob, are we still on for tomorrow?
C: .
S: 250 OK: queued as ABC123
C: QUIT
S: 221 Bye
Each command serves a specific purpose. EHLO (Extended HELO) identifies the sending server and requests the list of supported extensions. MAIL FROM specifies the envelope sender — the address that bounces go to, which may differ from the From: header the recipient sees. RCPT TO specifies the envelope recipient. DATA begins the message body, terminated by a lone dot on a line by itself.
The distinction between envelope addresses (used by SMTP for routing) and header addresses (displayed to the user) is critical to understanding email spoofing. SMTP itself places no restriction on what appears in the From: header — this is why authentication protocols like SPF, DKIM, and DMARC were invented.
MX Records and DNS Lookup
Before an email can be delivered, the sending server must discover where to deliver it. This is where DNS comes in. The sending server extracts the domain part of the recipient address (the part after the @), which belongs to some top-level domain, and queries that domain's MX (Mail Exchanger) records to find the mail servers responsible for it.
For example, querying the MX records for gmail.com returns something like:
gmail.com. MX 5 gmail-smtp-in.l.google.com.
gmail.com. MX 10 alt1.gmail-smtp-in.l.google.com.
gmail.com. MX 20 alt2.gmail-smtp-in.l.google.com.
gmail.com. MX 30 alt3.gmail-smtp-in.l.google.com.
gmail.com. MX 40 alt4.gmail-smtp-in.l.google.com.
The number before each hostname is the priority (lower is more preferred). The sending server tries the lowest-priority MX first; if it is unreachable, it falls back to the next one. This provides redundancy. If no MX record exists, the sender falls back to the domain's A or AAAA record as a last resort, per RFC 5321.
Each of those MX hostnames resolves to IP addresses via A/AAAA records, and those IP addresses are reachable via BGP routes. You can look up any mail server's IP to see which autonomous system hosts it — for instance, Google (AS15169) operates Gmail's mail exchangers. The security of this DNS lookup is where DNSSEC becomes important: without it, an attacker could poison DNS responses and redirect mail to a rogue server.
Mail Routing and Relaying
Email rarely travels directly from sender to recipient. It passes through a chain of Mail Transfer Agents (MTAs). The sending user's mail client (MUA — Mail User Agent) submits the message to their organization's outbound MTA via authenticated SMTP on port 587. That MTA performs DNS lookups, applies outbound policies (rate limits, DKIM signing, content scanning), and relays the message to the recipient's inbound MTA over port 25.
In many environments, additional relay hops exist. A large organization might route outbound mail through a dedicated gateway (like Proofpoint or Mimecast) for compliance scanning before it reaches the internet. On the receiving side, the MX record might point to a cloud security gateway that inspects the message before forwarding it to the actual mail server. Each relay adds a Received: header to the message, creating an auditable chain that records every server the message touched.
The concept of an open relay — a mail server that forwards messages for anyone, regardless of authentication — was once common and now represents a serious misconfiguration. Open relays are aggressively blocklisted because spammers exploit them to send mail that appears to originate from a trusted network.
IMAP vs POP3: Retrieving Email
SMTP handles sending and relaying. For retrieving email from a mailbox, two protocols exist:
IMAP (Internet Message Access Protocol), defined in RFC 9051, is the modern standard. IMAP keeps messages on the server and synchronizes state across multiple devices. When you read an email on your phone, IMAP marks it as read on the server so your laptop shows the same state. IMAP supports folders, server-side search, partial message fetching (downloading headers without the body), and flags. It operates on port 993 (implicit TLS) or 143 (STARTTLS).
POP3 (Post Office Protocol version 3), defined in RFC 1939, is the older and simpler protocol. POP3 downloads messages to the client and, by default, deletes them from the server. This made sense in the era of dial-up connections and single-device access. POP3 operates on port 995 (implicit TLS) or 110 (STARTTLS). While still supported by most servers, POP3 is increasingly rare in practice.
Modern email services like Gmail, Outlook, and Apple Mail primarily use IMAP (or proprietary sync protocols like Exchange ActiveSync and Microsoft Graph) to provide the seamless multi-device experience users expect. The JMAP (JSON Meta Application Protocol), defined in RFC 8620, is a newer alternative designed to replace IMAP with a more efficient, JSON-based API.
Message Format: RFC 5322 and MIME
The format of an email message itself is defined by RFC 5322 (Internet Message Format). A message consists of header fields followed by a blank line and then the body. Required headers include From:, Date:, and at least one destination (To:, Cc:, or Bcc:). Other common headers include Subject:, Message-ID:, Reply-To:, and In-Reply-To: (for threading).
The original RFC 822 (superseded by 5322) only supported 7-bit ASCII text. MIME (Multipurpose Internet Mail Extensions), defined across RFCs 2045-2049, extended email to support:
- Non-ASCII character sets — UTF-8 and other encodings, enabling email in every language
- Attachments — Binary files encoded as Base64 or quoted-printable inside the message body
- Multipart messages — A single email containing both plain text and HTML versions (
multipart/alternative), or a message body plus attachments (multipart/mixed) - Inline images — Referenced via
Content-IDheaders withinmultipart/relatedstructures
A modern HTML email with an attachment might have a MIME structure like: multipart/mixed containing a multipart/alternative (with text/plain and text/html parts) plus an application/pdf attachment. This nesting is why email parsers are notoriously complex.
Email Authentication: SPF, DKIM, and DMARC
SMTP's original design had no concept of sender authentication. Anyone could (and still can, at the protocol level) send a message claiming to be from any address. Three complementary standards now form the email authentication stack:
SPF (Sender Policy Framework)
SPF, defined in RFC 7208, lets a domain publish a DNS TXT record listing the IP addresses and servers authorized to send mail for that domain. When a receiving server gets a message with an envelope sender in @example.com, it queries the SPF record for example.com and checks whether the sending server's IP is listed.
A typical SPF record looks like:
example.com. TXT "v=spf1 ip4:198.51.100.0/24 include:_spf.google.com -all"
This says: accept mail from the 198.51.100.0/24 range, accept mail from servers authorized by Google's SPF record (for Google Workspace), and reject everything else (-all). SPF has a limitation: it only validates the envelope sender (MAIL FROM), not the From: header the user sees. A phisher can pass SPF by using their own domain as the envelope sender while spoofing a different From: header.
DKIM (DomainKeys Identified Mail)
DKIM, defined in RFC 6376, adds a cryptographic signature to outgoing messages. The sending server signs specific headers and the message body using a private key, and publishes the corresponding public key in a DNS TXT record. The receiving server retrieves the public key and verifies the signature.
A DKIM signature header looks like:
DKIM-Signature: v=1; a=rsa-sha256; d=example.com; s=selector1;
h=from:to:subject:date:message-id;
bh=abcdef123456...=;
b=GHIJKL789012...=
The d= field identifies the signing domain, s= is the selector used to look up the public key (at selector1._domainkey.example.com), h= lists the signed headers, bh= is the body hash, and b= is the signature itself. DKIM proves that the message was not modified in transit and that it was sent by a server with access to the domain's private key.
DMARC (Domain-based Message Authentication, Reporting, and Conformance)
DMARC, defined in RFC 7489, ties SPF and DKIM together and adds a policy layer. It solves the alignment problem: DMARC requires that the domain in the From: header (which the user sees) matches either the SPF-validated envelope domain or the DKIM signing domain. This closes the gap that allowed phishers to pass SPF or DKIM while spoofing the From: header.
A DMARC record is published at _dmarc.example.com:
_dmarc.example.com. TXT "v=DMARC1; p=reject; rua=mailto:[email protected]; pct=100"
The p= field specifies the policy: none (monitor only), quarantine (send to spam), or reject (block entirely). The rua= field specifies where aggregate reports should be sent — these reports tell domain owners who is sending mail on their behalf, both legitimately and fraudulently.
ARC (Authenticated Received Chain)
ARC, defined in RFC 8617, solves a problem that mailing lists and forwarding services create for DMARC. When a mailing list receives a message and re-sends it to subscribers, SPF fails (the list server's IP is not authorized for the original sender's domain) and DKIM may break (the list might modify headers or add a footer). DMARC then sees a failure and might reject the message.
ARC preserves the authentication results from each hop. Each intermediary adds three ARC headers: ARC-Authentication-Results (what checks passed at that hop), ARC-Message-Signature (a DKIM-like signature of the message), and ARC-Seal (a signature chain linking all ARC sets). The final receiver can inspect the ARC chain to decide whether to trust the message despite DMARC failure, based on whether the intermediaries in the chain are trusted.
STARTTLS and Encryption in Transit
SMTP was designed as a plaintext protocol. STARTTLS (RFC 3207) is an extension that upgrades an existing plaintext SMTP connection to TLS encryption. After the initial EHLO exchange, the client sends the STARTTLS command, the server responds with 220, and both sides perform a TLS handshake. Subsequent SMTP commands travel over the encrypted channel.
STARTTLS has a critical vulnerability: it is opportunistic. A man-in-the-middle attacker can strip the STARTTLS capability from the server's EHLO response, forcing the connection to remain in plaintext. The sending server has no way to know that encryption should have been available. This is a STRIPTLS attack.
MTA-STS (Mail Transfer Agent Strict Transport Security)
MTA-STS (RFC 8461) closes this gap. A domain publishes a policy (via a well-known HTTPS URL and a DNS TXT record) declaring that all mail to that domain must use TLS with a valid certificate. Sending servers that support MTA-STS cache this policy and refuse to deliver mail over plaintext connections, even if STARTTLS is stripped.
The related DANE (DNS-Based Authentication of Named Entities) protocol uses DNSSEC-signed TLSA records to publish the expected TLS certificate for a mail server, providing even stronger protection against certificate manipulation.
TLS Reporting (TLSRPT)
RFC 8460 defines TLS Reporting, which lets domain owners receive reports about TLS negotiation failures when other servers try to deliver mail to them. This helps identify misconfigured certificates, downgrade attacks, and connectivity problems.
BIMI: Brand Indicators for Message Identification
BIMI is a specification that lets organizations display their brand logo next to authenticated emails in supporting mail clients. BIMI builds on DMARC — a domain must have a DMARC policy of quarantine or reject to qualify. The domain publishes a BIMI DNS record pointing to its logo (an SVG file in a specific format) and, for stronger verification, a Verified Mark Certificate (VMC) issued by a certificate authority that validates trademark ownership.
Gmail, Apple Mail, and Yahoo Mail support BIMI. When you see a brand logo next to a sender's name instead of a generic avatar, that is BIMI in action — it provides visual confirmation that the email was authenticated and the sender is who they claim to be.
Modern Email Infrastructure
The mail server landscape spans open-source software, commercial appliances, and cloud platforms:
Postfix is the most widely deployed open-source MTA. Written by Wietse Venema as a secure alternative to Sendmail, Postfix handles mail routing, relay, and delivery with a modular architecture where each function runs as a separate process with minimal privileges. It powers mail systems from small organizations to major ISPs.
Microsoft Exchange is the dominant enterprise mail platform, offering integrated calendaring, contacts, and mailbox management. Exchange Server runs on-premises, while Exchange Online is the backbone of Microsoft 365's email service. Exchange uses MAPI (Messaging Application Programming Interface) and Exchange ActiveSync for client connectivity, alongside standard IMAP and SMTP.
Google Workspace (formerly G Suite) runs Gmail's infrastructure for organizations. Google's mail servers handle delivery for millions of domains — when you look up the MX records for a Google Workspace customer, they all point to *.google.com mail exchangers running within Google's network (AS15169).
Other notable MTAs include Exim (default on Debian systems, highly configurable), OpenSMTPD (from the OpenBSD project, focused on simplicity and security), and Haraka (a Node.js-based MTA designed for high-volume sending). On the delivery/mailbox side, Dovecot is the dominant open-source IMAP/POP3 server, often paired with Postfix.
Spam Filtering Techniques
Spam accounts for roughly 45% of all email traffic. Modern spam filtering uses multiple layers:
- IP reputation — Each sending IP has a reputation score based on past behavior. IP addresses on blocklists (like Spamhaus ZEN, Barracuda, or SORBS) are rejected outright. Reputation is tied to the IP prefix and the autonomous system it originates from — some entire ASNs develop poor reputations.
- DNS-based blocklists (DNSBLs) — The receiving server queries blocklist DNS zones with the sending IP. A positive response means the IP is listed. This check happens at connection time, before the message body is even transmitted, saving bandwidth.
- Content analysis — Systems like SpamAssassin and rspamd apply hundreds of rules to the message headers and body, scoring characteristics like suspicious URLs, known spam phrases, header anomalies, and HTML formatting tricks. Machine learning models trained on billions of messages augment rule-based systems.
- Bayesian filtering — Statistical classifiers trained on each user's spam and legitimate mail. These personalized filters learn what each user considers spam based on their training data.
- Greylisting — Temporarily rejecting messages from unknown senders with a 4xx response code. Legitimate MTAs retry delivery after a delay; many spam-sending botnets do not. This technique is effective but adds latency to first-time message delivery.
- Rate limiting — Throttling connections and messages from IPs that send high volumes. Sudden spikes from a previously quiet IP are a strong spam signal.
- Authentication checks — SPF, DKIM, and DMARC results feed directly into spam scoring. Messages that fail authentication are far more likely to be spam or phishing.
Email Deliverability
For legitimate senders, getting mail to the inbox (not the spam folder) is a constant challenge. Deliverability depends on:
- Authentication — Publishing SPF, DKIM, and DMARC records is now the baseline. Without all three, major providers like Gmail and Yahoo will treat your mail with suspicion. As of 2024, both require DMARC for bulk senders.
- IP and domain reputation — New IP addresses start with neutral reputation and must be "warmed up" by gradually increasing sending volume. Shared IPs (used by email service providers like SendGrid or Mailgun) pool reputation across all senders — one bad actor can affect everyone.
- List hygiene — Sending to invalid addresses generates bounces, which damage reputation. High bounce rates signal to receivers that the sender is not maintaining their list. Hitting spam traps — addresses that were never opted in, or abandoned addresses repurposed as traps — is particularly damaging.
- Engagement signals — Gmail and other providers track whether recipients open, click, reply to, or mark messages as spam. Low engagement leads to inbox placement problems. This is why senders monitor open rates and actively remove unengaged subscribers.
- PTR records — The sending IP should have a valid reverse DNS (PTR) record that resolves back to the sending hostname. Many receivers reject mail from IPs without PTR records.
- Unsubscribe mechanism — RFC 8058 defines
List-Unsubscribe-Post, enabling one-click unsubscribe. Gmail and Yahoo now require bulk senders to include this header and honor unsubscribe requests within two days.
How Phishing Exploits Email Trust
Phishing attacks exploit the fundamental trust model of email. Common techniques include:
- Display name spoofing — Setting the
From:display name to "IT Support" or "CEO Jane Smith" while using a completely different email address. Many mobile clients show only the display name, hiding the actual address. - Lookalike domains — Registering domains that visually resemble the target (e.g.,
exarnple.comwith an 'rn' instead of 'm', or internationalized domain names using Cyrillic characters that look identical to Latin ones). These domains can pass SPF, DKIM, and DMARC because the attacker controls DNS for their lookalike domain. - Compromised accounts — Sending phishing emails from a legitimate, compromised account bypasses all authentication checks because the mail genuinely originates from the claimed domain.
- Reply chain hijacking — Intercepting an existing email thread and inserting a malicious reply. The conversation history makes the message appear legitimate.
- Business email compromise (BEC) — Targeted attacks where the attacker impersonates an executive or vendor, requesting wire transfers or sensitive data. BEC caused over $2.9 billion in reported losses in 2023 according to the FBI's IC3 report.
DMARC with p=reject prevents direct domain spoofing but cannot stop lookalike domains or compromised accounts. BIMI helps by training users to expect a brand logo on legitimate messages — its absence on a lookalike domain is a visual cue that something is wrong.
Email Headers and Tracing the Delivery Path
Every email carries a set of headers that document its journey. While most users never see them, headers are essential for debugging delivery issues, investigating phishing, and understanding mail routing. Key headers to examine:
Received:— Each MTA that handles the message prepends aReceived:header. Reading these from bottom to top traces the message's path from origin to destination. Each includes the server name, IP address, protocol (ESMTP, ESMTPS for TLS), and timestamp. Look up these IP addresses in the looking glass to identify which networks handled the message.Authentication-Results:— Added by the receiving server, this header shows the SPF, DKIM, and DMARC check results. A typical entry:spf=pass smtp.mailfrom=example.com; dkim=pass header.d=example.com; dmarc=pass.X-Originating-IP:— Some webmail services include the IP address of the user who composed the message, useful for investigating the geographic origin of suspicious mail.Return-Path:— The envelope sender address, where bounces are delivered. Compare this to theFrom:header — a mismatch is not inherently suspicious (mailing lists, forwarding services, and transactional email systems legitimately use different return paths) but is worth noting.Message-ID:— A globally unique identifier for the message, typically in the format<unique-string@sending-domain>. The domain in the Message-ID can reveal the true sending infrastructure.
To trace an email's path through the internet, extract the IP addresses from Received: headers and look them up. Each IP maps to a BGP prefix originated by an autonomous system. This reveals whether the message actually traversed the networks you would expect. A message claiming to be from a US-based company but routed through unexpected networks is a red flag.
The Relationship Between Email and BGP
Email depends on BGP at every stage. The MX records for a domain resolve to IP addresses that are reachable only because BGP carries their routes across the internet. A BGP hijack targeting a mail server's prefix could redirect email to an attacker, intercept messages, or cause delivery failures. This has happened in practice — the 2018 attack on Amazon's Route 53 DNS involved BGP hijacking to redirect DNS queries, which in turn affected email routing.
SPF records contain IP addresses that must be reachable for validation. If BGP routes for those IPs are disrupted, SPF checks fail, potentially causing legitimate mail to be rejected. The security of the entire email authentication system ultimately rests on the security of DNS (which depends on DNSSEC) and BGP (which depends on RPKI).
You can explore the infrastructure behind major email providers by looking up their mail server IPs and autonomous systems:
- AS15169 — Google (Gmail, Google Workspace)
- AS8075 — Microsoft (Outlook, Exchange Online)
- AS16509 — Amazon (SES, WorkMail)
- AS13335 — Cloudflare (Email Routing)
- AS14618 — Amazon (additional infrastructure)
Key RFCs and Standards
Email is defined by a deep stack of RFCs accumulated over decades:
- RFC 5321 — SMTP (Simple Mail Transfer Protocol)
- RFC 5322 — Internet Message Format (headers and body structure)
- RFCs 2045-2049 — MIME (attachments, character sets, multipart messages)
- RFC 9051 — IMAP (Internet Message Access Protocol, version 4rev2)
- RFC 1939 — POP3 (Post Office Protocol version 3)
- RFC 7208 — SPF (Sender Policy Framework)
- RFC 6376 — DKIM (DomainKeys Identified Mail)
- RFC 7489 — DMARC (Domain-based Message Authentication, Reporting, and Conformance)
- RFC 8617 — ARC (Authenticated Received Chain)
- RFC 3207 — SMTP STARTTLS
- RFC 8461 — MTA-STS (Mail Transfer Agent Strict Transport Security)
- RFC 8460 — TLS Reporting for email
- RFC 8058 — One-Click List-Unsubscribe
- RFC 8620 — JMAP (JSON Meta Application Protocol)
Look Up Email Infrastructure
Trace the networks behind the world's email systems. Look up mail server IPs and the autonomous systems that carry email traffic across the internet:
- AS15169 — Google: operates Gmail and Google Workspace mail exchangers
- AS8075 — Microsoft: runs Outlook.com and Exchange Online
- AS13335 — Cloudflare: Email Routing service and DNS for millions of domains
- 1.1.1.1 — Cloudflare DNS: resolves MX records for email delivery worldwide
- 8.8.8.8 — Google DNS: resolves mail routing queries