How IPFS Works: Content-Addressed Storage

The InterPlanetary File System, or IPFS, is a peer-to-peer protocol for storing and sharing data in a distributed network. Unlike the traditional web, where content is addressed by location (the server hosting it), IPFS addresses content by what it is — a cryptographic hash of the data itself. This fundamental shift from location-based to content-based addressing changes how files are found, verified, and distributed across the internet.

IPFS was created by Juan Benet and Protocol Labs, with its first public release in 2015. It draws on ideas from BitTorrent, Git, and distributed hash tables to build a content-addressable, peer-to-peer hypermedia distribution protocol. Today it underpins NFT metadata storage, decentralized application hosting, and censorship-resistant publishing — and its relationship with DNS and blockchain-based naming systems makes it a key piece of the decentralized web.

Location Addressing vs Content Addressing

The traditional web uses location addressing. When you request https://example.com/photo.jpg, your browser connects to a specific server (resolved via DNS to an IP address, then routed via BGP) and asks that server for the file at that path. If the server is down, the file is gone — even if an identical copy exists somewhere else. The URL points to a place, not to the content.

IPFS uses content addressing. Instead of asking "give me the file at this location," you ask the network "does anyone have the file with this hash?" The hash uniquely identifies the content. If anyone on the IPFS network has a copy — whether it is the original uploader, a caching node, or another user who pinned it — they can serve it to you. The identity of the content is independent of where it lives.

This distinction is the architectural foundation everything in IPFS builds upon. Content addressing makes files self-verifying (you can always check the hash), naturally deduplicated (identical files have the same address), and location-independent (any node can serve any content it holds).

Content Identifiers (CIDs)

Every piece of data in IPFS is identified by a Content Identifier (CID) — a compact, self-describing hash. A CID encodes three things: the hash function used (typically SHA-256), the codec that describes how the data is structured (such as dag-pb for IPFS's Merkle DAG format or raw for uninterpreted bytes), and the hash digest itself.

CIDs come in two versions:

CIDv0 — The original format, starting with Qm. These always use SHA-256 and the dag-pb codec. Example: QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG
CIDv1 — The current format, using multibase encoding (typically base32 or base36). CIDv1 is self-describing: it includes version, codec, and multihash information. Example: bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi

The bafy prefix you see on CIDv1 hashes is not arbitrary. b means base32 encoding, and afy encodes the CID version (1) and codec (dag-pb). This self-describing property means any IPFS node receiving a CID immediately knows how to interpret and verify it.

The critical property of content addressing is immutability. If you change even a single byte of a file, its CID changes completely. This means a CID is both an address and a guarantee — if you retrieve data matching a CID, you know with cryptographic certainty it has not been tampered with.

Merkle DAGs: How IPFS Structures Data

IPFS does not store files as monolithic blobs. Instead, it breaks them into chunks (typically 256 KB) and organizes those chunks into a Merkle DAG (Directed Acyclic Graph). This is the same fundamental data structure used by Git for version control and by blockchains for transaction verification.

In a Merkle DAG, each node contains data and links to other nodes. Each link is a CID — the content hash of the child node. The root node's CID depends on all its children's CIDs, which depend on their children, and so on. This creates a cascade: changing any piece of data anywhere in the tree changes every CID up to the root.

This structure provides several properties that are fundamental to how IPFS works:

Deduplication — If two files share identical chunks, those chunks are stored only once. The Merkle DAG naturally deduplicates at the chunk level.
Parallel downloading — Different chunks can be fetched from different peers simultaneously, similar to BitTorrent.
Incremental verification — Each chunk can be independently verified against its CID as it arrives, without waiting for the complete file.
Efficient updates — If you modify a file, only the changed chunks and the ancestor nodes need new CIDs. Unchanged subtrees keep their existing hashes and data.

The Merkle DAG is not limited to flat files. IPFS can represent entire directory trees as DAGs, where directory nodes contain links to file nodes. This is how IPFS hosts complete websites — the root CID of a directory gives you access to every file within it.

DHT-Based Peer Discovery

When you request content by its CID, IPFS needs to find which peers actually have the data. It does this using a Distributed Hash Table (DHT) — specifically, a Kademlia-based DHT adapted for content routing.

The DHT is a distributed key-value store spread across all participating IPFS nodes. It maps CIDs to the set of peers that have advertised they hold the corresponding data. There is no central server that tracks who has what. Instead, responsibility for storing provider records is distributed among nodes whose peer IDs are "close" to the CID in the Kademlia XOR metric — a mathematical distance function over the hash space.

The process of finding content works in three steps:

Hash the content request — The client has a CID and needs to find providers. It computes which region of the hash space is responsible for this CID.
Walk the DHT — The client queries nodes it knows about, asking them for nodes closer to the target. Each step brings it closer to the nodes responsible for tracking providers of that CID. This converges in O(log n) steps, where n is the number of nodes in the network.
Get provider records — Once it reaches the responsible nodes, it retrieves the list of peers who have advertised they hold the data. The client then connects directly to one or more of those peers to download the content.

When a node adds content to IPFS, it announces itself as a provider to the DHT — effectively publishing a record that says "I have the data for this CID." These provider records expire and must be periodically refreshed, which is why content disappears from IPFS when no node is actively providing it (more on this under pinning, below).

The DHT is also used for peer routing — finding the network addresses of specific peers by their peer ID. This is how two IPFS nodes that want to communicate discover each other's IP addresses and establish direct connections, often traversing NATs using hole-punching techniques.

Bitswap: The Data Exchange Protocol

Once an IPFS node knows which peers have the content it wants, it uses the Bitswap protocol to actually exchange data blocks. Bitswap is a message-based protocol where peers exchange two types of lists: their want list (blocks they need) and blocks they can provide.

Bitswap is more sophisticated than a simple request-response protocol. It implements a ledger-based system inspired by BitTorrent's tit-for-tat strategy. Each pair of connected peers maintains a balance of how much data they have sent and received from each other. Peers that contribute more to the network are rewarded with faster service — free-riders who only download without sharing gradually receive lower priority.

The Bitswap exchange works like this:

Node A connects to Node B and sends its want list — a set of CIDs it needs.
Node B checks which of those CIDs it has locally. For any it has, it sends the blocks to Node A.
Node B also sends its own want list to Node A, and A reciprocates if it holds any requested blocks.
As blocks arrive, A verifies each one against its expected CID. Any block that does not hash to the expected CID is rejected.

Bitswap enables swarm downloading — a node can request different blocks from different peers simultaneously, assembling the complete file from multiple sources. This is particularly effective for popular content, where many peers hold copies of the same blocks.

IPNS: Mutable Naming for Immutable Content

Because CIDs are derived from content, they change whenever the content changes. This creates a problem for use cases that need a stable address pointing to changing content — like a website that gets updated, a profile that evolves, or a dataset that is periodically refreshed. You cannot use a CID as a permanent link to something that changes.

IPNS (InterPlanetary Name System) solves this by providing a mutable pointer to an immutable CID. An IPNS name is tied to a cryptographic key pair. The owner of the private key can publish a signed record that says "this IPNS name currently points to CID X." When the content changes, the owner publishes a new record pointing to the new CID. Anyone looking up the IPNS name gets the latest CID.

IPNS names look like: /ipns/k51qzi5uqu5dlvj2bv6qng... (using the peer's public key) or /ipns/example.com (when using DNS-linked IPNS records via DNSLink). The DNSLink approach is particularly practical: you add a TXT record to your domain's DNS zone like _dnslink.example.com TXT "dnslink=/ipfs/QmYwAP...", and IPFS nodes can resolve your domain name to IPFS content through standard DNS infrastructure.

DNSLink bridges the traditional web and IPFS, letting you use familiar domain names while serving content from IPFS. However, it reintroduces a dependency on DNS — which is centralized, censorable, and relies on the same BGP-routed infrastructure that IPFS aims to supplement. Blockchain domain systems like ENS (Ethereum Name Service) and Unstoppable Domains offer an alternative: domain names registered on-chain that resolve to IPFS CIDs without depending on ICANN, registrars, or DNS servers.

Pinning: Keeping Content Available

IPFS nodes have finite storage. By default, a node caches content it has recently accessed but may garbage-collect it when storage pressure increases. If no node on the network actively holds a file, it becomes unreachable — the CID still exists, but nobody can serve the data.

Pinning tells an IPFS node to keep specific content permanently and never garbage-collect it. When you pin a CID, you are committing to store and serve that data indefinitely (or until you unpin it). Pinning is the difference between content being transiently available and reliably persistent.

There are three types of pins:

Direct pin — Pins a single block, without pinning any blocks it links to.
Recursive pin — Pins a block and everything it links to, recursively. This is the most common type — pinning a root CID of a directory recursively ensures the entire directory tree stays available.
Indirect pin — A block that is not explicitly pinned but is retained because it is linked to by a recursively pinned block.

Running your own IPFS node and pinning content gives you direct control, but it means your content is only available when your node is online. For most real-world applications, this is insufficient.

Pinning Services

Pinning services are commercial providers that run IPFS infrastructure and pin content on your behalf, ensuring high availability without requiring you to run your own nodes. Major pinning services include:

Pinata — One of the largest, widely used for NFT metadata and web3 applications.
web3.storage — Backed by Protocol Labs, provides free storage with Filecoin persistence.
Infura — Part of ConsenSys, offers IPFS pinning alongside Ethereum infrastructure.
Filebase — S3-compatible storage backed by decentralized networks including IPFS.

Pinning services interact with your IPFS node via the IPFS Pinning Service API, a standardized interface (defined in the IPFS specification) that lets you manage pins across providers programmatically. You can use multiple pinning services for redundancy — pinning the same CID on three services means the content survives any single provider going down.

IPFS Gateways: Bridging IPFS and HTTP

Not everyone runs an IPFS node. Gateways bridge the gap by providing HTTP access to IPFS content. A gateway is an IPFS node that also runs an HTTP server — when it receives an HTTP request for an IPFS CID, it fetches the content from the IPFS network and serves it over standard HTTP.

The most widely used public gateways include:

https://ipfs.io/ipfs/{CID} — Operated by Protocol Labs, the original public gateway.
https://dweb.link/ipfs/{CID} — Also by Protocol Labs, using subdomain isolation for security.
https://cloudflare-ipfs.com/ipfs/{CID} — Operated by Cloudflare (AS13335), leveraging their global CDN infrastructure for fast delivery.
https://w3s.link/ipfs/{CID} — Operated by web3.storage.

There are two gateway URL patterns, and the difference matters for web security:

Path-based: https://ipfs.io/ipfs/{CID}/path/to/file — All content shares the same origin, which creates security issues (cookies, localStorage, and service workers are shared).
Subdomain-based: https://{CID}.ipfs.dweb.link/path/to/file — Each CID gets its own origin, providing proper security isolation. This is the recommended pattern for web applications.

Gateways are convenient but introduce centralization. When you access IPFS content through a gateway, you depend on that gateway's availability, its routing through BGP and the broader internet infrastructure, and its willingness to serve the content. Gateways can be blocked, rate-limited, or shut down. For true decentralization, running a local IPFS node (or using a browser with built-in IPFS support, like Brave) eliminates the gateway dependency.

How IPFS Compares to HTTP

IPFS and HTTP solve different problems and make fundamentally different architectural tradeoffs. Understanding these differences clarifies where each protocol fits.

Property	HTTP	IPFS
Addressing	Location (server + path)	Content (cryptographic hash)
Data integrity	Trust the server (TLS verifies the server, not the content)	Built-in (content hash = address)
Redundancy	Must be engineered (CDNs, replicas)	Natural (any peer can serve any content)
Mutability	Native (URLs point to changing content)	Requires IPNS or DNSLink layer
Performance	Optimized over decades, HTTP/2, HTTP/3	DHT lookup overhead, improving with Bitswap optimizations
Offline support	Requires explicit caching strategies	Nodes naturally cache accessed content
Infrastructure	Servers, DNS, CAs, CDNs	Peer-to-peer, DHT, optional gateways
Censorship	Servers can be seized, DNS blocked	Content available from any peer holding it

HTTP excels at dynamic, interactive web applications — real-time APIs, authenticated sessions, server-rendered pages. IPFS excels at distributing static and semi-static content where integrity, permanence, and censorship resistance matter. In practice, the two protocols are complementary rather than competing.

Use Cases

NFT Metadata and Digital Assets

When you buy an NFT, the blockchain token typically does not contain the artwork itself — it contains a URI pointing to the metadata and image. If that URI is an HTTP URL to a centralized server, the artwork could disappear if the server shuts down, making your token point to nothing. IPFS solves this: NFT metadata and images stored on IPFS are addressed by their content hash, so the URI is a permanent, verifiable reference to exactly the digital asset you purchased. Marketplaces like OpenSea and platforms like Zora store NFT assets on IPFS with Filecoin persistence for this reason.

Decentralized Application Hosting

Decentralized applications (dApps) can host their frontend interfaces on IPFS, making them accessible even if the development team's servers are taken offline. Combined with blockchain domain names (like ENS .eth domains resolving to IPFS CIDs), this creates a fully decentralized hosting stack where no single entity controls access to the application. Projects like Uniswap, Aave, and many DeFi protocols maintain IPFS-hosted frontends as censorship-resistant alternatives to their primary web deployments.

Censorship-Resistant Publishing

IPFS makes content difficult to censor because there is no single server to take down or domain to seize. Once content is pinned by multiple nodes across different jurisdictions, removing it requires convincing every single node to unpin it. Wikipedia has been mirrored on IPFS to ensure access from countries where it is blocked. Independent journalists and activists use IPFS to publish content that might be censored on traditional hosting platforms.

Scientific Data and Archives

Research datasets and digital archives benefit from content addressing because the CID serves as both an address and a verifiable checksum. Researchers can cite IPFS CIDs in papers, knowing that anyone who retrieves the data can verify it is exactly the dataset the paper analyzed. The Library of Congress, Internet Archive, and academic institutions have explored IPFS for long-term digital preservation.

Package Distribution

Software package managers can use IPFS to distribute packages in a decentralized fashion, reducing load on central registries and improving availability. The Go module proxy has experimented with IPFS, and npm packages have been mirrored on the network. Content addressing guarantees that a package fetched via IPFS is byte-identical to the original — supply chain integrity by design.

Filecoin: Incentivized Storage for IPFS

IPFS by itself has no built-in incentive for nodes to store other people's data. If you stop running your node and nobody else has pinned your content, it disappears. Filecoin is a companion protocol (also created by Protocol Labs) that adds economic incentives to storage.

Filecoin is a blockchain where storage providers (miners) earn FIL tokens by proving they are storing clients' data reliably over time. The protocol uses two cryptographic proofs:

Proof of Replication (PoRep) — Proves the provider has created a unique copy of the data, not just claimed to store it.
Proof of Spacetime (PoSt) — Proves the provider is continuing to store the data over time, not just storing it once and discarding it.

When you store data through Filecoin, you make a deal with storage providers who commit to storing your data for a specified duration (currently a minimum of 180 days, typically 1-3 years). The data is content-addressed using the same CID system as IPFS, creating a seamless bridge: store data via Filecoin, retrieve it via IPFS.

Services like web3.storage abstract this two-layer architecture: when you upload data, it is immediately available via IPFS for fast retrieval and simultaneously stored on Filecoin for long-term persistence. You interact with a single API, and the service handles the complexity of coordinating between the two networks.

The IPFS Network Stack and the Internet

IPFS is a protocol that runs on top of the existing internet infrastructure. It relies on the same IP routing, BGP path selection, and DNS resolution that powers HTTP. IPFS peers discover each other through the DHT, but the actual data transfer happens over TCP and QUIC connections that are routed through the same autonomous systems and internet exchange points as all other internet traffic.

This means IPFS inherits both the strengths and weaknesses of internet routing. A BGP hijack that intercepts traffic to an IPFS node would affect content retrieval from that node. An ISP that blocks IPFS-related ports could prevent its users from participating in the network (though IPFS can use standard HTTPS ports to blend in). Network partitions caused by submarine cable cuts or routing failures fragment the IPFS DHT just as they fragment any other global network.

However, IPFS's content-addressing model provides a layer of resilience that HTTP lacks. Even if an IPFS node is unreachable due to a routing problem, any other node holding the same content can serve it. Content integrity is verified cryptographically, not by trusting the transport path. If you can reach any peer with the data, you can verify and use it — regardless of which AS path your packets took.

Current Status and Limitations

IPFS has matured significantly since its inception but still faces practical challenges. DHT lookups can be slow compared to DNS resolution — finding content on the IPFS network can take seconds rather than milliseconds. The network depends on nodes voluntarily providing content; without pinning services or Filecoin incentives, content availability is unreliable. IPFS also adds complexity for developers accustomed to the simplicity of HTTP APIs and centralized hosting.

Performance continues to improve. Protocol Labs has developed IPNI (InterPlanetary Network Indexer), a centralized-but-federated index that accelerates content routing by supplementing the DHT with faster lookup services. The Saturn CDN project applies CDN architecture to IPFS, distributing content through a network of edge nodes for low-latency retrieval. And the move from go-ipfs to the more modular Kubo implementation has improved configurability and resource efficiency.

IPFS is unlikely to replace HTTP for real-time web applications, interactive services, or any use case that requires server-side computation. But for content distribution, archival storage, and applications where data integrity and censorship resistance are priorities, IPFS provides capabilities that the location-addressed web fundamentally cannot.

Explore the Infrastructure

The networks that power IPFS infrastructure are visible in the BGP routing table. You can look up the autonomous systems that operate IPFS gateways, pinning services, and related infrastructure:

AS13335 — Cloudflare, which operates the cloudflare-ipfs.com gateway
AS16509 — AWS, hosting many IPFS pinning services
AS15169 — Google Cloud, infrastructure for IPFS tooling
1.1.1.1 — Cloudflare DNS, often used to resolve IPFS gateway domains