How VPC Networking Works: Subnets, Route Tables, and Gateways in the Cloud
A Virtual Private Cloud (VPC) is a logically isolated section of a cloud provider's network where you deploy resources with full control over IP addressing, routing, and access policies. It is the cloud equivalent of a physical data center's network -- you define subnets, configure route tables, attach gateways, and enforce security rules -- except the infrastructure is software-defined, API-driven, and spans multiple availability zones. Every major cloud provider (AWS, GCP, Azure) implements VPCs, and while the terminology differs slightly, the core architecture is remarkably consistent. Understanding VPC networking is essential for anyone running production workloads in the cloud, because misconfigured VPCs are behind a large fraction of cloud security incidents and outages.
The VPC as an Isolated Network Domain
When you create a VPC, you assign it a CIDR block -- a contiguous range of private IP addresses from the RFC 1918 space (10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16) or, in some cases, publicly routable addresses. This CIDR block defines the total address space available for all resources within the VPC. AWS allows VPCs with a primary CIDR between /16 and /28, and you can add secondary CIDR blocks later. GCP takes a different approach: VPCs are global objects, and subnets within them can have independent CIDR ranges in different regions.
The isolation is real. At the network layer, a VPC is a separate broadcast domain. Traffic between two VPCs does not flow unless you explicitly configure connectivity (peering, transit gateways, or VPN). This is enforced at the hypervisor level -- the virtual switch on each physical host only forwards packets to destinations within the same VPC, or to an explicitly connected gateway. The underlying mechanism varies by provider, but it typically involves some form of encapsulation: AWS uses a proprietary encapsulation similar to VXLAN to tunnel VPC traffic over the physical network, tagging packets with a VPC identifier so the hypervisor can enforce isolation.
Every VPC comes with an implicit router. You never see it as a discrete resource, but it exists at every subnet boundary, handles inter-subnet routing within the VPC, and consults route tables to decide where to send traffic. This implicit router is what makes the "virtual data center" abstraction work -- it provides Layer 3 connectivity between subnets without you having to deploy and manage router instances.
Subnets: Public, Private, and Isolated
A subnet is a partition of the VPC's CIDR block, associated with a specific availability zone (AZ). You carve the VPC CIDR into smaller blocks and assign each to a subnet. For example, a 10.0.0.0/16 VPC might be divided into:
- 10.0.1.0/24 -- Public subnet in AZ-a (254 usable IPs)
- 10.0.2.0/24 -- Public subnet in AZ-b
- 10.0.10.0/24 -- Private subnet in AZ-a
- 10.0.20.0/24 -- Private subnet in AZ-b
- 10.0.100.0/24 -- Isolated subnet for databases in AZ-a
- 10.0.200.0/24 -- Isolated subnet for databases in AZ-b
The distinction between public, private, and isolated subnets is not an inherent property of the subnet -- it is determined entirely by routing. A public subnet has a route table entry that sends 0.0.0.0/0 traffic to an internet gateway (IGW). Instances in this subnet can have public IPs and communicate directly with the internet. A private subnet routes 0.0.0.0/0 to a NAT gateway, allowing outbound internet access (for software updates, API calls) without exposing instances to inbound connections. An isolated subnet has no default route to the internet at all -- traffic can only flow within the VPC or to explicitly peered networks.
Each subnet is bound to a single AZ. This is a critical design constraint: if AZ-a goes down, subnets in AZ-a become unavailable. For high availability, you deploy redundant resources across multiple AZs, with subnets in each. Cloud providers reserve the first four addresses and the last address of each subnet for internal use (network address, VPC router, DNS, future use, and broadcast), so a /24 subnet provides 251 usable addresses, not 254.
Route Tables and the Implicit Router
Every subnet is associated with exactly one route table (though one route table can be shared across multiple subnets). A route table is an ordered list of destination CIDR/target pairs. The VPC router evaluates routes using longest-prefix matching, exactly as a hardware router would.
Every route table automatically includes a local route for the VPC CIDR (e.g., 10.0.0.0/16 -> local). This route cannot be deleted and ensures that all subnets within the VPC can communicate with each other by default. Additional routes can point to:
- Internet Gateway (IGW) -- For public internet access (both inbound and outbound)
- NAT Gateway -- For outbound-only internet access from private subnets
- VPC Peering Connection -- For traffic destined to a peered VPC's CIDR
- Transit Gateway -- For centralized routing across many VPCs and on-premises networks
- Virtual Private Gateway (VGW) -- For traffic going through a VPN or Direct Connect to on-premises
- VPC Endpoint -- For private connectivity to cloud services (S3, DynamoDB, etc.) without traversing the internet
- Network Interface (ENI) -- For routing traffic through a specific instance (e.g., a software firewall or NAT instance)
A common gotcha: if you add a more specific route that overlaps with the local route, the more specific route wins due to longest-prefix matching. This can inadvertently break intra-VPC connectivity if you are not careful. For example, adding a route for 10.0.10.0/24 -> some-appliance will intercept traffic to that subnet even from within the VPC.
Internet Gateways and NAT Gateways
An Internet Gateway (IGW) is a horizontally scaled, redundant, fully managed component that performs two functions: it acts as a target in route tables for internet-bound traffic, and it performs one-to-one NAT between an instance's private IP and its associated public or Elastic IP. Unlike a NAT gateway, an IGW does not perform port-address translation -- each instance gets a dedicated public IP mapping. The IGW is stateless and does not become a bottleneck; AWS does not impose bandwidth limits on it.
A NAT Gateway performs port-address translation (PAT), allowing many private instances to share a single public IP address for outbound connections. It maintains a connection tracking table, maps each outbound connection to a unique source port on the NAT IP, and rewrites return packets. NAT gateways are AZ-scoped -- you need one per AZ for high availability. AWS NAT gateways support up to 55,000 simultaneous connections per destination IP and can burst to 100 Gbps. The cost model (per-hour plus per-GB data processing) makes NAT gateways one of the most expensive networking components in a typical AWS bill.
For cost optimization, many organizations use VPC endpoints to bypass NAT gateways entirely for traffic to AWS services. A gateway endpoint for S3, for instance, adds a route table entry that sends S3-bound traffic directly through the AWS backbone, avoiding NAT gateway data processing charges.
Security Groups: Stateful Instance-Level Firewalls
A security group is a stateful firewall attached to an elastic network interface (ENI). Every instance, RDS database, Lambda function (in a VPC), and load balancer has one or more security groups. Security group rules define allowed inbound and outbound traffic by protocol, port range, and source/destination. Crucially, security groups are default-deny inbound, default-allow outbound -- if you create a security group with no rules, no inbound traffic is permitted, but all outbound traffic is allowed.
The stateful nature means that if you allow inbound TCP port 443, the return traffic for those connections is automatically permitted regardless of outbound rules. This is implemented via connection tracking at the hypervisor level. The implication: you do not need to create explicit outbound rules for return traffic, and you do not need to worry about ephemeral port ranges.
Security groups have a powerful feature: self-referencing rules. You can create a rule where the source is another security group (or even the same group). For example, a "backend" security group can allow inbound port 8080 from the "load-balancer" security group. This decouples security policy from IP addresses -- instances can be added or removed from either group without updating firewall rules. This is a major advantage over traditional IP-based ACLs, especially in auto-scaling environments where instance IPs are ephemeral.
- No deny rules -- Security groups only support allow rules. You cannot explicitly block a specific IP. If you need explicit deny rules, use NACLs.
- Evaluated as a set -- If an instance has multiple security groups, all rules across all groups are aggregated. A packet is allowed if any rule in any group permits it.
- Limits -- By default, each security group can have 60 inbound and 60 outbound rules. Each ENI can have up to 5 security groups. These limits can be increased but affect hypervisor performance.
Network ACLs: Stateless Subnet-Level Firewalls
Network Access Control Lists (NACLs) are stateless firewalls applied at the subnet level. Unlike security groups, NACLs evaluate traffic in both directions independently -- if you allow inbound port 443, you must separately allow the outbound ephemeral port range for return traffic. NACLs are processed in rule number order (lowest first), and the first matching rule determines the action. Each rule can either ALLOW or DENY traffic.
The default NACL allows all inbound and outbound traffic. Custom NACLs default to deny-all. In practice, most organizations use security groups for primary access control and reserve NACLs for coarse-grained subnet-level restrictions -- for example, blocking an IP range known to be malicious, or restricting traffic between subnet tiers as a defense-in-depth measure.
The stateless nature of NACLs is the most common source of configuration errors. Consider HTTPS traffic: you need an inbound rule allowing TCP 443 and an outbound rule allowing TCP on ephemeral ports 1024-65535 (the range where the OS allocates source ports for reply traffic). Forgetting the outbound rule breaks all HTTPS connections. This is why security groups (stateful) are preferred for most use cases -- they handle return traffic automatically.
VPC Peering
VPC peering creates a direct network connection between two VPCs, allowing resources in either VPC to communicate using private IP addresses as if they were on the same network. Peering works across accounts and across regions (inter-region peering). The connection is established by sending a peering request from one VPC to another; the owner of the target VPC must accept the request before traffic can flow.
VPC peering has important limitations:
- Non-transitive -- If VPC-A is peered with VPC-B, and VPC-B is peered with VPC-C, VPC-A cannot reach VPC-C through VPC-B. Each pair requires a separate peering connection. This leads to an O(n^2) scaling problem: connecting n VPCs requires n*(n-1)/2 peering connections.
- No overlapping CIDRs -- Two VPCs with overlapping IP address ranges cannot be peered. This is a frequent problem in organizations that used the same default CIDR (e.g., 10.0.0.0/16) across many VPCs.
- No edge-to-edge routing -- You cannot use a peered VPC's internet gateway, NAT gateway, or VPN connection to route traffic. Each VPC must have its own gateways.
- Route table entries required -- Peering alone does not enable traffic flow. You must add route table entries in both VPCs pointing the remote VPC's CIDR to the peering connection.
Despite these limitations, peering is useful for simple point-to-point connectivity -- connecting a production VPC to a shared services VPC (logging, monitoring, CI/CD), or enabling cross-account access between teams.
Transit Gateway: Hub-and-Spoke at Scale
AWS Transit Gateway (TGW) solves the peering scalability problem. It acts as a regional network hub -- you attach VPCs, VPN connections, Direct Connect gateways, and even peered transit gateways to it, and it handles routing between all attached networks. Instead of O(n^2) peering connections, you need O(n) attachments.
Transit gateways support route tables with route propagation. Attached VPCs and VPNs can automatically propagate their routes into the TGW route table, or you can define static routes for more control. Multiple route tables enable network segmentation: you might have a "production" route table and a "development" route table, with different VPCs attached to each, effectively creating isolated routing domains within the same transit gateway.
Transit gateways support inter-region peering, enabling you to build a global network backbone across AWS regions. Traffic between peered transit gateways stays on the AWS backbone -- it does not traverse the public internet. Combined with Direct Connect, a transit gateway can serve as the central hub for hybrid cloud networking, connecting dozens of VPCs and on-premises data centers through a single architecture.
The cost model is per-attachment (hourly) plus per-GB data processing, which can be significant at scale. For simple two-VPC connectivity, peering is cheaper. Transit gateways become cost-effective when you have more than 3-4 VPCs that need mutual connectivity.
VPC Endpoints and PrivateLink
VPC endpoints allow resources in your VPC to connect to AWS services without going through the internet, NAT gateway, or VPN. There are two types:
- Gateway endpoints -- Free. Available only for S3 and DynamoDB. Implemented as a route table entry that redirects traffic to the service through the AWS backbone. No DNS changes needed; the service endpoints resolve to the same IPs but traffic is routed internally.
- Interface endpoints (powered by AWS PrivateLink) -- Available for over 100 AWS services and any third-party services that publish PrivateLink endpoints. An interface endpoint creates an ENI in your subnet with a private IP address. DNS resolution for the service endpoint is overridden (via private hosted zone) to resolve to this ENI's IP. Traffic flows through the ENI directly to the service, never touching the internet.
AWS PrivateLink is the underlying technology for interface endpoints, but it is also a service exposure mechanism. You can create your own PrivateLink service by placing a Network Load Balancer in front of your application. Other AWS accounts can then create interface endpoints to connect to your service. Traffic flows over the AWS backbone, the consumer never sees your VPC's IP addresses, and you control access via endpoint policies. This is how many SaaS products (Datadog, Snowflake, MongoDB Atlas) offer private connectivity to their services.
VPC Flow Logs
VPC Flow Logs capture metadata about IP traffic flowing through network interfaces in your VPC. Each flow log record includes the source IP, destination IP, source port, destination port, protocol, packet count, byte count, action (ACCEPT/REJECT), and the interface ID. Flow logs can be attached at three levels: VPC (captures all traffic), subnet, or individual ENI.
Flow logs are essential for security analysis, troubleshooting connectivity issues, and compliance. Common uses include:
- Identifying security group rules that are too permissive by analyzing which connections are accepted
- Debugging connectivity failures by looking for REJECT actions on expected traffic paths
- Detecting anomalous traffic patterns (port scans, data exfiltration, unusual outbound connections)
- Feeding into SIEM systems for real-time security monitoring
Flow logs do not capture the packet payload -- only metadata. They are sampled (not every packet is logged), and there is a delay of several minutes before records are available. For full packet capture, you need traffic mirroring, which copies actual packets to a monitoring appliance.
Multi-Cloud VPC Equivalents
While the term "VPC" originated with AWS, all major cloud providers implement the same concept with slightly different names and architectural choices:
- AWS VPC -- Regional. Subnets are AZ-scoped. Supports secondary CIDR blocks. Uses security groups + NACLs. IGW, NAT GW, TGW, PrivateLink. VPC peering is non-transitive.
- GCP VPC -- Global. Subnets are regional (span all zones in a region). Firewall rules are VPC-scoped (not subnet-scoped). Shared VPC allows multiple projects to share a VPC. Cloud NAT is regional. VPC Network Peering supports transitive peering through Network Connectivity Center.
- Azure VNet -- Regional. Subnets can span AZs. Uses Network Security Groups (NSGs) similar to AWS security groups. VNet peering, Virtual WAN (hub-and-spoke), Private Link. Azure uniquely supports "delegated subnets" where a subnet is exclusively assigned to a specific Azure service.
The most significant architectural difference is GCP's global VPC model. In AWS and Azure, a VPC/VNet is confined to a single region, and cross-region connectivity requires peering or transit gateways. In GCP, a single VPC spans all regions, and subnets in different regions can communicate without any additional configuration. This simplifies multi-region deployments but requires more careful CIDR planning since all subnets share the same routing domain.
Common VPC Design Patterns
Production VPC architectures typically follow one of these patterns:
- Three-tier -- Public subnets (load balancers), private subnets (application servers), isolated subnets (databases). This is the standard starting point for web applications.
- Hub-and-spoke -- A central "shared services" VPC connected via transit gateway to spoke VPCs for each team, environment, or application. The hub hosts DNS, logging, monitoring, and security appliances. Spoke VPCs are isolated from each other.
- Multi-account with landing zone -- Each AWS account gets its own VPC. A transit gateway connects accounts through a networking account. AWS Organizations SCPs enforce tagging, CIDR allocation, and security baseline policies. This is the pattern recommended by the AWS Well-Architected Framework for enterprise deployments.
- Hybrid cloud -- VPCs connected to on-premises data centers via Direct Connect (dedicated fiber) or Site-to-Site VPN. The on-premises network advertises its routes via BGP over the Direct Connect connection, and those routes propagate into the VPC route tables through the virtual private gateway or transit gateway.
VPC Networking and the Global Routing Table
VPC networking operates at a layer above the public internet routing system, but the two are deeply interconnected. When an instance in a public subnet communicates with the internet, its traffic exits through the IGW, which translates its private IP to a public IP that belongs to the cloud provider's autonomous system. That public IP is part of a prefix announced via BGP by the provider's edge routers to the global routing table. AWS, for instance, announces its prefixes from AS16509 and AS14618, which you can observe in the global routing table. The cloud provider's backbone handles routing between its regions, points of presence, and internet exchange points, but from the perspective of the rest of the internet, your VPC's public IPs are just addresses within the provider's BGP announcements. Use the god.ad looking glass to look up your cloud instances' public IPs and trace the AS path from any vantage point back to the provider's network.