How NetFlow and sFlow Work: Flow-Based Traffic Analysis
NetFlow and sFlow are flow-based traffic analysis protocols that provide per-conversation visibility into network traffic -- who is talking to whom, over which protocols, and how much data is being exchanged. Unlike SNMP, which provides aggregate interface counters (total bytes in, total bytes out), flow protocols break traffic down into individual flows defined by common attributes such as source/destination IP, source/destination port, protocol, and interface. This granularity is essential for bandwidth accounting, capacity planning, security forensics, DDoS detection, and peering traffic analysis.
The core concept is straightforward: a network device (router, switch, or probe) observes packets traversing an interface, groups them into flows based on shared attributes, and periodically exports summarized flow records to an external collector for storage and analysis. The device does not mirror or capture full packets -- it exports metadata about the traffic. A single flow record might represent a 45-minute HTTP session that transferred 2.3 GB of data, condensed into a few hundred bytes of metadata. This compression is what makes flow monitoring practical at scale: even a core router forwarding millions of packets per second can export flow data without overwhelming the monitoring infrastructure.
There are three major flow protocol families in widespread use: Cisco's NetFlow (v5 and v9), the IETF's IPFIX (essentially NetFlow v10), and InMon's sFlow. Each takes a fundamentally different approach to flow tracking, and understanding the differences matters for choosing the right solution and interpreting the resulting data correctly.
NetFlow v5: The Original Flow Protocol
Cisco introduced NetFlow in the mid-1990s as a switching optimization -- the flow cache was originally used to accelerate forwarding decisions. The monitoring use case emerged as Cisco realized the flow cache contained valuable traffic analysis data and added an export mechanism. NetFlow v5, released with Cisco IOS 12.0, became the de facto standard for flow monitoring and remains in production use today.
A NetFlow v5 flow is defined by a 7-tuple:
- Source IP address
- Destination IP address
- Source port (TCP/UDP)
- Destination port (TCP/UDP)
- IP protocol number (6 = TCP, 17 = UDP, 1 = ICMP, etc.)
- Type of Service (ToS/DSCP) byte
- Input interface (SNMP ifIndex)
Any packet matching the same 7-tuple belongs to the same flow. The router maintains a flow cache in memory, and for each active flow it tracks: packet count, byte count, start time, end time, TCP flags (OR'd across all packets in the flow), source and destination AS numbers, source and destination prefix masks, and the output interface. When a flow expires (due to inactivity timeout, active timeout, or TCP FIN/RST), the router exports the flow record to the configured collector.
NetFlow v5 uses a fixed record format: each export packet contains a header followed by up to 30 flow records, each exactly 48 bytes. The fixed format is both a strength (simple to parse, fast to process) and a limitation (cannot be extended with new fields). The export is sent over UDP -- there is no acknowledgment, no retransmission, and no flow control. If the collector is down or the network drops export packets, flow data is lost silently.
NetFlow v5 Export Packet:
Header (24 bytes):
version = 5
count = number of flow records (1-30)
sysUptime = milliseconds since device boot
unix_secs / unix_nsecs = timestamp
flow_sequence = sequence number for loss detection
engine_type / engine_id = identify the flow cache instance
Flow Record (48 bytes each):
srcaddr, dstaddr = IP addresses
nexthop = next-hop router IP
input, output = SNMP ifIndex of ingress/egress interfaces
dPkts, dOctets = packet and byte counts
first, last = flow start/end time (ms since sysUptime)
srcport, dstport = L4 ports
tcp_flags = cumulative OR of TCP flags
prot = IP protocol number
tos = ToS byte
src_as, dst_as = source and destination AS numbers
src_mask, dst_mask = prefix length for src/dst
NetFlow v9: Template-Based Flexibility
NetFlow v9 (RFC 3954) replaced the fixed record format with templates. The exporting device first sends a template record that defines the field layout, then sends data records that conform to that template. This decoupling means v9 can carry any combination of fields -- IPv6 addresses, MPLS labels, VLAN tags, BGP next-hop information, or vendor-specific fields -- without protocol changes.
Template records are periodically re-sent (typically every few minutes) so that collectors can decode data records even if they missed the initial template. Each template has a numeric ID, and data records reference the template ID to indicate their format. A single exporter might define multiple templates -- one for IPv4 flows, another for IPv6, and a third for MPLS -- and interleave data records of each type in the export stream.
NetFlow v9 also introduced options templates, which carry metadata about the exporter itself -- sampling rate, interface names, VRF-to-table mappings, and other contextual information that collectors need to correctly interpret flow data.
IPFIX: The IETF Standard (NetFlow v10)
IPFIX (IP Flow Information Export), defined in RFC 7011, is the IETF standardization of NetFlow v9. Cisco participated in the standardization process, and the result is protocol-compatible with NetFlow v9 in many respects, with several important enhancements:
- Variable-length fields: IPFIX supports fields with variable lengths, enabling efficient encoding of strings (e.g., HTTP host headers, application names) and variable-length addresses.
- SCTP and TCP transport: in addition to UDP, IPFIX can be exported over SCTP (RFC 4960) or TCP for reliable delivery. SCTP is particularly well-suited because it provides message-oriented, reliable transport with multiple streams -- template records can be sent on a separate stream from data records, preventing head-of-line blocking.
- Enterprise-specific Information Elements: vendors can define custom fields using their IANA-assigned enterprise number, similar to SNMP enterprise MIBs. The IANA IPFIX Information Element registry (over 500 standardized elements) provides a common vocabulary, and vendors extend it with proprietary elements as needed.
- Structured data: IPFIX supports lists and sub-templates, allowing complex data structures like multiple MPLS label stacks or multiple DNS answers within a single flow record.
Despite IPFIX being the official IETF standard, many network operators and vendors still refer to flow export generically as "NetFlow." Most modern flow collectors (nfdump, GoFlow2, Elastiflow, Kentik) accept NetFlow v5, v9, and IPFIX interchangeably.
sFlow: Sampling-Based Flow Monitoring
sFlow (RFC 3176) takes a fundamentally different approach from NetFlow. Where NetFlow tracks every flow through a stateful flow cache, sFlow uses statistical sampling: it captures one out of every N packets (where N is the sampling rate, typically 1-in-1000 to 1-in-10000) and exports a copy of the packet header along with interface counters. There is no flow cache, no flow state, and no flow timeout -- each sampled packet is exported independently.
The sFlow architecture has two components:
- Packet samples: the switching/routing ASIC samples packets at the configured rate. For each sampled packet, sFlow exports the first 128 bytes of the packet (capturing Ethernet, IP, and transport headers) along with metadata: the input/output interface, the packet length, the sampling rate, and VLAN information. The collector reconstructs traffic statistics by multiplying each sample by the sampling rate -- if you see one sampled packet of 1500 bytes at a 1:1000 rate, you estimate 1000 packets and 1.5 MB of traffic of that type during the sampling interval.
- Counter samples: periodically (typically every 20-30 seconds), sFlow exports interface counters similar to what SNMP provides --
ifInOctets,ifOutOctets,ifInErrors, etc. These provide the baseline traffic volumes that help calibrate the packet sampling data.
sFlow's key advantages over NetFlow:
- Hardware implementation: because sFlow does not maintain per-flow state, it can be implemented entirely in the switching ASIC. The ASIC simply copies one-in-N packets to the CPU, which adds the metadata and exports it. This means sFlow has zero performance impact on forwarding regardless of the number of concurrent flows. NetFlow, by contrast, requires a flow cache (often in software) that can consume significant CPU and memory -- a SYN flood attack creating millions of micro-flows can exhaust a router's flow cache and degrade forwarding performance.
- Constant resource usage: sFlow's CPU and memory usage is proportional to the sampling rate, not the number of flows. At 1:1000 sampling on a 100 Gbps link, sFlow exports approximately 8,000 samples per second regardless of whether the link carries 100 flows or 100 million flows.
- Layer 2 visibility: because sFlow captures raw packet headers, it provides visibility into Ethernet-layer information (MAC addresses, VLAN tags, 802.1Q headers) that NetFlow typically does not export. This makes sFlow valuable for monitoring switched networks and data center fabrics.
- Wide vendor support: sFlow is supported by virtually all switch vendors (Arista, Broadcom/MEMORY-mapped switches, Dell, HP/Aruba, Mellanox/NVIDIA, Juniper) because its ASIC-level implementation is straightforward. NetFlow support is strongest on Cisco and Juniper platforms.
sFlow's main limitation is the statistical nature of the data. At a 1:1000 sampling rate, short flows (a single DNS query, a TCP SYN scan, a small API call) may not be sampled at all. The accuracy of traffic estimates improves with the volume of traffic: a flow transferring 10 GB will be sampled approximately 8,500 times (at 1:1000 on 1500-byte packets), providing a very accurate estimate. A flow consisting of three packets might never be sampled. This means sFlow excels at volumetric traffic analysis (bandwidth accounting, top talkers, protocol distribution) but is less suitable for detecting low-volume anomalies or individual connections.
Flow Timeouts and Cache Management
NetFlow and IPFIX flow caches use two timeout mechanisms to decide when to export a flow record:
- Inactive timeout (typically 15 seconds): if no packets matching a flow's 7-tuple arrive within this interval, the flow is considered finished. The flow record is exported, and the cache entry is freed. This timeout ensures that short-lived flows are exported promptly.
- Active timeout (typically 1800 seconds / 30 minutes): long-lived flows (a persistent SSH session, a continuous video stream) are exported periodically even while still active. The router exports a snapshot of the current counters, resets the flow's byte/packet counters, and keeps tracking. This ensures the collector sees data from long-running flows without waiting for them to end.
Additionally, TCP flows are exported when the router sees a FIN or RST flag, which signals the end of the connection. The tcp_flags field in the flow record contains the cumulative OR of all TCP flags seen during the flow -- a useful signal for the collector: a flow with only SYN flags (0x02) was likely a failed connection attempt; a flow with SYN+ACK+FIN (0x13) was a normal connection lifecycle; a flow with RST (0x04) was terminated abnormally.
DDoS Detection via Flow Data
Flow data is one of the most effective tools for detecting and analyzing DDoS attacks. Unlike packet capture, which cannot scale to the bandwidth levels of modern attacks (hundreds of Gbps to Tbps), flow data provides a compressed, manageable view of traffic patterns that reveals attack signatures:
- Volumetric attacks (UDP flood, DNS amplification, NTP amplification, memcached amplification): flow data shows a sudden spike in traffic volume from many sources to a single destination IP/port. The flow collector can detect this by monitoring per-destination traffic rates and triggering alerts when a threshold is crossed. The source IPs are typically spoofed (using IP spoofing), but the destination IP and port, protocol, and packet size distribution provide a clear attack fingerprint.
- SYN floods: appear in flow data as a large number of short-lived flows with only the SYN flag set, all targeting the same destination. The flow cache itself may become stressed, as each SYN creates a new flow entry that expires quickly -- monitoring the flow cache utilization on the router is an early warning sign.
- Application-layer attacks (HTTP flood, Slowloris): harder to detect in flow data because they use legitimate-looking connections. However, an abnormal number of flows to a single destination on port 80/443, combined with unusual geographic or ASN distribution of sources, can indicate an application-layer attack.
Many DDoS mitigation services and scrubbing centers use flow data as their primary detection mechanism. Services like Cloudflare Magic Transit, Akamai Prolexic, and NTT/GTT scrubbing ingest NetFlow/sFlow from customer edge routers, analyze it in real time, and automatically activate BGP-based traffic diversion (using BGP communities or anycast swings) when an attack is detected.
Sampled NetFlow and Scaling
On high-speed interfaces (40/100/400 Gbps), maintaining a per-packet flow cache is impractical -- the router would need to perform a flow cache lookup for every packet at line rate, which exceeds the CPU capacity of most platforms. The solution is sampled NetFlow: the router samples one out of every N packets (similar to sFlow) and only creates/updates flow cache entries for sampled packets.
Sampled NetFlow combines the statistical approach of sFlow with the flow-state tracking of NetFlow. The flow records still contain aggregated byte/packet counts, but these counts represent only the sampled subset. The collector must multiply all counters by the sampling rate to estimate true traffic volumes. The sampling rate is exported as an options template in NetFlow v9/IPFIX, allowing the collector to automatically apply the correct multiplier.
Cisco's implementation is called "Sampled NetFlow" or "Random Sampled NetFlow." Juniper calls theirs "Inline Jflow" with sampling. The sampling happens in hardware (the ASIC copies sampled packets to the CPU), and the flow cache management happens in software on the route processor. Typical sampling rates range from 1:100 on access/aggregation routers to 1:10000 on core routers handling multiple 100 Gbps links.
Flow Collectors and Analysis
The collector is where flow data becomes actionable intelligence. Collectors receive exported flow records, store them in a time-series database, and provide querying and visualization capabilities. Common open-source collectors include:
- nfdump/NfSen: the classic Unix-based flow collector and analysis toolkit.
nfcapdreceives and stores flow data in compressed binary files (one file per 5-minute interval), andnfdumpprovides a powerful command-line query language for filtering, aggregating, and sorting flows. NfSen adds a web-based frontend. Despite its age, nfdump remains the most memory-efficient collector for large-scale deployments. - GoFlow2: a modern Go-based collector that ingests NetFlow/IPFIX/sFlow and outputs structured data (JSON, protobuf) to Kafka, PostgreSQL, or other backends. Designed for cloud-native deployments and horizontal scaling.
- Elastiflow: ingests flow data into Elasticsearch/OpenSearch, providing Kibana dashboards for traffic analysis. Popular in environments that already run the Elastic stack.
- pmacct: a versatile collector that supports NetFlow, IPFIX, sFlow, and BGP. It can correlate flow data with BGP routing information, enriching flows with AS path, origin AS, and community data -- invaluable for ISP traffic engineering and peering analysis.
Commercial platforms like Kentik, Arbor Networks (now NETSCOUT), and Plixer FlowMon provide cloud-hosted or on-premises solutions with advanced analytics, machine learning-based anomaly detection, and integration with BGP routing data for traffic engineering.
Flow Data for Peering and Transit Analysis
ISPs and content providers use flow data extensively for peering analysis. By correlating flow records with BGP routing tables (using tools like pmacct or Kentik), operators can determine:
- Traffic matrix by AS: how much traffic is exchanged with each peer and transit provider. This directly informs peering decisions -- if you are sending 500 Gbps to AS 15169 (Google) via a transit provider, it may be cost-effective to establish direct peering at an IXP.
- Transit cost attribution: by mapping flows to the transit provider carrying them (based on BGP next-hop or output interface), operators can calculate per-provider traffic volumes and optimize cost by shifting traffic between providers.
- Prefix-level traffic analysis: identify which prefixes generate the most traffic, useful for traffic engineering and capacity planning on specific links.
- Geographic traffic distribution: by mapping destination IPs to geographic locations (using GeoIP databases), operators can plan CDN node placement and regional peering strategies.
Practical Deployment Considerations
Deploying flow monitoring at scale involves several non-obvious trade-offs:
- Export bandwidth: on a busy core router, NetFlow export can consume significant bandwidth. A router forwarding 100 million packets per second at 1:1000 sampling generates approximately 100,000 flow records per second. At ~100 bytes per IPFIX record, this is ~10 MB/s of export traffic. Ensure the management network or export path can handle this without congestion.
- Collector sizing: flow data accumulates rapidly. A medium-sized ISP might ingest 50,000 flows per second, totaling ~150 GB per day of compressed flow data. Plan storage for at least 30-90 days of retention for forensic analysis.
- Sampling rate selection: lower sampling rates (1:10000) reduce export volume and router CPU load but decrease accuracy for low-volume flows. Higher rates (1:100) provide better visibility but increase export traffic and collector load. The sweet spot depends on link speed, traffic diversity, and analysis requirements.
- Clock synchronization: flow timestamps are based on the exporter's system clock. If the router's clock drifts, flow timestamps will be inaccurate, making it difficult to correlate flows from multiple routers or to match flow data with external events. NTP synchronization to a stratum-1 source is essential.
- Asymmetric routing: in networks with multiple exit points, a single flow's packets may traverse different routers in each direction. The collector sees two separate unidirectional flow records. Correlating these into a single bidirectional conversation requires matching by the 5-tuple (swapping source and destination) -- a computationally expensive operation at scale.
NetFlow/sFlow and BGP Correlation
One of the most powerful applications of flow data is correlating it with BGP routing information. NetFlow v5 includes source and destination AS numbers derived from the router's BGP RIB at the time of flow creation. NetFlow v9/IPFIX can include the full BGP AS path, BGP next-hop, and BGP communities. This enrichment transforms raw IP-level flow data into AS-level traffic matrices.
For an ISP, this means answering questions like: "How much traffic from AS 32934 (Facebook) traverses our network?" "What percentage of our transit traffic to AS 3356 (Lumen) could be offloaded to a direct peering at DE-CIX?" "Which customer AS generates the most traffic toward AS 2906 (Netflix)?" These questions are unanswerable from SNMP interface counters alone -- they require flow-level data enriched with BGP routing context.
Tools like pmacct maintain a BGP session with the router (or a route reflector) and use the received routing table to enrich incoming flow records in real time. This approach is more accurate than relying on the AS numbers embedded in NetFlow records by the router, because the tool can re-lookup the routing table at analysis time and apply the most current BGP state.
See It in Action
Flow monitoring is the backbone of traffic engineering at every major ISP and content provider. The traffic statistics that inform peering decisions, transit capacity planning, and DDoS mitigation strategies are all derived from NetFlow, IPFIX, or sFlow data collected at the network edge.
Use the god.ad BGP Looking Glass to explore the networks of major flow collector and DDoS mitigation providers. Look up AS numbers to see how traffic analysis platforms are connected to the global routing table -- the same table whose traffic they are monitoring.