A fraud detection system runs batch processing nightly. It analyzes yesterday's transactions and flags suspicious accounts by morning. But a fraudster who steals credit card data at 9 AM has all day to drain accounts before the system catches them. By the time the batch job runs, the money is gone.
Stream processing handles data continuously. Events flow in, get processed, and results flow out with minimal delay. This article covers the core concepts you need for system design interviews: windowing, time semantics, and delivery guarantees.
What is Stream Processing?
In batch processing, input is bounded—a finite dataset that eventually ends. A batch job reads all input, processes it, and terminates.
Stream processing handles unbounded data—events that arrive continuously with no defined end. The system runs indefinitely, processing each event as it arrives.
The input typically comes from a message broker like Apache Kafka. Producers write events to topics (named categories of messages). Consumers read from topics and process events. The broker retains events for some configurable period, allowing replay if needed.
A stream processing application reads events from topics, applies transformations, and writes results—either to other topics for downstream processing or to databases for serving.
Windowing: Grouping Events by Time
Stream processing often needs to aggregate events: count page views per minute, sum sales per hour, detect patterns over user sessions. But you can't aggregate an infinite stream—you need boundaries.
Windows divide the continuous stream into finite chunks for aggregation. Three patterns appear in most systems.
Tumbling windows divide time into fixed, non-overlapping intervals. A 1-minute tumbling window groups all events from 10:00:00-10:00:59, then 10:01:00-10:01:59, and so on. Each event belongs to exactly one window. Simple and efficient for periodic aggregates like "page views per minute."
Sliding windows overlap. A 1-minute window sliding every 30 seconds produces windows 10:00:00-10:00:59, 10:00:30-10:01:29, 10:01:00-10:01:59. Events near boundaries appear in multiple windows. Use sliding windows when you need smooth, rolling aggregates—moving averages, rate limits over "the last N minutes."
Session windows group events by activity gaps. If events arrive within a timeout (say, 30 minutes), they belong to the same session. When the gap exceeds the timeout, a new session starts. Sessions capture user behavior patterns where fixed intervals don't fit—a browsing session might last 5 minutes or 2 hours.
The choice depends on your query patterns. Tumbling windows work for dashboards refreshing on fixed intervals. Sliding windows work for rate limiting and anomaly detection. Session windows work for user journey analysis.
Event Time vs Processing Time
When did an event happen? Two answers exist, and the difference matters more than it might seem.
Event time is when the event actually occurred—timestamped at the source. A user clicked at 10:00:00 according to their device.
Processing time is when the system processes the event. Network delays, queue backlogs, or consumer restarts mean processing might happen at 10:00:05—or 10:05:00 if the system fell behind.
Consider counting clicks per minute. With processing-time windows, a traffic spike delays events. Some 10:00 clicks get processed at 10:05 and count toward the wrong window. A consumer restart replays events, double-counting them. The dashboard shows different numbers each time you look.
With event-time windows, clicks count based on when users clicked—regardless of processing delays. The same data always produces the same counts. The trade-off is complexity: events arrive out of order, so the system must handle them.
Most production systems use event time. Processing time is simpler but produces misleading results under load.
Watermarks: Knowing When Windows Are Complete
With event time, when can you close a window? You see an event at 10:01:15. Can you finalize the 10:00-10:01 window? Maybe—or maybe an event at 10:00:58 is still in flight.
Watermarks track progress through event time. A watermark at 10:01:00 means: "I believe all events before 10:01:00 have arrived." When the watermark passes a window's end time, the system emits results for that window.
How does the system know? Two common approaches:
Source-based watermarks track the oldest event in the input queue. If the oldest pending event has timestamp 10:01:00, all earlier events must have been consumed.
Heuristic watermarks estimate based on observed delays. If 99% of events arrive within 30 seconds, advance the watermark to 30 seconds behind the latest event.
Late events—those arriving after their watermark—require a policy:
- Drop late events (simplest, but loses data)
- Recompute and emit updated results (correct, but downstream must handle updates)
- Side-output late events for separate handling
The trade-off is latency vs completeness. Aggressive watermarks emit results quickly but miss late events. Conservative watermarks wait longer but produce more complete results.
So far we've covered how to group events (windowing) and when events happened (time semantics). The next question: what happens when processing fails?
Delivery Guarantees
When processing fails partway through, what happens to in-flight events? Stream systems offer different guarantees.
At-most-once: Process each event zero or one time. On failure, events may be lost. The simplest to implement—acknowledge receipt immediately, don't retry. Acceptable for metrics where occasional drops don't matter.
At-least-once: Process each event one or more times. On failure, replay from the last checkpoint and reprocess. No event loss, but duplicates possible. The consumer reads an event, processes it, then fails before acknowledging. On restart, it reprocesses the same event. Downstream systems must handle duplicates (idempotent writes—writes that produce the same result even if executed multiple times).
Exactly-once: Process each event exactly once, even across failures. The gold standard—no loss, no duplicates. Achieved through atomic commits: the processing result and the input offset (position in the event log) commit together or not at all. Systems like Flink and Kafka Streams support exactly-once semantics.
Exactly-once has performance overhead. Many systems use at-least-once processing with idempotent operations, achieving effectively-once results with lower complexity.
Common Use Cases
Real-time analytics. Dashboards showing live metrics: active users, transactions per second, error rates. Window aggregates emit to time-series databases for visualization.
Fraud detection. Score transactions as they happen. Aggregate user behavior over rolling windows—too many transactions in 5 minutes, unusual geographic jumps, spending pattern changes. Block suspicious activity before funds transfer.
Event-driven applications. Order placed triggers inventory check, triggers payment, triggers shipping notification. Each service processes events from the previous stage. Decoupled, scalable, resilient.
Alerting and monitoring. Aggregate error logs per minute. When the count exceeds a threshold, emit an alert. Correlate events across services to detect distributed failures.
Modern Stream Processing: Flink
Apache Flink emerged as the leading stream processor for production use cases. Here's why:
True streaming. Unlike early systems that batched events internally (micro-batching), Flink processes each event individually with low latency. This matters for use cases where even sub-second delays are problematic.
Native event time. Flink handles event time, watermarks, and late events as first-class concepts. Windowing operations work correctly even with disorder and delays.
Exactly-once guarantees. Distributed snapshots enable exactly-once processing without sacrificing performance. Checkpoints commit atomically across the entire pipeline.
Stateful processing. Operators maintain state across events—session aggregates, pattern detection, windowed joins. State survives failures through checkpointing. A session window needs to track all events in the current session; Flink manages this state reliably.
Flink runs both batch and stream workloads. Batch is just a bounded stream—the same API handles both. This unification simplifies development when you need both processing modes.
What's Next
This article covered stream processing fundamentals: windowing for aggregation, event time for correctness, watermarks for progress tracking, and delivery guarantees for reliability.
The hybrid architectures article explores how production systems combine batch and stream processing—why maintaining both is sometimes necessary and how modern systems unify them.