AdTechData EngineeringObservability

Your Ad Platform Processes 1M+ Events Per Minute. Where Does the Data Go?

High volume doesn't mean high visibility. Most of that data disappears before anyone looks at it.

Programmatic ad platforms are some of the highest-throughput systems in tech. A million ad requests per minute is normal. But throughput and visibility are different things.

At most platforms, that event data lands in Elasticsearch or a similar hot store optimized for real-time serving — not analytics. Retention is measured in days, not months, because storing that volume durably is expensive.

The result: when something goes wrong with an ad network, the ops team finds out when a client complains. The data to detect the problem existed two days ago. It's gone now.

The Retention Trap

Elasticsearch is excellent at what it does — fast indexing, real-time search, low-latency reads. It's not a data warehouse.

At the scale of 1M+ events per minute, ES clusters are typically configured with aggressive retention policies. Two to three days is common. That's fine for real-time ad serving. It's useless for trend analysis, root-cause investigation, or anything that requires looking back more than 48 hours.

The data exists just long enough to serve its operational purpose, then ages out. Any analytical value it held disappears with it.

Extracting Before It Expires

The fix isn't replacing Elasticsearch — it's building an extraction layer that captures data before retention policies delete it.

A custom extractor that reads from ES on a schedule, lands the data in object storage like S3, and loads it into a warehouse gives teams durable access without interfering with the operational cluster.

The key design decision is what to extract. At 1M+ events per minute, warehousing everything is expensive and usually unnecessary. Selective extraction — filtering for the events that matter analytically while excluding noise — keeps costs manageable without losing the data that matters.

From Volume to Visibility

Having the data in a warehouse is step one. Making it useful is step two.

Clickstream analytics layered on top — session reconstruction, user behavior paths, conversion funnels — turns raw event logs into something product and ops teams can actually use. Monitoring dashboards that flag anomalies in near real-time give ops teams the ability to detect issues in minutes, not days.

The difference between a reactive ops team and a proactive one is usually just data access latency.

In Practice

At a programmatic advertising platform processing over a million ad requests per minute, event data was landing in Elasticsearch with a two-day retention window. The ops team had basic alerting but routinely missed network issues until clients reported them.

After building a custom extraction pipeline that moved selected events from ES through S3 into a durable warehouse — and layering clickstream analytics and monitoring dashboards on top — network issue detection time dropped by 20%. For the first time, internal teams were catching problems before clients did.

Takeaway

At high volume, the default is data loss. Retention limits, ephemeral stores, and siloed systems mean most event data disappears before anyone can learn from it.

Intentional extraction and warehousing is the difference between having data and having visibility.

Tech Stack

SnowflakeRedshiftElasticsearchS3AirflowSparkPythonTableau

Dealing with something similar?

Let's talk about your situation and see if there's a fit.

Book a Discovery Call