Beyond the Data Lake: Why ‘Better Data’ is Only Half the Battle for AI-Driven Cyber Defense
Beyond the data lake: Why better data is only half the battle for AI-driven cyber defense. Explore agentic observability, adversarial AI, and the semantic gap.
In his latest piece for Forbes, Kolawole Samuel Adebayo hits on a fundamental truth that has haunted the machine learning community for years: "Garbage in, garbage out." As we move through 2026, the narrative in cybersecurity has shifted from "Do we have enough AI?" to "Is our AI fueled by the right data?" Adebayo argues that the missing link to unlocking AI’s true potential in defending our digital frontiers is the quality, context, and granularity of the data we feed it.
While Adebayo’s diagnosis is correct, his prescription—focusing on "better data"—might be an oversimplification of a much more chaotic reality. If we want to move from reactive defense to the "continuous readiness" that 2026 demands, we need to talk about more than just data hygiene. We need to talk about the semantic gap and the asymmetry of AI weaponization.
The "Better Data" Fallacy
Adebayo suggests that by refining our data pipelines, we can sharpen AI's predictive capabilities. On the surface, this can’t be argued. High-fidelity logs, decrypted traffic inspection, and unified telemetry are the bedrock of any modern SOC (Security Operations Center). However, there is a diminishing return on data volume.
The industry is currently drowning in "data lakes" that have turned into "data swamps." The problem isn't just that the data is messy; it’s that it is ephemeral. In the time it takes to clean, label, and ingest a dataset for training, a nation-state actor has already pivoted to a new exploit chain. "Better data" is often "yesterday’s data." To truly unlock AI, we shouldn't just be looking for better data—we should be looking for Faster, Agentic Context.
The Case for Agentic Observability
Let’s consider supplementing Adebayo’s argument with the transition from passive data to active agents. As noted in recent developments by firms like Fabrix.ai, the future isn’t just a centralized LLM crunching logs. It’s a swarm of autonomous AI agents that live at the edge of the network.
Instead of shipping terabytes of data to a central brain, these agents perform real-time "micro-inference." They don't just see a spike in traffic; they understand the intent of the service calling that traffic. By moving the intelligence to the data source, we solve the latency issue that Adebayo’s "better data" model inherently faces.
Challenging the Optimism: The Adversarial Counter-Move
We must also provide a counter-perspective to the idea that better data is a silver bullet. We are entering the era of Adversarial AI. If the "good guys" are using better data to train their defenses, the "bad guys" are using that same logic to craft Poisoning Attacks. If an attacker knows your AI relies on specific telemetry patterns to identify a breach, they will spend months "grooming" your data—slowly introducing noise that desensitizes the model to the eventual attack. In this light, "better data" actually creates a more rigid, and therefore more fragile, defense system.
The more precisely an AI is tuned to "clean" data, the more easily it can be fooled by a sophisticated outlier that has been carefully camouflaged within that data.
The Human-Centric Correction
Adebayo’s narrative focuses heavily on the technical unlock. However, the most critical data point in any cybersecurity ecosystem remains the human intent. Advanced robotics and ML models are excellent at pattern recognition, but they fail at "low-probability, high-impact" events—the "Black Swans" of the cyber world. We should view the "Better Data" movement not as a way to replace human intuition, but as a way to filter the noise so that human creativity can tackle the 5% of threats that AI will never see coming.
Final Thoughts
Kolawole Samuel Adebayo is right: we are leaving the "Big Data" era and entering the "Right Data" era. But "Right Data" is a moving target. To survive 2026, organizations must move beyond the dream of a perfect dataset.
We need adversarially robust models that expect the data to be compromised. We need agentic architectures that act before the data even reaches the lake. And most importantly, we need to remember that in the game of cat and mouse, the mouse is also using AI to study the cat.
Better data won't win the war; it just raises the stakes.
Written/published by Kevin Marshall with the help of AI models (AI Quantum Intelligence).
Source: Better Data Could Unlock AI’s Full Potential In Cybersecurity

