Evaluating Opportunities & Risks of AI Agents in On-Chain Yield Generation
One million financial transactions executed while users slept. That's the claim from Giza, whose AI agents now manage positions for 10,000 active users with a reported 2x profitability edge over static strategies. The catch: no disclosed methodology, no risk-adjusted metrics, no third-party verification. For allocators weighing autonomous yield products, the performance story is intriguing, but the diligence gap is the real headline.
🎟️ Yield Summit Series 2026
500+ institutional leaders. Five global venues: Cannes (Mar 28-29) • Miami (May 4) • Amsterdam (Jun 1) • Singapore (Oct 5-6) • Abu Dhabi (Dec).
Register as Attendee — Use code INSTITUTIONAL-YIELD for 10% off.
Become a Sponsor
Featuring
Julien Bouteloup — Founder & CEO, Stake Capital
Xavier Meegan — Founder & CIO, Frachtis VC
Anto Joseph — Principal Security Engineer, Eigen Labs
Renç Korzay — CEO, Giza
Yair Cleper — Contributor, Lava Network
Slashing Mechanisms as Institutional Insurance
EigenLayer has introduced a novel accountability framework for AI agents that functions, in their words, like deposit insurance. The mechanism works through restaking: Ethereum validators commit their staked ETH to back agent performance, and if an agent fails its commitments, that stake gets slashed. "You can think of it as insurance if you will. So like how you have FDIC insurance with Robin Hood," explained Anto Joseph, principal security engineer at EigenLayer. Source
The comparison is provocative but incomplete. FDIC insurance has actuarial tables, coverage limits, claims procedures, and regulatory oversight. EigenLayer's slashing parameters? Undefined in public documentation. More than 5.5 million ETH has been staked on the platform, and AVS competition is accelerating. But for institutional adoption, banks need to verify: what triggers a slash, what's the coverage cap, and who adjudicates disputes. Without those answers, calling this "insurance" creates potential mis-selling exposure.
Deterministic Inference Enables Audit Trails
One of the more technically significant developments: EigenLayer has made open-source LLMs deterministic. That means asking the same question with the same seed produces the same response, every time. They've deployed this with at least four models, including GPT OSS from OpenAI, Llama, and Qwen. "We sign the responses. So you can come back to us later and check... If it is not, because we signed the response, you can slash the inference operator," Joseph noted. Source
This is genuinely useful for audit trails. Reproducibility matters for compliance. But here's the thing: determinism doesn't equal accuracy. A model can be perfectly reproducible and still make terrible financial decisions. Banks should treat verifiable inference as necessary infrastructure, not sufficient validation. Independent benchmarking against financial outcomes remains essential before any client-facing deployment.
Performance Claims Lack Risk-Adjusted Verification
Giza's numbers sound impressive on the surface. One million transactions. Ten thousand active users. A claimed 2x profitability improvement over static position management. Wrench, Giza's CEO, stated that "100% of them put that user in a more profitable place when they woke up." Source
But what's the benchmark? What time period? What's the Sharpe ratio? Maximum drawdown? Transaction failure rate? Slippage? None of this is disclosed. The methodology remains unclear from the transcript. For institutional allocators, self-reported performance claims without third-party verification and standardized risk-adjusted metrics are unsuitable for client marketing. Period. Any bank considering these products needs independent audits with proper attribution analysis before touching them.
🎟️ Diligence frameworks for on-chain yield—discussed live
Join 500+ allocators, risk officers, and infrastructure teams at the Yield Summit Series 2026.
Cannes • Miami • Amsterdam • Singapore • Abu Dhabi.
Register (code INSTITUTIONAL-YIELD for 10% off) | Sponsorship inquiries
Predictive Models Carry Mis-Selling Liability
Some teams are building AI systems that claim to predict market events. Rec News, running on Giza's verifiable computation infrastructure, reportedly predicted stablecoin depeg events four to five days before the October 10th liquidation. Julian from State Capital mentioned, "We had prediction that something was going on... I'm not saying that we can read the future, but you can really analyze signals." Source
Another team Xavier referenced has allegedly predicted seven or eight hacks or depegs accurately. Interesting if true. But "if true" is doing a lot of work in that sentence. These claims lack audited, time-stamped evidence with false positive and negative rates. For banks, marketing predictive capabilities to clients without documented track records creates serious mis-selling liability. The October liquidation saw over $1 billion move from risky DeFi to stable products, while traditional ETF outflows were only $200 million. The divergence suggests correlation models need updating, but that's different from claiming predictive power.
Privacy Infrastructure Remains Production-Immature
The privacy situation is, frankly, a mess. Julian framed it starkly: "Can you imagine? I think one of the biggest human history risks that we are facing today is if OpenAI gets hacked because it's way more than money. It's actually human knowledge, human history." Source
Joseph added context about a recent lawsuit where the New York Times requested 15 million chat records from OpenAI, noting that even "the CISO of OpenAI released their blog post two days ago. They said they are looking into client-side encryption and TEEs." Source EigenLayer plans to run inference products in full TEE using Blackwell GPUs, but that's future capability, not current production. The hardware access constraints are real.
For banks, this creates immediate operational risk. Client data processed through AI agents may be exposed in litigation discovery or breaches. Until privacy guarantees are production-ready, institutions should prohibit transmission of client PII or proprietary data through non-TEE AI infrastructure. The panel consensus identified underlying DeFi infrastructure, not the AI layer, as the biggest risk. Smart contract vulnerabilities, oracle dependencies, bridge risks for cross-chain execution: these multiply attack surfaces. Wrench acknowledged this directly: "I think the biggest risk is the underlying DeFi infrastructure currently, not on the AI side." Source
The technology is moving fast. Rec News has operated for six years. Giza claims to have automated everything they wanted to automate. Xavier expects agent-driven automation of Pendle looping and similar DeFi strategies within months. But institutional demand remains limited due to maturity concerns. As Xavier put it, "As a VC fund, we have capital, and when I'm looking to allocate some of the capital for yield, it's very unlikely for me to probably put this into agent pools right now because they're just not mature enough." Source
That's the honest assessment. The infrastructure exists. The use cases are emerging. But the diligence frameworks haven't caught up. Banks evaluating this space need to establish minimum maturity criteria covering AUM thresholds, track record duration, independent audits, and KYC/AML procedures before any product consideration. The opportunity is real. So are the gaps.
How to Position as an Allocator (Bank, Treasury, Fund)
So you're actually considering this. Fair enough. The infrastructure is live, the use cases are real, but the diligence gap is wide. Here's a framework for moving from "interesting" to "investable" without blowing up your risk budget or your reputation.
Define your principal loss tolerance upfront. Agent pools aren't mature enough for most institutional mandates yet. Xavier said it plainly: even VC funds with risk appetite are hesitant to allocate here today.
Segment product types by risk profile. Staking with slashing-backed accountability (like EigenLayer's model) sits differently than structured DeFi looping strategies. Know which bucket you're in before you wire funds.
Verify custody and developer access controls independently. Can the operator rug you? EigenLayer claims infrastructure prevents unilateral fund access. Get that audited. Don't trust the pitch deck.
Require deterministic, signed inference for any AI-driven execution. Reproducibility enables audit trails. Four models are live with this capability today. If your vendor can't offer it, walk.
Establish reporting cadence and governance escalation paths. Who reviews agent decisions? What triggers a manual override? Build this before deployment, not after the first incident.
Start with a capped pilot. Set explicit stop conditions: drawdown thresholds, transaction failure rates, slippage limits. Scale only after you've seen a full market cycle. Or at least a few months of live data.
Block client PII transmission until TEE infrastructure is production-ready. It's not there yet. Blackwell GPUs are coming, but "coming" isn't "deployed."
🎟️ Yield Summit Series 2026
Where institutional allocators meet infrastructure teams to close the diligence gap.
Cannes (Mar 28-29) • Miami (May 4) • Amsterdam (Jun 1) • Singapore (Oct 5-6) • Abu Dhabi (Dec).
Register as Attendee — Code INSTITUTIONAL-YIELD for 10% off.
Become a Sponsor
Glossary
Slashing
A penalty mechanism in proof-of-stake systems where a validator's staked cryptocurrency is partially or fully confiscated for failing to meet protocol commitments or acting maliciously. Used here to mean the enforcement layer backing AI agent performance guarantees on EigenLayer.
Why it matters: Without clearly defined slashing triggers and coverage caps, institutions cannot model downside exposure or compare this mechanism to traditional insurance products.
Restaking
The practice of committing already-staked Ethereum (or its derivatives) to secure additional services or protocols beyond the base Ethereum network. Validators essentially pledge the same collateral to back multiple commitments simultaneously.
Why it matters: Restaking introduces layered counterparty risk—if an AI agent fails, the recourse depends on stake that may already be encumbered by other obligations.
AVS (Actively Validated Service)
A service or application on EigenLayer that relies on restaked ETH for its security guarantees. AVSs compete for validator backing, and the article notes this competition is accelerating.
Why it matters: As AVS competition intensifies, allocators must assess whether the stake backing any given AI agent service is diluted across too many competing commitments.
Deterministic Inference
A configuration of AI model execution where identical inputs (including a fixed random seed) always produce identical outputs. This contrasts with standard LLM behavior, which typically varies responses even for repeated queries.
Why it matters: Determinism enables reproducible audit trails for compliance, but does not validate whether the model's outputs are financially sound—only that they are consistent.
TEE (Trusted Execution Environment)
A secure, isolated area within a processor that protects code and data from external access, including from the system's own operating system or administrators. Used here to describe planned privacy infrastructure for AI inference.
Why it matters: Until TEE-based inference is production-ready, client data processed through AI agents remains exposed to litigation discovery and breach risks.
Sharpe Ratio
A standard measure of risk-adjusted return, calculated as excess return over a risk-free rate divided by the standard deviation of returns. Higher values indicate better compensation for volatility.
Why it matters: The article flags that Giza's performance claims lack Sharpe ratio disclosure—without it, allocators cannot distinguish skill from leverage or luck.
Depeg
An event where a stablecoin loses its intended price parity (typically 1:1 with USD), trading significantly above or below its target value. Depegs can trigger cascading liquidations across DeFi protocols.
Why it matters: Claims of predicting depegs carry serious mis-selling liability if not backed by audited, time-stamped evidence with documented false positive rates.
Oracle Dependencies
Reliance on external data feeds (oracles) that supply off-chain information—such as asset prices—to on-chain smart contracts. Oracle failure or manipulation can cause incorrect contract execution.
Why it matters: The article identifies oracle dependencies as part of the underlying DeFi infrastructure risk that multiplies attack surfaces for AI agent strategies.
Pendle Looping
A DeFi strategy involving Pendle protocol, where users recursively deposit and borrow against yield-bearing tokens to amplify exposure. The article references this as a candidate for near-term AI agent automation.
Why it matters: Looping strategies compound both yield and risk; automated execution without proper guardrails can accelerate losses during adverse market moves.
Verifiable Computation
Cryptographic techniques (such as zero-knowledge proofs) that allow a third party to confirm that a computation was performed correctly without re-executing it. Used here in the context of proving AI agent behavior.
Why it matters: Verifiable computation provides the technical foundation for holding AI agents accountable, but institutions must verify what exactly is being proven and what remains outside the proof scope.
Bridge Risk
The security exposure created when assets are transferred between different blockchain networks via cross-chain bridges. Bridges have historically been high-value targets for exploits.
Why it matters: Cross-chain execution by AI agents introduces bridge risk as an additional attack vector that compounds underlying DeFi infrastructure vulnerabilities.
Attribution Analysis
A performance evaluation method that decomposes returns into their sources—such as market exposure, sector allocation, or security selection—to identify what actually drove results.
Why it matters: The article calls for independent audits with proper attribution analysis before banks consider AI yield products, to distinguish genuine alpha from market beta or survivorship bias.
Blackwell GPUs
NVIDIA's next-generation GPU architecture, referenced in the article as the hardware required to run AI inference within TEEs at scale. As of the article's writing, access remains constrained.
Why it matters: Privacy guarantees for AI agent infrastructure depend on hardware that is not yet widely deployed—"coming" is not equivalent to "production-ready" for risk assessment purposes.
Read more
View all
Community
Follow us












