2026-06-11

Finance Bets on Transaction Foundation Models: Why Banks Build Their Own Instead of Wiring Up a General LLM

NVIDIA strings Revolut, Mastercard, Adyen, and Stripe into one narrative: the winning model in finance is a specialist trained on a firm's own transaction stream. Proprietary data is the real moat for vertical AI, but parts of this pitch deserve a discount.

foundation-models finance vertical-ai

Finance Bets on Transaction Foundation Models: Why Banks Build Their Own Instead of Wiring Up a General LLM — Photo / Unsplash

Summary

NVIDIA’s June 1 official blog post strings four firms (Revolut, Mastercard, Adyen, Stripe) into a single narrative: finance is shifting from “one specialist model per business line” to “one foundation model trained on the firm’s own transaction stream.” It calls the new species a transaction foundation model, and the core claim is to bring transformer architectures to structured tabular transaction data, learn a unified representation of customer behavior, and use it to drive fraud, credit, and recommendation tasks at once.

The judgment worth taking away is not “transformers won another domain.” It is this: in a data-locked vertical like finance, the moat is not model scale but who owns the transaction stream nobody else can replicate. That is exactly why these institutions build their own specialist models rather than wiring up a general LLM. Of course this is NVIDIA’s own sales narrative, and the repeatedly cited figures and the “the architecture is proven” tone should be discounted before you act on them.

What happened

NVIDIA defines a transaction foundation model as a large-scale AI system trained on billions of financial events (payments, transfers, product interactions, behavioral signals) that turns raw flow into reusable representation. The difference from a traditional fraud model is context: the traditional one evaluates isolated signals, while the foundation model reads the same transaction inside a context of timing, device, location, and prior activity. The post’s example is concrete: a payment at midnight means something very different when it is the fourth in 10 minutes, on an unfamiliar device, in a city the customer has never transacted from.

Four deployments carry the substance:

Revolut, with NVIDIA, built a transformer foundation model family called PRAGMA, trained on 24 billion events across 26 million user records in over 100 countries, running on a stack of Hopper GPUs plus the cuDF library plus Nemotron open models, deployed on Nebius cloud. The hardest claim, from Revolut’s head of group credit data science Tadas Kriščiūnas, is that feature engineering went from weeks or months to no time at all.

Mastercard is building a proprietary large tabular foundation model, trained on billions of anonymized transactions today and aimed at hundreds of billions, folding in fraud, authorization, chargeback, merchant location, and loyalty data, using NVIDIA NeMo AutoModel plus AWS plus Databricks. Adyen has deployed at scale, processing $1 trillion in payments, using reinforcement learning to lift conversion and cut risk for merchants; its AI product lead Dhruv Ghulati notes even a 0.1% authorization uplift translates to substantial incremental volume. Stripe uses the NVIDIA-plus-AWS platform to model full transaction context, claiming it blocked roughly $112 billion in fraud last year and cut fraud rates 38% on average.

NVIDIA also released a Build Your Own Transaction Foundation Model developer example, arguing any institution can start building transformer embeddings on its own tabular transaction data and plug into existing pipelines without rebuilding from scratch.

Why it matters

The line that actually changes a judgment is this: in finance, the training data itself is the moat. NVIDIA states it plainly, that transaction history is a proprietary asset competitors cannot replicate. This inverts the logic of the past two years of general LLMs. General LLM competition runs on public corpora plus model scale plus compute, where everyone starts from roughly the same internet text, the barrier is thin and gets matched fast. A bank’s 24 billion transactions and 26 million real users’ behavioral sequences can neither be bought nor scraped, and that is the gap others cannot close.

This answers the core question: why do these firms build their own specialist model rather than wire up an off-the-shelf general LLM. Two reasons. On data, the core risk data is structured tabular flow, not text, and will never be fed to an external model, so “call a third-party API” fails on compliance and data sovereignty grounds. On architecture, these firms chose to put transformers on tabular data and learn representations straight from raw flow, rather than repurpose a model designed for natural language. Together, this means the winner in financial vertical AI will not be whoever has the biggest model, but whoever has the most exclusive data and the ability to train its own flow into a unified representation.

It also signals a real architectural shift: from “fragmented task-specific models” to “a single unified representation.” NVIDIA’s case for the cost of fragmentation holds up: every new use case adds a model, every new market needs retraining, and models that cannot share context leave value on the table. A unified representation reusable across tasks does cut duplicated feature engineering and model maintenance, and Revolut’s “feature engineering to zero” is the most direct sign of that gain. This is a signal for anyone sitting on large volumes of structured behavioral data, not just banks.

Builder impact

If you build AI in finance or another data-locked vertical, the most concrete takeaway is to first map the scale and exclusivity of your proprietary data, because that decides whether building your own foundation model is even possible or worthwhile, not the model architecture comparison. Revolut’s scale (24 billion events, 26 million users) is a reference point, but the bar is not the absolute number; it is whether that data is genuinely unobtainable to others and covers long enough behavioral sequences. If the data is not exclusive or long enough, training your own transformer foundation model will likely lose to well-tuned task-specific models.

Second, do not treat “wire up a general LLM” as the default and then look for where it falls short. In structured, heavily-regulated core risk work, a general LLM is the wrong tool from the start: it lacks your proprietary data and its architecture is not tuned for tabular sequences. The direction worth evaluating is transformer-on-tabular, and whether one unified representation can replace your current pile of siloed specialist models and cut the duplicated feature engineering.

Third, stay sober: a unified representation is not free. Compressing fraud, credit, and recommendation into one model means that when it fails, it takes down every downstream task rather than one in isolation. Regulatory explainability, version rollback, single points of failure: the post mentions none of these, and they are unavoidable in production. That is exactly the part a vendor narrative skips, and you have to supply it.

What to ignore

The first thing to discount is the vendor-reported figures cited as settled fact. Stripe’s $112 billion in blocked fraud and 38% fraud-rate reduction, Adyen’s 0.1% authorization uplift, GFT’s 75% cut in false positives, Revolut beating task-specific models across the board: all of these are self-reported, relayed by NVIDIA, with no third-party audit. They are directionally credible, but treat the magnitudes as marketing numbers, not audited conclusions. A blanket claim like “a single foundation model outperforms task-specific models on every task” almost certainly rests on a chosen set of baselines.

The second is the urgency in “the architecture is proven, the infrastructure is ready, you can do it now.” The post’s closing line, “the data already exists, the architecture is proven, the infrastructure is ready,” is classic sales framing. The very cases it relays are firms like Revolut, Mastercard, and Stripe with top-tier data science teams and enormous data, repackaged as “any institution can do this now.” The Build Your Own developer example lowers the cost of starting, not the real hard parts of data scale, data governance, and model operations.

Third, do not read this as NVIDIA’s neutral industry observation. The whole piece lands on selling GPUs and a software stack: Hopper, cuDF, Nemotron, and NeMo AutoModel recur in every case. Its judgment on the transaction foundation model direction is right, but every success story it cites is also a hardware order, so discount the optimism by that conflict of interest.

FAQ

Can a general LLM do fraud detection and credit scoring directly?

It can run, but it does not fit core risk decisioning. Fraud and credit judgments need a continuous representation of a customer's full transaction sequence, and that data is structured tabular flow, not text, and is never handed to an external model. A general LLM has neither the proprietary data nor an architecture tuned for tabular sequences. Every case in NVIDIA's post (Revolut, Mastercard, Stripe) uses a self-trained transformer, not a wired-up GPT-style model.

How much data does it take to train a transaction foundation model?

By the numbers in the post, Revolut's PRAGMA trained on 24 billion events across 26 million users in 100-plus countries; Mastercard uses billions of anonymized transactions today and aims for hundreds of billions; Adyen has processed $1 trillion in payments. The scale itself is the barrier: it comes from years of owned business, cannot be bought or replicated, and that is exactly the moat.

Do transaction foundation models actually beat the task-specific models they replace?

NVIDIA says yes, but weigh the evidence strength. Revolut claims a single foundation model outperforms its task-specific models across credit scoring, fraud, and recommendations; Stripe claims it blocked around $112 billion in fraud last year and cut fraud rates 38% on average. These are vendor-reported figures with no third-party audit. The direction is credible; the exact magnitudes deserve a question mark.

Sources

Why Financial Institutions Are Converging on Transaction Foundation Models to Build Their Own Intelligence / official