Compute and Competition in AI: Different FlOPs for Different Folks

Community Article Published February 12, 2026

Upvote

Public discussion of the compute costs of AI has been dominated by colossal estimates: hundreds of thousands of GPUs, millions of hours of compute, billions of dollars in investment. The work of institutes like Epoch AI is indicative of these trends: their analysis of the largest AI models shows a consistent growth by a factor of 2-3 times per year for the past 8 years, putting costs on track to surpass a billion dollars by 2026.

Although this analysis of the most expensive and “performant” models is certainly important, it can also warp both public perception and governance decisions about AI in general. As we’ve seen this meteoric rise of costs for models advertised as defining a “frontier”, similar leaps have been achieved in doing more with less, and the majority of ways AI is developed across scientific and commercial applications still follows a very different logic.

What kinds of AI models are used IRL?

Institutes like Epoch AI define “notable models as those that reach state-of-the-art improvement on a recognized benchmark, are highly cited (over 1000 citations), are of historical relevance, and have shown significant use“. While these models can be considered impactful or visible, they are not necessarily representative of most commercial applications of AI, which are usually more context-specific and call upon structured data formats as diverse as voice applications, genomics, image analysis, or tabular data that don’t fit as well with language models.

In fact, sectors from healthcare to manufacturing and finance require models that are specifically tailored for their use cases, not only in terms of domain-specific knowledge, but also considering issues such as data privacy and regulatory constraints which preclude the usage of models that are, for instance, hosted in another region. These characteristics make smaller, more efficient and self-contained models more attractive to all but the very largest players in the AI ecosystem. This means that although the notability criteria used by outlets like EpochAI may be indicative of the AI use cases of mainstream tech companies that offer services to a breadth of different customers and applications, they are far from universal.

How much does it cost to train a current (commercial) AI model?

"Frontier" companies training the largest models have been notably secretive about their costs. Thankfully, investigative work by external actors provides some data on the subject. Notably, according to Epoch AI, the average cost of training a "notable model" has steadily risen between 2022 and 2026, with several entries passing the 100M$-bar in the last year; even without accounting for additional costs incurred by experiments and data elicitation. This makes training these kinds of models prohibitively expensive for not just small and medium enterprises (SMEs), but even for most major commercial actors across industries.

In contrast, the training costs of AI models - including ones with strong scientific value and commercial applications - developed by the overwhelming majority of actors are both much more diverse and overall significantly lower. In a recent report, we investigated the training costs of models that are notable on many different accounts. For example, some are at the frontier of training or size efficiency while matching the performance of the largest commercial models on particular applications, others approach their benchmark performance across the board while being significantly more transparent and adaptable, and others push the boundaries of particular scientific domains or data modalities. We found that training these models can still range from as little as a few thousand dollars to a few 10M$ at the most, in some cases as a shared expense between several collaborating organizations with shared interests.

The focus on one-size-fits-all models (one size = very large)

In recent public discussions, the rise of AI has been mostly synonymous with the rise to prominence of a particular version of the technology; where a single model is supposed to be able to support dozens, hundreds, or even according to some a fully “general” set comprising any possible applications of the technology. This focus has been reflected by a few different evolutions in terminology, from "foundation" models that are supposed to encode general knowledge in pre-training to be later adapted to applications to "General Purpose AI" models as a regulatory category in the EU AI Act to what large corporate AI developers currently describe as “frontier” models. To make things even more confusing, most consumers primarily identify these models with the systems that they power – ChatGPT, Gemini, Grok, etc. – which tend to combine or switch between different compute backends and versions of the underlying model.

Given that these models are meant to be as general-purpose as possible, the paradigm for evaluating AI models has also shifted towards equally generic evaluations. Benchmarks such as SWE-Bench and GPQA/MMMLU are intentionally designed to be as broad as possible, and used to evaluate new models as they are trained and to disclose results in technical reports and media articles. This therefore incentivizes the development of models that are as generic as possible in order to perform well on these benchmarks, which further contributes to sustaining the general-purpose AI narrative.

As we explored in our recent paper that explored the broader impacts of the bigger-is-better paradigm in AI, the focus on generalizability and scale is problematic for several reasons – from its environmental unsustainability (i.e. the amount of energy required for training and deploying models of increasing size and complexity) to the narrowing of the field in terms of the problems that are addressed with AI (since these are limited to those that have sufficient data to train larger models), as well as the exacerbation of the concentration of power in the field, which seemingly disempowers those who lack access to the necessary funds and compute in order to compete with increasingly large models – i.e. academic researchers, governments and nonprofit organizations. The paper, as well as our own related work, also calls for a more transparent approach to calculating and reporting the costs of AI models, and balancing these costs with the potential benefits the models are meant to have. However, accounting for these costs is more complicated than it may seem at first glance, notably because of the lack of transparency in the field – explore this in more detail below.

AI is a lot more accessible than you think!

While much of the media attention and hype is focused on the equivalent of the most expensive top-performing Formula 1 cars for AI (i.e. LLMs with trillion+ parameters), that doesn’t mean that everyone needs to drive one to pick up their groceries. As with any means of transportation, there are many options for everyday tasks, and members of the AI community should be building and adapting their own models and choosing the right combination of data, parameters and deployment choices that works for their budget and context.

Despite the apparent success and visibility of “frontier” models, they need not be the default choice (and arguably should not be) for most use cases, which have specific scope, usage, and context. Unfortunately, the additional costs of these systems - be they monetary, environmental, societal, or even in terms of competitive advantage for companies – are seldom made apparent to users (and, when they are, they are often incomplete at best); who understandably may be tempted to disregard the economic and environmental sustainability advantages of alternative approaches (including open and open source AI 🤗) for the convenience of using models that to tend to provide more out-of-the-box impressive results.

Look at the options before committing to one

As we show in our recent work (and the discussion above), smaller, less compute-intensive models are both commercially viable and often just as competitive as their larger counterparts in specific use-cases, from OCR to genome sequencing and even some of the most popular use cases of AI-supported software engineering. This strongly implies that we should flip the script on our approach to ai: rather than starting with what's seen as the "overall best AI model" and look for applications it might automate, start with a question of what it is that we want to do, and look for the right (and most efficient) AI tool for it!

The Hugging Face Hub takes this approach by allowing users to filter models not only based on the task that they do (including multimodal tasks, time series forecasting and even robotics!) but also based on their parameter count and even their reported evaluation results. This kind of filtering can be used as a first step to picking a base model for fine-tuning, or for testing different out-of-the-box models to see how the accuracy that they report on benchmarking datasets translates to real-world performance.

In terms of better understanding the environmental and financial cost of deployinf these systems, tools like the AI Energy Score project use empirical testing to measure the energy consumption of models across different tasks and modalities, in order to help users choose the most appropriate model for a given application. This showed a 340,000x difference between the models with the highest and lowest energy use overall, and the most recent cohort of models tested found that reasoning models use orders of magnitude more energy given the large volume of tokens that they output. Other initiatives like the Open LLM Leaderboard and MLPerf provide similar insights, also incorporating performance metrics on different tasks and models. We provide an additional overview of these costs for some of the models selected in our study.

One of the advantages and disadvantages of AI is its sheer diversity - the plethora of approaches that exist and the ways in which we can use them to solve actual problems. This can make “one-size-fits-all” approaches particularly enticing, since they can, at least in theory, make it possible to solve many tasks at once, and have shown incredibly promising results on a variety of benchmarks. But at the end of the day, no single benchmark dataset or energy efficiency metric will ever meaningfully represent the many characteristics of real-world deployment – which come with constraints in terms of hardware, data and usability that differ enormously from one organization to another.

Taking the time to shop around for the best overall approach for the task at hand instead of defaulting to the newest and shiniest externally operated AI product can go a long way in terms of both reducing AI’s negative impacts as well as fostering positive outcomes for the people deploying (and subjected to) AI systems; whether it means picking the most cost-efficient existing model to solve a specific task, collecting data to adapt an existing system to a deployment context, or developing one's own AI stack from scratch - alone or with other organizations with similar interests and requirements. We hope that providing a more diverse view of what AI can cost will encourage organizations to see where they can (and should!) have an AI strategy that they truly own; and depend on.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote