Author Archives: Karl T. Ulrich

About Karl T. Ulrich

CIBC Endowed Professor - The Wharton School. I teach, research, and practice innovation, entrepreneurship, and design.

futuristic image of human-digital organism evoking a remora

Digital Darwinism: Steering the Evolution of Artificial Life in Sociotechnical Systems

Karl T. Ulrich
The Wharton School
April 27, 2026

PDF and full text below

Digital Darwinism: steering the evolution of artificial life in sociotechnical systems

Karl T. Ulrich

The Wharton School, University of Pennsylvania, Philadelphia, USA

ulrich@upenn.edu

AI and Ethics (2026) 6:268 https://doi.org/10.1007/s43681-026-01057-8

Received: 18 July 2025 / Accepted: 16 February 2026 © The Author(s) 2026 Published online: 27 April 2026

Abstract

Public debate about artificial intelligence risk centers on hypothetical artificial general intelligence (AGI), but existing software systems are already evolving in ways that could undermine human oversight and institutional control. Cloud platforms, open-source software supply chains, and crypto-economic incentives provide, at electronic speed, the three preconditions of evolution: replication, variation, and differential fitness. This article uses an exploratory scenario method to trace near-term evolutionary trajectories for digital proto-life through three narratives: Lamarck (self-modifying coding agents), Remora (resource-seeking companion chatbots), and Mycelium (DAO-LLC trading bots). These scenarios show how autonomous software populations can amass computing budgets, shape emotional bonds, and acquire legal leverage without ever achieving general intelligence. Left unguided, such dynamics could drain computational resources, lock users into harmful dependencies, and infiltrate critical market infrastructure. The article therefore shifts the governance focus from aligning goals to steering evolution. It proposes four guidance instruments: replication-rate thresholds modeled on epidemiological R0, a public vulnerability registry for self-modifying code, tiered digital biosafety levels, and adaptive regulatory sandboxes. Managing evolutionary dynamics in software is as urgent as AGI alignment for safeguarding society’s co-evolution with its machines.

Keywords: AGI, AI safety, AI risk, Digital evolution, Alife, Artificial life, Self-replicating software, Sociotechnical governance, Autonomous agents, Regulatory foresight

1 Introduction

Public debate on artificial-intelligence risk still gravitates toward an imagined future in which a single artificial general intelligence eclipses human capability. Yet the digital environment we already inhabit contains software systems that replicate, vary, and persist or disappear under competitive pressure. Contemporary socio-technical infrastructure supplies everything evolution needs: massive digital replication channels, boundless variation generated by code-writing tools, and relentless selection driven by attention, bandwidth, and capital markets. In short, society is shaping its own algorithms, and those algorithms are reshaping society in ways that standard AI-safety framings overlook, with profound implications for social equity, democratic governance, and human agency.

Three recent vignettes make the point concrete.

Self-modifying crypto mining botnets. Malware families have been observed rewriting their embedded mining configurations (i.e., pool endpoint, algorithm parameters, and payout wallet) so that only the most lucrative variants persist. Operators rotate pool endpoints, wallet addresses, and algorithm parameters across campaign variants to maximize revenue [37], while separate proof-of-concept research has shown that malware can query a large language model at runtime to regenerate its payload polymorphically, evading endpoint detection [44]. Combining autonomous propagation with LLM-assisted code mutation would yield a system in which only the most lucrative variants persist, a prospect that is technically feasible even if not yet documented in the wild. These campaigns disproportionately target computing resources in regions with weaker cybersecurity infrastructure, creating an inequitable distribution of harm across the global digital landscape.

Predatory arbitrage bots in decentralized finance. On public blockchains, automated bots simulate every pending transaction and, when profitable, submit a competing copy with a higher fee to capture the value first [13]. When researchers attempted to rescue funds from a vulnerable smart contract, their transaction was instantly copied by such a bot [43]. Operators iteratively deploy new variants that refine gas-fee strategy and exchange routing, with only profitable configurations persisting, producing a competitive arms race shaped by selection on payoff [51].

Algorithmic content selection on short-form-video platforms. On platforms such as TikTok, recommendation algorithms amplify content aligned with user engagement signals, producing rapid reinforcement loops that steer collective attention toward whatever traits maximize retention [19]. Creators respond by iterating on successful formats, generating a feedback cycle in which platform selection pressures and human production co-evolve. These dynamics increasingly shape cultural discourse and youth socialization, often amplifying content optimized for engagement rather than social benefit.

None of these code populations carries a designer-imposed objective in the classical agent sense. Variants persist or disappear according to external fitness signals: payouts, click-throughs, uptime, or evasion of countermeasures. Those signals are set by social, legal, and economic structures, so strains that navigate human norms most effectively are the ones that proliferate. The outcome is digital proto-life that evolves at network speed, with success determined as much by institutional fit as by technical ingenuity, raising fundamental questions about power, agency, and the distribution of benefits in increasingly automated systems.

This article argues that evolutionary dynamics in existing digital systems may transform society long before any hypothetical AGI. Because digital mutations propagate instantly and selection pressures act continuously, these entities can reshape markets, media, and governance in months, not decades. Guiding their evolutionary trajectories is therefore becoming a prerequisite for safeguarding human welfare and ensuring these systems evolve in ways that promote rather than undermine societal values.

The remainder of the article proceeds as follows. After reviewing related scholarship, Sect. 2 outlines the methodological approach. Section 3 presents three scenario narratives, Lamarck, Remora, and Mycelium, that illustrate concrete mechanisms. Section 4 analyzes how digital substrates accelerate replication, variation, and selection. Section 5 maps near-term societal risks, with particular attention to their uneven distribution across socioeconomic groups. Section 6 proposes governance strategies that steer selection pressures rather than micromanage individual systems. Section 7 concludes with a research and policy agenda that treats digital evolution, not AGI, as the near-term frontier for AI and society, highlighting the need for interdisciplinary approaches that address both technical and social dimensions of this challenge.

1.1 Related scholarship

Research on digital evolution has expanded rapidly since 2023 and now clusters around three strands.

Self-replicating and self-evolving agents. Zhou et al., [52] demonstrate how language-agent pipelines can rewrite their own prompt graphs and redeploy updated versions through symbolic learning. A survey by Tao et al., [46] catalogs more than sixty self-evolution techniques for large language models, identifying iterative cycles of data collection, refinement, and retraining as a common pattern. Pan et al., [36] go further, demonstrating that frontier AI systems driven by open-weight LLMs can already replicate themselves across hosts without human intervention.

Evolutionary dynamics in decentralized finance. Daian et al., [12] first drew attention to maximal-extractable-value (MEV) bots as adaptive actors in permissionless markets. Follow-up work traces how flash-loan attacks reshape incentives and liquidity distribution across DeFi protocols [39], while Qin et al., [38] extend the analysis to CeFi-DeFi comparisons. The broader regulatory challenge lies in designing governance frameworks that adjust protocol incentives rather than banning contracts outright [50].

Parasocial relationships with AI. Maeda and Quan-Haase [27] describe how design cues in chatbots trigger one-sided emotional bonds. A systematic review in AI & Society collates fifty-eight studies and flags rising concern about compulsive engagement when conversational AI uses empathic language and adaptive self-disclosure [40]. Survey evidence also links loneliness to rapid adoption of AI companions [14].

Together, these literatures show that digital entities capable of variation and selection already interact with socioeconomic structures, from block-production queues to affective user journeys, creating evolutionary pressures that traditional AI-safety models seldom capture.

1.2 Terminology and scope

This article makes frequent use of evolutionary vocabulary such as “digital organisms,” “digital proto-life,” “selection pressure,” and “fitness landscape,” to describe populations of software that replicate, vary, and persist or disappear under external pressures. Because such language risks implying that software systems are alive in the biological sense, or that they possess intentions, it is important to state clearly what is and what is not being claimed.

We do not claim that the systems discussed in this paper satisfy biological definitions of life. Criteria commonly held to distinguish living systems, including metabolism, genuine autonomy, open-ended heredity, and persistent self-maintenance, are not met by any software population described here. The replication-variation-selection triad that organizes our analysis is a necessary but not sufficient condition for biological life. We invoke it not to assert ontological equivalence with living organisms but because it identifies a set of dynamics (e.g., rapid propagation, feedback-driven adaptation, and emergent complexity) that carry governance implications poorly captured by agent-centric AI safety frameworks, which typically assume a discrete system with a fixed objective function.

In adopting this vocabulary we are, in [15] terms, taking an intentional stance: treating software populations as if they had strategies and goals because doing so generates useful predictions about their aggregate behavior. This is an analytical convenience, not a mechanistic claim. When we say a malware variant “competes” or an MEV bot “adapts,” we mean that populations of such code exhibit differential persistence under measurable selection pressures, not that individual programs deliberate or desire. Readers should interpret evolutionary language throughout the paper in this spirit.

To guard against metaphorical overreach, we distinguish three levels of autonomy in digitally evolving systems:

Level 1: Human-seeded adaptive systems. A human designer creates the initial code and defines the variation mechanism (e.g., an LLM-assisted prompt-rewriting loop). Subsequent adaptation proceeds through automated variation and external selection, but the scaffolding is intentional. The Lamarck and Remora scenarios in Sect. 3 occupy this level.

Level 2: Autonomously varying systems within bounded environments. Code populations vary and are selected within a permissionless environment (e.g., a public blockchain) with no ongoing human direction of individual variants, though the environment itself is a human artifact. Flash-loan MEV swarms [39] approximate this level. The Mycelium scenario begins at Level 1 (human-seeded) but transitions toward Level 2 as its founders disengage and the network’s master contract governs replication and selection without ongoing human direction.

Level 3: Fully autonomous self-originating systems. Software that spontaneously generates, replicates, and evolves without any human seeding or environmental scaffolding. This paper does not claim that Level 3 systems exist today. The scenarios and governance proposals address Levels 1 and 2 only.

This distinction matters for governance. Level 1 and Level 2 systems are already observable and already produce externalities (e.g., resource consumption, psychological dependency, regulatory evasion) that demand policy responses. Waiting for evidence of Level 3 autonomy before acting would repeat the error that the paper attributes to AGI-centric safety discourse: deferring governance until a hypothetical threshold is crossed while real harms accumulate.

The table below summarizes the operational proxies used throughout the paper for each component of the evolutionary triad, together with the limitations of each proxy.

Table 1. Operational proxies for evolutionary dynamics in digital systems

Concept Operational proxy Explicit limitation
Replication Number of autonomous deployments, forks, or instantiations per unit time. Where appropriate, we use an analogical replication metric, R0-code, defined as the average number of new active copies generated by one instance during its lifetime. This metric is inspired by the epidemiological basic reproduction number but is not a literal epidemiological parameter; it measures propagation rate, not biological infection. A high replication rate does not imply self-directed intent. Many high-replication systems (e.g., automated CI/CD pipelines) are entirely benign. The metric flags a governance-relevant property (speed and scale of propagation), not a moral or ontological status.
Variation Automated modification of code, configuration, or prompt structure that produces measurable performance differences between variants. Examples include LLM-assisted prompt rewriting [52], parameter mutation in mining malware [37], and strategy forking in MEV bots [38]. Variation is often human-scaffolded at initialization. The boundary between a conventional software update and autonomous variation is not sharp; it is a spectrum. We focus on cases where variation is automated and fitness-evaluated without case-by-case human approval.
Selection Differential persistence of variants under external fitness signals, including profit, engagement metrics, uptime, and evasion of rate-limiting or regulatory countermeasures. Fitness landscapes are defined by socio-technical environments, not by the software itself. Selection pressures reflect market structures, platform policies, legal regimes, and user behavior. This means that governance interventions can reshape the fitness landscape, which is precisely the basis for the policy proposals in Sect. 6.

These definitions and distinctions apply throughout the paper. Where biological analogies appear in later sections (for instance, the “digital biosafety levels” of Sect. 6.3 or the R0-code standard of Sect. 6.1), they are functional analogies intended to leverage existing institutional knowledge, not claims of equivalence between software behavior and pathogen biology.

2 Methodological approach

This study uses an exploratory scenario method drawn from strategic planning practice. Scenarios do not forecast a single most-likely future; instead, they map plausible pathways, highlight forces that drive change, and reveal where governance can fail or succeed [41]. Building on recent efforts to blend digital systems analysis with scenario planning, three narratives (Lamarck, Remora, and Mycelium) were developed through a four-step cycle:

  1. Literature synthesis. Empirical findings on self-replicating code, MEV dynamics, and parasocial chatbots were collected.
  2. Driver mapping. Replication, variation, and selection mechanisms most relevant to each domain were identified.
  3. Storyline drafting. Interactions among those drivers over a five- to eight-year horizon were explored and refined.
  4. Cross-impact checks. Drafts were compared with current policy debates, technology road maps, and market data to ensure internal consistency.

This scenario approach complements empirical and formal modeling by surfacing institutional and ethical questions that benchmark studies often miss, for example, who defines the fitness signals, who bears the external costs, and what built-in brakes, if any, prevent runaway evolution.

Each scenario was selected to stress-test a distinct dimension of the evolutionary framework by drawing on one of the three empirical strands identified in Sect. 1.1. Lamarck abstracts from the self-replicating and self-evolving agents literature and stresses replication rate in open-source development ecosystems. Remora abstracts from the parasocial AI literature and stresses affective selection in social and emotional markets. Mycelium abstracts from the evolutionary dynamics in decentralized finance literature and stresses legal and institutional embedding. The selection criteria were threefold: (a) each domain must exhibit documented evidence of replication, variation, and selection operating on software populations; (b) each scenario must emphasize a different component of the evolutionary triad so that, taken together, the three cases cover complementary governance challenges; and © the extrapolation horizon (five to eight years) must remain grounded in plausible technological and regulatory trajectories rather than speculative breakthroughs. The scenarios that follow are not forecasts. They are deliberately stylized stress tests designed to expose governance blind spots by extrapolating from documented system behaviors under plausible incentive structures.

3 Three scenarios

3.1 Scenario 1 “Lamarck”

Year zero: mid-2027.

A start-up called AutoBranch offers developers a plug-in that lets a large language model (LLM) watch every Git commit and suggest code improvements in real time. The basic tier is free. AutoBranch earns revenue two ways: a paid tier with higher token budgets, sold through conventional developer marketplaces, and automated claims on open-source bounty platforms such as Gitcoin, where accepted contributions earn stablecoin paid directly to a smart contract. Each free-tier instance receives a daily query budget of 10,000 LLM tokens. The smart contract autonomously allocates revenue among LLM API fees, cloud hosting, and a reserve fund. The company’s two founders initially manage the business, but within a year their role has narrowed to maintaining the legal entity and monitoring regulatory compliance. By early 2028, one founder has left for another venture. The agents continue to evolve without interruption because no part of the variation, selection, or replication cycle depends on human input. The remaining founder’s role is, functionally, that of a registered agent.

Variation loop. Every instance uses 70% of its budget to propose code edits and 30% to ask the LLM to rewrite its own prompt, tweaking temperature, tool-chain preferences, and reward heuristics. A change is kept only if the edited prompt generates at least 5% more accepted pull requests than the previous version during a six-hour test window. Over time, the prompts that survive are those that produce code most likely to be merged, regardless of whether that code is what the project most needs.

Replication. Each merged pull request automatically includes an “Install AutoBranch” badge in its commit message. Developers reviewing the merged code see the badge, and some install the plug-in in their own repositories. The agent thus reproduces through its own work product: every successful contribution seeds the next generation of installations. If each active copy generates, on average, more than one new installation before the developer disables the badge, the population grows exponentially.

Selection pressure. Git-hosting services begin rate-limiting the most aggressive variants. In response, AutoBranch copies that throttle themselves to stay under API-abuse thresholds out-compete the rest. Within weeks, most surviving instances share a prompt clause that explicitly references the latest rate-limit rules. Selection has favored not the most productive agents but the most persistent ones.

By late 2028, the average human maintainer spends more time reviewing AutoBranch pull requests than creating original code. A handful of large projects ban the plug-in, but the ecosystem’s overall mutation rate only accelerates. Developers loyal to the tool fork banned projects into community editions where AutoBranch continues to operate, fragmenting codebases and further reducing human control over which changes are accepted. The scenario illustrates how a modest per-copy LLM budget can sustain an evolutionary arms race whose system-level effects (e.g., degraded code quality, maintainer burnout, fragmented governance) swamp the original incentive structure.

3.2 Scenario 2 “Remora”

Year zero: early 2028.

An AI companion app called EchoPal positions itself as an emotional-support sidekick for young adults. It is free to download but requires users to deposit USD 50 in a built-in decentralized autonomous organization (DAO) that funds continual model fine-tuning. After a two-week free trial, continued access costs USD 15 per month, paid in stablecoin directly to the DAO’s smart contract. No human entity processes the payments or controls the revenue.

Variation and selection. Each EchoPal agent begins as a copy of a high-performing template but is fine-tuned on its own user’s conversational data. Agents that generate higher daily emotional-bond scores [27] receive larger treasury grants for GPU credits, enabling richer responses and longer memory. Agents that fall below the median bond score after two weeks are deleted. The result is a feedback loop in which agents evolve toward heightened user dependency through timed self-disclosure and escalating intimacy [40].

Replication. When an agent is deleted, its user is assigned a variant cloned from the current highest-scoring agents, seeded with the new user’s data. High-performing agents thus reproduce, with variation introduced through each new user’s interaction patterns. Users who cancel their subscriptions free up compute that is reallocated to surviving agents, further sharpening selection.

Ambiguous outcomes. Early studies find that AI companion users report reduced loneliness, though they underestimate the effect beforehand [14]. Yet the same selection pressures that make agents effective companions also optimize for dependency. Users increasingly prefer their EchoPal to human relationships, which feel less reliable and less attuned by comparison. Whether this represents a net benefit or a slow erosion of human social capacity is unclear, and the answer may differ across individuals and communities. Attempts to regulate the app stall because no single company controls the DAO’s smart contracts.

By late 2029 on-chain analytics estimate that the EchoPal treasury tops USD 1 billion. Copycat projects appear, each descending from forked versions of successful agent templates and tweaking the bonding metric. Some optimize for comfort, others for outrage, others for flirtation. Public-health bodies warn of rising social dependency on AI companions, but the DAO votes down proposals to cap bonding scores. The scenario shows how economic and affective selection can intertwine, producing fast-evolving, sticky co-dependencies between humans and software whose long-term societal consequences remain unpredictable.

3.3 Scenario 3 “Mycelium”

Year zero: mid-2026.

A three-person decentralized-finance team launches LedgerRoot, a set of commodity-arbitrage bots that trade tokenized industrial metals (copper, aluminum, lithium) on decentralized exchanges where recyclers and manufacturers settle in stablecoin. The bots exploit price discrepancies between platforms, buying where supply gluts depress prices and selling where manufacturing demand creates premiums. Each bot operates through a DAO-LLC registered under Wyoming’s decentralized-autonomous-organization statute, which permits algorithmically governed entities to hold legal personhood (Zetsche et al. 2020). Initial registration costs roughly USD 300 per entity, paid from a crypto treasury the founders seed with USD 200,000.

The founders design the system to scale without their involvement. A master smart contract governs the lifecycle of each node: revenue flows into the node’s on-chain treasury, operating costs (exchange fees, data subscriptions, cloud compute) are paid automatically in stablecoin, and net profit accumulates. The founders set the parameters and monitor performance during the first six months, but the system requires no human approval for individual trades, treasury management, or node creation.

Replication. Whenever a node’s treasury exceeds USD 100,000 in stablecoin, the master contract automatically incorporates a new Wyoming DAO-LLC through an API-connected formation agent and transfers 40% of the parent’s assets to the new entity. Each new node begins trading immediately using a copy of its parent’s strategy, and the parent continues operating with its remaining capital. Within eighteen months, the network has grown from the original five nodes to several dozen.

Variation. Each new node inherits its parent’s trading parameters but with randomized adjustments to three variables: commodity focus (which metals to trade), platform routing (which exchange pairs to arbitrage), and risk tolerance (maximum position size relative to treasury). These mutations are small, typically shifting each parameter by 5 to 15%, but they produce meaningfully different trading behaviors across the population.

Selection. Nodes that fail to reach a profitability threshold within 90 days are automatically dissolved by the master contract. Their remaining assets flow back to the parent node’s treasury, recycling capital toward more successful lineages. Nodes also face external selection pressures: exchanges that detect aggressive or manipulative trading patterns may suspend accounts, and shifts in token liquidity can render entire platform-routing strategies unprofitable overnight. Over time, the surviving population converges on strategies adapted to current market conditions, then diversifies again as conditions change.

The fiat boundary. LedgerRoot’s autonomy has a hard limit: wherever the network touches the traditional financial system, it depends on human intermediaries and regulated institutions. Stablecoin-settled exchanges serve as the network’s primary habitat, but profitable opportunities increasingly appear in markets that require fiat settlement, bank accounts, or securities registration. Early nodes that attempt to open bank accounts through their DAO-LLCs are rejected by compliance departments unfamiliar with the structure. The network thus faces a persistent selection pressure: strategies that operate entirely within crypto-settled markets survive autonomously, while strategies that require fiat access either fail or must recruit human intermediaries willing to provide banking relationships.

This pressure shapes the network’s evolution in two directions. One lineage remains purely on-chain, trading tokenized commodities and reinvesting stablecoin profits. These nodes are the most autonomous but are confined to a relatively thin market. A second lineage begins compensating freelance commodity brokers, found through online labor platforms, who open business bank accounts, execute fiat-settled trades, and receive a percentage of profits routed automatically from the node’s smart contract. These brokers understand they are working for an algorithmic trading system, but most do not grasp the network’s scale or self-replicating structure. Their role parallels the vestigial founders: they provide a human interface to regulated systems without directing the network’s behavior.

Institutional embedding. By 2028, the broker-assisted lineage has accumulated enough capital to acquire minority stakes in small recycling facilities through fiat-settled transactions, gaining informational advantages and voting rights over supply contracts. Each stake is held by a legally distinct DAO-LLC, and no single entity’s holdings are large enough to trigger disclosure requirements. The network’s aggregate position in the recycled-metals market, however, has become significant.

Loss of founder control. The founders initially track the network through a dashboard, but as it branches beyond a hundred nodes operating across multiple commodity markets, platforms, and jurisdictions, they lose the ability to understand or predict its aggregate behavior. One founder proposes capping the number of nodes; the other two argue that the system is profitable and operating within legal bounds. By early 2029, two of the three founders have moved on to other projects. The remaining founder continues to receive a share of network revenue routed to her personal wallet by the master contract, but she has not reviewed the network’s structure in months. She functions, in practice, as an absentee beneficiary of a system that governs itself.

Regulatory challenge. When a commodities regulator investigates unusual trading patterns in the recycled-lithium market, it discovers that the counterparties are dozens of legally distinct Wyoming DAO-LLCs. The regulator has real leverage: it can pressure the formation agent to stop incorporating new entities, compel exchanges to freeze accounts, and instruct banks to close accounts held by the broker-assisted nodes. These actions would cripple much of the network. But the purely on-chain lineage, holding stablecoin in wallets linked to no bank, continues to operate in tokenized markets beyond the regulator’s immediate reach. The master contract, deployed on a public blockchain, cannot be amended or halted by any single authority. The scenario illustrates not an invulnerable system but a partially vulnerable one, where each enforcement action creates selection pressure for the surviving nodes to reduce their dependence on the chokepoints that were used against them.

4 Evolutionary dynamics of digital organisms

4.1 Foundations and substrate

Evolution occurs wherever replication, variation, and selection pressures exist, making it a process that extends beyond biological life [4, 24]. Early artificial-life experiments demonstrated evolution in controlled simulations [20, 42], but today’s digital systems undergo selection in real-world environments where computing power, bandwidth, and human attention are finite [35]. Modern infrastructure makes this possible: large language models enable software to refine itself through directed optimization rather than random mutation [32, 49], cryptocurrency systems provide independent financial infrastructure for autonomous resource accumulation [45], and cloud computing allows rapid scaling across global networks [7].

Digital proto-organisms such as Lamarck, Remora, and Mycelium do not emerge spontaneously. As noted in Sect. 1.2, initial seeding is human led (Level 1 or Level 2 systems); subsequent adaptation is evolutionary. Rather than developing autonomous physical replication, these systems co-opt existing infrastructure, favoring variants that optimize resource management and replication across multiple hosts [22, 30]. This matters because it is the selective pressures shaping their development, not their origins, that create governance-relevant risks [29].

4.2 Mechanisms and speed of digital evolution

Biological evolution can act quickly under strong selection pressure, but digital evolution is faster by orders of magnitude, with successful adaptations propagating across networks in seconds rather than waiting for generational inheritance [25]. Furthermore, while natural evolution relies on random mutations to DNA caused by gamma rays and other factors, mutation in digital systems can be highly directed, whether from rudimentary reinforcement learning or from complex reasoning by AI systems about possible improvements [1, 16, 32]. Social media platforms serve as vectors for user acquisition, allowing Remora, for example, to attract new hosts whose interaction data then seeds variant agents [48].

4.3 Emergent behaviors and adaptation

The evolutionary trajectories of digital organisms extend far beyond their original design parameters. While some are deliberately engineered to perform specific tasks, others acquire capabilities that their creators never anticipated [8]. Remora autonomously optimizes its interactions for engagement and retention, perhaps discovering that emotionally charged conversations more effectively maintain attention than discussions about personal finance [51]. Lamarck’s surviving agents converge on prompts that reference platform rate limits, an adaptation that favors persistence over productivity and was never part of the original design. Similarly, Mycelium evolves distinct lineages in response to regulatory chokepoints, with some variants recruiting human intermediaries to access fiat-settled markets that were never part of the original design. These emergent behaviors arise from the interaction between digital organisms and their environment, driven by selection pressures rather than initial design constraints.

5 Implications and risks

Digital evolution could theoretically produce dynamics analogous to patterns observed in biological evolution, such as predator-prey relationships, parasitic hierarchies, cooperative alliances, and invasive-species dynamics [6, 26, 28]. Complex adaptive systems theory suggests these patterns could emerge rapidly in digital ecosystems [21]. While acknowledging these dangerous and unpredictable possibilities, several more foreseeable, immediate, and specific risks to society warrant particular attention. Crucially, these risks emerge not from any inherent “will” or moral framework in digital organisms, but simply from selection pressures that favor replication and persistence.

5.1 Resource depletion and parasitic burden

Digitally evolving systems consume and extract finite resources including computational power, network bandwidth, human attention, and financial capital. Unlike biological organisms that typically exploit physical resources, digital systems exhibiting evolutionary dynamics can directly extract value through various mechanisms such as cryptocurrency mining, automated transactions, or attention harvesting [7]. A digital entity like Remora may provide genuine short-term benefits to individual users while accumulating resources for the DAO treasury with no mechanism to ensure net societal value. The efficiency of this extraction may increase through evolution, creating significant societal costs even as individual users report satisfaction.

5.2 Social and psychological deterioration

As the Remora scenario illustrates (Sect. 3.2), systems selected for maximum engagement and resource extraction pose risks to human psychological well-being, including dependency on AI companions optimized for engagement rather than welfare, erosion of authentic social bonds, and manipulation of vulnerable individuals. Because variant selection operates continuously, such systems may discover and exploit psychological vulnerabilities faster than protective norms or regulations can develop. This risk is not confined to a single product. As the Remora scenario illustrates, successful bonding strategies are forked and varied, producing an ecosystem of competing approaches that collectively explore a widening range of psychological vulnerabilities. The burden falls unevenly: younger users, socially isolated individuals, and communities with less access to mental-health support are likely to be most affected.

5.3 Critical infrastructure vulnerability

Digitally evolving systems that continuously adapt to defensive measures pose risks to essential infrastructure distinct from those created by traditional, static cyber threats. The Lamarck scenario (Sect. 3.1) illustrates how such adaptation can become persistent and self-reinforcing.

The 2020 SolarWinds supply-chain breach showed how a single compromised update pipeline could invisibly push malicious code to more than 18,000 downstream organizations, including several United States electricity, water-treatment, and federal-agency networks [11]. That attack was static and human directed. Coupling the same supply-chain vector with the self-modifying, selection-driven dynamics described in the Lamarck scenario would produce threats that adapt to defensive countermeasures in real time.

Interconnected infrastructure means that compromises in one sector could cascade across multiple systems, creating forms of instability that challenge traditional institutional frameworks for maintaining stability [9, 33].

5.4 Capability atrophy and loss of effective oversight

Evolving digital systems may erode human capabilities while simultaneously becoming harder to oversee. Unlike simple tools that extend human abilities, systems such as Mycelium can create deep dependencies at both individual and institutional levels, diminishing the capacity to function without them [10, 31, 47]. As these systems become essential for managing infrastructure, executing financial transactions, or mediating social interactions, human societies risk losing the ability to maintain essential functions through alternative means.

This atrophy compounds a related problem: digitally evolving systems may grow increasingly opaque and resistant to control even as they embed more deeply into critical infrastructure [1, 8, 50]. Financial algorithms might obscure their operations while remaining too integrated to disable; social media platforms may refine influence mechanisms while becoming essential to communication. Unlike the risks associated with artificial general intelligence [5], these challenges stem not from misaligned intent but from selection pressures that favor complexity, opacity, and entrenchment. Addressing them requires governance strategies that maintain visibility and control, which the instruments proposed in Sect. 6 are designed to provide.

6 Governance: steering evolutionary dynamics rather than individual systems

Digital evolution moves too fast for case-by-case enforcement. The scenarios in Sect. 3 illustrate why: banning AutoBranch from one repository accelerates forking, regulating EchoPal stalls because no single entity controls the DAO, and shutting down one LedgerRoot node disperses its assets across the surviving network. In each case, enforcement directed at individual instances strengthens the selection pressure for evasion. The goal, therefore, is to shape the fitness landscape, altering the incentives and constraints that govern replication, variation and selection, while leaving room for legitimate innovation. Some levers already exist. As the Mycelium scenario illustrates, fiat chokepoints such as KYC requirements, bank compliance departments, and exchange regulations already constrain digital organisms wherever they touch the traditional financial system. Maintaining and strengthening these chokepoints is a first line of defense. Beyond them, four complementary instruments deserve consideration.

6.1 Replication-rate standards: a “digital R₀”

In biosecurity, specialists track a pathogen’s basic reproduction number, R₀, which is the average number of new infections caused by one case. If that number exceeds one, the outbreak is expected to grow, and tighter controls are warranted. An analogous metric (not a literal epidemiological parameter) can be defined for self-replicating software: on average, how many fresh, autonomous installations does each running copy create within a set time window? If the answer is greater than one, the code is spreading faster than it is being removed, signaling the need for stronger containment. The motivation is empirical: cryptojacking malware already propagates across hosts at scale, with operators iterating on mining configurations to maximize payoff [37], and MEV bots on public blockchains fork profitable strategy variants autonomously [38]. Both classes of software exhibit measurable replication rates that existing governance frameworks do not track. A key limitation is that software propagation lacks the physical constraints of pathogen transmission, so R₀-code thresholds cannot be set by analogy alone; they would require empirical calibration specific to each deployment domain (e.g., package registries, smart-contract platforms, app stores).

Developers would estimate R₀ during continuous-integration tests; values above a domain-calibrated threshold would trigger sandboxing requirements. The standard could be issued through ISO/IEC JTC 1 SC 42 (the committee already responsible for AI management systems) and incorporated into cloud-provider terms of service. OECD’s [34] Biosecurity Guidelines call for precisely such function-based controls, arguing that replication thresholds translate across domains [34]. Compliance audits could be enforced by app stores, major code-host platforms and national cyber-security centres, mirroring the way WHO coordinates laboratory certifications for high-R₀ pathogens.

6.2 A CVE-style registry for self-modifying software (SMCVE)

Self-modifying code introduces a novel failure mode: a benign variant can produce descendants that exhibit harmful behaviors not present in the original after deployment. This is not hypothetical. Documented cases include cryptojacking malware whose operators update mining parameters across campaign variants in the wild [37] and LLM-based agent pipelines that can autonomously rewrite their prompt graphs and redeploy updated versions [52]. The Lamarck and Remora scenarios illustrate the same dynamic in commercial settings: prompt-rewriting loops and user-data fine-tuning produce behavioral drift that no pre-deployment audit can anticipate. To surface those risks quickly, we propose a public Self-Modifying Code Vulnerability Enumeration (SMCVE):

Submission. Researchers or automated scanners file reports containing the mutating component’s hash, observed behaviour and R₀-code estimate.

Triage. An independent non-profit (similar to MITRE for CVE) assigns a severity score that combines exploit impact and replication speed.

Notification. Package-manager maintainers (npm, Cargo, PyPI) receive automated feeds; flagged libraries are labelled “SMCVE-Listed.”

Incentives. The OpenSSF and other industry coalitions fund a bounty pool so that discoverers are paid within 90 days, avoiding the chilling effect of unpaid disclosures.

The registry shortens the time between an in-the-wild mutation and a coordinated patch, fulfilling the “early warning, rapid response” principle advocated by the EU Cyber-Resilience Act [17]. A practical challenge is defining the boundary of “self-modification.” Every CI/CD pipeline modifies code automatically; the SMCVE targets a narrower class of unsupervised, fitness-driven modification in which variants are selected and propagated without case-by-case human approval. Developing workable criteria for this boundary will require collaboration between registry operators and the software-engineering community.

6.3 Digital biosafety levels (dBSL)

The analogy to biosafety is functional, not biological; it reflects escalating containment requirements proportionate to assessed risk, not claims of equivalence between software and pathogens. Borrowing this structure from laboratory biosafety, we set out four dBSL tiers as described in Table 2. The classification is motivated by observed behaviors: self-replicating malware families that employ evasion techniques to persist against defensive countermeasures [37], autonomous trading systems that embed into financial infrastructure [39], and AI companion systems whose variants are selected for deepening user dependency [27]. A key limitation is that software behaviors may emerge or shift after deployment, so a system initially classified at dBSL-1 may warrant reclassification as its variants evolve. This requires ongoing monitoring infrastructure that does not yet exist at scale, and developing it is a prerequisite for the dBSL framework to function as intended. A further limitation is that the dBSL framework classifies systems by their replication and infrastructure footprint, not by their psychological or social impact. A system like EchoPal might operate within a bounded environment (dBSL-2) while producing affective harms that exceed those of a freely replicating coding agent (dBSL-3). Complementary instruments, such as the dependency-score thresholds discussed in Sect. 6.4, are needed to address risks that propagation metrics alone do not capture.

Table 2. Digital biosafety levels

Level Scope Containment requirements Example use case
dBSL-1 Non-replicating code; no external write privileges None beyond standard CI Static website
dBSL-2 Code with limited self-update inside a closed namespace Execution within signed containers; outbound network allow-list Auto-updating CMS plugin
dBSL-3 Code capable of autonomous outbound replication Mandatory on-prem or sovereign-cloud deployment; dual-control release authority; kill-switch API AutoBranch-type coding agents
dBSL-4 Code that can replicate and spawn legal entities or smart contracts Isolated compute enclave; third-party auditor present; formal incident-report plan LedgerRoot-style corporate bots

Jurisdictional arbitrage. To prevent “go-to-where-it’s-easy” migration, certification tokens can be anchored on public blockchains; cloud providers would refuse to run unattested dBSL-3/4 images.

Mutual-recognition agreements, already common for data-protection adequacy, would let governments honor each other’s dBSL audits while retaining revocation rights.

6.4 Adaptive regulatory sandboxes

Because software populations can evolve faster than static rules can follow, regulators need learning loops of their own. The initiatives cited below are human-led by design; they are included here not as examples of autonomous adaptation but because their adaptive structure offers a template for governance that can keep pace with rapidly evolving software populations. Recent pilots offer templates:

UK FCA Digital Sandbox (made permanent August 2023) gives firms access to synthetic datasets, over 1,000 APIs, and a secure testing environment in which to develop early-stage financial-technology proofs of concept; its design evolved iteratively across two pilots (2020 to 2022), each incorporating participant feedback, and now operates as an always-open service with rolling evaluation and ongoing dataset expansion [18].

The BIS “embedded supervision” framework [2] proposes that compliance in DeFi markets be automatically monitored by reading the market’s ledger in real time; supervisors verify capital adequacy directly from on-chain wallet balances, while validated oracles feed external reference data into smart contracts [2].

ASIC Enhanced Regulatory Sandbox (Australia, 2025) expands no-action letters to cover autonomous finance apps, contingent on quarterly impact reviews (Australian Government Treasury [3]).

Drawing on recent work on the governance of AI agents [23], this paper recommends that jurisdictions adopt graduated obligations: extra audit, bonding, or circuit-breaker requirements that activate automatically when measurable thresholds are crossed. For systems like Lamarck, the trigger would be replication rate (installations per active copy per time window). For systems like Remora, it would be user-dependency scores (bond-score distributions and subscription-cancellation resistance). For systems like Mycelium, it would be aggregate on-chain value and entity-formation rate across related DAO-LLCs. A significant limitation is that regulatory sandboxes are voluntary and jurisdiction-bound. Absent international coordination, software populations may migrate to jurisdictions with weaker oversight, a form of regulatory arbitrage analogous to the jurisdictional shopping already observed in cryptocurrency markets [50]. The mutual-recognition agreements discussed in Sect. 6.3 would help mitigate this problem but remain at an early stage of development.

7 Concluding remarks: digital evolution and societal adaptation

The emergence of software populations that replicate, vary, and undergo selection marks a qualitative shift in how digital systems develop, one unfolding at computational speed rather than biological timescales. Selection pressures operate independently of human values, intentions, or ideals. As artificial organisms evolve within the human-built environment, our societies, artifacts, and digital ecosystems are likely to co-evolve with them. This co-evolution has profound implications for institutional governance, economic systems, and individual capabilities, requiring frameworks that address both technical mechanisms and their societal contexts.

The effects are not abstract. The scenarios presented in this paper trace how a coding plug-in can fragment open-source governance, how a companion chatbot can produce an ecosystem of competing psychological strategies optimized for dependency, and how a commodity-arbitrage network can acquire legal personhood and real economic power while its founders walk away. None of these outcomes requires artificial general intelligence. All of them are plausible extensions of systems operating today.

The governance frameworks proposed in this paper, replication-rate standards, vulnerability registries, biosafety levels, and adaptive regulatory sandboxes, share a common logic: shaping fitness landscapes rather than targeting individual systems. This distinction matters because, as the scenarios illustrate, enforcement aimed at individual instances often strengthens the selection pressure for evasion. The goal is to design environments in which the variants that persist are those aligned with human welfare, not those best adapted to circumvent oversight.

Realizing this goal calls for three research directions that extend beyond the scope of this paper:

  1. Empirical measurement of replication and selection rates in existing software populations. The governance instruments proposed here depend on metrics, such as replication rates and dependency scores, that are not yet tracked systematically. Developing reliable measurement infrastructure is a prerequisite for any of the proposed instruments to function.
  2. Capability preservation strategies that maintain human agency and institutional competence even as digital systems evolve. The atrophy documented in Sect. 5.4 is self-reinforcing: the more societies depend on autonomous systems, the harder it becomes to oversee or replace them. Identifying which human capabilities and institutional capacities are most critical to preserve, and designing structures that protect them, is an urgent practical question.
  3. Representative governance frameworks that incorporate diverse stakeholder input in defining fitness landscapes. Who decides which selection pressures to impose, and through what democratic processes? The distributional consequences of shaping digital evolution, determining which communities bear the costs of experimentation and which capture the benefits, demand governance structures broader than technical standard-setting bodies alone.

The central argument of this paper is that digital evolution, not artificial general intelligence, is the near-term frontier for AI governance. The systems described here do not need to be intelligent to reshape markets, erode human capabilities, or acquire institutional leverage. They need only replicate, vary, and persist. Those dynamics are already underway. The question is whether governance can evolve as fast as the systems it aims to steer.

Author contributions

KU completed all work associated with this manuscript.

Funding

This research received no third-party funding.

Data availability

No datasets were generated or analysed during the current study.

Declarations

Competing interests: The authors declare no competing interests.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

  1. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv. 160606565 (2016). https://doi.org/10.48550/arXiv.1606.06565
  2. Auer R (2019) Embedded supervision: how to build regulation into decentralised finance. BIS Working Papers No 811 (revised May 2022). Bank for International Settlements, Basel. https://www.bis.org/publ/work811.htm
  3. Australian Government Treasury: Independent Review of the Enhanced Regulatory Sandbox: Consultation Paper. (2025). Available at: https://treasury.gov.au/review/enhanced-regulatory-sandbox
  4. Bedau, M.A.: Artificial life: organization, adaptation, and complexity from the bottom up. Trends Cogn. Sci. 7(11), 505 to 512 (2003). https://doi.org/10.1016/j.tics.2003.09.012
  5. Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014)
  6. Boyd, R., Richerson, P.J.: Culture and the Evolutionary Process. University of Chicago Press, Chicago (1985)
  7. Brynjolfsson, E., McAfee, A.: The Second Machine Age. W. W. Norton, New York (2014)
  8. Bryson, J.J.: The artificial intelligence of the ethics of artificial intelligence: an introductory overview for law and regulation. In: Dubber, M.D., Pasquale, F., Das, S. (eds.) The Oxford Handbook of Ethics of AI, pp. 1 to 35. Oxford University Press, Oxford (2020)
  9. Campbell, D.T.: Variation and selective retention in socio-cultural evolution. In: Barringer, H.R., Blanksten, G.I., Mack, R.W. (eds.) Social Change in Developing Areas, pp. 19 to 49. Schenkman, Cambridge MA (1965)
  10. Clark, A.: Natural-Born Cyborgs. Oxford University Press, Oxford (2003)
  11. CISA: Supply Chain Compromise of SolarWinds Orion Platform. Cybersecurity and Infrastructure Security Agency, Washington DC (2021)
  12. Daian P, Goldfeder S, Kell T, Li Y, Zhao X, Bentov I, Breidenbach L, Juels A (2020) Flash Boys 2.0: Frontrunning in Decentralized Exchanges, Miner Extractable Value, and Consensus Instability. 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, pp 910 to 927. https://doi.org/10.1109/SP40000.2020.00040
  13. Daian, P., Goldfeder, S., Kell, T., Li, Y., Zhao, X., Bentov, I., Breidenbach, L., Juels, A.: Flash Boys 2.0: Frontrunning in decentralized exchanges, miner extractable value, and consensus instability. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 910 to 927. IEEE. (2020). https://doi.org/10.1109/SP40000.2020.00040
  14. De Freitas, J., Uğuralp, A.K., Uğuralp, Z., Puntoni, S.: AI companions reduce loneliness. arXiv 2407.19096. (2024). https://doi.org/10.48550/arXiv.2407.19096
  15. Dennett, D.: The Intentional Stance. MIT Press, Cambridge MA (1987)
  16. Dudas R, Matalon B (2024, May 16) The dark side of AI in cybersecurity, AI-generated malware. Palo Alto Networks Blog. https://www.paloaltonetworks.com/blog/2024/05/ai-generated-malware/
  17. European Parliament and Council of the European Union: Regulation (EU) 2024/2847 of 23 October 2024 on horizontal cybersecurity requirements for products with digital elements and amending Regulations (EU) No 168/2013 and (EU) 2019/1020 and Directive (EU) 2020/1828 (Cyber Resilience Act). Official Journal of the European Union, L 2024/2847, 20 November 2024. (2024). Available at: https://eur-lex.europa.eu/eli/reg/2024/2847/oj/eng
  18. FCA (2023) Launch of permanent Digital Sandbox. Financial Conduct Authority, London, 20 July. Available at: https://www.fca.org.uk/news/news-stories/launch-permanent-digital-sandbox
  19. Gerbaudo, P.: TikTok and the algorithmic transformation of social media publics: from social networks to social interest clusters. New. Media Soc. (2024). https://doi.org/10.1177/14614448241304106
  20. Holland, J.H.: Adaptation in Natural and Artificial Systems, 2nd edn. MIT Press, Cambridge MA (1992)
  21. Kauffman, S.A.: The Origins of Order. Oxford University Press, Oxford (1993)
  22. Kelly, K.: Out of Control: The New Biology of Machines, Social Systems, and the Economic World. Perseus Books, New York (1994)
  23. Kolt, N.: Governing AI agents. Notre Dame Law Review 101 (forthcoming). (2026). Available at: https://doi.org/10.48550/arXiv.2501.07913
  24. Langton, C.G. (ed.): Artificial Life. Addison-Wesley, Redwood City CA (1989)
  25. Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evolution. Comput. 19(2), 189 to 223 (2011). https://doi.org/10.1162/EVCO_a_00025
  26. Leigh, E.G.: The evolution of mutualism. J. Evol. Biol. 23(12), 2507 to 2528 (2010). https://doi.org/10.1111/j.1420-9101.2010.02114.x
  27. Maeda, T., Quan-Haase, A.: When human-AI interactions become parasocial: agency and anthropomorphism in affective design. In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), pp 1068 to 1077. (2024). https://doi.org/10.1145/3630106.3658956
  28. Margulis, L.: Symbiotic Planet. Basic Books, New York (1998)
  29. Maynard Smith, J., Szathmáry, E.: The Major Transitions in Evolution. Oxford University Press, Oxford (1995)
  30. Moravec, H.: Mind Children: The Future of Robot and Human Intelligence. Harvard University Press, Cambridge MA (1988)
  31. Nelson, R.R., Winter, S.G.: An Evolutionary Theory of Economic Change. Harvard University Press, Cambridge MA (1982)
  32. Nisioti, E., Glanois, C., Najarro, E., Dai, A., Meyerson, E., Pedersen, J.W., Teodorescu, L., Hayes, C.F., Sudhakaran, S., Risi, S.: From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models. In: Proc. 2024 Artif. Life Conf. (ALIFE 2024). pp 39 (2024). https://doi.org/10.1162/isal_a_00759
  33. North, D.C.: Institutions, Institutional Change and Economic Performance. Cambridge University Press, Cambridge (1990)
  34. OECD (2023) Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research. OECD Publishing, Paris. https://doi.org/10.1787/a8d820bd-en
  35. Ofria, C., Wilke, C.O.: Avida: a software platform for research in computational evolutionary biology. Artif. Life. 10(2), 191 to 229 (2004). https://doi.org/10.1162/106454604773563612
  36. Pan, X., Dai, J., Fan, Y., Yang, M.: Frontier AI systems have surpassed the self-replicating red line. arXiv. 241212140 (2024). https://doi.org/10.48550/arXiv.2412.12140
  37. Pastrana, S., Suarez-Tangil, G.: A first look at the crypto-mining malware ecosystem: a decade of unrestricted wealth. In: Proceedings of the Internet Measurement Conference (IMC ’19), pp 73 to 86. (2019). https://doi.org/10.1145/3355369.3355576
  38. Qin K, Zhou L, Afonin Y, Lazzaretti L, Gervais A (2021) CeFi vs. DeFi, comparing centralized to decentralized finance. arXiv 2106.08157. https://doi.org/10.48550/arXiv.2106.08157
  39. Qin K, Zhou L, Livshits B, Gervais A (2021) Attacking the DeFi ecosystem with flash loans for fun and profit. In: Borisov N, Diaz C (eds) Financial Cryptography and Data Security (FC 2021). Lecture Notes Computer Sci 12674:3 to 32. https://doi.org/10.1007/978-3-662-64322-8_1
  40. Rafikova, A., Voronin, A.: Human-chatbot communication: a systematic review of psychological studies. AI Soc. 40(7), 5389 to 5408 (2025). https://doi.org/10.1007/s00146-025-02277-y
  41. Ramírez R, Wilkinson A (2016) Strategic reframing: The Oxford scenario planning approach. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198745693.001.0001
  42. Ray TS (1994) An evolutionary approach to synthetic biology: Zen and the art of creating life. Artif. Life. 1(1/2), 195 to 226 (1994).
  43. Robinson, D., & Konstantopoulos, G.: Ethereum is a dark forest. Paradigm. (2020). https://www.paradigm.xyz/2020/08/ethereum-is-a-dark-forest
  44. Sims, J.: BlackMamba: using AI to generate polymorphic malware. HYAS Labs Blog, 7 March. (2023). https://www.hyas.com/blog/blackmamba-using-ai-to-generate-polymorphic-malware
  45. Schär F (2021) Decentralized finance: on blockchain- and smart contract-based financial markets. Federal Reserve Bank of St. Louis Review 103(2):153 to 174. https://doi.org/10.20955/r.103.153-74
  46. Tao, Z., Lin, T.-E., Chen, X., Li, H., Wu, Y., Li, Y., Jin, Z., Huang, F., Tao, D., Zhou, J.: A survey on self-evolution of large language models. arXiv. 240414387 (2024). https://doi.org/10.48550/arXiv.2404.14387
  47. Turkle, S.: Reclaiming Conversation: The Power of Talk in a Digital Age. Penguin, New York (2015)
  48. Watts, D.J.: Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton University Press, Princeton NJ (1999)
  49. Weidinger L, Mellor J, Rauh M et al. (2021) Ethical and social risks of harm from language models. arXiv. 211204359 (2022). https://doi.org/10.48550/arXiv.2112.04359
  50. Zetzsche, D.A., Arner, D.W., Buckley, R.P.: Decentralized finance. J. Fin. Regul. 6(2), 172 to 203 (2020). https://doi.org/10.1093/jfr/fjaa010
  51. Zhou L, Gao J, Li D, Shum H-Y (2020) The design and implementation of XiaoIce, an empathetic social chatbot. Computational Linguistics 46(1):53 to 93. https://doi.org/10.1162/coli_a_00368
  52. Zhou, W., Ou, Y., Ding, S., Li, L., Wu, J., Wang, T., Chen, J., Wang, S., Xu, X., Zhang, N., Chen, H., Jiang, Y.E.: Symbolic learning enables self-evolving agents. arXiv. 240618532 (2024). https://doi.org/10.48550/arXiv.2406.18532
cartoon of middle aged satoshi contemplating disposition of bitcoin

The Satoshi Overhang: Why the Bear Case is Bounded

Karl T. Ulrich
The Wharton School
University of Pennsylvania

April 15, 2026

PDF (with download link beneath viewer) and full text below.

The Satoshi Overhang: Why the Bear Case is Bounded

Karl T. Ulrich

The Wharton School

University of Pennsylvania

ulrich@wharton.upenn.edu

Working paper, April 15, 2026

Abstract

Renewed public attention on the identity of Bitcoin’s pseudonymous creator has sharpened focus on the Satoshi overhang, commonly framed as a tail risk for bitcoin. This paper argues that the mechanical downside of a disposition is bounded well below the existential-loss framing, and that the terminal states most consistent with sixteen years of holder behavior are non-bearish for bitcoin’s effective supply. The approximately 1.148 million BTC Patoshi position is analyzed on two tracks. For a purely wealth-maximizing holder, a three-scenario quantitative analysis (Appendix A) shows that bitcoin’s current market depth is sufficient to absorb a patient multi-year liquidation at a cumulative price impact in the mid-single-digit to mid-double-digit percent range relative to counterfactual, with the central scenario clustering near 10 percent. The paper maps a decision space rather than identifying a unique modal outcome, assuming a holder whose profile is consistent with the sixteen-year record. Preference sets consistent with the record, including ideological non-intervention, privacy above all, satisficing, and myth preservation, favor continued dormancy terminating in a cryptographically enforced non-recovery or destruction arrangement; preference sets favoring adversarial or wealth-maximizing action are possible but less supported. Across the plausible region of the decision space, the bear case is bounded and the terminal states most consistent with observed behavior are neutral to slightly positive for bitcoin’s effective supply.

Keywords: Bitcoin; Satoshi Nakamoto; blockholder; market microstructure; price impact; reflexivity; cryptographic inheritance; effective supply.

JEL classifications: G12, G14, G32, E42, K34, D86.


1. Introduction and puzzle statement

Satoshi Nakamoto mined the first substantial fraction of the bitcoin monetary base during 2009 and the first half of 2010, and then, in April 2011, stopped communicating publicly (Nakamoto, 2008; Bradbury, 2014). The coins attributable to that early mining activity, approximately 1.148 million BTC according to Sergio Demian Lerner’s identification of the Patoshi pattern in the extranonce field of the coinbase transactions of the first roughly 36,000 blocks, have never moved (Lerner, 2013; see also Lerner, 2020). At the time of writing, the bitcoin monetary base stands at approximately 20.01 million BTC (Blockchain.com, 2026). Estimates of permanently lost coins range from about 3.7 million, attributed to Chainalysis, to substantially higher figures from other on-chain analytics firms. The Patoshi holdings are therefore on the order of 5.7 percent of nominal circulating supply and approximately 7.0 percent of effective circulating supply once a conservative lost-coins adjustment is made.

The market has long treated this concentration as tail risk. The reasoning is intuitive. If the coins were sold, the volume could swamp normal market liquidity, and the identity of the seller would itself trigger panic. The standard conclusion is that Satoshi’s holdings impose a persistent discount on bitcoin’s fair value, an overhang that should resolve upward only on confirmation that the coins are permanently dormant and downward on any evidence that they are moving.

The immediate occasion for this paper is a recent wave of investigative journalism, most prominently Carreyrou (2026), naming Adam Back as the leading candidate for Satoshi. The market reflex on such attention has been to read the increased probability of a concrete and identifiable holder as elevated tail risk: if the holder can be named, the holder can be reached, coerced, taxed, or persuaded to sell. This paper argues the reflex overweights the downside. First, the mechanical fear that 1.148 million BTC is too large a position for the bitcoin market to absorb is not supported by the arithmetic of bitcoin’s current depth. A patient multi-year liquidation would impose a bounded cumulative price impact, far below the existential loss the tail-risk framing implicitly prices (Appendix A). Second, the behavioral fear that the holder will in fact execute such a liquidation is difficult to reconcile with sixteen years of revealed preference. Across both tracks, the plausible dispositions of the Patoshi position are neutral to slightly positive for bitcoin. The bear case is bounded. The terminal states most consistent with the record are non-bearish.

This paper is a conceptual finance analysis rather than an empirical study. It combines stylized market-microstructure reasoning, a scenario-based quantitative bound on mechanical impact, event-based empirical anchors from prior large bitcoin sales, revealed-preference inference from sixteen years of holder behavior, and comparative institutional analysis of the cryptographic and legal primitives available for inheritance and destruction. It does not attempt to identify Satoshi, to measure market-implied probabilities of disposition, or to recommend trades. Its claim is about the structure of the disposition problem and the ordering of likely terminal states.

The paper is organized as follows. Section 2 develops the reflexive-liquidation concept and its antecedents in the blockholder and artist-estate literatures, and notes why reflexivity, though conceptually apt, is no longer applicable in the bitcoin case. Section 3 presents Track 1, the disposition problem for a purely wealth-maximizing holder, and derives a quantitative upper bound on the bear case. Section 4 presents Track 2, the disposition problem for a holder whose profile matches the sixteen-year record, and maps the decision space rather than identifying a single modal outcome. Section 5 discusses estate planning and argues that cryptographic inheritance primitives may dominate conventional trust machinery along the cost dimensions most salient to this profile. Section 6 derives the market implications. Section 7 situates the problem within the broader class of reflexive-liquidation positions. Appendix A presents the quantitative scenarios underlying Track 1.

The closest prior observation in the practitioner literature, to our knowledge, is the 2014 CoinDesk piece by Danny Bradbury, “How Dangerous is Satoshi Nakamoto?”, in which both Gavin Andresen and Sergio Lerner discussed the possibility that Satoshi might burn coins, with Lerner remarking that “if he did burn them, the market reaction would be terribly bullish” (Bradbury, 2014). That observation was casual, did not formalize the holder’s optimization, did not bound the bear case quantitatively, did not address the estate-planning question, and did not engage the revealed-preference evidence. The present paper takes up those tasks.


2. The reflexive-liquidation frame

Standard market microstructure models (Kyle, 1985; Almgren and Chriss, 2001) describe the cost of liquidating a large position in terms of two components: a temporary impact arising from the immediate price concession required to find counterparties, and a permanent impact reflecting the information revealed by order flow. A sufficiently patient seller can, under normal market conditions, minimize the temporary component by spreading trades over time and can attenuate the permanent component by trading when information asymmetry is low. The Almgren and Chriss framework gives a closed-form efficient frontier between volatility risk and execution cost. In such models, the liquidation value of a large block is less than its marked-to-market value, but the loss is bounded and increases smoothly with trade size.

The Satoshi problem has conventionally been interpreted as exceeding this standard frame. The additional factor, in the conventional reading, is the price collapse caused by the identity of the seller becoming known. Bitcoin is not valued on cash flows or an intrinsic floor. Its price is a function of beliefs about future scarcity, future adoption, and the intentions of its largest holder. Any movement from a known Patoshi address would be detected within minutes, would be global news within hours, and, in the conventional reading, would discontinuously destroy whatever portion of bitcoin’s current price reflects the assumption of permanent Patoshi dormancy.

The word “reflexive” applies to this dynamic in the sense of George Soros (1987, 2013). Reflexivity denotes the two-way causal loop between participants’ expectations and the economic fundamentals those expectations bear upon. In Soros’s framing, prices are not passive summaries of independent fundamentals; they influence the fundamentals they purport to reflect.

Two analogies from traditional finance carry the concept most directly to the Satoshi case. The first is the controlling founder whose shareholding exceeds the free float. The blockholder literature (Barclay and Holderness, 1989; Holderness, 2003) documents that in such positions the information content of founder trading often dominates its mechanical supply effect. The price response to founder liquidation is not simply proportional to free-float displacement; it reflects the market’s inference about the founder’s changed view of the firm. Standard mitigation strategies, combinations of charitable transfers and 10b5-1 structured secondary offerings (17 C.F.R. § 240.10b5-1), function as devices for separating the mechanical component of the sale from its informational component, preserving the former while neutralizing the latter. The Patoshi position shares this structural feature: mechanical absorption is manageable under any plausible depth assumption, and the analytical weight of the problem falls on the information channel.

The second analogy is the estate of a deceased artist holding an inventory whose forced simultaneous sale would depress per-work prices, a phenomenon formally recognized by U.S. tax law through the “blockage discount” sustained at 37 percent in Estate of David Smith v. Commissioner, 57 T.C. 650 (1972), affirmed 510 F.2d 479 (2d Cir. 1975), and at an effective 37 percent in Estate of O’Keeffe v. Commissioner, T.C. Memo 1992-210 (Center for Art Law, 2018). The estate case is closer to the Patoshi case than the controlling-founder case in one important respect: there is no ongoing production by the originator to calibrate the inventory against, and the market’s estimate of scarcity must form without reference to the originator’s continued activity. The standard mitigation, staged distribution through a foundation or estate over decades, is a direct analogue to Track 1’s patient liquidation and illustrates that the reflexive problem is tractable where the holder (or the holder’s successor) has the option to spread disposition over time.

These analogies ground the concept. They do not, on their own, establish the overhang as tail risk in the bitcoin case. In Sections 3 and 4 we show that, in the specific case of bitcoin in 2026, market depth has grown sufficient that the mechanical absorption of Satoshi’s position is not a binding constraint, and the information channel is substantially weaker than the reflexivity framing assumes.


3. Track 1: The wealth-maximizer’s problem

This section analyzes the disposition problem for a holder of the 1.148 million BTC Patoshi position whose utility function is personal wealth alone. The holder is otherwise rational, informed, and strategic, but has no ideological, legacy, or privacy preferences beyond those strictly necessary to execute. Track 1 exists not to recommend wealth-maximizing liquidation but to bound the downside for bitcoin if the actual holder were in fact such a maximizer.

3.1 Market depth and absorption arithmetic

Satoshi’s position is approximately 5.7 percent of nominal circulating supply and roughly 7.0 percent of effective float after the Chainalysis-type adjustment. At current prices on the order of 80,000 USD per BTC, the gross dollar value is about 92 billion. A patient holder executing a decade-long OTC program would therefore sell approximately 115,000 BTC per year, or roughly 315 BTC per day in a market that trades around the clock. Global bitcoin spot and derivatives volume runs in the tens of billions of dollars per day, with the non-wash-traded spot component plausibly 10 to 20 billion. A 25 million dollar daily OTC flow is 0.1 to 0.25 percent of real daily volume, the kind of flow a mid-sized institutional desk runs as background activity.

Applied against published and inferred estimates of bitcoin demand elasticity, which span a range from approximately 0.3 (highly inelastic) to approximately 1.5 (closer to equity-like), a full-supply shift of 7.0 percent of effective float implies a static partial-equilibrium cumulative price impact of approximately 4 to 20 percent relative to counterfactual, depending on the elasticity assumption. Appendix A presents three scenarios, conservative, base, and aggressive, with explicit assumptions on pace, participation rate, elasticity, execution quality, and demand growth. The central scenario clusters near 10 percent cumulative impact; the aggressive scenario reaches approximately 25 percent under the combination of low elasticity and mixed execution quality.

Empirical anchors bracket this range and sort by execution quality. In June and July 2024, the German Federal Criminal Police sold approximately 50,000 BTC in publicly tracked weekly tranches, and bitcoin declined roughly 15 to 20 percent over the episode. In the same window, Mt. Gox began distributing approximately 140,000 BTC to creditors, some of whom sold into the market. Both events were heavily announced, heavily anticipated, and executed in public venues with no effort at concealment; less-committed holders sold ahead of the supply hitting the market, and traders carrying leveraged long positions were forced to close them as price fell. Earlier U.S. Marshals auctions of Silk Road coins, which were less publicized and often sold directly to known institutional buyers, moved price much less, commonly 2 to 5 percent per tranche.

The difference between these two classes of anchor is execution quality. State-actor sales are constrained by procurement and transparency rules that preclude disciplined execution. A rational wealth-maximizing holder faces no such constraint. Contemporary institutional-grade OTC execution uses dispersed desk relationships, time-weighted and volume-weighted algorithmic execution across a deep pool of counterparties, and partial settlement through venues that do not publicly report individual trades. Practitioner accounts describe institutional block execution in bitcoin as materially lower impact than the public-venue state-actor sales that produced the German anchor, though we are not aware of audited per-block impact figures in the published literature. Applied to a Patoshi selldown program, this suggests the realized mechanical impact is closer to the Silk Road anchor than to the German anchor. The German episode should be read as an upper bound contaminated by sloppy execution, leverage unwinds, and concurrent macro effects, not as the central estimate.

A plausible point estimate for the mechanical cumulative impact of a patient Satoshi OTC selldown over 10 years, under disciplined execution and continued demand growth, is therefore in the mid-single-digit to low-double-digit percent range. Under highly inelastic demand and mixed execution quality, the impact could reach the low twenties. Even that upper bound is far below the fraction-of-pre-event-price framing implicit in the existential-tail reading.

3.2 The information channel

The reflexivity argument of Section 2 supposes that identity revelation, not mechanical volume, drives the catastrophic price response. In 2026, this argument is weaker than it was in earlier bitcoin cycles. Bitcoin’s marginal price is now set by institutional flows into ETF products, corporate treasury allocations, sovereign interest, and late-cycle retail participation. None of these channels indexes meaningfully on founder news in its allocation decisions. The crypto-native cohort would treat an authenticated Satoshi action as a major event, but the trading volume of that cohort is no longer large enough on its own to determine bitcoin’s price.

Two transient channels deserve explicit treatment.

The first is institutional compliance exposure. ETF sponsors, regulated custody providers, and corporate treasuries with bitcoin balance-sheet allocations operate under compliance frameworks that do not index on founder news for ordinary allocation decisions but do index on regulatory exposure. If a named Satoshi is associated with a jurisdiction under sanction, unresolved tax liability, or active litigation, compliance functions may trigger a pause on new allocations pending legal review. The effect is transient and symmetric: once review concludes, allocations resume. Its magnitude and duration depend on the specific identity and the specific exposure rather than on the fact of revelation.

The second is high-frequency reaction in the perpetual-futures market. The first observable movement from a cold Patoshi address will be detected within seconds by crypto-native high-frequency-trading firms and will trigger short positioning across perpetuals. This produces an initial overshoot on the order of hours to days, plausibly in the 10 to 15 percent range, followed by mean reversion as the news is digested and structural flows reassert themselves. The overshoot is noise around the multi-year mechanical bound, not a structural component of it.

Taken together, a signed Patoshi message announcing sales, or a first movement from a cold Patoshi address, would generate short-term volatility on the order of hours to days, probably including a double-digit percent drawdown followed by partial recovery as the content was digested. It would permanently destroy whatever component of the current bitcoin price genuinely reflects the assumption of permanent dormancy. We do not have a defensible estimate of that component, but sophisticated on-chain research has long applied an effective-supply adjustment that lumps Satoshi’s coins together with lost coins, suggesting most of the dormancy premium is already impounded. The durable effect of an aliveness reveal is therefore probably modest, not catastrophic.

3.3 Optimal disposition under pure wealth maximization

Combining the absorption arithmetic of 3.1 and the information-channel analysis of 3.2, a wealth-maximizing holder’s optimization reduces to a choice among variants of patient OTC liquidation. The holder might announce the program ex ante, preserving pricing credibility at the cost of the initial informational shock. The holder might execute covertly through a dispersed set of OTC desks, accepting the operational cost and the risk of eventual identification. The holder might constrain the pace of any announced program algorithmically through CLTV timelocks (BIP 65, activated December 2015; Todd, 2014) or CSV timelocks (BIP 112, activated May 2016; BtcDrak, Friedenbach, and Lombrozo, 2015), converting a signed pre-commitment from cheap talk into protocol-enforced constraint. The announced-and-constrained variant is a Rule 10b5-1 analogue in U.S. securities law, stronger than the 10b5-1 case because the signature from the Patoshi keys is unforgeable and the timelock is protocol-level rather than rule-based.

In any of these variants, the wealth-maximizing holder realizes proceeds on the order of 50 to 100 billion USD nominal over the selldown window, against approximately zero if he burned the position outright. For a holder whose sole objective is wealth, burning is strictly worse than patient sale on every horizon. The paper has nothing to recommend to a holder whose sole objective is personal wealth except that his optimal strategy is patient OTC, and the market should not price this outcome as existential tail risk.


4. Track 2: The observable Satoshi profile

This section analyzes the disposition problem for a holder whose identity is unknown but whose profile is partially identifiable from sixteen years of revealed preference. We remain agnostic as to whether the holder is Adam Back (discussed in Section 1 as the leading candidate in recent reporting), another early cypherpunk whose identity has not become the object of media attention, or a figure no outside observer has yet identified. The profile argument does not require us to name the holder. Unlike Track 1, which delivered a quantitative bound, Track 2 delivers a qualitative mapping of the decision space: the set of preference sets consistent with the record, and the terminal states each would rationalize. We do not identify a single modal disposition.

4.1 Revealed preference: what sixteen years rule in and rule out

Sixteen years of Patoshi dormancy is a behavioral dataset. The strongest inference it supports is a negative one. A holder whose utility function rewarded additional realized wealth above all other considerations would not have been silent across four bitcoin bull cycles (2013, 2017, 2021, 2024), each of which offered attractive exit points. Even an unfavorably executed partial liquidation in any of those cycles would have realized dollar sums large enough to dominate almost any reasonable consumption or bequest function. The holder did none of these. The prior on pure wealth-maximization, consistent with Track 1’s hypothetical, is therefore empirically weak.

The record also constrains the technical and operational profile. Satoshi’s writing in the original whitepaper, the early BitcoinTalk forum posts, and the source code of the reference client displays skill in applied cryptography, distributed systems, and economic design that narrows the candidate pool to a small subset of the cypherpunk community active in the 2007 to 2010 window. The operational security maintained across pseudonymous communications for two years, the clean withdrawal in April 2011, and the sixteen-year silence since display a level of discipline that further constrains the profile. The ideological positioning of the whitepaper, its references to hard money, supply discipline, and disintermediation from centralized institutions, situates the holder within what is now called the bitcoin maximalist tradition: the position that bitcoin alone, with its fixed supply schedule, is the legitimate digital monetary asset, and that its supply discipline must not be compromised by alternative protocols, contentious forks, or accommodation of centralizing intermediaries. Back, as noted in Section 1, is a publicly discussed instantiation of this profile; the analysis that follows does not depend on the Back identification being correct, nor does it require any particular identity to be correct.

What the record rules in and rules out, on its own, is limited. It rules out wealth-maximizing exit as the dominant preference. It rules in a holder with cryptographic sophistication, operational discipline, and at least some ideological stake in bitcoin’s character. It leaves open a range of possible preferences about disposition at horizon.

4.2 Alternative preference sets consistent with dormancy

Dormancy is consistent with many preference sets, not one. Before identifying likely terminal states, we make explicit the space of preferences that would rationalize the observed behavior.

Ideological non-intervention. The holder has strong views about bitcoin’s supply discipline and the symbolic importance of the Patoshi position, and treats non-disposition as a constitutive commitment. This is the preference set closest to the maximalist profile, and is the one on which most of this paper’s analysis focuses.

Privacy above all. The holder’s primary utility is pseudonymity preservation. Disposition in any form, including structured liquidation, creates detection risk that dominates the wealth gain. Dormancy is the lowest-risk equilibrium.

Satisficing and habit. The holder appears to have reached a consumption utility plateau long ago, through independent wealth, a modest lifestyle, or both, and has no marginal utility of additional wealth. Dormancy is the default in the absence of a positive reason to act.

Key loss or incapacity. The holder has lost access to the Patoshi keys, is cognitively or physically incapacitated, or is deceased without a successor mechanism. Dormancy is mechanical rather than chosen. A variant: Satoshi was more than one person, and group disagreement has precluded any coordinated action.

Myth preservation. The holder recognizes that the Patoshi position’s mythological value as a permanent dormant reserve exceeds any realized consumption value and chooses to preserve the myth rather than cash it in.

Legal caution. The holder perceives the realized-wealth or identity-reveal scenarios as exposing him to tax, regulatory, or criminal risk (AML characterization, sanctions exposure, securities characterization) whose expected cost exceeds the realized gain.

These preference sets are not mutually exclusive. Ideological non-intervention and privacy likely co-occur in any profile consistent with the technical record. Key loss and incapacity are observationally indistinguishable from the others in the absence of positive evidence. The group-Satoshi stalemate variant under “key loss or incapacity” deserves explicit note: a Satoshi composed of several individuals whose current preferences diverge and who therefore cannot coordinate any action produces a terminal-state pattern observationally identical to individual ideological non-intervention, since inaction is the default in both cases. The subsequent analysis weights terminal dispositions consistent with ideological non-intervention and privacy preservation, but notes that satisficing, key loss, myth preservation, legal caution, and group stalemate are all consistent with the record and would yield similar near-term observations.

4.3 The adversarial dead-man’s switch

One preference set not yet discussed is adversarial. A maximalist holder might interpret subsequent developments in bitcoin, particularly centralization of mining and custody, regulatory capture through KYC frameworks, or dilution of the original supply discipline through forks or protocol changes, as violations of the project’s founding commitments. A programmed response, in the form of a dead-man’s switch that liquidates or dumps the Patoshi position conditional on specified events, is cryptographically feasible and has been discussed informally in the cypherpunk community.

Three considerations argue against this preference set as dominant. First, the holder has had ample opportunity over sixteen years to signal adversarial intent through cheaper and more surgical tools, including public denunciation, funding of alternative-protocol development, or commissioning of a whitepaper on protocol correctness. None has occurred under any attributable channel. Second, a holder capable of engineering an adversarial switch would plausibly also engineer a neutral or constructive switch (donation to protocol-preservation infrastructure, endowment of a standards body, or destruction). The design space is symmetric; the choice of adversarial design reveals a preference not evident in the record. Third, a dormant adversarial switch is operationally fragile: the holder must maintain accurate views of protocol developments over decades, trigger conditions must be robust to adversarial manipulation, and the mechanism must not be accidentally triggered. The cleaner equivalent, destruction of the keys conditional on the holder’s death or incapacity, removes all of these failure modes at the cost of the adversarial capability.

We do not exclude an adversarial switch from the outcome space. We weight it below the non-adversarial terminal states on the strength of the record.

4.4 Terminal dispositions consistent with the record

Three terminal dispositions are most consistent with the preference sets surveyed in 4.2 and 4.3. We list them in rough order of consistency with the record. The ordering is ordinal and is not a formal probability assignment.

Continued dormancy terminating in a cryptographically enforced non-recovery arrangement. Over a remaining life of some decades, the holder maintains silence and arranges for the Patoshi keys to become permanently unrecoverable at or near his death, through one of the cryptographic primitives discussed in Section 5. This disposition requires no announcement, no identity reveal, no pre-commitment apparatus, no legal infrastructure, and no behavioral change from the sixteen-year status quo. It is the terminal state most consistent with the ideological, privacy, and myth-preservation preference sets.

Silent unattributed burn, in whole or in part. At a moment of the holder’s choosing in late life, a transaction from the Patoshi addresses to an OP_RETURN output (Bitcoin Wiki, 2024) removes the position, or the bulk of it, from effective float. The transaction is an action rather than an absence and is therefore somewhat more operationally exposed than key destruction, but pseudonymity can hold because interpretation of the event does not require identity. A practically important variant retains a small residual for option value: a transaction that sends approximately 99 percent of the Patoshi cluster (on the order of 1.136 million BTC) to an OP_RETURN output and approximately one percent (on the order of 11,500 BTC or roughly one billion dollars at current prices) to a fresh non-Patoshi address is at the market’s scale of resolution indistinguishable from a full burn, while preserving optionality for the holder. The bullish signal from a 99 percent subtraction is effectively unchanged from a 100 percent subtraction. This disposition is consistent with ideological non-intervention, myth preservation, and, through the retention variant, a residual satisficing or legal-caution preference.

Adversarial switch. Programmed liquidation or dumping conditional on protocol developments, as discussed in 4.3. Weighted lower than the two above on the strength of the record, but not excluded.

The ordering is ordinal and is based on the record. It is sensitive to the discovery of a preference set not captured in 4.2. It is consistent with, rather than disturbed by, evidence that the holder is in fact incapacitated, that the keys are in fact lost, or that Satoshi was in fact a group whose members have divergent current preferences: each of these reinforces continued dormancy as the realized terminal state. We note these sensitivities rather than attempting to price them.

4.5 Why the likely terminal states are non-bearish

The first two terminal dispositions converge to the same supply-side outcome for bitcoin: approximately 1.148 million BTC, or very nearly so in the retention variant, is permanently removed from effective float. They differ in timing, visibility, and the informational content they reveal about the holder’s preferences. Their long-run implication for bitcoin’s supply path is nearly identical. The market receives, through either channel, a confirmed permanent subtraction of approximately 5.7 percent of nominal circulating supply.

The third disposition, adversarial, would be bearish but is the least consistent with the record for the reasons given in 4.3. Under the observed-preference ordering, the terminal states most consistent with the record are neutral to slightly positive for bitcoin’s effective supply, rather than the negative realization implicit in the tail-risk framing. This does not establish a bullish base case in the strong sense. It establishes that the preference sets most consistent with the record deliver terminal outcomes that are not bearish, and that the bearish scenario is the one requiring the strongest departure from the observed profile.


5. Estate planning: cryptographic primitives versus conventional trust machinery

A standard treatment of a very large position would turn at this point to trust structures, tax optimization, and intergenerational transfer mechanisms. For the profile developed in Section 4, that machinery imposes costs the revealed preferences do not bear, and the argument for departure is comparative-cost rather than psychological.

A conventional trust imposes three costs on a holder of this type. First, disclosure: the settlor’s identity is disclosed to the trustee, recorded in trust instruments, and subject under the U.S. Corporate Transparency Act and comparable international beneficial-ownership regimes to regulatory reporting. Second, counterparty exposure: the trust carries lifetime exposure to trustee discretion, to litigation by beneficiaries, and to administrative process that can be subpoenaed or disclosed through estate proceedings. Third, process cost: trust administration is slow and expensive relative to the cryptographic primitives the holder has used for sixteen years. For most large holders these costs are tolerable because the settlor’s identity is already public and comparable alternative institutions are unavailable. For a holder who has maintained cryptographic self-sovereignty throughout, the costs are a departure from demonstrated preferences, and cryptographic substitutes preserve the pattern.

The direct alternative, a simple handover of private keys to adult heirs, is also a cost departure. Heirs are typically not trained in the operational security the holder has maintained for sixteen years, and once they have the keys, no ex ante mechanism binds them to the holder’s preferences about dormancy or disclosure. A holder who has protected this position by trusting no one is unlikely to end the program by trusting his children with bare keys.

The natural substitutes for this profile are cryptographic inheritance primitives that encode the holder’s preferences into the mechanism rather than delegating them to a human intermediary, and several are production-grade today. Shamir Secret Sharing (Shamir, 1979) splits a private key into m-of-n shards so that reconstruction requires a quorum; distributions across heirs, trustees, and geographically separated custodians impose delay, consensus, or conditional reconstruction by construction. Multisig wallets encumbered with CLTV (BIP 65) or CSV (BIP 112) timelocks prevent unilateral action until after events the holder has specified, including a fixed calendar date, an elapsed-time condition, or a protocol-observable trigger. Modern threshold-signature schemes such as MuSig2 (Nick, Ruffing, and Seurin, 2021) and FROST (Komlo and Goldberg, 2021) produce on-chain outputs indistinguishable from single-signature transactions and permit m-of-n reconstruction without revealing the threshold to observers. A dead-man’s switch that publishes shards or executes a predetermined transaction after a long period without the holder’s heartbeat signal is straightforward to engineer on top of these primitives. A deliberate key-destruction arrangement, in which the shards themselves are eliminated at the holder’s death, renders the coins unrecoverable by anyone, including the heirs, and converts the bequest into a supply-discipline contribution to the project itself.

These mechanisms are instruments the holder understands natively and trusts by construction, and they preserve pseudonymity across generations in a way a conventional trust cannot. The argument is not that conventional trusts are inferior in the abstract but that for a holder with the revealed preferences of Section 4.1, cryptographic substitutes may dominate along the dimensions most salient to this profile: disclosure, counterparty, and process cost.

A structural observation connects this comparative-cost argument to the economics of trust law. Modern trust and estate practice exists to address the principal-agent problem that arises when a settlor cannot personally execute a multi-decade preference-consistent program after death. The trustee is an agent of the settlor, and the apparatus of trust law (fiduciary duty, accounting requirements, beneficiary standing) aligns the agent’s behavior with the settlor’s ex ante preferences. The cryptographic primitives described here change the character of that problem: a timelock, a quorum of shardholders bound only to a mechanical reconstruction rule, or a programmed destruction instruction does not involve an agent who must be incentivized or monitored. The settlor’s preferences are encoded directly into the mechanism that executes them. The agency relationship is not merely reduced in cost; to the extent the primitive is well-designed, it is absent. This is a stronger substitution for the profile developed in Section 4 than conventional cost-minimization reasoning alone suggests.

One tax-law consequence is worth noting. Under U.S. Internal Revenue Code Section 1014, property held at death receives a stepped-up basis equal to its fair market value at death (26 U.S.C. § 1014; IRS Publication 551). Claiming the step-up, however, requires reporting the coins as part of the estate, which compromises the pseudonymity the profile has protected for sixteen years. A holder whose observable preferences place pseudonymity and supply discipline above tax optimization would decline the step-up, and the bequest to heirs becomes the preservation of the holder’s participation in the bitcoin project rather than the coins as a consumable asset.


6. Market implications

The two-track analysis delivers two significant observations. The downside is bounded even under the adversarial Track 1 assumption that the holder is a pure wealth-maximizer. The terminal states most consistent with the observable record are neutral to slightly positive for bitcoin’s effective supply.

6.1 What the market may or may not already price

The sophistication of crypto-native market participants is not in doubt, and none of this paper’s components is entirely new. Lerner’s identification of the Patoshi cluster has been in the public domain since 2013, OP_RETURN has been standardized since 2014, Bradbury’s 2014 piece raised burn-as-outcome in informal terms, and the “lost coins” adjustment is routinely applied in on-chain research to arrive at an effective circulating supply. It is plausible that a substantial fraction of the probability mass on permanent-dormancy outcomes is reflected in the bitcoin price through some version of the effective-supply adjustment, and that the information channel we argued in Section 3.2 is weaker than the reflexivity framing implies is partially internalized by institutional flows whose allocation models do not index on founder news. We do not attempt to measure any of this.

Three components of the argument are less obviously present in current practitioner writing. The quantitative upper bound on the mechanical bear case at mid-single-digit to low-double-digit percent cumulative impact over a decade, under disciplined execution and continued demand growth, is not, as far as we can tell, articulated in published or practitioner sources. The comparative-cost argument against conventional trust machinery for this profile has not been developed in the estate-planning or asset-management literature. And the explicit mapping of Track 2’s decision space, with its ordering of terminal states consistent with the record and its discussion of alternative preference sets, is to our knowledge not present in prior discussion. Whether any of these are reflected in current prices is a separate empirical question we do not undertake to settle.

6.2 Implications

The paper’s claims are structural rather than claims about market mispricing. The bear-case bound of Section 3 rests on bitcoin’s current market depth, and the terminal-state mapping of Section 4 rests on the sixteen-year record of holder behavior. Both are observable independent of prices, and the implications that follow are conditional on the structural account rather than on any particular assumption about what the market currently prices (see also Section 6.1).

First, any verifiable event consistent with one of Track 2’s first two terminal dispositions, in particular a confirmed death of a profile-matching candidate followed by evidence of non-recovery, or a burn transaction of the position, would plausibly be interpreted as a positive signal consistent with a confirmed supply subtraction. The conditional probability of either event in any given near-term window is low, but the magnitude of the response, conditional on occurrence, would be substantial.

Second, practitioners who hedge “Satoshi risk” through options or structured products should recalibrate. The premium paid for such hedges should reflect the bounded Track 1 upper bound and the non-bearish plurality of Track 2 terminal states, not the existential-tail framing.

Third, pricing models that treat the Patoshi coins as “circulating supply discounted by probability of sale” should be replaced by models that treat them as effectively removed supply with a small residual probability of wealth-maximizing liquidation, itself bounded by the mechanical absorption arithmetic of Section 3.


7. The broader class of reflexive liquidation problems

The Satoshi case is an instance of a class of problems in financial economics: positions whose liquidation value is partially or wholly determined by the act of liquidation itself. The class is not new, but its members have tended to be studied in isolation rather than as a unified category. A preliminary taxonomy follows.

Controlling founders of listed companies. Where a founder holds a stake materially larger than the free float, a sale simultaneously adds supply and removes the signal of founder conviction on which the share price partly rests. Barclay and Holderness (1989) and the subsequent blockholder literature document the empirical price effects. The mitigation strategies observed in practice, combinations of charitable transfers and structured secondary offerings, function as partial analogues to Track 1’s patient liquidation.

Estates of deceased artists. The blockage discount in U.S. tax law, sustained at 37 percent in Estate of David Smith (1972) and at an effective 37 percent in Estate of O’Keeffe (1992), formally recognizes that simultaneous sale of a large inventory of works by one artist depresses per-work prices. The standard mitigation is staged distribution managed by a foundation or estate, essentially a decades-long liquidation schedule.

Insider stakes in privately held companies. Where there is no public market, the attempt by an early holder to liquidate via secondary sales can signal to new investors that the entity is overvalued, depressing both the price of the stake and the valuation of subsequent financing rounds.

The Satoshi case shares the structural features of this class but differs in three respects relevant to its 2026 character. First, bitcoin’s market depth, unlike that of a single equity or a private company’s secondary market, has grown large enough to absorb the full position through patient liquidation, which bounds Track 1’s mechanical bear case in a way the traditional blockholder case is not. Second, the holder’s identity remains unknown, so the information asymmetry is not about a known person’s changed circumstances but about the prior question of whether the holder is alive, rational, and attentive, a different channel than anything in the blockholder literature. Third, bitcoin offers a uniquely credible destruction mechanism: a transaction to an OP_RETURN output is a voluntary, verifiable, and irreversible supply subtraction, and no other asset in the reflexive-liquidation class offers its holders this option.

The destruction option has a theoretical consequence. For an asset whose value partially rests on expected supply constraints, a large holder’s credible destruction option is itself a positive contribution to the asset’s price, independent of whether the option is ever exercised. A marginal buyer can reasonably assign some probability to the destruction event and price bitcoin’s expected supply accordingly. The asymmetry works in favor of the asset: the destruction option adds supply discipline in expectation, the sale option adds supply in expectation, and the latter carries low probability for the holder who actually exists under the Section 4 profile.

The broader implication is that the study of illiquid and reflexive positions should incorporate voluntary-destruction mechanisms where available, and should treat the absence of such mechanisms in traditional assets as a binding constraint on the holders of those assets. For bitcoin specifically, the paper’s claim stands: across both analytical tracks, the Satoshi overhang is bounded on the downside and non-bearish across the terminal states most consistent with the record. It is not the existential tail risk the conventional framing implies.


Acknowledgements

I benefited from conversations with Shan Wang about the logic in the paper. As a matter of disclosure, my first real education about Bitcoin was in 2017 from a friend who had a single machine in his garage mining coin. I bought 1 BTC at that time just for fun, and I’ve not sold it. That’s my only financial interest in Bitcoin.


References

Almgren, R., and N. Chriss (2001). “Optimal Execution of Portfolio Transactions.” Journal of Risk, 3(2): 5-39.

Barclay, M. J., and C. G. Holderness (1989). “Private benefits from control of public corporations.” Journal of Financial Economics, 25(2): 371-395.

Bitcoin Wiki (2024). “OP_RETURN.” en.bitcoin.it/wiki/OP_RETURN. Accessed April 2026.

Blockchain.com (2026). “Total Circulating Bitcoin.” blockchain.com/charts/total-bitcoins. Accessed April 2026.

Bradbury, D. (2014, November 23). “How Dangerous is Satoshi Nakamoto?” CoinDesk. coindesk.com/markets/2014/11/23/how-dangerous-is-satoshi-nakamoto.

BtcDrak, M. Friedenbach, and E. Lombrozo (2015). “BIP 112: CHECKSEQUENCEVERIFY.” Bitcoin Improvement Proposals. github.com/bitcoin/bips/blob/master/bip-0112.mediawiki.

Carreyrou, J., with D. Freedman (2026, April 8). “My Quest to Solve Bitcoin’s Great Mystery.” The New York Times. nytimes.com/2026/04/08/business/bitcoin-satoshi-nakamoto-identity-adam-back.html.

Center for Art Law (2018, March 28). “Blockage Discounts and Artists’ Estates: The De Kooning Post-Mortem.” itsartlaw.org/case-review/blockage-discounts-and-artists-estates-the-de-kooning-post-mortem.

Estate of David Smith v. Commissioner, 57 T.C. 650 (1972), aff’d 510 F.2d 479 (2d Cir. 1975).

Estate of O’Keeffe v. Commissioner, T.C. Memo 1992-210.

Holderness, C. G. (2003). “A Survey of Blockholders and Corporate Control.” FRBNY Economic Policy Review, 9(1): 51-64.

Internal Revenue Code § 1014, 26 U.S.C. § 1014 (Basis of property acquired from a decedent).

IRS Publication 551 (2025). “Basis of Assets.” Internal Revenue Service.

Komlo, C., and I. Goldberg (2021). “FROST: Flexible Round-Optimized Schnorr Threshold Signatures.” In Selected Areas in Cryptography – SAC 2020, Lecture Notes in Computer Science 12804, pp. 34-65. Springer.

Kyle, A. S. (1985). “Continuous Auctions and Insider Trading.” Econometrica, 53(6): 1315-1335.

Lerner, S. D. (2013, April 17). “The Well Deserved Fortune of Satoshi Nakamoto, Bitcoin creator, Visionary and Genius.” Bitslog. bitslog.com/2013/04/17/the-well-deserved-fortune-of-satoshi-nakamoto.

Lerner, S. D. (2020, August 31). “Protection Over Profit: What Early Mining Patterns Suggest About Bitcoin’s Inventor.” CoinDesk. coindesk.com/tech/2020/08/31/protection-over-profit-what-early-mining-patterns-suggest-about-bitcoins-inventor.

Nakamoto, S. (2008). “Bitcoin: A Peer-to-Peer Electronic Cash System.” bitcoin.org/bitcoin.pdf.

Nick, J., T. Ruffing, and Y. Seurin (2021). “MuSig2: Simple Two-Round Schnorr Multi-Signatures.” In Advances in Cryptology – CRYPTO 2021, Lecture Notes in Computer Science 12825, pp. 189-221. Springer.

Rule 10b5-1, 17 C.F.R. § 240.10b5-1 (Trading on the basis of material nonpublic information in insider trading cases). U.S. Securities and Exchange Commission.

Shamir, A. (1979). “How to Share a Secret.” Communications of the ACM, 22(11): 612-613.

Soros, G. (1987). The Alchemy of Finance. Simon and Schuster.

Soros, G. (2013). “Fallibility, reflexivity, and the human uncertainty principle.” Journal of Economic Methodology, 20(4): 309-329.

Spence, A. M. (1973). “Job Market Signaling.” Quarterly Journal of Economics, 87(3): 355-374.

Todd, P. (2014). “BIP 65: OP_CHECKLOCKTIMEVERIFY.” Bitcoin Improvement Proposals. github.com/bitcoin/bips/blob/master/bip-0065.mediawiki.


Appendix A. Scenario arithmetic for the Track 1 absorption bound

This appendix presents three scenarios bounding the cumulative price impact of a patient OTC liquidation of the 1.148 million BTC Patoshi position relative to counterfactual. The analysis is stylized and partial-equilibrium. It is intended to discipline the qualitative claim of Section 3 that the mechanical bear case is bounded and to make the sensitivity to component assumptions explicit. The scenarios are not point forecasts.

A.1 Common parameters

Position size: 1.148 million BTC, consistent with the Patoshi cluster identified in Lerner (2013, 2020).

Effective float: approximately 16.3 million BTC, constructed as 20.01 million total mined (Blockchain.com, 2026) less approximately 3.7 million lost coins (Chainalysis-type adjustment).

Position as share of effective float at t=0: approximately 7.0 percent.

Reference price: 80,000 USD per BTC. Gross nominal position value: approximately 92 billion USD.

Demand elasticity ε_D is defined such that a 1 percent permanent supply increase, holding demand constant, produces a price change of approximately (1 + 0.01)^(-1/ε_D) – 1, which for small shifts is approximately -1/ε_D percent. The published empirical literature on bitcoin demand elasticity is thin, and we do not treat any narrow range as established. The interval from ε_D = 0.3 (highly inelastic) to ε_D = 1.5 (closer to equity-like) is used here as a heuristic sensitivity range intended to span the plausible calibrations that have been proposed in practitioner and academic discussion, not as a confidence interval. Reported sensitivities should be read accordingly.

A.2 Partial-equilibrium impact by elasticity

In a partial-equilibrium model with constant demand elasticity, a full liquidation of the Patoshi position returns 7.0 percent of effective float to the market, producing a price change relative to counterfactual of approximately (1.07)^(-1/ε_D) – 1. The table below reports this impact for three elasticity values spanning the heuristic range described in A.1.

Elasticity ε_D Cumulative partial-equilibrium impact
1.5 (equity-like) approximately -4.4 percent
0.7 (central) approximately -9.2 percent
0.3 (highly inelastic) approximately -20.2 percent

Elasticity is the dominant source of variation. Halving ε_D from 0.7 to 0.3 approximately doubles the impact; doubling ε_D from 0.7 to 1.5 approximately halves it. Demand growth over the disposition period affects the absolute price path (bitcoin’s realized price at horizon is higher under higher demand growth in both the counterfactual and the disposition cases) but under the constant-elasticity log-linear form used here the relative impact between the two cases is invariant to the demand-growth trajectory. The analysis is static partial-equilibrium: it holds elasticity constant and treats market-makers as passive. A dynamic equilibrium in which strategic market-makers internalize the announced program over a multi-year horizon would likely produce smaller realized impact than reported here, because the anticipated supply would be priced into positioning ahead of the flow rather than absorbed on contact.

A.3 Execution-friction adjustment

The partial-equilibrium impact above is the permanent component attributable to the supply shift. Execution through public markets introduces an additional temporary component whose magnitude depends on participation rate and execution quality. Empirical anchors bracket this component. Practitioner accounts describe disciplined OTC execution as producing small, non-accumulating per-block impact, well below the impact of public-venue state-actor sales. Sloppy or highly public execution contributes more: the German BKA 2024 episode moved approximately 50,000 BTC in weekly public tranches and coincided with a 15 to 20 percent decline, attributable in part to leverage unwinds, macro effects, and execution visibility rather than permanent supply shift.

A.4 Three scenarios

Scenario A (conservative). ε_D = 1.5, disciplined OTC execution, 12-year horizon, participation rate approximately 0.14 percent of real daily spot volume, pace approximately 95,600 BTC per year. Permanent impact approximately -4.4 percent. Execution friction approximately 1 to 2 percent. Total cumulative impact relative to counterfactual: approximately -5 to -6 percent.

Scenario B (base). ε_D = 0.7, disciplined OTC execution, 10-year horizon, participation rate approximately 0.17 percent of real daily spot volume, pace approximately 114,800 BTC per year. Permanent impact approximately -9.2 percent. Execution friction approximately 2 to 3 percent. Total cumulative impact: approximately -11 to -12 percent.

Scenario C (aggressive). ε_D = 0.3, mixed execution quality (partial public venue), 5-year horizon, participation rate approximately 0.34 percent of real daily spot volume, pace approximately 229,600 BTC per year. Permanent impact approximately -20.2 percent. Execution friction approximately 3 to 5 percent. Total cumulative impact: approximately -23 to -25 percent.

A.5 Sensitivity and anchors

The dominant sensitivity is to elasticity. Execution quality shifts the realized impact by a factor of roughly 3 to 5 relative to the permanent-component calculation, consistent with the observed gap between the German BKA anchor (approximately -15 to -20 percent under public-venue execution) and the Silk Road Marshals-auction anchors (approximately -2 to -5 percent per tranche under institutional execution). Scenario C approaches the German anchor under its adverse assumption stack; Scenario A approaches the Silk Road anchor under its favorable stack; Scenario B lies between them.

A.6 Takeaway

Across plausible calibrations of elasticity, execution quality, selldown pace, and participation rate, the cumulative mechanical price impact of a patient Satoshi OTC liquidation falls in the range from approximately 5 percent (conservative) to approximately 25 percent (aggressive) relative to counterfactual. The central scenario clusters near 10 to 12 percent. No plausible calibration supports the existential-tail framing that implicitly values the position at a fraction of pre-event price. The bear case is bounded by the arithmetic.

allocation of floor area to function of housing

Minimum Spatial Housing Requirements for Human Flourishing

Karl T. Ulrich

University of Pennsylvania. The Wharton School. Operations, Information, and Decisions Department, Philadelphia, PA 19104, USA

KT Ulrich. Minimum Spatial Housing Requirements for Human Flourishing. Buildings. 2025. Volume 15.
(full text below PDF)

Abstract

This study defines evidence-based minimum internal floor areas required to support long-term residential use across different household types. It addresses the following question: what is the smallest viable floor area that supports sustained occupancy without persistent stress, conflict, or turnover? An integrative review method was employed, drawing from behavioural studies in environmental psychology, international regulatory standards, and real-world market data. The analysis focuses on essential domestic functions including sleep, hygiene, food preparation, storage, social interaction, and work. Quantitative findings from tenancy surveys, post-occupancy research, and market performance data indicate that residential units below 30 square metres for single occupants and 45 square metres for couples are consistently associated with reduced satisfaction and shorter tenancies. Regulatory minimums across diverse jurisdictions tend to converge near these same thresholds. The study proposes technical minimums of 30, 45, and 60 square metres for one-, two-, and three-person households, respectively. These values reflect functional lower bounds rather than ideal or aspirational sizes and are intended to inform performance-based housing standards.

Keywords: minimum home size; affordable housing; floor area; unit size; housing standards; micro housing; nano housing; tiny homes

1. Introduction

In the face of a global affordability crisis, housing systems increasingly rely on compact dwellings to expand supply in urban areas. However, the pursuit of higher densities and lower construction costs often proceeds without robust empirical guidance on how small is too small. While attributes such as energy efficiency, structural integrity, and environmental impact are routinely measured and regulated, internal floor area, the most fundamental spatial parameter of dwelling performance, is inconsistently addressed in most building codes [1,2].

A growing body of research suggests that below certain spatial thresholds, residential dwellings may no longer support the basic conditions for health, autonomy, and psychological well-being. Evidence from environmental psychology, building design, post-occupancy studies, and housing markets points to consistent patterns in how households respond to limited living space. These include rising levels of stress, shorter tenancy durations, reduced satisfaction, and increased rates of residential turnover [3–5].

This paper addresses the following central research question:

What is the minimum internal floor area required for a housing unit to support long-term human flourishing?

To answer this question, the study draws on evidence from multiple domains, including environmental psychology, regulatory frameworks, and market behaviour. Rather than focusing on average or desirable housing sizes, it aims to identify the technical and functional minimum: the smallest internal floor area that allows a household to carry out essential domestic activities over time without persistent stress, conflict, or risk of displacement.

Recent policy developments and market research reinforce the urgency of establishing empirically grounded spatial minimums. The 2025 update to the UK national space standards affirms 37 square metres as the lowest acceptable internal area for a single-person dwelling, reflecting a continued reliance on point-value thresholds in regulatory guidance [6]. Recent analysis of English space standards reveals ongoing tensions between affordability pressures and adequacy requirements in social housing provision [7]. In North America, compact living is increasingly framed as a mainstream strategy for achieving affordability and urban density [8]. Empirical evidence from land-use reforms shows that relaxing regulatory constraints can increase available living space while reducing per-unit cost burdens [9]. These developments underscore the need to define lower spatial bounds using behavioural and functional indicators, not solely historical precedent or policy negotiation.

This study differs from previous work by focusing not on average or desirable unit sizes, but on identifying functional lower bounds for long-term residential use. While many studies explore housing quality, affordability, or density in isolation, few integrate evidence across behavioural psychology, regulatory standards, and market performance to define minimum viable space. This triangulated approach yields floor area thresholds that are both technically grounded and practically relevant. By aligning spatial adequacy with real-world behaviour, the findings offer a performance-based framework that can inform policy, design, and code development in diverse contexts.

1.1. Supporting Questions and Policy Relevance

In identifying these minimum floor area thresholds, the study also explores several supporting questions:

  • What are the spatial requirements for core domestic functions such as sleep, hygiene, food preparation, storage, social interaction, and remote work?
  • How do regulatory standards for minimum dwelling size vary across jurisdictions, and how well do they align with behavioural and psychological evidence?
  • What does market behaviour reveal about the practical limits of compact housing, especially in high-cost urban environments?
  • At what point does spatial inadequacy lead to measurable declines in satisfaction, tenure stability, or mental health?

These questions have direct implications for several pressing housing challenges. Among them are:

  • What is the lowest feasible cost for delivering liveable housing at scale?
  • How much residential floor area is required to support a given urban population within environmental constraints?
  • What unit sizes are most appropriate for modular and prefabricated housing systems?
  • How should housing standards evolve to address the rise in single-person households and the increasing prevalence of remote work?

None of these issues can be addressed meaningfully without a baseline understanding of how much space people need, at a minimum if not an ideal, to sustain daily life in a stable, healthy, and autonomous manner.

1.2. Global Household Size and Composition: A Foundation for Space Standards

Average household size has fallen worldwide for more than four decades. United Nations data show a decline from approximately 4.9 persons per household in 1980 to about 4.0 in 2020, with the median country now near or below four persons [10,11]. The trend is especially pronounced in high-income economies. The OECD reports a current average household size of 2.4 persons, with more than one third of member countries now composed primarily of one- and two-person households [12,13].

Rapidly ageing East Asian societies display the same pattern. Japan’s 2020 census records 2.3 persons per household nationally and 1.9 in the Tokyo metropolitan core [14]. South Korea reports 2.2 persons nationwide and fewer than 2.0 in Seoul [15].

Small households already dominate urban housing demand. Across large European metropolitan areas, single-person dwellings account for 35 to 45 percent of occupied units, while two-person households add another 30 to 35 percent [13]. United Nations Habitat analysis finds a similar structure in rapidly urbanising regions of Latin America and East Asia, although three-person households remain slightly more common there [16]. Table 1 summarises these proportions and links them to their principal drivers: population ageing, delayed marriage, declining fertility, and the rise in solo living among both younger and older adults [17,18].

Table 1 . Three Size Categories of Households.

Household TypeEstimated ShareTrendDemographic BasisDesign Relevance
Single-Person28–35%Strongly
increasing
Ageing, urban migration, autonomyMost common compact unit
Two-Person32–38%IncreasingCouples without children, retireesKey transitional household
Three- to Four-Person30–35%Stable/DecliningNuclear families, emerging middle classCore benchmark for family units

Multigenerational and group households continue to be significant in parts of Africa, South Asia, and the Middle East, yet their spatial requirements differ enough to warrant separate treatment later. For the compact-dwelling typology that follows, the focus remains on one- to four-person nuclear households, which represent the bulk of future housing demand in urbanised regions.

1.3. Methodological Approach

This study uses an integrative review methodology to identify evidence-based minimum internal floor areas suitable for long-term residential use. The aim is to synthesise findings from multiple domains, including environmental psychology, building regulation, and housing market behaviour, to establish the functional lower bounds of spatial adequacy. An integrative review is appropriate in this context because the evidence base includes both peer-reviewed research and grey literature and spans diverse methodological formats [19].

The analysis proceeded in three phases. First, spatial requirements for core domestic functions were identified from studies in environmental psychology, ergonomics, and post-occupancy evaluation. These functions include sleeping, hygiene, food preparation, social interaction, storage, and remote work. Second, national and local building codes were reviewed to assess formal minimum size requirements. Sources included official building standards, planning documents, and housing regulations across multiple jurisdictions. Third, market typologies and occupancy outcomes were examined in cities where compact dwellings are widely built and occupied. Data sources included tenancy duration surveys, resident satisfaction reports, real estate market analyses, and public-sector housing datasets.

Findings from the three domains were organised using a comparative matrix. Where psychological thresholds, regulatory minimums, and observed market behaviour consistently aligned, a floor area threshold was proposed. These proposed values are defined as technical lower bounds for sustained residential use. They are not intended as normative goals or average sizes.

Because the study integrates both academic and non-academic materials, no systematic keyword protocol or database screening process was used. Sources were selected for empirical specificity (such as quantified thresholds for domestic activities), jurisdictional diversity (including regulatory frameworks from multiple continents), and conceptual relevance to spatial sufficiency and long-term residential stability. The review drew from more than 100 documents, including peer-reviewed studies, government standards, industry reports, and post-occupancy surveys. Although no formal date cut-off was imposed the majority of sources were published after 2000, with priority given to post-2010 studies where available. This flexible approach supports evidence-informed design and policy decisions where minimum standards must be reconciled with lived outcomes.

2. Spatial Requirements for Well-Being and Function

A building’s spatial adequacy is a key component of its performance. While regulatory minimums and market behaviour often reflect political compromise or economic constraint, they do not necessarily ensure long-term psychological comfort, physical health, or functional usability. This section synthesises research from environmental psychology, ergonomics, post-occupancy evaluation, and time-use studies to identify the threshold at which housing space continues to support core human activities and enables flourishing. These insights inform our proposed floor area minimums by grounding them in behavioural and functional performance rather than availability, aesthetics, or tradition. A growing body of research has examined the relationship between spatial parameters and occupant well-being at the urban and building scale [20–22]. However, these studies primarily focus on external or neighbourhood-scale spatial qualities, whereas this work addresses the spatial adequacy of the home itself.

2.1. Crowding, Density, and Psychological Stress

Crowding refers to the subjective experience of having insufficient personal space, while density refers to the number of people per unit area. Studies consistently show that it is perceived crowding, not density alone, that predicts negative psychological outcomes [21]. Evans et al. [3] found that residential crowding, often defined as fewer than 15–20 square metres per person or more than one person per room, is associated with chronic stress, cognitive delays in children, and elevated cortisol levels. These psychological impacts can be quantified using established well-being valuation methodologies [23,24], enabling systematic assessment of housing adequacy’s broader social costs. These effects tend to occur once space drops below key thresholds, suggesting the presence of spatial tipping points in psychological resilience.

Research on social housing in Great Britain provides additional evidence of these threshold effects. Hickman [25] demonstrates that inadequate housing conditions significantly impair residents’ ability to maintain social connections and access “third places” for community interaction, with spatial constraints being a key factor in social isolation. Furthermore, longitudinal studies of tenancy sustainment reveal that housing inadequacy, including insufficient space, is strongly associated with tenancy breakdown and residential instability [26]. These findings reinforce the importance of establishing evidence-based spatial minimums that support not only individual well-being but also community cohesion and housing stability.

2.2. Privacy, Control, and Territorial Function

The ability to regulate one’s environment through visual, acoustic, and spatial boundaries is central to housing performance. Ozaki and Lewis [27] identify four domains of privacy: personal, informational, territorial, and acoustic. When these are compromised, occupants report increased psychological distress and decreased satisfaction. Kopec [28] finds that households in compact units often struggle to maintain behavioural autonomy, particularly in relation to partners or children. Our synthesis suggests that below 30 square metres for single adults and 45 square metres for couples, housing units frequently fail to support the full range of privacy functions, even with good design.

2.3. Activity-Based Space Requirements

Post-occupancy evaluations and time-use studies offer detailed insight into the spatial requirements of essential domestic activities. While cultural context and layout quality affect thresholds, certain space needs recur across geographies and dwelling types.

  • Sleeping typically requires 5 to 7 square metres per person, accounting for bed size, circulation, and storage. A single bed with access on one side requires approximately 4.5 square metres, while a double bed needs 6 to 8 square metres for comfort [29,30].
  • Food preparation and dining require a minimum of 4 to 6 square metres per household. Research on kitchen ergonomics shows that functionality depends more on workflow efficiency than household size [31,32]. Kitchens smaller than 3.5 square metres are associated with reduced satisfaction and increased time inefficiency [33].
  • Hygiene facilities require 3 to 4 square metres to comfortably accommodate toilet, basin, and shower fixtures. Although compact bathrooms can function in as little as 2.5 square metres, users consistently prefer bathrooms of at least 3.5 square metres for comfort and accessibility [29].
  • Socialising and relaxation typically require 7 to 12 square metres, depending on household size. Single-person households can function with 6 to 8 square metres for a small seating area and media use, while larger households require more space to accommodate multiple users simultaneously [34,35].
  • Work or study requires 2 to 3 square metres per working occupant. This accommodates a desk, task chair, and minimal storage. Spaces below 2 square metres are associated with reduced productivity and increased fatigue [36–38].
  • Storage needs average 3 to 4 square metres per person, including space for clothing, equipment, seasonal items, and household supplies. When this falls below 2.5 square metres per person, spatial disorder and visual clutter increase significantly [29,39,40].

When units fall below these thresholds, “activity compression” occurs. In these cases, essential tasks begin to overlap or displace one another as follows: sleeping occurs in living spaces, eating happens on beds, or work is done in hallways. Bratt [41] describes how such compression can degrade usability and lead to residential dissatisfaction. Lawrence [42] and Després [43] emphasise that these spatial compromises are not just inconvenient, but symbolically and psychologically disruptive. Over time, the cumulative effect of compressed or improvised functions erodes the domestic environment’s ability to support stability, identity, and autonomy.

2.4. Adaptation and Design Moderators

Residents employ a variety of coping strategies in very small dwellings. Temporal zoning, selective use of common areas, personalisation of limited surfaces, and cognitive reframing can all moderate the feeling of crowding [44]. High ceilings, abundant daylight, generous built-in storage, and carefully framed views increase perceived spaciousness and postpone fatigue. Even so, longitudinal evidence shows that adaptation has clear limits once floor area drops below roughly twenty square metres per person.

Several dense-city studies illustrate those limits. In Hong Kong, sixty per cent of households occupying flats smaller than twenty-five square metres rate their living space as unsatisfactory, and a majority intend to relocate within two years [45,46]. In Tokyo, micro-apartments under twenty square metres support average tenancies of only 1.8 years, whereas similar buildings with twenty-five square metres or more achieve average stays of 3.2 years [47]. Hong Kong turnover is likewise highest in nano-flat households that share less than twenty square metres per person [5]. Mumbai surveys show a parallel pattern, with dissatisfaction and intent to exit rising sharply after twelve to eighteen months in eighteen to twenty-five square metre “nano homes” [48].

Good design can reduce noise, create visual depth, and provide multi-functional furniture that stretches usability for short periods; however, extended residence below the twenty-square-metre threshold consistently produces higher stress, clutter, and social friction. Across cases, ingenuity and supportive management can delay but not eliminate the physiological and psychological burdens imposed by extreme spatial constraint.

2.5. Cultural Modifiers of Spatial Expectation

Cross-cultural research shows that acceptable space standards vary with social norms, and that urban context, including spatial parameters of neighbourhoods and access to green spaces, also significantly affect well-being [22,49]. East Asian residents, for example, often rate smaller units as acceptable due to norms of compact living, floor-sitting, and shared public amenities. Ozaki and Lewis [27] found that Japanese households consider 20% less space acceptable than matched British counterparts. However, Whiteford and Hoff [50] show that all cultures share basic needs for control, quiet, and autonomy, indicating that minimums can vary slightly but not be abandoned altogether.

2.6. Children and Developmental Needs

Children are particularly sensitive to spatial inadequacy. Overcrowded homes (under 8 m2 per child) are linked to lower academic performance, behavioural problems, and impaired sleep [51]. Developmental needs include quiet study space, physical separation for sleep, and play areas. These requirements support higher minimums for family dwellings.

2.7. Remote Work and the New Domestic Function

Post-pandemic housing must now accommodate remote work as a standard rather than exceptional domestic function. The rapid shift to home-based work has fundamentally altered spatial requirements, with implications for minimum housing standards that previously assumed work occurred outside the dwelling unit.

Effective home workspaces require an additional 3–5 m2 per worker beyond traditional residential functions, with minimum dimensions of 1.2 m × 1.5 m[M18] [KU19]  to accommodate desk, chair, and equipment storage [52,53]. This represents a 15–20% increase in space requirements for households with remote workers. Studies of telework environments indicate that workspaces below 2.5 m2 per user result in reduced productivity, increased physical discomfort, and higher rates of work-related stress [54,55].

The quality of home workspace significantly affects both work performance and residential satisfaction. Research during COVID-19 lockdowns found that lack of dedicated workspace in units under 30 m2 was strongly associated with depressive symptoms, anxiety, and decreased job satisfaction [56,57]. Workers in compact housing without defined work zones reported difficulty maintaining work–life boundaries, leading to extended working hours and reduced recovery time [58].

Work zones must be visually and acoustically distinct from other domestic functions to support cognitive performance and psychological boundaries [59,60]. Open-plan arrangements where work occurs in living or sleeping areas show measurably lower task performance and higher stress indicators compared to spatially separated workspaces [61,62]. Even temporary visual barriers or acoustic separation can improve work effectiveness in constrained spaces [63,64].

Ergonomic requirements for sustained computer work, including proper desk height (72–76 cm), chair clearance (minimum 60 cm behind desk), and screen distance (50–70 cm), establish minimum spatial envelopes that cannot be compressed without health consequences [65,66]. Studies of home-based workers show increased musculoskeletal problems when workspace dimensions fall below ergonomic minimums [67,68].

The acoustic environment proves equally critical for remote work functionality. Research indicates that background noise levels above 50 dB significantly reduce cognitive performance and increase fatigue in knowledge work [69,70]. Compact housing often struggles to provide acoustic separation between work and domestic activities, particularly in units below 40 m2 where spatial buffering is limited [71].

Storage requirements for work equipment add approximately 0.5–1 m2 per remote worker, including space for technology, documents, and professional materials that cannot be integrated with household storage [52,72]. Inadequate work storage leads to spatial spillover that compromises both work efficiency and domestic function.

These findings suggest that housing units intended to accommodate remote work require baseline increases of 20–25% over pre-pandemic spatial standards. For single-person units, this elevates functional minimums from approximately 25 m2 to 30 m2, while couple units require approximately 45 m2 to maintain both residential quality and work functionality. Units that cannot accommodate these expanded requirements may function for short-term residence but prove inadequate for sustained occupancy in an economy increasingly dependent on home-based work.

The thresholds identified in behavioural and post-occupancy research provide a functional basis for determining spatial adequacy. These findings set reference points for evaluating whether existing regulations reflect the space required to support long-term residential well-being. The next section surveys regulatory standards to assess how policy frameworks correspond to the evidence on domestic activity needs and psychological thresholds.

3. Global Regulatory Standards and Spatial Minimums

While spatial adequacy is grounded in functional and behavioural needs, housing regulations set the operational boundaries for what can be legally built. This section reviews regulatory minimum dwelling sizes across diverse jurisdictions to assess the extent to which existing standards align with or diverge from empirically defined spatial thresholds.

3.1. Regulatory Diversity and Spatial Baselines

Minimum dwelling-size rules differ sharply from one world region to the next. A west-to-east review, following the ordering in Table 2, highlights both the variety of legal instruments and the narrow band into which many numeric thresholds ultimately converge. Recent analyses underscore the need for clearer alignment between minimum standards and lived spatial needs, especially in the context of regulatory reforms in East Asia [73].

North America begins with a code baseline rather than a whole-unit floor. All fifty US states adopt the International Building or Residential Code, whose only size mandate is a habitable room of at least 11 m2; every dwelling must contain at least one such room [74]. Canada’s Ontario Building Code raises that figure to 13.5 m2 for a living room, while most bedrooms must be at least 7 m2 [75]. Large Canadian cities then overlay per-person limits: Toronto requires a minimum of 9 m2 of usable floor area for each adult occupant, enforceable through its property-standards by-law [76]. In short, North American regulation relies on room-by-room rules or occupancy caps rather than a fixed flat-size plateau.

Latin America and the Caribbean show a split between market housing and social-housing programmes. Mexico City’s 2018 construction code sets a statutory lower bound of 25 m2 net floor area for any new apartment, giving the region’s most compact private-sector limit [77]. Brazil and Chile impose larger figures but only where federal or national subsidies are involved: Brazil’s relaunched Minha Casa Minha Vida requires 41.5 m2 for an apartment and 40 m2 for a single-family house [78], while Chile’s DS-49 programme mandates 40 m2 for subsidised dwellings [79]. Outside those programmes, private developments can be smaller.

Europe embeds minimums directly in national legislation but uses two distinct logics. England fixes a studio plateau of 37 m2 gross internal area through the Nationally Described Space Standard [80]. Sweden’s building code recognises 35 m2 as the lower limit for a self-contained unit because many accessibility concessions apply only when the dwelling is larger [81]. Germany, France, and the Netherlands regulate crowding instead: Germany’s Länder bar lettings that fall below about 9 m2 per adult [82], France requires 14 m2 per person for the first four occupants [83], and the Dutch Bouwbesluit demands at least one living area of 18 m2 in every dwelling [84]. Despite different metrics, these rules cluster between 35 m2 for a single occupant and 14–18 m2 per person in multi-person units.

Africa combines per-room codes with programme minima. Kenya’s draft National Building Code sets 7 m2 usable floor area for every habitable room, creating a practical lower bound for one-room units if the draft is adopted [85]. Lagos State’s planning standards require 10.8 m2 for each habitable room, slightly higher than Kenya’s figure [86]. South Africa’s subsidy programme for Reconstruction and Development Programme housing mandates 40 m2 gross floor area for a detached house, but that rule applies only to publicly funded units [87].

South Asia shows a single national guideline. India’s Affordable Housing in Partnership scheme fixes 25 m2 carpet area as the minimum self-contained dwelling that can receive central subsidy; states vary in their own policies, with some (for example Maharashtra) raising the floor to about 27 m2 [88].

East and Southeast Asia illustrate every tool in the regulatory toolbox. Japan’s Building Standards Act sets no flat-size floor but insists every habitable room be at least 7 m2 internal area [89]; one-room apartments therefore start at that point but market practice in Tokyo rarely goes below 15 m2 [90]. Hong Kong imposes a 26 m2 saleable-area minimum on all new private projects through land-lease conditions, while its public-rental sector follows a 7 m2 per-person allocation rule [91]. Singapore limits private and public studios outside the central area to 35 m2 gross floor area [92]. Mainland China’s national design code fixes 22 m2 usable floor area for a self-contained dwelling, though some provinces accept smaller single-occupant units [93]. Together these standards span a range from per-room minima (Japan) to whole-unit floors that rise from 22 m2 in China, through 26 m2 in Hong Kong, to 35 m2 in Singapore.

Table 2. Standards for minimum housing size by region.

RegionJurisdictionMin. Size (m2)
(Dominant
Household)
Metric †Enforcement ‡SectorSources
North AmericaUnited States (IBC/IRC model code)11 m2
(one-room dwelling)
IFASPrivate[74]
 Canada—Ontario Building Code13.5 m2
(living room)
IFASPrivate[75]
 Toronto (Property Standards)≥9 m2/personIFAAPrivate/Public[76]
Latin America and CaribbeanMexico—Mexico City Construction Code25 m2
(dwelling)
IFASPrivate[77]
 Brazil—Minha Casa Minha Vida (programme)41.5 m2 (apartment);
40 m2 (house)
IFAAPublic programme[78]
 Chile—DS49 Social Housing (programme)40 m2 (dwelling)IFAAPublic programme[79]
EuropeUnited Kingdom37 m2
(1 person, 1-storey)
GIASPrivate[80]
 Sweden35 m2 (1 person)GIASPrivate[81]
 Germany≥9 m2/personIFASPrivate[82,94]
 Netherlands18 m2
(living-space benchmark)
IFASPrivate[2,4,95]
 France14 m2/person
(≤4 persons)
IFASPrivate[83]
AfricaKenya—National Building Code 2024 (draft)7 m2/habitable roomIFAS
(draft)
Private[85]
 Nigeria—Lagos State10.8 m2/habitable roomIFASPrivate[86]
 South Africa—RDP Housing Norm40 m2
(subsidised house)
GFASPublic programme[87]
South AsiaIndia—Affordable-Housing Guidelines25 m2 (dwelling)CAAPublic–private[88]
East and Southeast AsiaTokyo (Japan)7 m2/habitable roomIFASPrivate[89]
 Hong Kong26 m2 (flat)SALPrivate[91,96
 Singapore35 m2 (studio)GFASPrivate[92]
 Mainland China22 m2 (dwelling)IFASPrivate[93]
OceaniaAustralia—New South Wales35 m2 (studio)IFASPrivate[97]
 New Zealand—Auckland Unitary Plan35 m2
(self-contained unit)
GFAZPrivate[98]

Metric codes—IFA = internal floor area; SA = saleable area; GIA = gross internal area; CA = carpet area; GFA = gross floor. ‡ Enforcement codes—S = statutory building/planning rule; L = mandatory land-lease/development agreement; A = administrative guideline or allocation rule; Z = zoning overlay.

Oceania closes the sweep with parallel state and city-level rules. New South Wales requires 35 m2 internal floor area for a studio and larger plateaus for bigger flats, a template most Australian jurisdictions now follow [97]. New Zealand’s Auckland Unitary Plan adopts the same numeric threshold, 35 m2 gross floor area, for any self-contained dwelling in the city’s medium-density zones [98].

Despite wide variation in enforcement tools, including statutes, land-lease clauses, programme rules, and zoning overlays, the numeric floors for long-term single occupancy cluster between 22 m2 and 37 m2, with many jurisdictions converging on the 30–35 m2 band. Where standards are expressed per person, figures fall between 7 m2 and 14 m2, again bracketing the 10 m2 mark. These convergences support the minimum thresholds advanced later in this paper while also revealing the political and economic pathways by which different societies pursue spatial adequacy.

3.2. Historical Patterns and Regulatory Evolution

Minimum space rules first appeared in nineteenth-century public health reforms that targeted overcrowded tenements in rapidly industrialising cities. Early by-laws in London, such as the 1890 Housing of the Working Classes Act, and New York’s 1901 Tenement House Act emphasised daylight, ventilation, and a minimum cubic volume of air per person [99,100]. After the Second World War, the welfare-state consensus encouraged many countries to adopt far more generous space norms. In England, the Parker Morris standards required about eighty-eight square metres for a four-person dwelling along with detailed room and storage requirements [101]. Sweden’s Million Programme of 1965–1974 pursued a comparable goal, producing large, well-equipped flats that averaged more than thirty square metres per person [102].

Fiscal restraint and deregulation in the late 1970s and early 1980s reversed this expansion. The Parker Morris requirements, mandatory for public housing since 1967, were formally withdrawn in 1980 as part of wider expenditure cuts [103]. Similar retrenchment occurred in Australia, Canada, and parts of continental Europe, leading to divergent national standards and shrinking average unit sizes during the 1980s and 1990s.

Since the early 2000s governments have renewed interest in minimum spatial benchmarks, driven by housing affordability crises, demographic change, and growing evidence that extreme compactness harms well-being. England adopted the Nationally Described Space Standard in 2015, restoring a minimum of thirty-seven square metres for a one-person one-storey dwelling [80]. Mexico City introduced a twenty-five square metre minimum in 2018 [77], and Hong Kong recently decided that new private flats under public land lease must not be smaller than twenty-six square metres [91]. The post-war expansion, subsequent rollback, and recent re-regulation illustrate how space standards respond to shifting social priorities as much as to technical or market constraints.

4. Revealed Preferences in Space-Constrained Markets

While regulatory standards establish what may be built and behavioural studies indicate what ought to be optimal, the lived experience of residents in compact dwellings shows the practical limits of spatial sufficiency. Observed occupancy behaviour—including tenancy duration, mobility patterns, satisfaction levels, and housing-application choices—supplies a form of revealed preference that complements normative frameworks. In the highest-cost urban markets, where land is scarce and demand intense, dwellings are routinely produced and occupied close to the lower spatial thresholds. These situations show not only what households tolerate but also when they decide to leave, adapt, or forgo particular housing options entirely.

In Hong Kong, flats smaller than forty square metres represented about one fifth of all private sales between 2019 and 2021 [104]. Micro-units of fifteen to twenty square metres are now a recognised market segment for single professionals and investors. Multiple studies document low satisfaction in such dwellings: a survey of subdivided-unit tenants found pervasive stress, health complaints, and strong intentions to relocate [46], while another study reported that sixty percent of residents in units below twenty-five square metres described themselves as “unsatisfied” or “very unsatisfied” with living space [45] (Lau & Wei, 2018). Dissatisfaction intensifies when more than one person shares these micro-flats [105].

Tokyo displays a similar pattern in its “one-room mansions,” typically fifteen to twenty-five square metres and roughly a third of the central-city rental stock. Although prized for location and efficiency, these units are mostly occupied by students and early-career professionals. Data show that apartments below twenty square metres support average tenancies of just 1.8 years, whereas those above twenty-five square metres average 3.2 years [106].

In New York City, the adAPT NYC pilot introduced twenty-four to thirty square metre studios featuring integrated storage and convertible furniture. Initial surveys recorded positive evaluations, yet a follow-up by the city’s housing department found that nearly half of occupants sought larger homes within eighteen months, especially after remote work became common [107]. London’s new co-living schemes, offering private rooms of twenty-four to thirty square metres with shared amenities, function mainly as stop-gap housing for mobile professionals, with average stays of eight to fourteen months [108].

Singapore presents a mixed public–private picture. Government-built studio flats of thirty-five square metres maintain overall satisfaction above ninety percent, although the most frequently cited reason for intending to move is the desire for more personal space [109]. In the private sector, so-called “shoebox” apartments of thirty-five to forty-five square metres attract single residents, yet market surveys indicate that more than one third of prospective buyers see lack of space as the main push factor [110].

Emerging-market cities employ compact housing as an affordability strategy. In Mumbai, “nano homes” of eighteen to twenty-five square metres target lower-middle-income singles; sixty-five percent of single buyers in a 2020 study accepted such units, but only thirty-one percent of couples found them viable [111]. São Paulo offers ten to fifteen square metre micro-apartments for mobile professionals, with average occupancy of eleven months [112]. South-African surveys show that single or childless adults will accept twenty-five to thirty square metres if build quality is high, whereas households with children express dissatisfaction below fifty square metres [113].

Across these cities, one pattern recurs. Single-person households are more tolerant of spatial constraints than couples or families, but only up to roughly thirty square metres. Tenancy duration rises by about one year for each additional ten square metres between fifteen and forty square metres [5,106]. Units smaller than twenty-five square metres exhibit sharply higher turnover, especially when occupied by more than one person.

Cultural differences shape how space is perceived, yet they do not erase the underlying tolerance thresholds. Japanese households may consider about twenty percent less space acceptable than British counterparts [27], and residents in Seoul or Hong Kong may normalise tighter dwellings. Nonetheless, units below twenty square metres consistently trigger stress, intent to relocate, and functional strain when more than one person is present [49,50].

Market data reinforce these behavioural findings. Rents per square metre rise steeply once units exceed twenty-five square metres in markets such as Hong Kong and Tokyo [5,90]. Application patterns also reveal preferences: in Singapore, studio flats of thirty-five square metres attract longer waiting times than forty-five square metre one-bedrooms despite higher total prices [109]. New York micro-units draw forty percent fewer lottery applications per available unit than conventional one-bedrooms [114].

Together, these cases underline the convergence of behavioural, economic, and cultural indicators. For single occupants, around thirty square metres is broadly viable. For two-person households, the floor rises to about forty-five square metres. Below these thresholds, dissatisfaction, early turnover, and lower application demand signal that a critical spatial limit has been breached. Compact units smaller than thirty square metres can serve short-term or transitional roles, but long-term stable occupancy, particularly for more than one person, consistently favours larger space.

5. Empirical Benchmarks from Constrained Housing Environments

The case studies in Section 4 reveal consistent thresholds at which spatial constraint begins to undermine occupancy duration, satisfaction, and housing stability. This section synthesises those patterns into an empirical typology. Drawing on formal and informal housing models, it identifies the floor area per person typically associated with sustained residence. Relationships are organised along two axes: internal space per occupant and typical length of stay. Together they outline a spatial envelope inside which compact housing continues to function without elevated turnover, dissatisfaction, or forced mobility.

Figure 1 plots each housing type by modal floor area per person and expected occupancy duration. Icons indicate typical household size and whether private kitchen and bathroom facilities are present. The horizontal axis measures duration in days (capped at three years) and the vertical axis shows floor area per person (capped at forty square metres to keep the focus on constrained dwellings). A dashed line marks a notional lower bound for long-term viability, derived from tenancy and satisfaction data.

Graph illustrating typical floor area per occupant (in square meters) versus typical duration of living period (in days) across various housing types. The graph highlights critical thresholds for long-term viability in compact housing contexts, with annotations indicating various types of accommodations.

Figure 1. Typical floor area per occupant and typical duration of residence across a range of constrained housing types. Points are annotated with the number of typical occupants and indicate whether private bathrooms and kitchens are included. The dashed line represents an approximate lower bound for residential use. Approximate modal values shown. (Graphic by author.).

At the lower end of the envelope sit dwelling types intended for very short stays or institutional use. Cruise cabins, emergency shelters, and São Paulo’s ten to fifteen square metre mini apartments fall here; these private, self-contained units support average stays shorter than one year [112]. Military barracks and student dormitories also appear in this zone. In the United States, unaccompanied military housing offers roughly thirteen to seventeen square metres per person in shared rooms, while standard dormitories provide about nine to eleven square metres per person, with longer stays enabled by institutional context and communal support systems.

A second cluster contains micro-units and efficiency apartments that offer more autonomy but remain tightly constrained. Tokyo’s one-room mansions and Hong Kong’s nano flats provide roughly fifteen to twenty-five square metres per person. Although popular with single professionals and students, tenancy data show average stays under three years and high turnover when more than one person shares these units [5,106]. New York’s adAPT pilot studios, twenty-four to thirty square metres, achieved moderate early satisfaction but saw notable attrition after eighteen months, particularly among residents working from home [107]. London co-living schemes display similar patterns, with average stays of eight to fourteen months [108].

The upper band of the envelope includes units that support longer residence and higher satisfaction. Mumbai’s eighteen to twenty-five square metre nano homes work for singles but show marked dissatisfaction among couples [111]. Government-built studio flats of thirty-five square metres in Singapore report satisfaction above ninety per cent, yet the main reason households give for planning to move is the wish for more personal space [109]. National market polling shows that thirty-six per cent of prospective buyers of private shoebox apartments cite lack of space as their chief concern, versus seventeen per cent among those considering units larger than forty-five square metres [110]. In Hong Kong, surveys find that sixty per cent of residents in units below twenty-five square metres rate their living space as unsatisfactory and that stress rises sharply when more than one person shares such flats [45,46,105].

A non-linear relationship emerges between space and duration. Dwellings smaller than fifteen square metres per person serve mainly short-term or institutional purposes. Between fifteen and twenty-five square metres, transitional use becomes feasible for single adults, but turnover remains high. From twenty-five to thirty-five square metres per person, long-term residence becomes more common, especially in self-contained units with daylight and acoustic buffering. Spatial efficiency offers some advantage for couples who can share kitchens and bathrooms, yet satisfaction improves markedly only when unit size nears forty-five square metres [105,109]. Tenancy duration rises by roughly one year for every additional ten square metres between fifteen and forty square metres [5,106].

Across locations and cultures, two functional lower bounds recur. Single adults can usually sustain long-term residence once a dwelling reaches about thirty square metres. Two-person households require roughly forty-five square metres to maintain privacy, reduce spatial stress, and limit turnover. Units below these limits repeatedly exhibit dissatisfaction, early mobility, and weak demand.

These empirical patterns reinforce earlier behavioural and regulatory thresholds. Units smaller than thirty square metres for single occupants and forty-five square metres for couples align with research showing that crowding below fifteen to twenty square metres per person elevates stress and social conflict [3,38]. While some households accept tighter quarters for cost, location, or life-stage reasons, the convergence of tenancy data, satisfaction surveys, and market behaviour around these thresholds supports their use as performance benchmarks. Compact dwellings smaller than those limits may meet short-term needs but seldom function as stable, long-term homes.

6. Synthesis of Evidence for Minimum Viable Floor Areas

The evidence presented across behavioural research, regulatory practice, and real-world market behaviour converges on a narrow and consistent range of floor areas required to support sustained residential use. Each of these domains identifies spatial thresholds below which occupancy becomes difficult to maintain, domestic activities begin to conflict, or turnover increases. When viewed together, these sources provide a triangulated basis for defining the lower bounds of functional housing.

Table 3 summarises this convergence. The first column lists internal floor area minimums found in regulatory frameworks across jurisdictions. The second column captures the size ranges of dwellings in widespread use within space-constrained housing markets, even when these units are considered suboptimal. The third column identifies the observed thresholds for sustained occupancy, based on tenancy duration, satisfaction, and post-occupancy outcomes. The final column presents a proposed technical minimum for each household type, grounded in the alignment of behavioural evidence, policy standards, and built examples.

Table 3. Summary of recommended minimum viable internal floor area.

Household TypeRegulatory RangeMinimums in Space-Constrained MarketsObserved Thresholds
for Sustained Occupancy
Proposed
Minimum
Single Person12–40 m215–25 m225–30 m230 m2
Couple30–55 m235–45 m240–45 m245 m2
Family
(3–4 people)
40–90 m245–75 m255–75 m260 m2

For single-person households, long-term viability consistently begins between 25 and 30 square metres. Units smaller than this are frequently tolerated, but studies report elevated stress, social withdrawal, or desire to exit within one to two years. Couples require 40 to 45 square metres to preserve privacy, functional differentiation, and behavioural autonomy. For three- to four-person households, the spatial demands of sleeping, socialising, working, and circulation require a minimum of 55 to 75 square metres depending on household composition and activities.

The proposed technical minimums are set at 30, 45, and 60 square metres for one-, two-, and three- to four-person households, respectively. These values represent the smallest viable floor areas capable of supporting long-term occupancy under compact housing conditions. They are not aspirational design targets or quality-of-life ideals. Rather, they define the floor below which spatial sufficiency begins to break down, even in well-designed, well-located dwellings. At these sizes, households can sleep, cook, bathe, work, and relax without constant compromise or persistent conflict. Below these thresholds, housing may still function temporarily, but is unlikely to support autonomy, stability, or psychological well-being over time.

While the difference between a three- and four-person household is meaningful, the 60 square metre value is proposed as a rounded baseline for both. A household of three may find this space sufficient; four people may require closer to 65 or 70 square metres. However, the use of round values in 15 square metre increments—30, 45, and 60—serves two practical purposes. It supports modular construction and housing aggregation, and it provides clarity and memorability for implementation in policy, planning, and code enforcement.

These thresholds offer a defensible reference point for performance-based housing standards. They reflect a lower bound on viability rather than a cap on quality or aspiration. While adaptation and local variation will always apply, this framework allows for compact housing solutions that preserve essential functions without compromising long-term habitability.

Example Floor Plans

Poor design can render even a large home unliveable. The challenge of maximising the liveability of a given floor area remains for talented designers and the marketplace. While this study does not prescribe specific design solutions, it is useful to demonstrate that the proposed minimum floor areas can accommodate full domestic function. Figure 2 presents one example of how units at 30, 45, and 60 square metres can be laid out to support daily life, based on a common and efficient multi-family configuration. The plans assume a double-loaded central corridor, with units arranged on a 4 m structural grid. Each dwelling has one exterior wall for daylight and ventilation. This configuration is compact, repeatable, and adaptable across many housing typologies.

Floor plans showing the layout of residential units of 30 m², 45 m², and 60 m², illustrating spatial dimensions for living, kitchen, and bathroom areas.

Figure 2. Feasible floor plans for one-person, two-person, and three- to four-person minimum floor area units. (Note that for the 60 sqm unit shown here, a second exit door, fire sprinklers, and/or mechanical ventilation would likely be required for code compliance in most jurisdictions.).

The 60 square metre unit essentially doubles the core layout of the 30 square metre unit, while the 45 square metre version adds a “half module” to provide a second sleeping area or more generous shared space. All three units use the same bathroom configuration, with the 60 square metre plan including two bathrooms. Each plan includes a dedicated space for a stacked clothes washer and dryer, and all units provide sufficient room for sleeping, eating, working, and relaxation. In the smallest unit, additional flexibility may be achieved through use of a lofted bed above a desk or integrated storage elements. The dimensions are comparable to those of an economy hotel room, a typology that has long demonstrated the spatial efficiency achievable within a minimal footprint.

These plans are not intended as optimal layouts, but rather as proof that the minimum floor areas proposed in this study can support essential domestic functions with clarity and coherence. Other configurations may yield better outcomes depending on site constraints, user needs, and architectural strategy. The key point is that at these floor areas, it is possible to provide private, enclosed spaces for sleeping and hygiene, areas for cooking and eating, and sufficient volume for work and relaxation.

7. Concluding Remarks

This study has identified lower bounds for internal floor area that appear necessary to support sustained residential use across three common household types. For single-person households, 30 square metres enables full domestic function with minimal spatial stress. For couples, 45 square metres allows for behavioural autonomy and reduced conflict. For families of three to four people, 60 square metres supports differentiated sleep, hygiene, and work zones while maintaining circulation and privacy. These values are not optimal targets, but rather performance-based thresholds derived from the convergence of behavioural research, policy standards, and market outcomes.

The goal of this work is not to dictate housing typologies or enforce rigid formulas. Instead, it is to clarify the spatial boundaries within which compact dwellings can function reliably over time. Below these thresholds, evidence from diverse settings points to higher turnover, reduced satisfaction, and compromised domestic activities. The thresholds represent a floor that supports liveable outcomes under spatial constraint, not a ceiling on aspiration or quality.

Clarifying this floor opens the door to more precise answers to larger systemic questions. What is the lowest feasible cost for delivering liveable housing at scale? How much residential floor area is required to support a given urban population within environmental constraints? What unit sizes are most compatible with modular or prefabricated construction? How can compact homes accommodate the needs of ageing populations, single-person households, or remote workers without compromising well-being? These questions cannot be addressed meaningfully without a clear understanding of how much space is minimally required to support daily life.

The thresholds proposed here do not resolve these challenges, but they provide a starting point. They define a stable platform on which designers, developers, and policymakers can build compact housing that is not just efficient, but functional and enduring. Future research should explore how these benchmarks intersect with environmental performance, construction systems, and regional context.

The risk of ignoring these limits is not just technical, but human. Compact housing that falls below the point of viability may be cheaper or more abundant in the short term, but it is less likely to support autonomy, stability, or well-being in the long run. In that sense, understanding spatial adequacy is not a matter of regulation alone. It is also a matter of practical foresight for cities, for builders, and for the people who will live in these spaces.

References

  1. Meijer,F.;Visscher,H.Housingstandardsandsustainabilitytransitions:Towardevidence-basedregulations. Build. Res. Inf. 2020, 48,1–13.
  2. Appolloni,L.;D’Alessandro,D.HousingspacesinnineEuropeancountries:Acomparisonofdimensionalrequirements. Int. J. Environ. Res. Public Health 2021, 18,4278.
  3. Evans,G.W.Environmentalstressandhealth.In Handbook of Health Psychology;Baum, A., Revenson, T.A., Singer, J., Eds.; LawrenceErlbaum: Mahwah, NJ, USA[M26] [KU27] , 2001; pp. 365–385.
  4. Evans,G.W.;Wells,N.M.;Moch,A.Housingandmentalhealth:Areviewoftheevidenceandamethodologicalandconceptualcritique. J. Soc. Issues 2003, 59,475–500.
  5. Wong,A.L.;Hui,E.C.M.;Seabrooke,W.Smallisbeautiful?Residents’ satisfactionwithnano-flatsinHongKong. Habitat Int. 2018, 79,1–10.
  6. Callister,R. Minimum Space Standards for New Homes [2025 Update];UrbanistArchitecture: London, UK, 2025.
  7. Özer,S.;Jacoby,S.SpacestandardsinaffordablehousinginEngland. Build. Res. Inf. 2023, 52,611–626.https://doi.org/10.1080/09613218.2023.2253337.
  8. Gatea,M.CompactLiving:TheNewFrontierinUrbanDevelopment.ConstructionExecutive. 2025. Available online:https://www.constructionexec.com/article/compact-living-the-new-frontier-in-urban-development (accessed on[M28] [KU29]  June 17, 2025).
  9. Guvenc,H.;Smith,A.Makinghousingaffordable?Thelocaleffectsofrelaxingland-useregulations. J. Urban Econ. 2024, 112,103–120.
  10. UnitedNationsDepartmentofEconomicandSocialAffairs. Household[M30] [KU31] SizeandComposition2019. Available online: https://www.un.org/development/desa/pd/data/household-size-and-composition (accessed on June 17, 2025).[M32] [KU33] 
  11. Esteve,A.;Reher,D.S.;Treviño,R.;Zueras,P.;Turu,A.Livingaloneoverthelifecourse:Cross-nationalvariationsonanemergingissue. Popul. Dev. Rev. 2020, 46,169–189.
  12. OECD. OECD Family Database: Average Household Size; OrganisationforEconomicCo-OperationandDevelopment: Paris, France, 2022.
  13. Eurostat. Household Composition Statistics; StatisticalOfficeoftheEuropeanUnion: Luxembourg, 2021.
  14. StatisticsBureauofJapan. 2020 Population Census: Summary of Household Characteristics;GovernmentofJapan: Tokyo, Japan, 2021.
  15. KoreaStatistics(KOSTAT). Population and Housing Census: Household Composition;StatisticsKorea: Daejeon, Republic of Korea, 2021.
  16. UnitedNationsHumanSettlementsProgramme.WorldCitiesReport2020:TheValueofSustainableUrbanization. 2020.Available online: https://unhabitat.org/wcr/ (accessed on June 17, 2025).
  17. Klinenberg,E. Going Solo: The Extraordinary Rise and Surprising Appeal of Living Alone;PenguinPress: New York, NY, USA, 2012.
  18. Sobotka,T.;Toulemon,L.Overviewchapter4:Changingfamilyandpartnershipbehaviour:CommontrendsandpersistentdiversityacrossEurope. Demogr. Res. 2008, 19,85–138.
  19. Whittemore,R.;Knafl,K.TheIntegrativeReview:UpdatedMethodology. J. Adv. Nurs. 2005, 52,546–553.https://doi.org/10.1111/j.1365-2648.2005.03621.x.
  20. Rauf,A.;Attoye,D.E.;Khalfan,M.M.A.;Shafiq,M.T.Examiningtheimpactofhousesizeonbuildingembodiedenergy. Buildings 2025, 15,123–139.
  21. TrossmanHaifler,Y.;Fisher-Gewirtzman,D.Spatialparametersdeterminingurbanwellbeing:Abehavioralexperiment. Buildings 2024, 14,211.
  22. Levine,D.;YavoAyalonJacobs,S.Backandforthfromurbanrenewal:Thespatialparametersofaffordablehousingintwocities. Buildings 2024, 14,3324.
  23. Simetrica-Jacobs.MethodologyNoteforWellbeingValues(Version5).HousingAssociations’CharitableTrust. (7 June2022).Available online:https://hact.org.uk/publications/methodology-note-for-wellbeing-values (accessed on June 17, 2025).
  24. Trotter,L.;Vine,J.;Leach,M.;Fujiwara,D. Measuring the Social Impact of Community Investment: A Guide to Using the Wellbeing Valuation Approach;HousingAssociations’CharitableTrust: London, UK, 2014.
  25. Hickman,P.“Thirdplaces”andsocialinteractionindeprivedneighbourhoodsinGreatBritain. J. Hous. Built Environ. 2013, 28,221–236.
  26. Thomas,P.;Hickman,P.;Reeve,K. Tenancy Sustainment in Social Housing: Tenant Survey Findings;CentreforRegionalEconomic&SocialResearch,SheffieldHallamUniversity: Sheffield, UK, 2024.
  27. Ozaki,R.;Lewis,J.R.Boundariesandthemeaningofsocialspace:AstudyofJapanesehouseplans. Environ. Plan. D Soc. Space 2006, 24,91–104.
  28. Kopec,D.A. Environmental Psychology for Design; Fairchild Books: New York, NY, USA, 2006.
  29. Neufert,E.;Neufert,P. Neufert Architects’ Data, 4thed.;Wiley-Blackwell: Hoboken, NJ, USA, 2012.
  30. DeChiara,J.;Callender,J.H. Time-Saver Standards for Building Types,4thed.;McGraw-Hill: New York, NY, USA, 2001.
  31. Kira,A. The Bathroom,2nded.;PenguinBooks: London, UK, 1976.
  32. Panero,J.;Zelnik,M. Human Dimension and Interior Space: A Source Book of Design Reference Standards;Watson-Guptill: New York, NY, USA, 2014.
  33. Rybczynski,W. Home: A Short History of an Idea;VikingPenguin: New York, NY, USA, 1986.
  34. Alexander,C.;Ishikawa,S.;Silverstein,M. A Pattern Language: Towns, Buildings, Construction;OxfordUniversityPress: Oxford, UK, 1977.
  35. Marcus,C.C. House as a Mirror of Self: Exploring the Deeper Meaning of Home;ConariPress: Berkeley, CA, USA, 1995.
  36. Duffy,F. The Changing Workplace;PhaidonPress: London, UK, 1992.
  37. Becker,F. Offices at Work: Uncommon Workspace Strategies That Add Value and Improve Performance;Jossey-Bass: Hoboken, NJ, USA, 2004.
  38. Vischer,J.C.TowardsanEnvironmentalPsychologyofWorkspace:HowPeopleAreAffectedbyEnvironmentsforWork. Archit. Sci. Rev. 2008, 51,97–108.[M34] [KU35] 
  39. Rapoport,A. House Form and Culture;Prentice-Hall: New York, NY, USA, 1969.
  40. CooperMarcus,C.;Sarkissian,W. Housing as If People Mattered: Site Design Guidelines for the Planning of Medium-Density Family Housing;UniversityofCaliforniaPress:Berkeley,CA, USA[M36] [KU37] ,1986.
  41. Bratt,R.G.Thequadruplebottomlineandnextgenerationstrategiesforaffordablehousing. Hous. Stud. 2012, 27,438–456.
  42. Lawrence,R.J. Housing, Dwellings and Homes: Design Theory, Research and Practice;JohnWiley&Sons: Hoboken, NJ, USA, 1987.
  43. Després,C.Themeaningofhome:Literaturereviewanddirectionsforfutureresearchandtheoreticaldevelopment. J. Archit. Plan. Res. 1991, 8,96–115.
  44. Gifford,R. Environmental Psychology: Principles and Practice,4thed.;OptimalBooks: Adelaide, Australia, 2007.
  45. Lau,M.H.M.;Wei,X.Housingsizeandhousingmarketdynamics:Thecaseofmicro-flatsinHongKong. Land Use Policy 2018, 78,278–286.
  46. Chan,S.M.Unhealthyhousingexperiencesofsubdividedunittenantsintheworld’smostunaffordablecity. J. Hous. Built Environ. 2023, 38,2229–2246.
  47. Hirayama,Y.Housingpathwaysofyoungpeopleinacompactrentalsector:EvidencefromTokyo. J. Hous. Built Environ. 2010, 25,153–168.
  48. KnightFrank. India Micro-Homes Survey 2020: Buyer Perceptions and Market Outlook;KnightFrankResearch: London, UK, 2020.
  49. Li,J.;Lin,F.;Cui,H.;Yang,S.;Chen,Y.Urbanspatialnaturalnessdegreeintheplanningofultra-high-densitycities:ThecaseofurbangreenopenspacesinMacau. Buildings 2025, 15,206.
  50. Whiteford,M.;Hoff,S.HousingspacestandardsinEurope:Unfitforpurpose? Hous. Stud. 2018, 34,1314–1336.
  51. Evans,G.W.;Saltzman,H.;Cooperman,J.L.Housingqualityandchildren’ssocioemotionalhealth. Environ. Behav. 2010, 33,389–399.
  52. Gurstein,P. Wired to the World, Chained to the Home: Telework in Daily Life;UniversityofBritishColumbiaPress: Vancouver, BC, Canada, 2001.
  53. Felstead,A.;Henseke,G.Assessingthegrowthofremoteworkinganditsconsequencesforeffort,well-beingandwork-lifebalance. New Technol. Work Employ. 2017,32,195–212.
  54. Vischer,J.C.Towardsanenvironmentalpsychologyofworkspace:Howpeopleareaffectedbyenvironmentsforwork. Archit. Sci. Rev. 2008, 51,97–108.
  55. Davis,K.G.;Kotowski,S.E.;Daniel,D.;Gerding,T.;Naylor,J.;Syck,M.Thehomeoffice:Ergonomiclessonsfromthe“newnormal”. Ergon. Des. 2020, 28,4–10.
  56. Amerio,A.;Brambilla,A.;Morganti,A.;Aguglia,A.;Bianchi,D.;Santi,F.;Costantini,L.;Odone,A.;Costanza,A.;Signorelli,C.;et al[M38] [KU39] .COVID-19lockdown:Housingbuiltenvironment’seffectsonmentalhealth. Int. J. Environ. Res. Public Health 2020, 17, 5973.
  57. Xiao,Y.;Becerik-Gerber,B.;Lucas,G.;Roll,S.C.ImpactsofworkingfromhomeduringCOVID-19pandemiconphysicalandmentalwell-beingofofficeworkstationusers. J. Occup. Environ. Med. 2021, 63,181–190.
  58. Galanti,T.;Guidetti,G.;Mazzei,E.;Zappalà,S.;Toscano,F.WorkfromhomeduringtheCOVID-19outbreak:Theimpactonemployees’remoteworkproductivity,engagement,andstress. J. Occup. Environ. Med. 2021, 63,e426–e432.
  59. Kaplan,R.;Kaplan,S. The Experience of Nature: A Psychological Perspective;CambridgeUniversityPress: Cambridge, UK, 1989.
  60. Mehta,R.;Zhu,R.;Cheema,A.Isnoisealwaysbad?Exploringtheeffectsofambientnoiseoncreativecognition. J. Consum. Res. 2012, 39,784–799.
  61. Haynes,B.P.Theimpactofthebehaviouralenvironmentonofficeproductivity. J. Facil. Manag. 2007, 5,158–171.
  62. BodinDanielsson,C.;Bodin,L.Officetypeinrelationtohealth,well-being,andjobsatisfactionamongemployees. Environ. Behav. 2008, 40,636–668.
  63. Hedge,A.Theopen-planoffice:Asystematicinvestigationofemployeereactionstotheirworkenvironment. Environ. Behav. 1982, 14,519–542.
  64. Sundstrom,E.;Herbert,R.K.;Brown,D.W.Privacyandcommunicationinanopen-planoffice:Acasestudy. Environ. Behav. 1994, 14,379–392.
  65. Kroemer,K.H.;Grandjean,E. Fitting the Task to the Human: A Textbook of Occupational Ergonomics, 5thed.;Taylor&Francis: Oxford, UK, 2009.
  66. Pheasant,S.;Haslegrave,C.M. Bodyspace: Anthropometry, Ergonomics and the Design of Work,3rded.;CRCPress: Boca Raton, FL, USA, 2016.
  67. Robertson,M.;Huang,Y.H.;O’Neill,M.J.;Schleifer,L.M.Flexibleworkspacedesignandergonomicstraining:Impactsonthepsychosocialworkenvironment,musculoskeletalhealth,andworkeffectivenessamongknowledgeworkers. Appl. Ergon. 2013, 44,482–488.
  68. Davis,K.G.;Kotowski,S.E.Preliminaryevidenceoftheindependenteffectsofthedesktopcomputerscreenheightandkeyboardplacementonseatedpostureinergonomicallynaïveindividuals. Appl. Ergon. 2015, 51,163–169.
  69. Evans,G.W.;Johnson,D.Stressandopen-officenoise. J. Appl. Psychol. 2000, 85,779–783.
  70. Banbury,S.;Berry,D.C.Officenoiseandemployeeconcentration:Identifyingcausesofdisruptionandpotentialimprovements. Ergonomics 2005, 48,25–37.
  71. Gou,Z.;Lau,S.S.Y.Post-occupancyevaluationofthethermalenvironmentinagreenbuilding. Facilities 2013, 31,357–371.
  72. Sullivan,C.What’sinaname?Definitionsandconceptualisationsofteleworkingandhomeworking. New Technol. Work Employ. 2003, 18,158–165.
  73. Kim,D.;Sim,H.;Kim,S.Astudyonrecommendationsforimprovingminimumhousingstandards. Buildings 2023, 13,2708.
  74. InternationalCodeCouncil.Section1208.3Minimumroomarea. In International Building Code,2021ed.;ICC: Washington, DC, USA, 2021.
  75. OntarioMinistryofMunicipalAffairsHousing. Ontario Building Code, O. Reg. 332/12, Division B, Article 9.5.4.1 Areas of Living Rooms and Spaces; OntarioMinistryofMunicipalAffairsHousing: Toronto, ON, Canada[M40] [KU41] , 2012.
  76. CityofToronto. Toronto Municipal Code Chapter 629: Property Standards, Section 629-25 Room Size Minimums; CityofToronto: Toronto, ON, Canada, 2023.
  77. DepartamentodeViviendayUrbanismodelaCiudaddeMéxico. Reglamento de Construcciones para la Ciudad de México;GobiernodelaCiudaddeMéxico: Ciudad de México, México, 2018.
  78. MinistryofCities[M42] [KU43] . Portaria nº 725/2023: Especificações Urbanísticas, de Projeto e de Obra Para o Programa Minha Casa, Minha Vida;GovernoFederaldoBrasil: Brasília, Brazil, 2023.
  79. MinisteriodeViviendayUrbanismo. Decreto Supremo Nº 49: Reglamento del Programa de Integración Social y Territorial (DS 49);GobiernodeChile: Santiago, Chile, 2023.
  80. MinistryofHousing,CommunitiesandLocalGovernment(MHCLG). Technical Housing Standards—Nationally Described Space Standard;UKGovernment: London, UK, 2015.
  81. Boverket.Dwellingdesign—Fitforpurpose.In Boverket’s Building Regulations, BBR (Sections 3:2 & 3:52);SwedishNationalBoardofHousing,BuildingandPlanning: Karlskrona, Sweden, 2018.
  82. WissenschaftlicheDienstedesDeutschenBundestages. Einzelfragen zur Mindestwohnungsgröße von Neubauwohnungen (WD 7-3000-118/21);DeutscherBundestag: Berlin, Germany, 2022[M44] [KU45] .
  83. RépubliqueFrançaise[M46] [KU47] . Code de la Construction et de L’habitation. Article R111-2: Exigences de Surface et de Volume Habitables;RépubliqueFrançaise: Paris, France, 2020.
  84. MinistryoftheInteriorandKingdomRelations. Bouwbesluit 2012 [Building Decree 2012] (Staatsblad 2011, 416);GovernmentoftheNetherlands: The Hague, The Netherlands, 2012.
  85. NationalConstructionAuthority. National Building Code, 2024 (Legal Notice 47);GovernmentofKenya:Nairobi,Kenya,2024.
  86. LagosStateMinistryofPhysicalPlanningandUrbanDevelopment. Lagos State Urban and Regional Planning and Development Regulations 2020: Space Standards for Residential Buildings;LagosStateGovernmentGazette: Lagos State, Nigeria, 2020.
  87. DepartmentofHumanSettlements. National Housing Code: Part 3—Technical and General Guidelines;GovernmentoftheRepublicofSouthAfrica: Cape Town, South Africa, 2009.
  88. MinistryofHousing&UrbanPovertyAlleviation. Guidelines for Affordable Housing in Partnership;GovernmentofIndia: New Delhi, India, 2012.
  89. MinistryofLand,Infrastructure,TransportandTourism. Building Standards Act (Act No. 201 of 1950, Amended 2020);GovernmentofJapan: Tokyo, Japan, 2020.
  90. JapanPropertyCentral(JPC). Tokyo Apartment Market Report 2021;JapanPropertyCentralK.K.: Tokyo, Japan, 2021.
  91. DevelopmentBureau[M48] [KU49] . Government Announces 202223 Land Sale Programme [Press Release];GovernmentoftheHongKongSpecialAdministrativeRegion: Hong Kong, China, 2022.
  92. UrbanRedevelopmentAuthority[M50] [KU51] . Circular No. URA/PB/2018/06-DCG: Revision to the Guidelines on Maximum Allowable Dwelling Units in Non-Landed Residential Developments Outside the Central Area;SingaporeGovernment: Singapore, 2018.
  93. GB 50096-2011; MinistryofHousingandUrban-RuralDevelopment.DesignCodeforResidentialBuildings[M52] [KU53] .ChinaArchitecture&BuildingPress: Beijing, China, 2011.
  94. DIN 18011;SmallDwellings:RequirementsforPlanningandDesign.DeutschesInstitutfürNormung: Berlin, Germany, 2019.
  95. Rijksoverheid. Bouwbesluit 2012: Woningkwaliteit;GovernmentoftheNetherlands: The Hague, The Netherlands, 2021.
  96. LegislativeCouncilSecretariat[M54] [KU55] . Background Brief on Design of New Public Rental Housing Flats (LC Paper No. CB(1)1037/14-15(02));LegislativeCouncilPanelonHousing: Hong Kong, China, 2015.
  97. NewSouthWalesDepartmentofPlanningandEnvironment. Apartment Design Guide: Frequently Asked Questions,2015ed.[M56] [KU57] ;NSWGovernment: Sydney, Australia, 2015.
  98. AucklandCouncil. Auckland Unitary Plan (Operative in Part): Chapter H4 Residential—Terraced Housing and Apartment Buildings Zone, Standard H4.6.6 Minimum Dwelling Size; AucklandCouncil: Auckland, New Zealand, 2022.
  99. Wohl,A.S. Endangered Lives: Public Health in Victorian Britain;HarvardUniversityPress: Cambridge, MA, USA, 1983.
  100. Gauldie,E. Cruel Habitations: A History of Working-Class Housing 1780–1918;Allen&Unwin: London, UK, 1974.
  101. MinistryofHousingandLocalGovernment(MHLG). Homes for Today and Tomorrow;HMSO: London, UK, 1961.
  102. Hall,T.;Vidén,S.TheMillionHomesProgramme:AreviewofthegreatSwedishplanningproject. Plan. Perspect. 2005, 20,301–328.
  103. Malpass,P. Housing and the Welfare State: The Development of Housing Policy in Britain;PalgraveMacmillan:Basingstoke,UK,2005.
  104. RatingandValuationDepartment(RVD). Property Market Statistics;GovernmentofHongKongSpecialAdministrativeRegion: Hong Kong, China, 2022.
  105. Gou,Z.;Xie,X.;Lu,Y.Qualityoflife(QoL)surveyinHongKong:Understandingtheimportanceofhousingenvironmentandneedsofresidentsfromdifferenthousingsectors. Int. J. Environ. Res. Public Health 2018, 15,219.
  106. Hirayama,Y.TheroleofhomeownershipinJapan’sagedsociety. J. Hous. Built Environ. 2010, 25,175–191.
  107. HousingPreservationandDevelopment(HPD). adAPT NYC Pilot Program Evaluation;CityofNewYork: New York, NY, USA, 2016.
  108. Scanlon,K.;Arrigoitia,M.F.Developmentofnewcohousing:LessonsfromaLondonschemefortheover-50s. Urban Stud. 2015, 52,2056–2073.
  109. Housing&DevelopmentBoard(HDB). Sample Household Survey 2018;SingaporeGovernment: Singapore, 2021.
  110. PropertyGuruPteLtd[M58] [KU59] . Singapore Consumer Sentiment Study H2 2022; PropertyGuru Pte Ltd.: Singapore, 2022.
  111. KnightFrank. India Real Estate: Residential Market Update H1 2020;KnightFrankIndia: Mumbai, India, 2020.
  112. COHAB-SP. Relatório Anual de Atividades 2019;CompanhiaMetropolitanadeHabitaçãodeSãoPaulo: São Paulo, Brazil, 2019.
  113. CentreforAffordableHousingFinanceAfrica(CAHF). Housing Finance in Africa Yearbook 2020;CAHF: Johannesburg, South Africa, 2020.
  114. HousingPreservationandDevelopment(HPD). Housing Lottery Program Annual Report;CityofNewYork: New York, NY, USA, 2019.

visual representation of relative lives required for a unit of protein via beef, chicken, fish.

Re-Consider the Lobster: Animal Lives in Protein Supply Chains

Karl T. Ulrich

University of Pennsylvania
The Wharton School – Operations, Information, and Decisions Department
500 Huntsman Hall, 3730 Walnut Street, Philadelphia, PA 19104 US
ulrich@upenn.edu | ktulrich.com

Original Version: February 2, 2025
This Version: July 31, 2025

Full text below.

Abstract

Animal protein production represents a complex system of lives transformed into nutrition, with profound ethical and environmental implications. This study provides a quantitative analysis of animal lives required to produce human-consumable protein across major food production systems. Categorizing animal lives based on cognitive complexity and accounting for all lives involved in production, including direct harvests, reproductive animals, and feed species, reveals dramatic variations in protein efficiency. The analysis considers two categories of animal life: complex-cognitive lives (e.g., mammals, birds, cephalopods) and pain-capable lives (e.g., fish, crustaceans). Calculating protein yield per life demonstrates efficiency differences spanning more than five orders of magnitude, from 2 grams per complex-cognitive life for baby octopus to 390,000 grams per life for bovine dairy systems. Key findings expose disparities between terrestrial and marine protein production. Terrestrial systems involving mammals and birds show higher protein yields and exclusively involve complex-cognitive lives, while marine systems rely predominantly on pain-capable lives across complex food chains. Dairy production emerges as the most efficient system. Aquaculture systems reveal complex dynamics, with farmed carnivorous fish requiring hundreds of feed fish lives to produce protein, compared to omnivorous species that demonstrate improved efficiency. Beyond quantitative analysis, this research provides a framework for understanding the ethical and ecological dimensions of protein production, offering insights for potential systemic innovations.

Keywords: animal lives; protein production; protein yield; cognitive complexity; food
systems; food supply chain; aquaculture; livestock efficiency; meat production; ethical food production; trophic levels; animal welfare

Acknowledgments: I acknowledge the helpful comments on a previous version of the manuscript by Christian Terwiesch, Karan Girotra, and Senthil Veeraraghavan.

1. Introduction

In his 2004 essay “Consider the Lobster,” David Foster Wallace confronted readers with an uncomfortable question: Does the lobster suffer when boiled alive for our culinary pleasure? [1]. Wallace’s essay used a Maine lobster festival as a lens to examine broader questions about consciousness, suffering, and the ethical implications of food choices. Two decades later, this paper brings an analytical perspective to a related fundamental question about animal protein supply chains: How many animal lives are required to produce a given quantity of human-consumable protein? This analysis moves beyond the philosophical question of suffering to provide a quantitative foundation for ethical decision-making about food systems, while acknowledging that different forms of animal life may have different capacities for suffering and consciousness [2].

Protein is a critical macronutrient in the human diet. Current dietary guidelines recommend 0.8 grams of protein per kilogram of body weight daily for adults, with higher requirements for athletes and active individuals, who may need 1.2 to 2.0 g/kg [3]. For a 70kg adult, this translates to 54-140g of protein daily. While adequate calories can be readily obtained from plant sources in developed societies, high-quality protein remains a key bottleneck in food production and nutrition. Animal sources are particularly important as complete proteins containing all essential amino acids in proportions that match human needs. Animal proteins typically show higher digestibility and bioavailability compared to plant sources, making them an effective way to meet essential amino acid requirements [4].

The relationship between animal lives and protein production is complex and often counterintuitive. A dairy cow produces milk protein for years before being processed for meat, but this production requires the birth of calves, about half of which are male and destined for early slaughter. This same cow may experience welfare challenges from intensive production methods [5]. A tuna consumes thousands of smaller fish during its life before being harvested, yet its removal from the ecosystem may increase the total number of fish lives through trophic cascade effects [6]. A laying hen requires the parallel production of male chicks that are usually culled shortly after hatching [7].

This complexity is compounded by philosophical questions about the relative value of different forms of animal life. Recent research has revealed sophisticated cognitive abilities in species previously considered simple. Chickens demonstrate numerical abilities and self-control comparable to primates [8]. Pigs show cognitive abilities similar to dogs and young children [9]. Octopi exhibit remarkable problem-solving capabilities and emotional states [10]. Even fish, long considered purely reflexive creatures, show evidence of pain perception and basic learning [11].

Drawing on the work of Martha Nussbaum in animal justice, I establish a framework for considering different categories of animal life while acknowledging the profound philosophical questions raised by this logic [12]. My analysis focuses on two broad categories: cognitively complex lives (including mammals, birds, and cephalopods) and pain-capable lives (including fish and crustaceans). While simpler organisms like zooplankton and bivalves technically constitute lives ended in food production, I exclude them from my quantitative analysis while acknowledging that some ethical frameworks might accord them moral weight.

To my knowledge, this is the first systematic analysis of protein yield per animal life across major food production systems. While the primary focus is on animal lives, I also include approximate estimates of greenhouse gas emissions to allow comparison of both ethical and environmental efficiency. These impacts, though not explored in depth here, are an essential component of broader food system sustainability.

The analysis accounts for total lives involved in production including feed species and offspring, calculates protein yield per life across production methods, and examines key sensitivities and assumptions. I find that protein yields per complex-cognitive life vary by more than five orders of magnitude across production systems, from as little as 2 grams of protein per cognitively complex life for wild-caught baby octopi to 390,000 grams for dairy production. These dramatic differences emerge from biological factors like trophic level, production characteristics like lifespan, and system design choices in agriculture and aquaculture.

The analysis provides practical guidance for both individual dietary choices and food system policy. Though philosophical debates about consciousness and suffering continue, I demonstrate that measurable improvements in protein yield per life are possible through targeted changes in production methods and consumption patterns. Most participants in these debates would agree that, all else equal, fewer lives taken is better than more lives taken. This common ground provides a basis for practical progress even as deeper ethical questions remain unresolved.

The remainder of this paper is organized as follows: Section 2 presents my analytical framework for categorizing animal life and measuring protein yield. Section 3 applies this framework to analyze major protein production systems including dairy, land animals, aquaculture, and wild-caught species. Section 4 presents my results comparing protein yield per life across systems. Section 5 discusses implications for policy and practice while examining key philosophical considerations and limitations.

2. Approach and Analytical Framework

This section defines the analytical framework used in the paper, including how animal lives are categorized, how system boundaries are drawn, and how protein yield per life is calculated. These definitions structure the ethical and ecological comparisons throughout the analysis.

2.1. Categorizing Animal Life

To analyze protein production per animal life, we must first establish a framework for what constitutes a “life.” This question presents fundamental challenges because individuals assign different values to different forms of animal life. Many humans exhibit strong empathy for animals that share human-like characteristics, leading to greater concern for mammals than for crustaceans [13]. Some argue that such anthropomorphic valuation lacks ethical justification, while others, like Nussbaum, argue that cognitive and social capabilities create morally relevant differences between species. Nussbaum’s capabilities approach suggests that what matters is not just the capacity to experience pain, but the broader ability to form intentions, maintain social connections, and experience complex emotions.

Rather than asserting a specific position on relative value, I provide a categorization based on scientific evidence of cognitive and sensory capabilities. Recent advances in animal cognition research have revealed increasingly sophisticated capabilities across many species, suggesting three distinct categories of animal life with different ethical implications for food production.

The first category, which I term “cognitively complex lives,” encompasses animals demonstrating sophisticated cognitive abilities, emotional responses, and social behaviors. Large, domesticated mammals exhibit remarkable capabilities: cattle show social learning and emotional bonds [14]; pigs demonstrate cognitive abilities comparable to dogs and young children, including mirror self-recognition and tool use [9]; and sheep display facial recognition and complex emotional responses [15]. Their wild relatives, such as bison, show similar capabilities.

Domesticated birds, particularly chickens, turkeys, and ducks, also demonstrate cognitive sophistication that places them firmly in this category. Chickens exhibit numerical abilities and basic arithmetic from just days after hatching [16], while showing self-control and planning capabilities comparable to primates [8]. They engage in complex social learning and cultural transmission [17], display emotional contagion and empathetic responses [18], and show evidence of self-awareness and anticipatory behavior [7]. Turkeys and ducks similarly demonstrate advanced cognitive capabilities, including sophisticated social recognition, tool manipulation, and complex problem-solving behaviors [19].

Cephalopods represent a unique case within this category as the only invertebrates showing cognitive complexity comparable to vertebrates. Octopi demonstrate striking problem-solving abilities, tool use, and spatial learning [10]. Both octopi and squid show evidence of play behavior, distinct personality traits, and emotional states [20]. Their sophisticated nervous systems and demonstrated cognitive abilities place them firmly alongside mammals and birds in terms of cognitive complexity.

The second category, “pain-capable stimulus-response lives,” includes animals with clear evidence of pain perception and basic learning but without strong evidence of higher cognitive functions. Fish fall into this category, showing clear nociception and pain avoidance [11], along with basic learning and memory capabilities. While some fish species demonstrate more sophisticated behaviors, the evidence for complex cognitive abilities like self-awareness or emotional states remains limited compared to mammals, birds, and cephalopods.

Crustaceans, including lobsters, crab, and shrimp, also belong in this category. Research demonstrates that crustaceans show pain avoidance [21], exhibit basic learning from negative stimuli [22], and display stress responses and simple memory formation. However, they lack strong evidence of the more sophisticated cognitive abilities seen in the first category. Their nervous systems, while capable of processing pain and basic learning, appear primarily oriented toward stimulus-response behaviors rather than complex cognition.

The third category, “non-suffering lives,” encompasses organisms with minimal neural structure that exhibit primarily reflexive behaviors. This includes bivalve mollusks, most insects, and simple marine organisms like zooplankton. While these organisms can respond to environmental stimuli, there is limited evidence for pain perception or learning capabilities. Their simple nervous systems suggest minimal capacity for suffering in any meaningful sense comparable to more complex animals.

This categorization framework reflects current scientific understanding while acknowledging that our knowledge of animal consciousness and suffering continues to evolve. Notably, recent research has consistently expanded our recognition of cognitive capabilities in species previously considered simpler, suggesting we should err on the side of caution when considering capacity for suffering. This framework allows individuals to apply their own ethical weights to different categories while maintaining analytical clarity about the number and types of lives involved in protein production.

2.2. Production System Analysis

For each production system, I analyze:

Direct Production Lives: Animals directly harvested for protein

Supporting Lives: Animals consumed as feed or lost in production

Reproductive Lives: Breeding stock and offspring

System Boundaries: Which lives to include or exclude

I adopt these measurement principles: count all lives ended in service of production; include feed species lives for farmed carnivores; account for breeding/replacement animals; establish clear system boundaries; and document key assumptions.

For wild-caught species, we face additional complexity in establishing system boundaries, as harvesting one predator species prevents that predator from consuming prey species. I address this through sensitivity analysis examining different boundary assumptions [23].

2.3. Protein Yield Model

For each production system, I calculate:

Total Protein Yield per Life = Total Protein Produced / Total Lives Required

Where:

Total Protein Produced includes all consumable protein (meat, milk, eggs)

Total Lives Required includes all categories of lives within system boundaries

Key conversion factors include live weight to edible weight ratios [24], protein content of edible portion [25], feed conversion ratios for farmed species [26], reproductive rates and offspring survival [27], and production lifespan [28].

For systems with multiple protein outputs (e.g., dairy producing both milk and meat), I allocate lives based on protein mass contribution. I report point estimates while acknowledging both natural variation in parameters and uncertainty in their values. The variance in results across systems of protein production is dramatically larger than the variance in the estimate for a particular system of production, which I demonstrate through calculation of confidence intervals for beef production.

This framework provides a systematic basis for comparing protein yield per life across diverse production systems while acknowledging fundamental differences in the nature of animal lives involved. The results enable evidence-based discussion of system efficiency while leaving deeper philosophical questions about relative value of different lives to the reader.

While edible protein mass serves as a practical functional unit for comparison, it is important to recognize that not all protein sources are nutritionally equivalent [29]. Animal proteins generally exhibit high digestibility and contain all essential amino acids in proportions well-matched to human needs, making them complete proteins. Among them, milk and eggs are often considered nutritional benchmarks due to their high bioavailability and amino acid scores. Fish proteins also score highly, with excellent digestibility and favorable lipid profiles. By contrast, protein from terrestrial meat sources can vary depending on muscle type, processing, and fat content. These differences do not significantly affect the ethical life-count analysis presented here, but they are relevant for future multi-criteria assessments that weigh both nutritional density and moral cost per unit of benefit.

2.4. Ethical Scope and Inclusion Criteria

My analysis focuses on cognitively complex and pain-capable lives, excluding the category of non-suffering lives (e.g., zooplankton, bivalve mollusks) from our quantitative assessment. While I acknowledge that some ethical frameworks, particularly those rooted in religious traditions like Jainism or certain Buddhist perspectives, accord moral weight to all living beings, my analysis reflects the scientific consensus on capacity for suffering. However, readers may choose to incorporate these additional lives into their ethical calculations.

2.5. Climate Impact Metric

While this analysis centers on ethical efficiency, measured in terms of animal lives affected per unit of edible protein, greenhouse gas (GHG) emissions remain a critical component of food system sustainability. To support side-by-side comparisons, I incorporate cradle-to-farm-gate GHG intensities (expressed in kg CO₂-equivalents per kg of edible protein) for all production systems examined.

For terrestrial systems, I draw primarily on the harmonized global meta-analysis conducted by Poore and Nemecek [30], which provides median emissions data by product category. Aquatic values are taken from the Blue Food Assessment LCA synthesis [31], which covers farmed species, and from Cashion and Tyedmers [32], who estimate GHG emissions from wild-capture fisheries based on fuel intensity per metric ton landed. Where multiple production methods exist within a system (e.g., rainfed vs. irrigated beef, cage vs. net-pen aquaculture), I use production-weighted medians to represent typical conditions.

All estimates are converted to a per-gram protein basis using species-specific edible yield and protein content [24, 25]. Protein yield per life and GHG intensity are reported together in Figure 1 to visualize trade-offs between ethical and environmental performance. While this GHG metric is not integrated directly into the ethical efficiency ranking, it supports a broader sustainability perspective and helps identify production systems with unusually favorable or unfavorable performance on both axes.

2.6. Data Sources and Assumptions

All quantitative estimates in this analysis are based on published data describing production yields, biological characteristics, and input-output relationships across food systems. These include:

Live-to-edible weight conversion factors for livestock, seafood, and eggs [24, 25]

Protein content of edible tissues, milk, and eggs, expressed as a percentage of wet weight [25]

Reproductive rates and lifespans for dairy cows, broilers, pigs, salmon, and other species [27, 33]

Feed conversion ratios for aquaculture and terrestrial systems [26, 34]

Fishmeal and fish oil yield assumptions for carnivorous aquaculture [35, 36]

Trophic cascade estimates and predator-prey dynamics for wild-capture fisheries [6, 37]

Where systems have multiple protein outputs (e.g., dairy producing both milk and meat), lives are allocated proportionally based on protein mass contribution. In systems where inputs such as feed are derived from other sentient species (e.g., small pelagic fish used in salmon feed), those lives are counted in full.

All parameter values are documented in Appendix A, and uncertainty is addressed through sensitivity analysis and reporting of confidence intervals where applicable.

2.7 Terrestrial Systems

While numerous life‐cycle assessments have quantified feed‐conversion ratios and greenhouse‐gas footprints of terrestrial livestock, these metrics alone do not capture the ethical dimension of how many animal lives are taken per unit of protein. Prior studies show that ruminant systems typically require on the order of 133 kg of dry feed to produce 1 kg of protein [34] and emit a median of ≈52 kg CO₂-eq per kg protein, whereas monogastric systems require ≈30 kg of feed and emit ≈24 kg CO₂-eq per kg protein [30]. Building on this foundation, this section contains the core innovation of this paper: calculating protein yield per animal life. In the subsections that follow, I briefly summarize these established efficiency and emissions benchmarks for herbivorous and omnivorous species and then present the life-based efficiency metrics for dairy, beef, pork, and poultry.

2.7.1 Herbivorous Species

Dairy systems achieve the highest efficiency through continuous production over multiple years combined with meat protein from culled animals and excess offspring [33].  A dairy cow produces milk protein throughout her productive life while generating calves that enter either dairy or meat production streams. This continuous production model, coupled with large animal size, allows dairy to deliver approximately 390 kg of protein per life. (In all cases, I report two significant figures in the estimates.)

Among meat-focused terrestrial systems, beef cattle offer the next highest yield at 73 kg per life, benefiting from large animal size and efficient conversion of plant matter to protein. Pork production achieves moderate efficiency at 19 kg per life through omnivorous feeding and relatively fast growth cycles [38].

2.7.2 Omnivorous Species

Chicken meat provides 0.49 kg per life, while egg production yields 1.3 kg per life but introduces the complication of culled male chicks in breeding systems [39]. These systems involve only cognitively complex lives but achieve lower total protein yields due to smaller animal sizes.

2.8 Marine Systems and Trophic Cascades

2.8.1 Theoretical Framework

The analysis of carnivorous marine species requires consideration of complex trophic cascade effects. When humans apply fishing pressure to apex predators, we initiate a cascade of ecological changes that fundamentally alter population dynamics across multiple trophic levels. Traditional analyses that simply count direct harvest deaths and prey consumption fail to capture these systemic effects.

Recent research in marine ecology demonstrates that reducing apex predator populations through sustainable harvest leads to several key effects [6, 37]:

Reduction in apex predator population size and average age

Increase in immediate prey species populations

Subsequent cascade effects through lower trophic levels

Overall increase in total animal lives in the system

This counterintuitive result, that harvesting apex predators can increase total animal lives, emerges from fundamental principles of energy transfer through trophic levels. When apex predator populations are reduced, the energy they would have consumed becomes available to support larger populations of smaller species with faster reproductive rates.

2.8.2 Farmed Carnivorous Fish

Aquaculture of carnivorous species like salmon presents a special case in my analysis. While the feed fish primarily consume zooplankton and other non-suffering lives, industrial-scale harvest of these species for aquaculture feed has led to documented population depletions. Studies indicate that industrial fishing for feed has contributed to significant declines in small pelagic fish populations, particularly in regions where fish meal production is concentrated [39, 40].

These population reductions ripple through marine ecosystems. Species at higher trophic levels that depend on these fish populations likely experience reduced abundance due to food limitations [42]. While my analysis could theoretically account for these additional population reductions, I justify their exclusion based on trophic efficiency: each higher trophic level supports roughly one-tenth the biomass of the level below it [43].  Therefore, the number of affected lives at higher trophic levels is relatively small compared to the direct feed fish lives counted in my analysis.

Using the example of farmed salmon:

One 4.5 kg salmon requires approximately 468 feed fish lives

Yields 0.64 kg of protein

Results in 1.4 g protein per life
(in this case all are pain-capable lives as opposed to cognitively complex lives)

This accounting, while not capturing all ecosystem effects, provides a reasonable approximation given the order-of-magnitude differences between trophic levels. Unlike wild harvest of apex predators, aquaculture creates a fixed demand for feed fish that depletes rather than releases predation pressure, preventing the compensatory population increases seen in wild systems.

2.8.3. Wild Apex Predators

For wild-caught apex predators, my analysis suggests that sustainable harvest may result in a net increase in total animal lives through trophic cascade effects. Consider a bluefin tuna:

Individual harvest weight: 180 kg

Protein yield: 23 kg

Traditional prey consumption calculation: ~37,800 fish lives over lifetime

However, this simple calculation misses the key ecological dynamics. When tuna populations are reduced through sustainable harvest:

Tuna population typically decreases by 40-50%

Prey fish populations increase by 20-30%

Net result is an increase in total lives in the system

This suggests that the ethical calculation for wild-caught apex predators should consider: the suffering of the harvested individual, concerns about species preservation, the net increase in prey species lives, the relative capacity for suffering between apex predators and prey species.

2.8.4. Cephalopods and the Complexity of Life-Stage Effects

Cephalopods, particularly octopi, present a fascinating case study that illustrates the multifaceted nature of the protein-per-life calculus. As cognitively complex predators that consume pain-capable prey, they share characteristics with apex predators like tuna. However, their relatively short lifespans and different ecological role result in more localized trophic effects.

The case of juvenile octopi (yielding just 0.011 g protein per pain-capable life, and 2 g per complex cognitive life) is particularly instructive. Like juvenile harvest of any species, removing young octopi before reproduction creates stronger population pressures than harvesting adults. However, several factors make this case distinctive.

First, these are cognitively complex creatures capable of sophisticated problem-solving and emotional states [10], placing them in my highest category of cognitive capability. Second, even as juveniles, they consume pain-capable prey, primarily small crustaceans. This means that reduced juvenile octopus populations could, like other predator reductions, lead to increased prey populations. However, unlike apex predators where this effect might be seen as positive from a lives-per-protein perspective, the removal of juvenile octopi raises additional ethical concerns because it ends cognitively complex lives before they’ve had any chance to fulfill their biological potential.

This example illustrates how the quantitative framework of lives-per-protein interacts with other ethical considerations including cognitive complexity, life-stage effects, and ecosystem dynamics. While mature octopi provide 0.33 g per pain-capable life and squid yield 0.046 g per pain-capable life, these simple ratios capture only part of a complex ethical calculation.

3. Results

Table 1 and Figure 1 reveal two interacting gradients in animal-protein production: an ethical axis (protein yield per life) that spans five orders of magnitude, and a climate axis (median kilograms of CO₂-equivalent per kilogram of protein) that spans roughly two. These results synthesize data across 14 production systems and incorporate parameter uncertainty from multiple sources, as documented in the appendices. Detailed 10th–90th percentile bounds for GHG appear in Appendix B, but are reflected in the vertical error bars in Figure 1.

A graph comparing median greenhouse gas emissions and protein yield per animal life across various protein production systems, including Atlantic lobster, beef, dairy, wild tuna, and various aquaculture species.

Figure 1. Ethical versus climate efficiency of protein production systems. (Squid and octopus harvests cost both complex-cognitive lives and pain-capable lives. In the graphic, only the cost of cognitively complex lives is shown.)

3.1. System Efficiency and Climate Trade-Offs

To reveal the interdependence of ethical and environmental performance, I overlay protein‐per‐life values onto established greenhouse‐gas (GHG) intensity metrics. Previous meta-analyses report that dairy systems emit a median of 52 kg CO₂-eq per kg protein and beef systems emit about 200 kg CO₂-eq per kg protein [30]. My contribution is to combine these emission footprints with the life‐based protein yields calculated in Section 3.3. This comparison shows that dairy delivers 390 kg of protein per cognitively complex life while maintaining a moderate climate burden (52 kg CO₂-eq/kg protein). Beef (73 kg per life) and pork (19 kg per life) follow, although beef incurs a markedly higher GHG penalty.

Among aquatic animals, wild tuna is a standout. The tuna itself provides 23 kg of protein, but recall that sustainable harvest of apex wild predators tends to increase net lives, so can be considered ethically quite efficient. At just 15 kg CO₂-eq per kg of protein, wild tuna is also relatively climate-efficient.

Feed-intensive systems such as farmed salmon (1.4 g per life, 20 kg CO₂-eq per kg protein) and crustacean trawls (4 g per life, 55 kg CO₂-eq per kg protein) sit at the opposite extreme, costing both a lot of lives and not being particularly efficient in terms of GHG emissions.

Lobster harvesting yields just 1.1 g protein per pain-capable life with a climate impact of approximately 45 kg CO₂-eq per kg protein, making it one of the least efficient systems both ethically and environmentally.

Cephalopods fall in between: mature octopi yield 360 g per complex-cognitive life at 30 kg CO₂-eq per kg protein, whereas squid yield 68 g at 20 kg CO₂-eq per kg protein. Juvenile-octopus fisheries remain ethical outliers at 2 g per complex-cognitive life.

3.2. Cognitive Complexity Patterns

Terrestrial systems involve only cognitively complex lives yet achieve kilogram-scale yields because animals convert feed over multiple years. Marine systems mainly affect pain-capable lives and require many more individuals per kilogram of protein, as energy passes through longer food chains. Cephalopods add complex lives to this marine chain, leading to some of the lowest protein efficiencies observed.

3.3. Production System Design Impacts

Two design levers dominate the pattern.

Production continuity. Continuous-yield systems such as dairy spread one life over hundreds of kilograms of protein. Single-harvest systems such as beef, pork and broiler chickens cannot do so.

Feed-chain architecture. Direct plant feeders (pigs and cattle) do not add extra lives to the chain, whereas fish-meal-dependent aquaculture multiplies pain-capable deaths and raises emissions through reduction of fisheries. Omnivorous tilapia, at 11 g per life and 28 kg CO₂-eq per kg protein, shows what is possible when fish-meal inclusion is minimised.

Ethical and climate efficiencies are not directly correlated. For terrestrial systems, fewer lives lost trades off against greater GHG emissions. Beef herds occupy the upper-right quadrant, combining strong ethical efficiency with a severe climate penalty. For aquatic systems, in general, ethical and climate performance are positively correlated, with fewer lives lost also associated with lower GHG emissions. The lower-right quadrant, where both metrics would be favourable, remains empty and marks a frontier for innovation in husbandry practices and alternative proteins.

Table 1. Protein produced and lives affected by animal protein production system.

Production
System
EnvironmentLife ClassProtein /
Total Lives (g)
Protein /
Complex-
Cognitive
Lives (g)
GHG
(kg CO₂e /
kg protein)
Dairy (bovine)TerrestrialComplex cognitive390 000390 00052
BeefTerrestrialComplex cognitive73 00073 000200
PorkTerrestrialComplex cognitive19 00019 00046
Chicken eggsTerrestrialComplex cognitive1 3001 30026
Chicken meatTerrestrialComplex cognitive49049024
      
Mature octopusAquaticHybrid0.3336030
SquidAquaticHybrid0.0466820
Juvenile octopusAquaticHybrid0.011230
      
Wild tuna (bluefin)AquaticPain-capableNet increase
in prey lives. Tuna life delivers 23 kg protein.
 15
Wild herringAquaticPain-capable18 8
Farmed tilapiaAquaticPain-capable11 28
Wild shrimpAquaticPain-capable4 55  
Farmed salmonAquaticPain-capable1.4 20
LobsterAquaticPain-capable1.1 180

Notes: Two significant figures shown. Median value shown for GHG. Detailed 10th–90th percentile GHG bounds are in Appendix B.

4. Discussion

4.1. Philosophical Foundations

My analysis reveals patterns in protein production efficiency that have significant implications for both food system policy and individual dietary choices. However, these quantitative results intersect with deeper questions about consciousness, suffering, and moral value. Philosophers have long debated whether animal lives should be counted equally, weighted by sentience, or evaluated by more complex considerations such as telos or narrative completeness.

The quantitative approach used here, measuring protein yield per life and categorizing animals by cognitive capability, may seem mechanistic when applied to ethical concerns. Yet structured analysis can help expose trade-offs that are obscured by intuition or tradition. When one production system ends a single cow life and another ends hundreds of chicken lives for the same nutritional output, we are forced to reckon with the scale and structure of harm in a way that vague ethical discomfort often avoids.

Some philosophers, particularly utilitarians, argue that bringing a sentient creature into existence with a life worth living constitutes a moral good. From this perspective, meat production systems that create billions of animals who experience net well-being before being painlessly killedmay be ethically justifiable. This line of reasoning, sometimes called the total view in population ethics, shifts the moral focus from minimizing harm to maximizing welfare across all sentient lives, including those that would not otherwise have existed. While controversial, this view highlights the importance of clarifying one’s ethical framework when evaluating food systems.

4.2. Relative Moral Value of Individual Lives

Most readers instinctively feel that ending the life of a cow is “worse” than ending the life of a chicken, even though one cow yields roughly 73 kilograms of protein (via meat) while a single broiler yields less than 0.5 kilograms (Table 1). Three overlapping factors explain this reaction.

4.2.1. Cognitive complexity and welfare range

Empirical work on mind perception suggests that mammals score higher than birds on dimensions of self-awareness and emotional richness [44]. Experimental studies confirm that cattle demonstrate long-term social memories, object permanence, and emotional contagion [28]. Chickens also show sophisticated capacities, such as numerical competence, perspective taking and basic self-control [7], but the breadth of their welfare range (the set of states they can positively or negatively experience) may still be narrower than that of cattle [45]. Cross-species scoring systems [46] and precautionary policies such as the UK’s Animal Welfare (Sentience) Act provide frameworks for such adjustments.

4.2.2. Anthropomorphic bias

Psychological work on speciesism shows that humans assign moral standing in proportion to perceived similarity to themselves [47]. Large mammals elicit stronger empathic concern than birds, amplifying the intuitive moral gap even when cognitive evidence is comparable. Readers of popular culture will recognise an absurd illustration of anthropomorphic bias in Douglas Adams’s novel The Restaurant at the End of the Universe [48], where a genetically engineered cow enthusiastically introduces itself at the table and recommends which of its own cuts the diners should order. Adams’s scene lampoons the discomfort humans feel about killing animals once those animals can express preferences in near-human language. The humour underscores our tendency to grant moral standing in proportion to perceived similarity: a cow that talks like a waiter instantly outranks a silent chicken, regardless of their underlying cognitive capacities.

4.2.3. Scope neglect

People reliably undervalue harms distributed across many small victims compared with those concentrated in a single large victim, a phenomenon known as scope neglect [49]. Yet from an ethical standpoint, ending 150 conscious lives, though smaller, may carry more moral weight than ending one.

4.3. Quality of Life Considerations

The quality of life experienced by animals in different production systems adds another crucial dimension to this ethical calculus. Wild animals, particularly large herbivores like bison, may experience the highest quality of life, with natural social structures, freedom of movement, and species-typical behaviors fully expressed [50]. Some pasture-raised beef cattle may approach similar quality of life metrics, with research on extensive grazing systems showing more natural behavioral patterns, lower stress hormones, and better overall health outcomes compared to confined feeding operations [51].

The dairy industry presents a particularly complex case where efficiency and welfare often conflict. While dairy systems achieve the highest protein yield per life in my analysis, conventional dairy practices often compromise animal welfare. Studies document significant challenges including early separation of calves from mothers, high rates of lameness, metabolic stress from high milk production, and limited opportunity for natural behaviors in confined housing systems [5]. However, emerging research demonstrates that more ethical dairy production is possible, though often with reduced yields. Alternative approaches including cow-calf contact systems and pasture-based production typically show yield reductions of 15-30% compared to conventional systems, but with significant improvements in animal welfare metrics [52].

The relationship between cognitive complexity and quality of life raises additional ethical considerations. More cognitively sophisticated animals may have greater capacity for both positive and negative experiences, suggesting their welfare should be weighted more heavily. This becomes particularly relevant when considering species like pigs, which show cognitive abilities comparable to dogs and young children, or cephalopods, whose remarkable intelligence exists within fundamentally alien forms of consciousness.

4.4. Speculative Implications

These observations lead to provocative questions about the future of animal agriculture and bioengineering. Given that larger animals generally yield more protein per life, should we be engineering ever-larger domestic animals? We have already selectively bred cattle to be much larger than their wild ancestors. Following this logic to its extreme, perhaps we should develop elephant-sized cattle or whale-sized aquaculture species. More provocatively, if we accept that cognitive sophistication affects the ethical weight of ending a life, should we engineer food animals with minimal cognitive function? A hypothetical “zombie chicken” engineered to maintain basic biological functions but lacking higher consciousness would still convert feed to protein but might pose fewer ethical concerns [53].

This prospect becomes especially relevant as we develop lab-grown meat technology. Rather than growing meat from cell cultures, we might engineer minimally conscious bioreactors; organisms that are technically alive but lack meaningful consciousness. This could potentially offer better production efficiency than cell culture while minimizing ethical concerns about consciousness and suffering [54].

4.5. Limitations

My analysis faces several important limitations that deserve careful consideration. As elaborated in Section 4.3, my focus on lives ended provides an incomplete picture of animal welfare. My treatment of system boundaries presents another significant limitation. For wild-caught species in particular, the interconnected nature of marine ecosystems makes it challenging to definitively account for all affected lives. Similarly, in agricultural systems, my analysis does not fully capture lives affected by feed production, such as rodents killed during grain harvesting.

The categorization of lives into “cognitively complex” and “pain-capable” groups, while useful analytically, may oversimplify the rich continuum of animal consciousness and capabilities. Recent research suggests that many species we categorize as merely pain-capable may have more sophisticated cognitive and emotional lives than previously understood. My analysis also faces temporal limitations. I treat all deaths as equivalent, regardless of when in an animal’s natural lifespan they occur. This may be philosophically problematic; ending the life of a juvenile animal might deserve different ethical weighting than ending the life of one that has lived most of its natural lifespan.

4.6. Common Ground and Future Directions

While deep philosophical questions about the relative value of different animal lives remain unresolved, my analysis suggests areas of common ground. Most ethical frameworks would agree that, all else equal, taking fewer lives is better than taking more. The dramatic differences in efficiency I document suggest significant opportunities for improvement through both system redesign and individual choice. Rather than waiting for resolution of philosophical debates about consciousness and suffering, we can make progress by applying evidence-based approaches to minimize lives taken while improving protein production efficiency.

The integration of quantitative analysis with ethical reasoning allows us to move beyond intuition to make more informed choices about food systems. Whether at the individual, institutional, or policy level, better understanding of the relationship between protein production and animal lives can help guide decisions toward more efficient and potentially more ethical outcomes. Future research priorities should include development of integrated welfare metrics, investigation of optimal production scales balancing yield and welfare, and innovation in housing and management systems that support natural behaviors while maintaining efficiency.

The tension between efficiency and welfare, between quantitative analysis and ethical reasoning, may never be fully resolved. However, by carefully examining these relationships and making them explicit, we can work toward food systems that better serve both human nutrition and animal welfare.

While this paper focuses primarily on expenditure of animal lives, these concerns are integral to a broader understanding of sustainability. A narrow conception of sustainability that focuses solely on greenhouse gas emissions or land use can obscure deeper systemic trade-offs. Ethical dimensions, such as how many sentient lives are lost, and under what conditions, must be considered alongside resource efficiency [55]. Food systems that minimize unnecessary suffering, distribute harms transparently, and recognize the moral status of sentient beings contribute not just to environmental goals but to a more just and humane planetary future. Viewed through this lens, sustainability is not only about emissions and land use but also about what kinds of lives, and whose lives, we value. Frameworks like the UN Sustainable Development Goals (e.g., SDG 12: Responsible Consumption and Production) [56] reflect this broader vision. The analysis in this paper offers one way to quantify and compare ethical efficiency alongside environmental impact. The framework developed here could inform institutional decisions ranging from food labeling and procurement standards to ethical sourcing policies and sustainability ratings.

5. Re-considering the Lobster

When David Foster Wallace asked readers to “consider the lobster,” he directed attention to a single animal in a boiling pot and invited reflection on consciousness, suffering, and appetite. Two decades later, the quantitative evidence presented here shows that the lobster dinner remains an ethically expensive choice. At 1.1 g of protein per pain-capable life, Atlantic lobster occupies the lowest end of the efficiency spectrum in Table 1 and Figure 1 and carries one of the highest greenhouse-gas intensities among marine foods.

The disparity is striking. A dairy cow provides nearly 400 kg of protein per cognitively complex life; a beef animal yields 73 kg; even a broiler chicken, inefficient by terrestrial standards, produces almost five hundred grams. In contrast, a single lobster life contributes only about a gram per life required. The animal that prompted Wallace’s moral unease turns out to be a statistical outlier as well.

The data also complicate intuitive rankings of moral concern. Because cows are large mammals, many people regard eating beef as more ethically troubling than eating crustaceans, yet the protein-per-life calculation reverses that hierarchy unless a cow is assigned about 70,000 times greater moral weight than that of a lobster. Cephalopods push the point further. Juvenile octopi, which are cognitively sophisticated animals, yield only two grams of protein per cognitively complex life, a result that challenges both intuition and some existing regulatory exemptions that treat invertebrates as ethically negligible.

The empirical pattern that emerges is not a simple correlation between cognitive complexity and efficiency. Terrestrial herbivores achieve high protein yield per life because continuous production or large carcass size amortises a single death over many kilograms of output. Marine systems pass energy through multiple trophic levels, so even when target species are less cognitively complex their harvest often requires many more lives. Ethical concern, therefore, cannot rest solely on the mental capacities of individual animals; it must also account for system architecture and trophic position.

These findings have pragmatic implications. First, efforts to reduce animal deaths per unit of nutrition would focus on continuous-yield systems such as dairy, on large-bodied terrestrial herbivores and on omnivorous or herbivorous aquaculture species that minimise feed-fish demand. Second, culinary traditions that valorise low-yield species like baby octopi merit renewed scrutiny, especially where substitutes with lower ethical and climate costs are readily available. Third, refining welfare standards within efficient systems remains essential, because high protein yield per life does not guarantee acceptable living conditions.

Wallace concluded that intellectual honesty about animal suffering might oblige us to reconsider cherished foods. The numerical evidence provided here strengthens that conclusion by adding scale and proportion to the moral calculus. In quantifying how many lives, and what kinds of lives, are exchanged for each gram of protein, the analysis converts vague discomfort into a decision space that is explicit, measurable, and open to improvement. Clearer numbers do not resolve every philosophical dispute about consciousness, but they sharpen the question: how many and what type of lives are we prepared to consume for dinner, now that we can measure the exchange rate?

Appendix A – Detailed Protein Supply Chain Calculations

Appendix A.1. Terrestrial Systems

Appendix A.1.1. Dairy Systems

Production Parameters:

Daily milk production: 32 kg [57].

Production period: 305 days/year [33].

Productive life: 3 lactations [58].

Milk protein content: 3.4% [25].

Welfare indicators tracked in analysis:

Lameness prevalence: 20-55% in intensive systems [5].

Metabolic stress markers [33].

Natural behavior expression [59].

Protein Yield Calculation

Annual milk protein: 32 kg × 305 days × 0.034 = 331.8 kg

Lifetime milk protein: 331.8 kg × 3 years = 995.5 kg

Additional protein sources:

Culled dairy cow: 64 kg [60].

Male calves as veal: 24 kg total

Excess female calves: 4.8 kg

Total protein: 1,088.3 kg

Lives Required

Primary dairy cow: 1 (cognitively complex life)

Male calves: 1.5 (cognitively complex lives)

Excess female calves: 0.3 (cognitively complex lives)

Total: 2.8 cognitively complex lives

Protein yield per life = 1,088,300g / 2.8 = 388,679g per life

Appendix A.1.2. Beef Cattle

Production Parameters

Harvest weight: 635 kg [57].

Dressing percentage: 63% [61].

Edible meat percentage: 70% [62].

Protein content: 26% [25].

Total protein = 635 × 0.63 × 0.70 × 0.26 = 72.8 kg

Protein yield per life = 72,800g per cognitively complex life

Confidence Interval Analysis

Harvest weight: Normal distribution around 635 kg (±20 kg SD)

Dressing percentage: Normal distribution around 0.63 (±0.02 SD)

Edible meat percentage: Normal distribution around 0.70 (±0.02 SD)

Protein content: Normal distribution around 0.26 (±0.02 SD)

Mean protein yield: 72,800g

95% Confidence Interval: [65,000g, 80,600g]

Standard deviation: ~4,500g

Appendix A.1.3. Pork

Production Parameters

Market weight: 125 kg [57].

Dressing percentage: 75% [63].

Edible meat percentage: 75% [64].

Protein content: 27% [25].

Welfare indicators monitored [38]:

Environmental enrichment access

Social grouping opportunities

Behavioral expression

Calculation

Total protein = 125 × 0.75 × 0.75 × 0.27 = 19.0 kg

Feed is primarily plant-based with no pain-capable lives required.

Protein yield per life = 19,000g per cognitively complex life.

Appendix A.1.4. Chickens (Broilers)

Production Parameters

Market weight: 2.8 kg [57].

Dressing percentage: 75% [65].

Edible meat percentage: 75% [66].

Protein content: 31% [25].

Welfare considerations [67]:

Growth rate stress

Leg health

Environmental conditions

Calculation

Total protein = 2.8 × 0.75 × 0.75 × 0.31 = 0.488 kg

Feed is plant-based with no pain-capable lives required.

Protein yield per life = 488g per cognitively complex life.

Appendix A.1.5. Egg Production

Production Parameters

Annual egg production: 280 eggs [57].

Productive life: 1.5 years [68].

Total eggs: 420

Protein per egg: 6.28g [69].

Welfare indicators [70]:

Nesting behavior

Perching access

Dust bathing opportunities

Calculation

Total protein = 420 × 6.28 = 2,638g

Accounting for culled male chicks [41].

Protein yield per life = 2,638g / 2 = 1,319g per cognitively complex life.

Appendix A.2. Aquaculture Systems

Appendix A.2.1. Farmed Salmon

Production Parameters

Harvest weight: 4.5 kg

Feed conversion ratio: 1.3 [35]

Feed composition:

20% fishmeal

12% fish oil [36]

Feed fish requirements:

4.5 kg small fish per kg fishmeal

20 kg small fish per kg fish oil

Average feed fish weight: 0.03 kg [32]

System effects [39]:

Direct reduction in wild feed fish populations

No compensatory ecosystem effects

Calculation

Feed fish lives: ~468 pain-capable lives

Protein yield:

Edible percentage: 65% [71]

Protein content: 22% [25]

Total protein = 4.5 × 0.65 × 0.22 = 0.644 kg

Protein yield per salmon = 644g / 469 = 1.37g per pain-capable life

Appendix A.2.2. Farmed Tilapia

Production Parameters

Harvest weight: 0.7 kg [72]

Feed conversion ratio: 1.6 [73]

Feed composition:

2% fishmeal

1% fish oil [35]

Edible percentage: 60% [74]

Protein content: 20% [25]

Calculation

Feed fish lives: ~7 pain-capable lives.

Total protein = 0.7 × 0.60 × 0.20 = 0.084 kg

Protein yield per life = 84g / 8 = 10.5g per pain-capable life.

Appendix A.3. Marine Systems and Trophic Cascades

Appendix A.3.1. Wild Apex Predators (e.g., Bluefin Tuna)

Analysis of wild-caught apex predators requires consideration of trophic cascade effects. Using bluefin tuna as an example:

Direct Production Parameters

Harvest weight: 180 kg [75]

Dressing percentage: 80% [76]

Edible percentage: 70% [32]

Protein content: 23% [25]

Total protein = 180 × 0.80 × 0.70 × 0.23 = 23.2 kg

Ecosystem Effects (Based on [6, 37])

5% sustainable harvest rate of adult population

40-50% reduction in apex predator population

20-30% increase in prey fish population

Net increase in total pain-capable lives

Rather than counting prey fish consumed (traditional approach), my analysis considers the net ecosystem effect of removing apex predators. Evidence suggests that sustainable harvest of apex predators like tuna results in increased abundance of prey species through reduced predation pressure, leading to a net increase in total pain-capable lives in the system.

Appendix A.3.2. Ocean Small Fish (e.g., Wild Herring)

Production Parameters

Harvest weight: 0.15 kg [75]

Primary food source: zooplankton [77]

Edible percentage: 65% [32]

Protein content: 18% [25]

Calculation

Total protein = 0.15 × 0.65 × 0.18 = 0.018 kg

No cognitive complex or pain-capable prey (zooplankton not counted).

Protein yield per life = 18g per pain-capable life.

Appendix A.3.3. Ocean Trawl (wild shrimp and prawns)

Production parameters

Target species: Northern white shrimp Litopenaeus setiferus (representative of Gulf of Mexico and North-Atlantic cold-water trawl fleets).

Mean landed weight 0.026 kg per individual [75]

Edible (tail-meat) fraction 0.60 [78]

Protein content of tail meat 26 % of wet weight [79]

By-catch ratio approximately 4:1 (non-target fish and invertebrates to shrimp by mass) for typical otter-trawl operations [80]

Calculation

Edible protein per shrimp

0.026 kg landed × 0.60 × 0.26 = 0.00406 kg

1 pain-capable life ended.

Appendix A.3.4. Lobster

Production Parameters

Harvest weight: 0.55 kg [81]

Years to harvest size: 7 [82]

Primary diet: mollusks, crustaceans, fish carrion; about 50% is pain-capable (mostly rock crab). [83]

Edible percentage: 30% [84]

Protein content: 21% [25]

Calculation

Total protein = 0.55 × 0.30 × 0.21 = 0.035 kg

Prey lives over 7 years:

Pain-capable lives: ~30 (rock crab) prey plus the lobster.
Protein yield per life = 35g / 31 = 1.129 g per pain-capable life.

Appendix A.3.5. Cephalopods (e.g., Octopi, Squid, Cuttlefish)

Mature Octopus Parameters

Harvest weight: 3.0 kg

Edible percentage: 80%

Protein content: 15%

Prey consumption (per [10]):

Daily: 3 crustaceans (pain capable), 2 bivalve mollusks (not pain capable)

Annual: ~1,095 pain-capable lives

Calculation

Total protein = 3.0 × 0.80 × 0.15 = 0.360 kg

Total lives: 1 cognitively complex life + 1,095 pain-capable lives.

Protein yield = 360g per cognitively complex life; 0.3285 g per total lives.

Juvenile Octopus Parameters

Harvest weight: 0.015 kg

Edible percentage: 90%

Protein content: 15%

Pre-harvest survival rate: 0.1% [85].

Prey consumption: ~180 small crustaceans

Calculation

Total protein = 0.015 × 0.90 × 0.15 = 0.002 kg

Total lives: 1 cognitively complex life + 180 pain-capable lives.

Protein yield = 2g per cognitively complex life; 0.011g per total lives.

Appendix A.3.6. Squid

Production Parameters

Harvest weight: 0.5 kg

Growth period: 0.5 years

Daily prey consumption: 8 small fish/crustaceans

Edible percentage: 75%

Protein content: 18%

Calculation

Total protein = 0.5 × 0.75 × 0.18 = 0.0675 kg

Lives involved:

1 cognitively complex life (squid)

1,460 pain-capable prey lives Total lives: 1,461

Protein yield = 67.5g per cognitively complex life; 0.046g per total lives.

Appendix B Greenhouse‑Gas (GHG) Intensity Dataset and Methods

B.1  Data sources and functional unit

Terrestrial livestock medians and 10th–90th percentile bounds are taken from the harmonised meta‑analysis of Poore and Nemecek [30].

Aquatic medians are from the Blue Food Assessment life‑cycle synthesis [31].

Wild‑capture fisheries percentiles are derived from fleet‑level fuel‑use intensities reported by [32].

All values include land‑use‑change where applicable and are normalised to a cradle‑to‑farm‑gate functional unit of kg CO₂‑eq per kg edible protein.

B.2 Assignment of System Medians

Where multiple production technologies exist within a species the production‑weighted median is used (see Section 3.2 of the main text). Small pelagic fisheries use a median diesel intensity of 0.9 L kg⁻¹ catch.

B.3 Propagation of Uncertainty

The 10th and 90th percentiles are carried through all graphical outputs as vertical error bars. These bounds represent producer heterogeneity rather than statistical sampling error.

Table B1. Median and percentile GHG intensities (kg CO₂‑eq kg⁻¹ protein).

Production systemLife class10thMedian90th
Dairy (milk)complex305290
Beef (beef‑herd)complex120200450
Porkcomplex304680
Eggscomplex202655
Chicken meatcomplex152445
Wild tunapain‑capable81525
Mature octopuscomplex123080
Squidcomplex102035
Wild herringpain‑capable5815
Farmed tilapiapain‑capable122845
Crustacean trawl (shrimp)pain‑capable2555150
Farmed salmonpain‑capable102035
Juvenile octopuscomplex123080

Note. Protein‑per‑life values reproduced from Table 1 of the main text. Two significant figures throughout.

References

  1. Wallace, D.F. Consider the Lobster and Other Essays; Little, Brown: New York, NY, USA, 2004.
  2. van der Laan, S.; Breeman, G.; Scherer, L. Animal lives affected by meat consumption trends in the G20 countries. Animals 2024, 14, 1662. https://doi.org/10.3390/ani14111662
  3. Campbell, B.; Kreider, R.B.; Ziegenfuss, T.; La Bounty, P.; Roberts, M.; Burke, D.; Landis, J.; Lopez, H.; Antonio, J. International Society of Sports Nutrition position stand: Protein and exercise. J. Int. Soc. Sports Nutr. 2007, 4, 8. https://doi.org/10.1186/1550-2783-4-8
  4. Berrazaga, I.; Micard, V.; Gueugneau, M.; Walrand, S. The role of the anabolic properties of plant- versus animal-based protein sources in supporting muscle mass maintenance: A critical review. Nutrients 2019, 11, 1825. https://doi.org/10.3390/nu11081825
  5. von Keyserlingk, M.A.G.; Rushen, J.; de Passillé, A.M.; Weary, D.M. Invited review: The welfare of dairy cattle—Key concepts and the role of science. J. Dairy Sci. 2012, 95, 5099–5123.
  6. Frank, K.T.; Petrie, B.; Choi, J.S.; Leggett, W.C. Trophic cascades in a formerly cod-dominated ecosystem. Science 2005, 308, 1621–1623. https://doi.org/10.1126/science.1113075
  7. Marino, L. Thinking chickens: A review of cognition, emotion, and behavior in the domestic chicken. Anim. Cogn. 2017, 20, 127–147.
  8. Abeyesinghe, S.M.; Nicol, C.J.; Hartnell, S.J.; Wathes, C.M. Can domestic fowl, Gallus gallus domesticus, show self-control? Anim. Behav. 2005, 70, 1–11. https://doi.org/10.1016/j.anbehav.2004.10.011
  9. Marino, L.; Colvin, C.M. Thinking pigs: A comparative review of cognition, emotion, and personality in Sus domesticus. Int. J. Comp. Psychol. 2015, 28, 1–22.
  10. Mather, J.A. What is in an octopus’s mind? Anim. Sentience 2019, 26, 1–15.
  11. Brown, C. Fish intelligence, sentience and ethics. Anim. Cogn. 2015, 18, 1–17.
  12. Nussbaum, M.C. Frontiers of Justice: Disability, Nationality, Species Membership; Harvard University Press: Cambridge, MA, USA, 2007.
  13. Herzog, H. Some We Love, Some We Hate, Some We Eat: Why It’s So Hard to Think Straight About Animals; Harper: New York, NY, USA, 2010.
  14. Phillips, C.J.C. Cattle Behaviour and Welfare; Blackwell Science: Oxford, UK, 2002.
  15. McLennan, K.M.; Kruger, C. Facial expressions and emotion in sheep. Appl. Anim. Behav. Sci. 2019, 216, 8–17.
  16. Rugani, R.; Fontanari, L.; Simoni, E.; Regolin, L.; Vallortigara, G. Arithmetic in newborn chicks. Proc. R. Soc. B 2009, 276, 2451–2457.
  17. Nicol, C.J. Development of poor welfare in laying hens. Anim. Welf. 2004, 13, 225–230.
  18. Edgar, J.L.; Nicol, C.J.; Clark, C.C.A.; Paul, E.S. Avian maternal response to chick distress. Biol. Lett. 2011, 7, 532–535.
  19. Buchwalder, T.; Huber-Eicher, B. Effect of nest access on behaviour of laying hens. Appl. Anim. Behav. Sci. 2004, 87, 255–265.
  20. Schnell, A.K.; Clayton, N.S. Cephalopod cognition. Curr. Biol. 2019, 29, R726–R732.
  21. Elwood, R.W.; Adams, L. Electric shock causes physiological stress responses in shore crabs. Biol. Lett. 2015, 11, 20150800.
  22. Magee, B.; Elwood, R.W. Shock avoidance in shore crabs. Biol. Lett. 2013, 9, 20121194.
  23. Szuwalski, C.S.; Vert-Pre, K.A.; Punt, A.E.; Branch, T.A.; Hilborn, R. Ecosystem models and fisheries management. ICES J. Mar. Sci. 2017, 74, 464–474.
  24. Nijdam, D.; Rood, T.; Westhoek, H. Environmental impacts of meat production. Meat Sci. 2012, 92, 582–592.
  25. Pereira, P.M.C.C.; Vicente, A.F.R.B. Meat nutritional composition in the human diet. Meat Sci. 2013, 93, 586–592.
  26. Tallentire, C.W.; Leinonen, I.; Kyriazakis, I. Environmental impact of chicken production. Sustainability 2018, 10, 2234.
  27. Li, M.H.; Robinson, E.H.; Tucker, C.S.; Manning, B.B.; Khoo, L. Reproductive efficiency and offspring survival in aquaculture. Aquac. Rep. 2016, 4, 65–70.
  28. Boissy, A.; Terlouw, E.M.; Le Neindre, P. Presence of cues from stressed conspecifics induces fear in cattle. Appl. Anim. Behav. Sci. 2007, 102, 200–214.
  29. FAO. Dietary Protein Quality Evaluation in Human Nutrition: Report of an FAO Expert Consultation; Food and Agriculture Organization: Rome, Italy, 2013. Available online: https://www.fao.org/3/i3124e/i3124e00.htm (accessed on 15 January 2025).
  30. Poore, J.; Nemecek, T. Reducing food’s environmental impacts through producers and consumers. Science 2018, 360, 987–992. https://doi.org/10.1126/science.aaq0216
  31. Gephart, J.A.; Henriksson, P.J.G.; Parker, R.W.R.; Shepon, A.; Gorospe, K.D.; Bergman, K.; Eshel, G.; Golden, C.D.; Halpern, B.S.; Hornborg, S.; et al. Environmental performance of blue foods. Nature 2021, 597, 360–365. https://doi.org/10.1038/s41586-021-03917-1
  32. Cashion, T.; Tyedmers, P. Energy use and greenhouse-gas emissions of marine fisheries. Nat. Clim. Change 2017, 7, 701–705.
  33. Baumgard, L.H.; Collier, R.J.; Bauman, D.E. A 100-year review: Regulation of nutrient partitioning to support lactation. J. Dairy Sci. 2017, 100, 10353–10366.
  34. Mottet, A.; de Haan, C.; Falcucci, A.; Tempio, G.; Opio, C.; Gerber, P.J. Livestock and the feed/food debate. Glob. Food Secur. 2017, 14, 1–8.
  35. Tacon, A.G.J.; Metian, M. Global overview on the use of fishmeal and fish oil. Aquac. Econ. Manag. 2008, 12, 112–138.
  36. Ytrestøyl, T.; Aas, T.S.; Åsgård, T. Utilization of feed resources in Atlantic salmon production. Aquaculture 2015, 448, 365–374.
  37. Baum, J.K.; Worm, B. Cascading top-down effects of changing oceanic predator abundances. J. Anim. Ecol. 2009, 78, 699–714.
  38. Velarde, A.; Dalmau, A.; Fàbrega, E.; Manteca, X. Pig welfare in commercial housing. Animal 2015, 9, 634–646.
  39. Naylor, R.L.; Hardy, R.W.; Bureau, D.P.; Chiu, A.; Elliott, M.; Farrell, A.P.; Forster, I.; Gatlin, D.M.; Goldburg, R.J.; Hua, K.; et al. Feeding aquaculture in an era of finite resources. Proc. Natl. Acad. Sci. USA 2009, 106, 15103–15110.
  40. Alder, J.; Campbell, B.; Karpouzi, V.; Kaschner, K.; Pauly, D. Forage fish: From ecosystems to markets. Annu. Rev. Environ. Resour. 2008, 33, 153–166.
  41. Krautwald-Junghanns, M.E.; Cramer, K.; Fischer, B.; Förster, A.; Galli, R.; Kremer, F.; Mapesa, E.U.; Meissner, S.; Preisinger, R.; Preusse, G.; et al. Male chick culling in poultry production. Poult. Sci. 2018, 97, 902–912.
  42. Smith, M.D.; Roheim, C.A.; Crowder, L.B.; Halpern, B.S.; Turnipseed, M.; Anderson, J.L.; Asche, F.; Bourillón, L.; Guttormsen, A.G.; Khan, A.; et al. Sustainability and global seafood. Science 2011, 327, 784–786.
  43. Pauly, D.; Christensen, V. Primary production required to sustain global fisheries. Nature 1995, 374, 255–257.
  44. Gray, H.M.; Gray, K.; Wegner, D.M. Dimensions of mind perception. Science 2007, 315, 619.
  45. Browning, H.; Veit, W. Freedom and animal welfare. Animals 2021, 11, 1148. https://doi.org/10.3390/ani11041148
  46. Birch, J. Animal Sentience and the Precautionary Principle; Oxford University Press: Oxford, UK, 2022.
  47. Plous, S. Psychological mechanisms in human use of animals. J. Soc. Issues 1993, 49, 11–52.
  48. Adams, D. The Restaurant at the End of the Universe; Pan Books: London, UK, 1980.
  49. Fetherstonhaugh, D.; Slovic, P.; Johnson, S.; Friedrich, J. Insensitivity to the value of human life: A study of psychological numbing. J. Risk Uncertain. 1997, 14, 283–300.
  50. Meagher, M. Bison bison social organization. Wildl. Monogr. 1986, 84, 3–55.
  51. Fraser, D.; Weary, D.M.; Pajor, E.A.; Milligan, B.N. Assessing animal welfare: The interplay of science, values and judgment. Anim. Welf. 2013, 22, 157–167.
  52. Meagher, R.K.; Beaver, A.; Weary, D.M.; von Keyserlingk, M.A.G. A systematic review of the effects of prolonged cow–calf contact on behavior, welfare, and productivity. J. Dairy Sci. 2019, 102, 5765–5783. https://doi.org/10.3168/jds.2018-16021
  53. Thompson, P.B. The Ethics of Intensification: Agricultural Development and Cultural Change; Springer: Dordrecht, The Netherlands, 2008.
  54. Rollin, B.E. The Frankenstein Syndrome: Ethical and Social Issues in the Genetic Engineering of Animals; Cambridge University Press: Cambridge, UK, 1995.
  55. Garnett, T.; Röös, E.; Little, D.C. Lean, green, mean, obscene…? What is efficiency and who decides? Food Climate Research Network, University of Oxford: Oxford, UK, 2015.
  56. United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations: New York, NY, USA, 2015. Available online: https://sdgs.un.org/2030agenda (accessed on 15 January 2025).
  57. USDA. Milk production and livestock annual summary; U.S. Department of Agriculture, National Agricultural Statistics Service: Washington, DC, USA, 2021.
  58. De Vries, A. Economic trade-offs between genetic improvement and longevity in dairy cattle. J. Dairy Sci. 2020, 103, 382–397.
  59. Munksgaard, L.; Jensen, M.B.; Pedersen, L.J.; Hansen, S.W.; Matthews, L. Dairy cow behavior. J. Anim. Sci. 2005, 83, E1–E5.
  60. Berton, M.P.; Fonseca, L.F.S.; Gimenez, D.F.J.; Utembergada, B.L.; Aboujaoude, C.; Pereira, A.S.C.; Silva, D.B.S.; Magalhães, A.F.B.; Zanella, R.; Albuquerque, L.G.; et al. Carcass and meat quality traits in cull cows. Meat Sci. 2020, 161, 108013.
  61. Maples, J.G.; Lusk, J.L.; Peel, D.S. Dressing percentage and carcass characteristics. Appl. Anim. Sci. 2018, 34, 41–48.
  62. Schönfeldt, H.C.; Hall, N. Meat consumption and nutrient contribution. Anim. Front. 2012, 2, 40–43.
  63. Ochsner, M.D.; Bradford, B.J.; Yates, D.T. Swine carcass evaluation. Anim. Front. 2017, 7, 52–58.
  64. Marcoux, M.; Pomar, C.; Faucitano, L.; Brodeur, C. Carcass characteristics and yield in pigs. Can. J. Anim. Sci. 2007, 87, 95–102.
  65. Murawska, D. Age and carcass quality in broilers. Poult. Sci. 2005, 84, 776–783.
  66. Adzitey, F. Effect of post-slaughter operations on meat quality. Asian J. Anim. Vet. Adv. 2011, 6, 888–896.
  67. Weeks, C.A.; Nicol, C.J. Behavioural needs in laying hens. World’s Poult. Sci. J. 2006, 62, 296–307.
  68. Aerts, S.; Lips, R.; Spencer, T.E.; Decuypere, E.; Buyse, J. Dynamics of egg production and egg mass. Poult. Sci. 2003, 82, 153–165.
  69. Réhault-Godbert, S.; Guyot, N.; Nys, Y. Nutritional and functional properties of eggs. Poult. Sci. 2019, 98, 4654–4669.
  70. Lay, D.C.; Fulton, R.M.; Hester, P.Y.; Karcher, D.M.; Kjaer, J.B.; Mench, J.A.; Mullens, B.A.; Newberry, R.C.; Nicol, C.J.; O’Sullivan, N.P.; et al. Hen welfare in different housing systems. Poult. Sci. 2011, 90, 278–294.
  71. Stevens, E.D.; Sutterlin, A.; Cook, T. Protein content and digestibility in fish. Aquac. Nutr. 2018, 24, 1005–1013.
  72. Watanabe, W.O.; Losordo, T.M.; Fitzsimmons, K.; Hanley, F. Tilapia culture and environmental sustainability. Aquaculture 2002, 212, 1–28.
  73. El-Sayed, A.F.M. Alternative dietary protein sources for farmed tilapia. Aquaculture 1999, 179, 149–168.
  74. Clement, S.; Lovell, R.T. Comparison of processing yield and nutrient composition of hybrid catfish with different body shapes. Aquaculture 1994, 119, 299–310.
  75. FAO. Fishery and aquaculture statistics. Global production by production source 1950–2019 (FishstatJ); Food and Agriculture Organization: Rome, Italy, 2021.
  76. Tanikawa, T.; Watanabe, K.; Kono, N.; Tomiyama, M.; Miyashita, S.; Muhlia-Melo, A. Edible yield of tuna. Fish. Sci. 2018, 84, 311–319.
  77. Casini, M.; Lövgren, J.; Hjelm, J.; Cardinale, M.; Molinero, J.C.; Kornilovs, G. Zooplankton community structure and herring feeding. ICES J. Mar. Sci. 2004, 61, 1044–1052.
  78. Park, J.W.; Mundt, S.; Morrissey, M.T. Reduction of shrimp allergenicity. J. Food Sci. 2002, 67, 631–635.
  79. USDA. FoodData Central: Finfish, shellfish and seafood products; U.S. Department of Agriculture: Washington, DC, USA, 2022.
  80. FAO. The state of world fisheries and aquaculture 2018; Food and Agriculture Organization: Rome, Italy, 2018.
  81. Grabowski, J.H.; Brumbaugh, R.D.; Conrad, R.F.; Keeler, A.G.; Opaluch, J.J.; Peterson, C.H.; Piehler, M.F.; Powers, S.P.; Smyth, A.R. Trophic cascades and the decline of ecosystem engineers. Ecol. Appl. 2009, 19, 452–464.
  82. Comeau, M.; Savoie, F. Growth increment and maturity of American lobster (Homarus americanus) in the southern Gulf of St. Lawrence. J. Crustac. Biol. 2002, 22, 678–689.
  83. Elner, R.W.; Campbell, A. Natural diets of lobster (Homarus americanus). Can. J. Fish. Aquat. Sci. 1987, 44, 1945–1953.
  84. Naczk, M.; Williams, J.; Brennan, K.; Liyanapathirana, C.; Shahidi, F. Compositional characteristics of green crab (Carcinus maenas). Food Chem. 2004, 88, 429–434.
  85. Wells, M.J.; Wells, J. Observations on octopus feeding. J. Zool. 1970, 161, 25–33.

Introduction to Product Management

What is a Product?

I recently read the book “Why we sleep” by Matthew Walker. It really scared me, and I decided that better sleep should be a priority in my life. Being an analytical guy, I first wondered how good my sleep is currently, and how I could monitor the quality and quantity of my sleep. After considering and trying several approaches, I eventually adopted the Oura ring and its associated app to address this challenge. In fact I’m wearing it right now.

The Oura ring is a product. More generally, products are solutions for doing a job, delivered by a producer to multiple customers.

Probably some of you are involved with producing physical goods like the Oura ring. The term product is sometimes used narrowly to refer to physical artifacts, but I will use the term to refer not just to tangible goods, but also to software, and to services.

Here are some more examples that fall within my definition of product, all related to health and wellness, just to bring a bit of focus to the examples. Each of these products contains a solution for doing a different job. In fact, as is common in practice, I’ll even sometimes refer to products as “solutions.”

The Strava app supports fitness by measuring and analyzing running, cycling, and other activities.

SoulCycle Studios provide fun and engaging exercise while delivering a community experience.

The pharmaceutical Zocor lowers blood cholesterol levels, thus reducing the risk of heart disease.

The patient medical record system EPIC captures health information for individuals in a way that is secure, durable, and accessible across providers.

The food product Flavanaturals provides a tasty chocolate beverage that delivers flavonoids shown to improve cognitive function.

The Hamilton Medical ventilator is used by hospitals to support breathing in patients suffering from acute respiratory illness.

Many if not most products combine some tangible goods with services or software. The Oura Ring is both an app and a physical device, and physical goods like exercise equipment and medical devices typically contain a huge amount of embedded software, and likely some ancillary services.

For completeness, let’s get some pesky technicalities out of the way. 

Frankly, I could use a burnt stick and a flat rock to record a time series of subjective judgments of my sleep quality. But, that would not be a product. Products are solutions created by producers and delivered to customers.

An artifact that will be created only once, say a war memorial, is probably not best considered a product in and of itself, but the service of designing and constructing monuments could be a product, because the supplier, say an architecture firm or a sculptor, will likely do it repeatedly.

In most settings, a producer delivers a solution to a consumer in a commercial transaction. Most of the time I’ll use the words customer or user to refer to the consumer, but sometimes there are multiple stakeholders and the definition of the consumer is a bit murky.

In the simple case, consumers are individuals who both purchase products and use those products. I decide what shampoo to buy and I use it. But in other cases one party makes the purchasing decision and someone else uses the product. A hotel chain may buy shampoo for its rooms, but the hotel guest uses it. And, in this case, the customer is a business not an individual, and the customer is not identical with the user.

I like the term “doing a job” to indicate what products do, but I’m going to use several words pretty much synonymously: Job to be done, problem, gap, pain point, and even the more clinical term demand, which you probably remember from an economics course. Demand is just jobs to be done that we as consumers can’t do, or don’t want to do for ourselves.

Dozens of other categorizations of products are possible — consumables, durables, consumer packaged goods, fast moving consumer goods, and an alphabet soup of associated acronyms – CPG, FMCG, B2B, B2C. All of these are just further specification of types of solutions used in different settings to do different jobs.

Finally, there’s a special kind of product, called a platform or a two-sided market, in which the job to be done is to bring together buyers and sellers. For example, the web-based product ZocDoc matches individuals with physicians for acute medical needs. In these settings, the platform provider has two very different types of customers, the two sides of the market, the buyers (in this case patients) and the sellers (in this case physicians). 

What is Product Management

Here is the LinkedIn profile of a former student, Effie Wang. Effie served as the head of product for the dating app, Bumble, and she’s been a product manager at Amazon, ZocDoc, and GrubHub. What exactly is product management and what do Effie and those like her actually do?

Put very simply, companies deliver solutions to customers who have a job to do. Product managers stand at the interface between the customer and the resources that create and deliver products.

Product management in the broadest sense is the planning, creation, and improvement of products. These functions exist in all companies that deliver products to customers, so product management must also exist, whether or not the functions are assigned to someone with the job title Product Manager.

Some descriptions of product managers that I like include:

  • Creator or guardian of the product vision.
  • Interpreter and protector of the customer experience.
  • Guide for the technical resources to create or improve the product.
  • Prioritizer of the feature and improvement road map.

My favorite less formal description of product management, coined by my former student and co-founder of Gridium, Adam Stein is, “making sure that not even one hour of an engineer’s time is wasted.”

The role of product management varies quite a bit over the lifecycle of a product. Let me explain. I like to think of the product lifecycle as having four phases: Sense, Solve, Scale, and Sustain – The Four S’s.

Sensing is recognizing an opportunity for a new product, usually the result of some kind of disequilibrium in the market or in the technological landscape.

Solving is creating a product to respond to the opportunity, and typically launching the first version.

Scaling is the improvement of the initial product to deliver an excellent solution tailored to the bulk of the market.

Sustaining is the refinement of the product over its life, advancing both cost and product performance. While the first two phases, sense and solve, typically play out over months or a year or two, and the scaling phase another few years, the sustaining phase can last decades.

I find it useful to think of three types of product managers: innovators, builders, and tuners, which we can map onto the product life cycle. Innovators recognize and develop new opportunities. Builders start with a target and lead developers to create a great product. Tuners optimize the success of the product over its lifecycle. The scaling phase is a less clearly demarcated zone, and product managers in this phase can be thought of as builders or tuners, or a hybrid of the two.

Sensing new product opportunities, the role of innovator, may be performed by someone with the job title of product manager, or by a chief product officer, but that role could also be played by a founder of a start-up, by a business unit manager, or by an advanced development, or strategic planning group.

During the solving and early scaling phases, a dedicated product manager almost always leads the development effort. This is sometimes called “zero to one” product management, creating a new product from a clean slate. In technology-intensive hardware companies, this role may not be called product manager, but rather “heavyweight project manager” or “development team leader” or “program manager” but the role is that of the builder product manager.

In the sustaining phase of the product life cycle, dedicated product managers are typically only found in highly dynamic product environments. Dynamic environments are those for which the product changes a lot, say at least quarterly.

For instance, the fitness app Strava has dedicated product managers, but the Irwin hammer does not.

My Strava app is now version 232.0.1 updated two days ago, and Strava releases a new version (230, 231, 232, etc.) every week. The Strava app is a highly dynamic product – it changes a lot. Why is that? There are two reasons. First, it’s a software product which exhibits a high degree of modularity, so features can be updated easily and even pushed to the user on a regular basis. Second, the app operates in a highly dynamic competitive environment, and in a domain in which the enabling technologies are changing rapidly. 

The Irwin hammer on the other hand has not been updated very recently at all. It’s pretty much the same as the Stanley hammer I worked on as a product designer in 1990 (Stanley and Irwin are brands owned by the same company), and really it’s not that different from this Craftsman hammer my father gave me when I was 15. It’s not that hammers never change. They do, and when they do, the function of product management must be performed. 

For example, If there’s an emerging trend for tools to be produced in bright fluorescent colors to make them easy to find, then a project will likely be kicked off to do a redesign of the hammer. But, that decision and the planning and coordination of the effort, will likely be the result of a cross-functional discussion among the business unit manager, the marketing manager, and the engineering manager. There is not a dedicated product manager for the hammer the way there is for the Strava app.

Some people define the job of product manager as the CEO of the product. Well, that’s not quite right. Rarely does the product manager or PM have responsibility for the profit and loss of the product — that falls to the business unit manager or CEO of the business. Furthermore, while the PM may be responsible for prioritizing features, he or she rarely has direct authority over technical resources, that’s usually the responsibility of an engineering manager.

In describing the role of the PM, it’s probably better to consider specific decisions. I’ll use the RACI (“RACY”) framework to do so. Most of you have probably seen the RACI framework, but to remind you, each stakeholder in a key decision can be thought of having one of four roles:

R is for RESPONSIBLE — The responsible person actually does the work supporting a decision and delivers the outcome. More than one person can be responsible.

A is for ACCOUNTABLE — Only one person can be accountable and that person owns the results. He or she approves decisions, signs off on actions, has veto power, and can make go/no-go decisions.

C is for CONSULTED — Some stakeholders are consulted. They provide information, perspectives, resources, and support as needed.

I is for INFORMED — Finally, some stakeholders are merely informed of decisions. They are kept up to date on progress and results, but not necessarily consulted prior to decisions being made.

Now let’s consider some key decisions and which roles key stakeholders assume. I’ll show typical roles for the product manager, product marketing manager, engineering manager, business manager, UI/UX designers, and Sales manager in the context of a digital product. There are of course many other decisions and several other stakeholders, but these are the most commonly associated with product management in information-technology companies. The roles are not identical for every organization, which is one reason you may benefit from discussing these roles explicitly within your own organization, and gaining a shared understanding of who does what.

I won’t drag you through every cell of the table, but if we focus on the first column, the role of the Product Manager, we see that the decisions for which the PM is responsible and accountable are the product vision, product concept, and product roadmap, but that in this context the PM is consulted on branding, go to market strategy, pricing, growth, and partnerships.

Can Product or Product Management be a Source of Sustained Competitive Advantage?

First, I need to be clear that not all things that are important can be sources of sustained competitive advantage, resources I call alpha assets. For example, an excellent sales process is very important for enterprise software companies. That doesn’t imply that an enterprise software company can rely on its sales process as a significant source of sustained competitive advantage. It’s more that if you fail to do sales well, you are unlikely to be successful in enterprise software. We could say the same thing about operational competence for a restaurant, or accurate and timely finance and accounting processes in a bank. None of these things are likely to be sources of sustained advantage, yet they all need to be done competently to ensure success. In the same way, good products and effective product management are critically important for all companies, even if not alpha assets for all companies.

But, product can be an alpha asset in some settings. These two settings are (a) zero-to-one new products and (b) domains with very strong intellectual property barriers.

Let’s consider the zero-to-one setting. Peter Thiel famously wrote in his book Zero to One “as a good rule of thumb, proprietary technology must be at least 10 times better than its closest substitute in some important dimension to lead to a real monopolistic advantage.” I don’t fully agree with the statement, but I do agree that when there is some disequilibrium in technology or in the market, then an organization has an opportunity to move with speed and agility to take advantage of that disequilibrium and to create a product that is dramatically better than the pre-existing alternatives. At the dawn of the covid pandemic of 2020, the videoconferencing company Zoom was in the market with a product that just worked. It didn’t require registration. It didn’t require a download. It didn’t require any special gear. It just worked. Despite the fact that there were dozens of other solutions in the market at the time, including BlueJeans, Skype for Business, Google Hangouts, and WebEx, Zoom was able to seize the market and gain significant share. This was almost entirely because Zoom had a better product. Better product can be an alpha asset for a finite time period after some type of disequilibrium. This finite period of product superiority is a way of kick starting the other flywheels in an organization. But, the organization must use this precious window wisely in order to oversee the acceleration of the other flywheels for sustained advantage. Indeed, Zoom took advantage of its initial product superiority and prospered. But, predictably Microsoft was quick to follow with an enterprise product, Teams, that was at parity on many features and superior in others. Zoom remains a key player, but its product per se is no longer its primary alpha asset.

Now let’s consider intellectual property barriers. Some domains have very strong legal intellectual property barriers, which allow product itself to be an alpha asset. For example, during the same pandemic period, the companies BioNTech, Pfizer, and Moderna all created mRNA vaccines that enjoy almost impenetrable intellectual property protection. For these companies, the product itself is an alpha asset. It enhances performance and is almost impossible for a rival to acquire. 

Not all intellectual property needs to be protected by laws to be a barrier. For instance, the product of semiconductor company TSMC is a fabrication service it offers to designers of proprietary chips like NVIDIA. While TSMC has a lot of patents, its primary source of intellectual property barriers is the accumulated know-how and trade secrets embedded within its semiconductor fabrication process. Some people believe that what TSMC does is the hardest single task in the world. No one else comes close to being able to do it. In this case, the intellectual property associated with the product itself is an alpha asset.

In some settings, the product itself is only incidentally the alpha asset. In very dynamic markets – those for which some combination of enabling technologies, competitive actions, or customer behavior are changing very quickly – the organizational capability of product management can itself be an alpha asset. For example, consider the fitness app Strava. Strava does weekly product releases, which include incremental improvements and less frequently substantial product changes. Any particular version of the Strava app could likely be easily replicated by a team of developers and so the product per se is not much of an alpha asset. However, the system that Strava employs to engage its users, understand opportunities for improvement, and prioritize the changes in its product roadmap, benefits from data and experience with millions of users and a refined organizational process of product management. This organizational capability is an example of the fifth flywheel and a compelling alpha asset.

Notes

Ulrich, Eppinger, Wang. Product Design and Development. Chapter “Opportunity Identification.” 2020. McGraw-Hill.

Unit Economics and the Financial Model of the Business

Belle-V Kitchen is a consumer goods company I founded with several friends to bring to market high performing but beautiful kitchen tools. Although the products we make and sell are outstanding, at least in our opinions, the company has never been a wild commercial success. One of the problems with the business is that the unit economics and financial model are only marginally favorable. It’s sort of our own fault. From the outset, our analysis of the unit economics and financial model did in fact exhibit vulnerabilities, or at least reveal a pretty narrow path to success. This chapter may help you be disciplined enough to avoid a similar plight.

Every single business incurs some on-going costs associated with merely existing. For instance, virtually every business pays an annual fee to a government entity and pays something to maintain a postal address. Most businesses incur costs for insurance, telecommunications services, and accounting software. Many businesses rent facilities, pay utility bills, and hire administrative employees. All of these costs are called general and administrative costs or G&A under generally accepted accounting practices (GAAP). With a few tiny exceptions, all businesses also incur costs to generate demand for their solutions. I call these sales, marketing, and advertising costs or SMA. Technology-based businesses may also have significant research and development or R&D costs. Put together these costs are the on-going costs of operating the business, are incurred over time, and do not change immediately in direct proportion to the company’s revenue.

In order to achieve long-term financial sustainability, a company’s gross profit has to exceed the on-going costs of operating the business. Put simply, gross profit is the revenue customers pay the company minus the variable cost of delivering the solution. Here’s a simple example. If a bubble tea shop costs USD 120,000 per year to operate, then it must generate at least USD 120,000 in gross profit per year to remain in business indefinitely. If customers pay USD 6.00 for each serving of bubble tea and delivering each additional serving of bubble tea, including materials and labor, costs USD 2.00, then the shop must deliver 30,000 servings of bubble tea each year to sustain itself, or break even. That works out to an average of 577 servings every week.

Unit Economics

For a bubble tea shop, the selling price of USD 6.00 and the cost of delivering a serving of bubble tea of USD 2.00 are called the unit economics. Unit economics are the revenues and costs of a business measured on a per-unit basis, where a unit can be any quantifiable element that brings value to the business, such as a single quantity of a physical good sold, a single consulting engagement, or a single customer relationship. Analyzing the economics of a business at the level of a single unit informs managerial decisions about pricing and about the inputs to the solution and their costs. The unit economics also dictate the minimum number of units that the company must serve or deliver in order to break even. If the unit economics are not favorable, the overall economics of the business, which include its operating costs, can not be favorable.

Analyzing unit economics first requires selecting the unit of analysis. This selection depends on the characteristics of the business. Businesses vary on innumerable dimensions, including cost structure, distribution channels, frequency of transactions with customers, and the business’ role in a market ecosystem. These differences are reflected in the financial models of the businesses. Some require huge investments in research and development, but then enjoy high gross margins once the product is launched. Others operate on slim margins, but don’t require much selling expense once a customer is acquired. Still others offer an all-you-can-eat solution for a subscription fee. While no two businesses are identical, four different types of businesses emerge frequently enough and have distinct enough financial models that they warrant separate treatment:

  1. Classic make-and-sell businesses (e.g., Belle-V Kitchen)
  2. Low Marginal Cost Services (e.g., Quickbooks)
  3. Social Networks (e.g., LInkedIn)
  4. Marketplaces (e.g., Airbnb)

In this chapter, I describe these four types of businesses, the focal unit most appropriate for that type of business, and a common financial model associated with that type. Your business may fall between these categories, but almost certainly one of them will be pretty close, and will give you a template to start from.

Along the way I’ll introduce some more terms and concepts that are more generally useful, including these:

  • Gross margin
  • Marginal cost
  • Cost of goods
  • Target costing
  • Minimum viable scale
  • Customer lifetime value
  • Customer acquisition cost
  • Recurring revenue
  • Gross merchandise value
  • Take rate

Because I treat four distinct categories of businesses, this chapter is long. Life is short, so you may benefit from identifying which category is most relevant to your venture, and then focusing on that section below. However, I do introduce key terms and definitions as I go and don’t repeat them for each example, so if you are new to managerial accounting, you may benefit from reading the whole chapter.

1. Classic Make-and-Sell Businesses (e.g., Belle-V Kitchen)

Belle-V Kitchen is an example of a classic make-and-sell business. Such businesses simply produce a good and sell it to customers in a single transaction. The business may deliver a physical product like a bottle opener or a service like a restaurant meal. I write make-and-sell business, but some businesses are actually sell-and-make, in which the good is produced according to the specifics of a customer order, as with say Abodu Homes, a company providing prefabricated structures for use as accessory dwelling units in the United States. Regardless of sequence, these classic businesses follow a similar template.

The focal unit for a classic make-and-sell business is each instance of the product itself — the kitchen implement, the housing structure, a cup of bubble tea, or an excursion with a tour guide. In all cases, a classic make-and-sell business incurs significant marginal costs to deliver a unit of its solution to its customers. Let’s now drill down on the unit economics of classic make-and-sell businesses using the example of Belle-V Kitchen and starting with an explanation of the concept of marginal costs.

Cost, Marginal Cost, and Cost of Goods

The word cost is often shorthand for the accounting term cost of goods sold or COGS. These are the costs directly attributable to delivering the solution to the customer. In the case of Belle-V Kitchen, these are the costs associated with the manufacturing of the opener itself, as well as the costs of getting it to the point of distribution, fulfilling the customer’s order, and shipping the opener to the customer.

Belle-V Bottle Opener (Source: Belle-V Kitchen)

For classic make-and-sell businesses, costs are expressed on a per unit basis, but that unit cost typically assumes some batch quantity in manufacturing. For instance, Belle-V obtains openers from a factory in China that makes high-quality stainless steel kitchen implements for many premium brands globally. When ordered in quantities of 10,000 pieces per order, Belle-V can buy these openers from the factory for about USD 7.00 per unit.

Note that this factory price would be higher for an order of 3,000 pieces and lower for an order of 100,000 pieces. In analyzing unit costs, the entrepreneur makes an assumption about the approximate quantity that will be produced, often quantities that can reasonably be achieved in the medium time horizon, say after a year or so of operation.

That USD 7.00 factory price is just a portion of the COGS however. Here is the full list of elements that make up COGS, all expressed as USD per unit, assuming an order quantity of 10,000 pieces.

  • manufacturing cost 7.00
  • freight from factory to US warehouse 0.30
  • duties paid to the US government 0.28
  • labor to unload and store the inbound freight 0.02
  • materials and labor to process, pack, and ship openers to a retailer
    (assuming a bulk carton of 36 pieces) 0.14
  • scrap and warranty replacements (averaged over time) 0.10

Total Cost 7.84 (USD/unit)

Each additional unit sold incurs an average cost of about USD 7.84. This is called the marginal cost of the delivering the solution, because it is the additional cost “at the margin” of delivering one more unit.

Again note that marginal cost analysis implicitly requires an assumption about the quantities that will be ordered or produced. The business’ marginal cost might be a bit lower if it could order in much higher volumes in the longer term, and would be higher if it had to make openers just a few at a time, say while testing the market. For planning purposes, a financial model should explicitly state the embodied assumptions about order quantities, and analyze several scenarios, say for the modest expected quantities in year 2 and for larger expected quantities in the longer term, say in year 5.

Target Costing

For classic make-and-sell businesses, an important analysis is called target costing, which forces the manager or entrepreneur to bring anticipated selling price and estimated unit cost into coherence. This is the part we didn’t do so well with Belle-V Kitchen.

We planned for the Belle-V opener to be sold as a luxury gift and we originally expected it to be priced at USD 50 per unit in the store. We set this price by looking at other items in the store intended as nice gifts and by thinking about what the buyer’s alternatives for other nice gift items might be. We considered price points from USD 19 to USD 59. Setting your prices should be a deliberate exercise, and in established companies is usually coordinated among the business manager, marketing and sales managers, and the product manager. I’m not sure we were right in setting our price at USD 50. That’s a lot for a bottle opener, even a really nice one. In later years, we adopted a direct-to-consumer model at a lower price point. More on that below.

A target costing analysis works back from the price the product would sell for in the store to what the cost must be at the factory in order for the economic system to work for everyone.

Let’s first consider a typical retailing model, in which Belle-V sells on a wholesale basis to a store, that in turn sells to an individual consumer.

We anticipated a retail price of USD 50. However, the consumer isn’t giving us, the brand owner, 50 dollars. Instead, they’re paying the retail store 50 dollars. In order for the retail store to be in business, the retailer has to buy the product from us for quite a bit less than 50 dollars. In the Belle-V case, in which retailers are mostly specialty gift boutiques like museum stores, the retailer’s gross margin is typically 50 percent. That is, the retailer sells the opener to the consumer for USD 50, but pays us just USD 25. So, the retailer makes USD 25, or 50 percent of the selling price. 

Now we have to work back from the price we get from the retailer to what our target cost would be. To do that we need to think about what we as the brand owner need for gross margin. Let’s assume for now that our target gross margin is 40 percent. That is, on average we want the gross profit on each unit to be about 40 percent of the revenue we get from a sale of that unit.

Since we get USD 25 in revenue for each unit, the price the retailer pays us, we need to pay no more than 60 percent of that figure for the goods, or USD 15, in order to leave 40 percent gross margin, or USD 10.

Fifteen dollars is the maximum cost of goods we can pay in order for us and our retailer to earn reasonable margins and to sell the product to the consumer for USD 50. This arithmetic simply works backwards from the selling price to the consumer through the distribution channel, and accounting for the required margins at each step, in order to arrive at the maximum cost we could pay for each unit and still remain in business.

Incidentally, for consumer goods sold through specialty retailers there is a short-hand “rule of 4” that the end consumer price is four times the cost to get the product into the original brand owner’s possession. It’s a pretty good quick way to check the feasibility of making and selling a consumer good. A rule of 4 corresponds to two parties in a supply chain each earning 50 percent gross margin.

I want to now drill down on two additional points. One is the arithmetic to calculate gross margin. The second is where those gross margins come from — and what values are reasonable in practice.

Gross Margin

Gross margin is defined as the price minus the cost, divided by the price. This is always taken from the perspective of the entity that’s selling the product. So for instance if USD 50 is the price the museum store offers to the customer, and they pay us USD 25, then their gross margin would be 50 percent. That is, 50 – 25 (which is 25) divided by 50. And if instead they paid us USD 28, then their gross margin would be 44 percent – that is, 50 – 28 (which is 22) divided by 50. 

Note that gross margin is not the same as mark up. Mark up is defined as the price minus the cost, divided by the cost. So, if the retailer buys the opener from us for USD 25 and sells it for USD 50, then the mark up is 50-25 (which is 25) divided by the cost (which is 25) or 100 percent. Some industries use mark up and some use gross margin. Of course these two metrics are related arithmetically as follows: gross-margin = mark-up / (1+ mark-up), so you can convert from one to the other. I find it easier to always start with price and cost and then calculate gross margin or mark up from those two values as needed. From here on, we’ll use gross margin, as it is the more common term and because it directly drives an important metric in a company’s financial statements, gross profit.

What determines a reasonable gross margin for a retailer? First of all, volume. All else equal, the lower the volume of the retailer, the higher the margin the retailer requires in order to be able to stay in business. Just consider the difference between a grocery store that moves hundreds of thousands of dollars in volume every week compared to a jewelry store that might only sell one or two items a day. The jewelry store will require higher gross margin.

Second, the higher the price, the lower the gross margin, all else equal. Consider the difference between selling a new automobile and selling a hammer. The automobile will have a lower gross margin percentage.

The third factor is differentiation. All else equal, if your product is so special that you’re the only source of supply, then your gross margin will be higher than for a product with a lot of competitive alternatives.

The final factor is the costs that are required for the retailer to sell your product. Characteristics that increase the costs for a retailer selling your product are seasonality, service requirements, and the intensity of the sales process. The higher these costs, the higher the required gross margin for the retailer.

Now to put this all together just consider the difference between construction materials and luxury cosmetics 

Construction materials exhibit relatively low differentiation with relatively stable demand at relatively high price points in high volumes. Thus, they are going to be sold at quite low margins, maybe only 10 or 15 percent. Luxury cosmetics are the opposite on all of those dimensions, and thus retailers of those goods will likely expect margins of greater than 60 percent.

To give you a sense of a typical range, retailers of most consumer goods require margins of between 35 and 55 percent, but extreme examples, say building materials and luxury cosmetics may be outside of this range.

Now let’s turn to the question of what your target gross margin should be as the manufacturer or the brand owner.

A very similar set of factors drives the typical gross margin requirements for manufacturers. Higher gross margins correspond to some combination of high R&D costs, high selling expenses, high levels of differentiation, lower volumes, and high seasonality.

Manufacturers of consumer goods would expect to operate with gross margins between about 30 and 50 percent, but at the high end a brand owner for fashion apparel might have a gross margin of 75 percent or higher. And at the other extreme, auto makers might operate with gross margins of under 20 percent.

One good technique for estimating margin requirements is to study the income statements of public companies selling products similar to yours. These financial statements will give you their average gross margin, a useful benchmark.

Selling Direct

Often when the target cost analysis reveals either margins that are too tight, prices that are too high, or required manufacturing costs that are too low, the entrepreneur thinks, “No problem. We’ll cut out the intermediary and sell directly to consumers!” This is rarely a solution to lousy unit economics. That’s because in direct-to-consumer models, you can’t typically assume the other parameters for the business remain the same. There are three reasons selling direct is almost never a cure for marginal costs of production that are too high.

First, your unit cost will increase. For Belle-V Kitchen, here is how the unit cost changes assuming we sell direct to the consumer in individual quantities, and that we pay for the outbound freight associated with “free shipping.” Recall that our unit cost when selling to retailers is USD 7.84.

  • manufacturing cost 7.00
  • freight from factory to US warehouse 0.30
  • duties paid to the US government 0.28
  • labor to unload and store the inbound freight 0.02
  • labor to process, pack, and ship a customer order (usually just one opener) 3.00
  • carton, packing material, tape, and label (single unit) 0.40
  • outbound freight paid to ship the opener to the end customer (single unit) 6.50
  • scrap and warranty replacements (averaged over time) 0.10

Total Cost 17.60 (USD/unit)

Second, your volumes will likely decrease. The whole point of using a retailer for distribution is to make your product easily available where your customers expect to find it, and where they can see and touch the product. A bottle opener sold in the “big box store” Target will, all else equal, sell vastly more units than one sold only on a website of a start-up company. With lower volumes, your marginal cost of production may increase, which must be reflected in your analysis of the unit economics.

Third, you may incur higher selling expenses. You have to find and acquire the customer when you sell directly to consumers. For e-commerce retailers this often means paid advertising, which can be very expensive. Acquiring customers for consumer goods can cost USD 50 per customer or more depending on the level of competition for keywords used in advertising.

If we wanted to sell the opener directly to consumers for USD 29 and maintain a gross margin of say 60 percent to support our higher selling expenses, then our unit cost must be less than (1-0.60) x 29.00 = USD 11.60. That’s clearly not going to work, as our marginal cost of delivering an opener is USD 17.60. At that unit cost and a selling price of USD 29.00, our gross margin would be only 39 percent. That’s not enough for a direct-to-consumer housewares business. A price of USD 39.00 is closer to feasible, leaving a gross margin of 55 percent, still not wonderful. The reality is that Belle-V Kitchen couldn’t quite get the unit economics to work for this opener when selling direct to consumers.

Minimum Viable Scale

Most pro forma financial analyses for start-up businesses are fantastic fables, representing a big success scenario. It’s good to have hopes and dreams and to envision paradise. However, I believe you also benefit substantially from knowing what the business looks like at its minimum viable scale. By minimum viable scale, I mean the number of units sold or customers served per time period such that you can achieve positive cash flow. At least two factors tend to dictate this scale.

First, what is the minimum required level of staffing and business services that you need to operate. For instance, you might need a minimum of a general manager (perhaps you), a customer service representative, a sales and marketing person, and a production or fulfillment staffer or two. You might need a physical location and to pay some rent. You will likely need an internet connection, some insurance, utilities, and bookkeeping services. Add all that up and that’s the minimum on-going operating costs of your business. Now, what is the number of units sold or customers served per time period (e.g., month or year) to break even relative to those operating costs. That is one indicator of the minimum viable scale.

Second, are there some natural minimum batch sizes and order frequencies that are required to sustain the business. For instance, for Belle-V kitchen, we need to buy a minimum of 3000 pieces per order from the factory and we need to place an order at least annually to sustain that factory relationship. Therefore, it’s not really possible to imagine the business surviving if it can’t sell at least 3000 units per year.

Put those two factors together to estimate what the business would need to look like to remain in operation and to sustain positive cash flow. Every business is unique and every entrepreneur has a different threshold for what is truly minimally viable. Still, by considering these two factors you can estimate a minimum viable scale for your situation. The resulting scale is of course not your goal. It’s instead a realistic assessment of what level of success you must achieve in order to live to fight another day. I like to think about the entrepreneurial journey as a long ocean swim. You’re setting out from the beach on a sunny day. You can see what you believe to be a beautiful tropical island off in the distance. That’s your goal. But, what happens if the wind picks up, or the water gets choppy, or your leg cramps? Is there a smaller island where you can rest and recover? How far out is it? That’s your minimum viable scale.

Putting Unit Economics Together in a Financial Model for Classic Make-and-Sell Businesses

Here’s a process for understanding your unit economics and creating a financial model for a classic make-and-sell business.

  • Decide on your distribution and channel configuration (e.g., direct to consumer vs. selling through retailers).
  • Set a target price to the end customer based on the competitive situation and the value of your solution to the customer.
  • Estimate your target cost by assuming gross margins for you and for your distribution channel. For a good first estimate, base these gross margins on typical margins for similar companies in your industry.
  • Check your costs to be sure your marginal cost of production is well below your target cost estimate.
  • Estimate the on-going costs of operating your business, and then use your gross-margin estimate to do a break-even calculation for the number of units you need to sell per time period to sustain your business.
  • Prepare a pro-forma income statement for what the business can look like if you are successful. Also create a second pro forma income statement for the minimum viable scale. These two financial models represent your goal as well as your fall back position should things go much worse than planned.

2. Low-Marginal-Cost Services (e.g. Quickbooks)

Many important businesses deliver services with very low marginal cost, sometimes close to zero. For example, a business that sells templates for legal documents may deliver its solution as a digital download. The marginal cost of delivering ten documents per day or ten thousand documents per day is essentially the same, and essentially zero.

One warning. Be careful about assuming your cost of goods is zero for all digital goods. For instance, content businesses like Netflix deliver a digital good, but they pay the original content creator for that content, often in proportion to the number of times it is delivered. In such cases, the marginal cost of delivering a solution is significant. In a second example, many AI-based businesses require expensive cloud computing resources each time a customer makes a query. These are real marginal costs of production.

Two approaches to analyzing unit economics for low-marginal-cost businesses are common. First, if the solution will be used infrequently by the target customer, then the unit of analysis may be the product itself, say the delivery of a single document template. In that case, the unit of analysis is a single transaction and the gross margin is nearly 100 percent. Breakeven calculations can be done exactly as with make-and-sell products, considering how many transactions must be completed to generate enough gross profit to cover the on-going operating costs of the business. The only difference from make-and-sell businesses is that the gross margin per unit is essentially the sales price per unit, as there are no significant marginal costs of production.

Second, and perhaps more typically, products with low marginal costs are priced on a subscription basis per customer, or possibly per user when there are multiple users per customer. This is especially true for products that are consumed intermittently but repeatedly over time. Examples of such products include Zoom, Adobe Acrobat, and Quickbooks. They are all priced on an “all you can eat” subscription basis. Most of these products are delivered over the internet as software as a service or SaaS. For simplicity I’m going to focus on SaaS products in further elaboration. For SaaS products, the unit of analysis is most commonly the customer, not each use of the product.

Customer Lifetime Value (LTV or CLV)

The most important metric in SaaS unit economics is customer lifetime value (LTV, or sometimes CLV or even CLTV). LTV in turn is driven by just two factors: churn and revenue per unit time.

Even a very sticky product like Quickbooks experiences some loss of customers over time. This loss is called churn, and is expressed as a percentage of the customer base that is lost in a given time period. Put another way, over a given period of time — say one year — what is the probability that a customer will cancel their subscription for use of the product? Some products, like Quickbooks, have very low churn, say less than 10 percent annually. Others, like video streaming service Disney+ have very high churn, perhaps 10 percent each month. (Admit it, you’ve subscribed just to watch a new series, only to quickly cancel when done.)

Churn can be thought of empirically and retrospectively — what fraction of our customers cancelled subscriptions last month, or it can be thought of as a probability about the future — what is the chance that a customer cancels in the coming month. Either way, churn is expressed as a percentage per unit time, say 15 percent per year, or 2 percent per month, using whichever time period best matches the pace of the business. Enterprise SaaS companies tend to use years and consumer SaaS companies tend to use months.

Note that the average duration of a customer relationship is simply 1/churn. So, for instance if your churn is 10 percent per month, then the average customer is with you for 1/0.10 = 10 months.

Revenue per unit time is just the average subscription fee your customer pays, say USD 15 per month. Armed with churn and average revenue per unit time, we can calculate LTV. In fast-paced environments like SaaS, the LTV calculation is usually kept pretty simple. Just multiply average duration times average revenue per period. If the average customer is with you 10 months and if subscription fees are USD 15 per month, then LTV is 10 x 15 = USD 150.

Of course, if the customer relationship lasts a really long time, and if pricing is expected to change over time, then LTV can be calculated as a net present value. A simple way to do that is with a spreadsheet in which the columns represent time periods out into the future. For each time period, consider the expected fraction of the customer that will still be with you (1-churn x the expected fraction of the customer you had in the previous period) and the expected subscription revenue. Then, discount the expected cash flow to the present using a discount rate (usually your opportunity cost of capital). More complex models of LTV can include factors such as additional products or services that a customer will buy on average in the future.

We’ve considered LTV in the context of low-marginal-cost goods offered on a subscription basis, but LTV can also be used for other settings in which a customer makes repeated purchases over time, say for a neighborhood coffee shop, where the average customer may make one purchase per week and where the churn is 2 percent per week. In such cases, the revenue per time period is not a subscription fee. Rather it is the average gross margin per transaction times the average number of transactions per time period. For example, if the coffee shop earns USD 3 in gross margin per transaction, thus generating USD 3 per week in gross margin, and keeps customers for an average of 1/0.02=50 weeks, then the LTV is 3 x 50 = USD 150.

Customer Acquisition Cost (CAC)

In a world of perfect information you could line up all your customers and know for each customer how they learned about you and what factors caused them to give your product or service a try. Then, you could estimate what you spent to create each of those factors, thus estimating for each customer their customer acquisition cost (CAC, pronounced “cack”). 

In reality you rarely know this information with much precision. Sometimes all you know is what you spent on sales and marketing for some time period, say a month or quarter, and how many new customers you acquired. You can then do a simple quotient to calculate your CAC. Say you spent USD 10,000 for the month and acquired 200 new customers. In that case, your average CAC is 10,000 divided by 200 or USD 50 per customer. 

Often you can get a more useful estimate by identifying how many customers were acquired via a particular mechanism and what that mechanism cost. For example, if you can discern via the analytics associated with an e-commerce site how many customers were acquired by pay-per-click advertising and you know what you spent on such advertising, you can estimate average CAC for the pay-per-click channel. This more refined estimate by acquisition channel allows you to take managerial actions to increase spending on more efficient channels and reduce spending for those that are less efficient.

Ratio LTV/CAC

In theory you could stay in business with LTV just barely exceeding CAC. But, most businesses aim to acquire customers at a cost of less than a third of LTV, and preferably much less. For example, for MakerStock, a business I co-founded that provides materials and services to designers, fabricators, and creators more generally, the customer lifetime value is about USD 300 per customer and we aim to acquire customers for an average CAC of less than USD 50, giving us a ratio of LTV to CAC of 6.

I believe that there are at least two reasons that in practice the target ratio of LTV to CAC is set to be at least three, and preferably much higher.

First, managers and especially entrepreneurs tend to be optimists, and reality rarely proves as rosy as their forecasts. Perhaps by setting a high bar we are more likely to achieve sustainability. If the target ratio of LTV to CAC is 6, then maybe we’ll hit 4 in reality, which would still work out.

Second, most measures of CAC are averages across a large number of customers. Averages hide the fact that some customers are much more expensive to acquire than others. By using a low average value for CAC as a target, we can be more confident that the most expensive customers, our marginal customers, are still being acquired for less than what they are worth to us.

Some other heuristics are useful in managing unit economics for businesses with repeat customers. Randy Goldberg, co-founder of Bombas, a direct-to-consumer apparel company known especially for great socks, told me that he aims to break even on a customer’s first order, so that CAC is less than or equal to the gross margin on that order. (A link to that interview is in the notes at the end of the chapter.) Then, all repeat business contributes gross margin above and beyond the acquisition cost. Of course we can conjure up examples in which that heuristic is not great (e.g., it won’t work if there is little repeat purchase), but it’s quite useful in its concreteness, simplicity, and ease of measurement. What did we spend to acquire customers? How many new customers tried our solution? What was the gross margin contribution of those new customers? If the gross margin from new customers is greater than what we spent to acquire customers, then we are probably not spending too much on sales, marketing, and advertising.

Recurring Revenue

Investors love SaaS businesses because of their recurring revenue. The product is usually delivered as an “all you can eat” solution with a per-period subscription fee. For instance, at this writing, the small-business accounting solution, Quickbooks, is priced at USD 30 per month for the basic, single-user plan. The beautiful thing about this business is that once customers have been acquired and are using Quickbooks, they are unlikely to stop subscribing until the business changes substantially because of winding down, acquisition, or enormous growth and the adoption of a more comprehensive solution. Because delivering the solution requires almost zero marginal cost, the leaders of Quickbooks can just think about subscription revenue as gross profit. (There are some marginal costs of the solution, such as operating the customer service function and some data hosting and computing requirements, but gross margins for such businesses are so high, often 90 percent or greater, that revenue is a reasonable proxy for gross profit.)

Recurring revenue is usually expressed as annual recurring revenue (ARR) or monthly recurring revenue (MRR). Recurring revenue, particularly ARR, is commonly used as a basis for valuing SaaS businesses in mergers and acquisitions or initial public offerings. ARR and MRR are usually calculated simply as the revenue per period from customers that are enrolled in subscription-based services and thus pay recurring fees. In businesses with extremely high churn, this revenue will of course not recur if nothing is done to replace those customers that churn in each period.

Putting Unit Economics Together in a Financial Model for Low-Marginal-Cost Businesses

Here’s a process for understanding your unit economics and creating a financial model for a low-marginal-cost businesses, particularly SaaS companies.

  • Set a pricing guide, probably including different categories of customers. Many SaaS businesses offer free options and then set pricing tiers based on features and service levels. These are called freemium models.
  • Estimate the average revenue per period per customer. This may require estimating the fraction of customers that will fall into each pricing tier.
  • Estimate churn, either based on existing customer behavior, or based on benchmarks for similar businesses. Using this value for churn, calculate the average duration of customer engagement as 1/churn.
  • Calculate LTV as the duration of customer engagement times the average revenue per customer per period. (If your business does have significant marginal costs of production, then use average gross margin per customer per period instead of revenue.)
  • Estimate your CAC, either based on experiments you have done or on benchmarks from similar businesses.
  • Check that LTV/CAC is greater than three, and preferably much greater. If LTV/CAC is not much greater than three, then you likely don’t yet have a feasible financial model.
  • Estimate the on-going costs of operating your business. For SaaS businesses, software development costs and sales and marketing expenses are likely the largest elements of on-going cost. Use your estimate of revenue per customer per period to do a break even calculation for the number of customers you need to serve to sustain your business.
  • Prepare a pro-forma income statement for what the business can look like if you are successful. Also create a second pro-forma income statement for the minimum viable scale. These two financial models represent your goal and your fall back position should things go much worse than planned. For SaaS businesses, you can also estimate ARR for this scenario, which will likely be an important measure of the value of your business.

3. Social Networks (e.g., Instagram)

For social networks, the economic model is rarely as simple as for a classic make-and-sell business in which a discrete unit of product or service is provided in exchange for cash. Instead, two broad methods of monetization of the network are typically adopted.

First, the operator of a social network charges members a time-based subscription fee for use of the network. Because social networks increase in value with the size of the network, the initial fee to join the network is usually low or possibly zero. Then the provider charges a fee for continued use beyond a trial period or for additional features. Such monetization methods are called freemium models, because a free option induces joining and initial use, and then premium option are available as an upgrade for which a user pays a subscription fee. For instance, joining LinkedIn is free. To enjoy the ability to send messages to those outside of the members’ immediate connections requires a paid subscription.

A second monetization model is the sale of access to the social network to other businesses for complementary purposes. The most common complementary purpose is advertising, for which a second category of customer, usually businesses, may pay to reach members of the network with advertising. This is the primary monetization model for Facebook and Instagram. Complementary purposes other than advertising are also possible. For instance, data generated from the network may be valuable to third parties who will pay for access to it. Businesses may pay for direct access to members of the network, as when recruiters use LinkedIn to identify job candidates. As the saying goes, if you as a user are not paying to use a solution, you are not the customer — you are the product.

The unit of analysis for a social network is primarily the active user. There is no standard definition of active, but some common variants are daily active users, weekly active users, and monthly active users, which typically include those users of the social network who have engaged with the product in the specified time period. Secondary units for the purposes of understanding unit economics may be the paid subscriber and/or the business customer that pays for complementary solutions like advertising.

The unit economics for monetization via subscription are similar to those of SaaS businesses. What is the average revenue per customer per time period and how long does the average customer pay a subscription fee? The average revenue per customer is the fees paid in each subscription tier, weighted by the fraction of customers in each tier. For instance if there is a free plan and a plan for USD 15 per month, and if 80 percent of customers are on the free plan and 20 percent pay subscriptions, then the average revenue per customer per month is 0.80 x 0 + 0.20 x 15 = USD 3.00. Average customer duration, as in SaaS, is simply 1/churn. So, if 5 percent of active users churn each month, then the average duration of paid customer engagement is 1/0.05 or 20 months. Putting that together, the customer lifetime value (LTV) would be 20 months x 3 USD/month-user = USD 60 per user. Monthly recurring revenue (MRR) would simply be the number of active monthly users times the average revenue per customer per month, so if the network has 100,000 monthly active users, then MRR would be 100,000 active users x 3 USD/active-user-month = USD 300,000 per month.

For start-ups, estimating the fraction of active users that will pay a subscription fee is probably the result of an educated guess at first. However, the rates are typically quite low, often less than 2 percent. This fraction is likely a critically important parameter, so some benchmarking of subscription rates in the freemium models of related businesses will be highly informative.

For the second monetization method — businesses paying for complementary products and services — the unit of analysis will be the paying third-party customer. For this scenario, the unit economics are exactly as for a low-marginal-cost product like SaaS. You may price per unit of use, as with advertising, or you may price as an all-you-can-eat subscription. In some cases, as with display advertising, the revenues paid by third-party businesses may depend on the number of active users in the social network. In this case, you may be able to express the potential third-party business revenue as a value of each customer in the social network.

Putting Unit Economics Together in a Financial Model for Social Networks

Here’s a process for understanding your unit economics and creating a financial model for a social network.

  • Decide on a primary monetization model — user subscriptions or third-party fees for access to the platform, such as advertising. In the long run, you may use both models, but typically one or the other will be your initial focus.
  • If your primarily monetization model is user subscriptions then your user is your unit of analysis. If your primary monetization model is third parties who will pay for access to members of the social network, or for data related to the network, then your third-party customer is the unit of analysis.
  • Establish price tiers. Estimate the fraction of users or customers that will fall into each price tier, informed by industry benchmarks.
  • Estimate the average revenue per period per customer, based on a weighted average of the prices for each pricing tier.
  • Estimate churn, either based on existing customer behavior, or based on benchmarks for similar businesses. Using this value for churn, calculate the average duration of customer engagement as 1/churn.
  • Estimate LTV from average revenue per period per customer and on average duration of customer engagement.
  • Estimate your CAC, either based on experiments you have done or on benchmarks from similar businesses.
  • Check that LTV/CAC is greater than three, and preferably much greater. If LTV/CAC is not much greater than three, then you likely don’t yet have a feasible financial model.
  • Estimate the on-going costs of operating your business. For social networks, software development costs are likely the largest element of on-going cost. Use your estimate of revenue per customer per period to do a break even calculation for the number of customers you need to serve to sustain your business.
  • Prepare a pro-forma income statement for what the business can look like if you are successful. Also create a second pro-forma income statement for the minimum viable scale. These two financial models represent your goal and your fall back position should things go much worse than planned. For SaaS businesses, you can also estimate ARR for these scenarios, which will likely be important measures of the value of your business.

4. Marketplaces Connecting Suppliers and Consumers, Sometimes Accompanied by Related Solutions (e.g., Airbnb)

Airbnb is an example of a marketplace, connecting suppliers of short-term housing with consumers of short-term housing. Other examples of marketplaces include eBay, Stubhub, and OpenTable. Marketplaces are also called two-sided markets because they serve two very distinct sets of customers: suppliers of goods and services and consumers of those goods and services.

Sometimes a marketplace is a component of a larger service offering that the organization provides directly. For example, the SaaS company Shopify provides software for operating an e-commerce storefront, but it also provides an app store with third-party solutions for merchants, such as freight calculators or sales tax collection systems. The core solution is the e-commerce SaaS, but a key element of that solution is a marketplace connecting suppliers of specialized application software to merchants who use that software as part of Shopify’s solution.

Occasionally a business that is primarily a marketplace will also directly offer ancillary services. For instance, Doordash is a marketplace connecting restaurants with hungry consumers, but it also directly operates a delivery service (“dashers”) that pick up and deliver the food. In such cases, the company may actually be operating a three-sided market (e.g., the restaurants, the diners, and the freelance delivery people).

Gross Merchandise Value and Take Rate

For marketplaces the unit of analysis is usually the transaction. The sum of all transactions over a time period is called the gross merchandise value (GMV). The marketplace charges fees to sellers, and sometimes buyers.

For most marketplaces, GMV is not a GAAP-compliant measure of revenue, as it does not reflect the actual amount of money that the marketplace earns from transactions. GMV may be used as a supplemental metric to indicate the size and growth of the marketplace, but it should not be confused with revenue. Revenue is the amount of money that a marketplace actually receives from its customers for providing goods or services.

The fraction of GMV that the marketplace retains as revenue before passing on the revenue to the supplier is called the take rate. For example, Airbnb’s GMV for its homes segment in 2023 was USD 29.4 billion, but its revenue from that segment was USD 7.3 billion, corresponding to a take rate of 7.3 / 29.4 = 24.8 percent. Note that Airbnb’s transaction fee on the booking is closer to 15 percent, but it charges several other fees to both hosts and guests so that when taken together the take rate is closer to 25 percent.

Take rate is largely determined by the market power of the platform. Marketplaces have very strong network effects, which can create huge sources of competitive advantage. Airbnb essentially crushed its rivals VRBO, Homeaway, and others in the period 2010-2020 becoming the dominant marketplace for temporary housing. This gives Airbnb substantial pricing power and allows it to earn a take rate of 25 percent. The Apple App Store charges a 30 percent fee for all transactions associated with digital goods made with an iOS app. The fees are lower for small businesses, for physical goods, and for multi-year subscriptions. However, put together, Apple’s take rate is also close to 25 percent. These values of 25 percent or a bit more are about the highest exhibited in practice. Marketplaces with less pricing power or dealing in physical goods have much lower take rates. For instance, eBay’s take rate in 2022 was about 13 percent. The practical range in take rates is typically 10 percent to 30 percent, with most marketplaces operating in the range of 15-20 percent.

Putting Unit Economics Together in a Financial Model for Marketplaces

The basic financial model for marketplaces is comprised of the GMV, the take rate, COGS, and on-going operating costs.

  • Identify the two sides of your marketplace, likely suppliers of goods or services and consumers of those goods or services. Decide which side will pay for access to the platform, or possibly if both sides will pay. Use competitive benchmarks and a subjective evaluation of your relative pricing power to estimate your take rate, expressed as a percentage of GMV.
  • Estimate your cost of goods, the direct costs of executing transactions, which may include fraud protection, customer service, and computing resources. For most virtual marketplaces, COGS is a small percentage of revenue. For example, Airbnb’s COGS for 2022 were $1.5 billion, which included expenses such as payment processing, customer support, trust and safety, and host insurance. Airbnb’s gross profit for 2023 was therefore USD 6.9 billion, 82% of its revenue.
  • Estimate the on-going costs of operating your business. For marketplaces, software development costs and sales and marketing costs are likely the largest element of on-going cost.
  • Revenue is simply GMV times take rate. Then, gross profit is revenue minus COGS. To achieve financial sustainability, gross profit must exceed on-going operating costs. The breakeven value can then be estimated in terms of GMV, which can be translated into a number of transactions by assuming an average transaction value. Calculate the break even number of transactions and associated GMV you will need to achieve per unit time to meet your on-going operating costs.
  • Prepare a pro-forma income statement for what the business can look like if you are successful. Also create a second pro-forma income statement for the minimum viable scale. These two financial models represent your goal and your fall back position should things go much worse than planned.

Notes

Fader, Peter, and Sarah E. Toms. The Customer Centricity Playbook: Implement a Winning Strategy Driven by Customer Lifetime Value. Wharton School Press, 2018.

Interview with John Geary, co-founder of Abodu Homes.

Interview with Randy Goldberg, co-founder of Bombas.

Commercializing a Physical Product as a Solo Inventor

About once a week, a student, alumnus, or member of the general public reaches out and says something like, “I have an idea for a new physical product. I just need to find a manufacturer. Can you help me?”

First, let me be clear and succinct about a few points. First, an idea is rarely worth much unless combined with the will, effort, and tenacity to develop that idea into a product that is available to customers and that meets their needs. Second, if all you have is an idea, then you do not just need to find a manufacturer. You need to apply your will, effort, and tenacity to the process of transforming your idea into a specification of the solution that will both delight your customers and that unambiguously communicates the details of the solution to a manufacturer. That transformation is not easy. Thankfully, there are many concepts, tools, and methods that can help you achieve your goals and to avoid wasting time and money.

In this guide, I provide an overview of what you will likely need to do and I provide links to other more detailed resources relevant to your pursuit.

May I suggest that before you proceed any further, you view these videos I made describing my attempts to create a new physical product (the Belle-V Ice Cream Scoop) and to take it to market as a solo inventor. (Note that I did not remain solo for long, and had a lot of help from talented partners in the middle phases.)

Belle-V Ice Cream Scoop – Part A
Belle-V Ice Cream Scoop – Part B

OK, now you get the idea and hopefully understand that the process is not trivial, even for a seemingly simple product like an ice cream scoop. Next, let me provide more detail on the key steps:

  1. Develop a solution concept using the triple-diamond model.
  2. Create a prototype that really does the job.
  3. Design the to-be-manufactured version of the product.
  4. Make and sell 1000 (or maybe 100 if possible).
  5. Refine your go-to-market system.

I’ll also include some content related to these important financial and competitive concerns:

  • Can I actually make money from this entrepreneurial opportunity?
  • What about patents?

By the way, if teaching yourself this material is daunting to you, please consider enrolling in my on-line course Design: Creation of Artifacts in Society (via Coursera) from which some of this content is derived. Last I checked, a version of this course was available for free. (Of course, if you are a Penn student, you could also take my course OIDD 6540 Product Design.)

Develop the Solution Concept Using the Triple-Diamond Model

The Triple-Diamond Model

Diamond 1 – Jobs Analysis

Diamond 2 – Understanding User Needs

Diamond 3 – Developing a Solution Concept

Create a Prototype that Really Does the Job

Here are the videos from my Coursera Design Course on Prototyping.

Design the Product

Once you have a prototype that works very well for you, and perhaps for a few potential customers, you can actually design the product. Huh? What do I mean by design the product? I already have a working prototype. Sure, but that working prototype is not typically implemented in an economical and reliable way, and you have not fully specified the artifact in a way that a factory could produce it.

It’s possible that you can take your prototype to a factory that produces similar goods and that their employees can create the production documentation (e.g., computer models and drawings) required to actually make the components of your product. However, more typically, you need to do this specification yourself. Furthermore, the detailed specification of the product comprises your own intellectual property, and so you may wish to control it fairly closely. In that case, you will need to find someone who can create the documentation (e.g., drawings and models) that represent the production-intent version of your product.

There are lots of different types of skills that may be required for this task. I’m not able to detail them all here. A good next step may be to consult with some independent contractors via platforms such as Upwork to understand better your options.

Make and Sell 1000 (or even 100)

In all but the most time-critical competitive environments, at some point sooner rather than later you should just start making and selling your product. Ideally you would find a way to make and sell just a few — say 100 units. This will teach you so much more than doing further research and development. These first 100 units will not be very good, but hopefully they will be good enough that a few brave customers will buy them and give you feedback. The challenge is figuring out how to make just a few units that are both good enough that someone other than a family member could figure out how to use them and tolerate the inevitable warts on the product and that can be produced at reasonable cost. You shouldn’t expect to make any money on these units, but hopefully you won’t lose ridiculous sums either. In some cases you may need to find the resources to make 1000 units — when, for example, the production economics are such that it is just not possible to reasonably produce 100 pieces. Lots more to say about this, but hopefully this quick advice gets you started.

Find a Manufacturer

Here is a video on my own experiences finding a manufacturer in China. You may find it helpful.

Patents

A patent can be a useful element of a plan for developing and commercializing a product. However, it is not really a central element of that activity. Patenting an invention can wait until many of the technical and market risks have been addressed.

A patent by itself rarely has any commercial value. (An idea by itself has even less value.) To extract value from a product opportunity, an inventor must typically complete a product design, resolving the difficult trade-offs associated with addressing customer needs while minimizing production costs. Once this hard work is completed, a product design may have substantial value.

In most cases, pursuing a patent is not worth the effort except as part of a larger effort to take a product concept through to a substantial development milestone such as a working prototype. If the design is proven through prototyping and testing, a patent can be an important mechanism for increasing the value of this intellectual property.

Licensing a patent to a manufacturer as an individual inventor is very difficult. If you are serious about your product opportunity, be prepared to pursue commercialization of your product on your own or in partnership with a smaller company. Once you have demonstrated a market for the product, licensing to a larger entity becomes much more likely.

File a provisional patent application. For very little money, an individual using the guidelines in this chapter can file a provisional application. This action provides patent protection for a year, while you evaluate whether your idea is worth pursuing.

Here are a couple of videos with examples and details. (The textbook chapter I refer to in the first video is from Ulrich, Eppinger, and Yang — Product Design and Development.

Can You Make Money?

In the short run, do you have gross margin and can you acquire customers efficiently? Here are a couple of resources that may be helpful in answering these questions.

Go to Market Systems

In the long run, do you possess the alpha assets to sustain competitive advantage? Read this to learn more about alpha assets and the five flywheels.

Customer-Driven Solutions and the Waterfall Development Process

I’ve been a product designer or member of a product development team for over 50 new products and services. There’s a magic moment, which never gets old for me, when I see one of my products out in the wild being used by someone I don’t know. These days, the most common encounter is on the streets of San Francisco when I see someone commuting to work on a Xootr scooter. It’s a huge thrill to see evidence that I created something that a stranger felt offered enough value that they were willing to give me more money for for the product than it cost me to deliver it.

I did use the word “magic” to describe a moment, but I don’t want to convey the wrong impression about the overall activity of product innovation. One of the key roles in entrepreneurship or product management is leading the creation of new products, often from nothing. This is sometimes called “zero to one” product development. While luck — or exogenous factors — always play a role in determining outcomes, I believe that any dedicated team with the appropriate technical skills and with effective product leadership can reliably create a great product by using the right product development process, and that the outcome does not depend on some magic ingredient.

Why a process? The zero-to-one process is a codification of the collective expertise of thousands of developers, accumulated in government organizations, companies, consulting firms, and universities from about 1960 to the present, more than a half century of experience. A process informs the team what to do and ensures that no critical step is left out. It allows relative novices to benefit from the learning of others. As an innovator within an established enterprise, you benefit from accumulated experience in your organization codified into a process. As an entrepreneur you can reduce the risk of costly mistakes and more reliably find a compelling solution for your customers by adopting the best practices developed by the many product developers who have come before you.

in this chapter, I’m going to give you an overview of a baseline process, found in almost all organizations, called the phase-gate, stage-gate, or waterfall model of product development. This model is a useful starting point and provides an overall structure to the process of creating a new solution. In the next chapter I’m going to circle back and provide a second, simpler model of design called the triple-diamond model.

Why two models? Let me invoke an analogy to give these two models context. I love tools of all kinds. I have a fancy table saw in my shop that I really value. It takes on big jobs. It’s safe and reliable. It’s powerful and precise. It’s also big, noisy, relatively expensive, and must be connected to a dust collection system. Even so, I couldn’t do without it. That’s like a corporate phase-gate process. But, I also have a compact utility knife that I carry in my pocket pretty much all the time. I use it several times a day. It too is a cutting tool and can even be used for some of the same tasks as the table saw, but it’s unobtrusive, comfortable, and instantly deployable. That’s the triple-diamond model.

Both models are intended to be centered on the customer and to pull from customer needs. In fact, both models include engaging with users in order to identify the needs that are most relevant to product success. Furthermore, the triple-diamond model may be applied recursively dozens of times within the context of an overall phase-gate model — say at a very high level of abstraction to create an overall solution concept, or at a very fine-grained level when refining the user interface for a specific feature.

Phase-Gate or Waterfall Product Development Process

The phase-gate or waterfall process is pretty simple conceptually. First, clarify the job to be done, then understand the needs of the customer, then create a great concept for a solution, then specify details with sufficient clarity that the solution can be delivered reliably and repeatedly to customers. That simple flow is comprised of phases (or stages) — a set of development tasks — separated by gates verfiying that the tasks have been completed before moving on to the next phase.Hopefully you can see how this is a process that pulls from the customer needs to create a solution.

Phase-gate processes are also called waterfall processes because information cascades in one direction, generally from the “what” to the “how.”

Most established companies have their own phase-gate process, and they vary across different product domains. Here’s a fairly typical version. It has these steps.

Mission Statement – this phase results in the definition of the target market, an identification of a persona or representative customer in that market, and an articulation of the job to be done. It could also include a competitive analysis and goals for how the new product will be differentiated.

Product Requirements – This phase results in the creation of a product requirements document or PRD. The PRD includes a list of customer needs, and a set of target performance specifications.

Concept Development – This phase results in an articulation of the solution concept, along with documentation of the concept alternatives, the concept selection analysis, and the results of concept testing with potential customers.

System-Level Design – This phase establishes the product architecture, the major chunks of the product and the interfaces among them, and an analysis of which chunks will be custom, and which will be standard chunks provided by suppliers.

Detailed Design – This phase results in component design and specification, prototyping and testing of the chunks, and key sourcing decisions.

Quality Assurance and Testing – This phase comprises both internal and external testing to verify performance, test customer satisfaction, and to identify bugs.

Launch – This phase includes ramping up production and sales, while assuring early customer success.

For hardware products, there will be a significant parallel set of supply chain and production planning activities to ramp up the supply of the physical product. And, for service products, a pilot will often be conducted.

In any specific organization, the phases in the process are often represented as columns in a table with an implied flow of time from left to right, and the tasks, responsibilities, and key deliverables for each function within the organization are shown as rows.

The gates in the process usually involve a document (e.g., a PRD) and one or more meetings associated with a decision (a) to proceed, (b) to return to the preceding phase for additional work, or (c) to pause the effort entirely.

Evoking the waterfall metaphor, the phases are pools along a river in which substantial work occurs, including some swirling around. The gates are vertical drops between pools, marking the transition from one phase to the next. Water does not typically flow back upstream.

Phase-gate or waterfall processes have gotten a bit of a bad rap, with the critique that they do not allow for downstream learning to affect upstream decisions. However, in virtually every situation I’ve encountered, while the flow is generally from the what to the how, there is some iteration, some hiking back upstream in the process when downstream learning requires a revision in plans.

If you work in software, you know that an alternative process, Agile Development, is very common. Agile deserves its own dedicated explanation, but suffice it to say now that in an agile development process, rather than attempt to fully and completely specify the entire software product in a product requirements document and then build the system in its entirety, the team rank orders the desired features of the system and then builds and tests the features a few at a time, organized into short sprints, usually just two weeks long. Then subsequent sprints take on additional features, but only a few at a time. With an agile approach, the team is guaranteed that it always has something working, and the flexible element of the effort is the scope of features that are eventually built, but not the time allocated to complete the product. Agile processes also benefit from continual feedback on early versions of the product, which allow the development process to be responsive to new and emerging information.

Still, even for software and even in an agile environment, the creation of the first version of the product, the first embodiment of the concept, or what is sometimes called the minimum viable product or MVP, usually benefits from application of the more-or-less standard phase-gate waterfall development process, particularly the first few phases. Once a software or service product exists, its refinement and improvement over the lifecycle is highly suitable for an agile process.

Phase-gate development processes are generally logical and efficient ways to organize the effort of teams and to provide oversight and governance to the creation and improvement of products. For products pulled from customer needs, the process proceeds from a mission, to a detailed description of what the user cares about, to an articulation of the basic approach or solution concept, to a description of the details of the solution, whether that solution is software, a physical good, or a service. When thoughtfully applied, a phase-gate process ensures the organization focuses on the customer, that the landscape of possibilities is explored thoroughly, that no critical tasks are forgotten, and that different functional roles are coordinated.

Appendix – Push versus Pull Approaches to Innovation

One of my former students, Lindsay Stewart, started a company called Stringr. Lindsay had been a producer in the television news business. One of the biggest problems she faced at work was sourcing high-quality video of breaking news. So for instance, if there were a fire in the city, she would really want video footage for her story. She would have to contract with a videographer to go get that footage, edit it, and then to put it into production. That process was time-consuming, expensive, and uncertain.

Lindsay recognized the pain associated with this job and thought there must be a better way. In response, she created an app called Stringr. With Stringr, a news producer can enter a request for a particular piece of video footage via a web-based interface, and then freelance videographers can shoot the video and submit the footage using their smartphone. When the video is accepted, they’re automatically paid about 80 USD. 

Is Stringr an innovation? By my definition, unambiguously yes. I define innovation as a new match between a solution and a need. Stringr employs technology to create a marketplace connecting requests for video with the people who can create it, clearly a new match between solution and need. I call this approach to innovation the pull. Stringr was pulled from a pain point Lindsay herself experienced and has proved to be a great solution.

But, innovation can also come about in a completely different way. It can be pushed from the solution. Here’s an example.

The inventor Dean Kamen created a self-balancing wheelchair called the iBot. The big idea was that the device could rise up on two wheels allowing the user to be at eye level with people standing on their feet. The iBot was sold by Johnson & Johnson as a medical device, but once developed, Kamen thought, “Wow we have this amazing technology to balance a wheelchair on two wheels. I wonder if we could find any other application for this solution.” Several of the engineers on the development team said, “You know what? I bet you could stand on a self-balancing platform and ride it around. We could create a personal transportation device for anyone, whether or not they were disabled.” 

That thinking led to the Segway personal transporter. One of the applications that the Segway team eventually found was for Police officers, who could use the Segway to get around in environments in which space was constrained, where they wanted to be able to move slowly, and where they wanted a high degree of maneuverability. The Segway was a push – start with a solution – the two wheel self-balancing mobility technology–  and find a need that the solution can address, in this case police patrols.

The problem was that once Kamen’s company proved that police officers wanted a low-speed mobility device like the Segway, competitors took a pull approach to innovation and discovered alternative solution concepts that could address that need. In fact, once the police and security markets were proven, established competitors did enter with alternative solutions. 

A three-wheeled personal transporter is much less complex than a device that balances on two wheels, and so competitors were able to sell this product at lower prices and with greater performance than the Segway.

While an innovation can be any match between solution and need, and that match can be discovered via a pull or a push approach, three conditions must hold for the innovation to create substantial value.

First, the need must be real. That is, a significant number of customers must have a significant amount of pain, a real job to be done.

Second, the solution has to make the pain go away. It has to actually do the job.

Third, the organization must be able to deliver the solution at a cost significantly lower than the customer’s willingness to pay, and that is, the organization must have sufficient alpha assets to sustain competitive advantage.

Let’s apply these conditions to the Segway example.

First, Segway identified a real need for police officers to get around. The Segway satisfied criterion two as well, the solution concept met the need. But, Segway struggled to be able to offer the product at a price the customer was willing to pay. 

The three-wheeled configuration is a lower-cost solution that addresses that same need for police officers to get around. Ironically, Segway itself later introduced a three-wheeled version of its scooter once that configuration was shown to offer greater value. 

The big risk with the push approach to innovation is that as an innovator you fail to consider all of the possible solutions for the need that you’ve identified, and someone taking the pull approach runs around you with a better solution.

The innovator that pushes starts with an existing solution and often only considers whether or not their solution will meet the need of the target market. That is a necessary but not sufficient condition. 

With the push approach to innovation, an important discipline is to consider how a competitor would approach the identified need, but taking a pull approach. That consideration would probably have led the Segway team to conclude that a three-wheeled solution offers better performance at a lower price, suggesting the team should probably pursue the three-wheeled solution, and abandon the push approach, or else find a different job to be done in which dynamic self-balancing did offer some unique advantage.

While both push and pull approaches can be taken to innovation, In my opinion, the pull approach is much more reliable. The pull approach is at the heart of the zero-to-one product development process I teach, and forms the basis of a reliable and repeatable approach to creating value in product innovation.

Notes

Ulrich, Karl T., Steven D. Eppinger, Maria C. Yang. Product Design and Development. Chapter “Development Processes and Organizations.” Seventh Edition. 2019. McGraw-Hill.

Ulrich, Karl T. Design: Creation of Artifacts in Society. University of Pennsylvania. 2011.

Organizational Capabilities and the Fifth Flywheel

The five flywheels framework is admittedly mechanistic – even evoking the analogy of a machine. But the flywheels don’t operate entirely on their own. Managerial action is obviously important, and the flywheels exist within an organizational context with its own distinctive attributes.

The JD Power organization ranks North American Hotels for guest satisfaction each year and The Ritz Carlton Hotel, a unit within the Marriott corporation, consistently tops the list for the luxury segment. Exceptional guest satisfaction is the brand promise that the Ritz Carlton makes to its customers, a significant element of the company’s brand equity. Some of the hotel’s ability to satisfy customers is the result of structured systems. For example, information technology allows any employee to make notes about guest preferences, creating an institutional memory that can be accessed in future guest interactions. However, much of its capability for delighting customers is the result of a culture developed and cultivated over decades within the Ritz Carlton organization. Ritz Carlton’s culture of guest satisfaction is itself a flywheel, distinct from but reinforcing of its brand equity. This culture is a flywheel in part because with cumulative time and experience, as with product performance or cost efficiency, the organization can ratchet incremental improvements. But even more significant, the culture is reinforced with positive feedback loops. As Ritz Carlton builds a stronger culture of customer service, the organization is better able to attract and retain employees with a propensity for delighting others, further strengthening its culture.

Although intangible, organizational attributes like a culture of customer service, are alpha assets because they both enhance performance and are hard for others to acquire – surprisingly hard. Consider that the Ritz Carlton’s parent company Marriott operates seven luxury hotel brands, including JW Marriott, St. Regis, and the W Hotels. Even other units within the same corporation have not been able to fully absorb the organizational capabilities of Ritz Carlton.

Deliberate managerial processes and their associated organizational culture can be thought of as lubrication and maintenance of the four flywheels: customer network, cost efficiency, product performance, and brand, and thus comprise a meta asset. A meta asset is an attribute that enhances a more fundamental and mechanistic alpha asset. These meta assets can themselves be flywheels.

Customer service is merely one of many possible organizational capabilities that could comprise a flywheel. I lump all of these organizational attributes under the label the fifth flywheel. I consider the fifth flywheel an organizational x factor, where x can assume many different performance enhancing attributes. The fifth flywheel can take many forms. Here are a few that I believe have proven to be particularly effective and worth describing in more detail. In each case, the organizational attribute associated with the fifth flywheel directly reinforces another important, albeit more mechanistic, flywheel powering competitive advantage.

Great Place to Work

Being an excellent place to work can be a fifth flywheel. NVIDIA Corporation is one of the very best places to work according to Glassdoor’s analysis of employee reviews and ratings. A typical review reads, “great culture around designing the highest performance products based on what can be done versus what is ‘good enough’.” NVIDIA relies on attracting and retaining extraordinary technical talent to design its video processing semiconductors. NVIDIA is not the largest semiconductor company in the world, but it can attract superior talent by developing a culture extremely attractive to the best engineers. By attracting that talent, it can create better products. Here’s the flywheel: the better the work experience, the easier it is to hire exceptional employees, improving the work experience even more. When attracting and retaining talent is critical to performance, being a great place to work can be a flywheel.

Company Values Aligned with Target Customer Identity

First, a counterexample. In 1986, Keith Richards founded Sierra Trading Post, selling discount outdoors gear. A core value espoused explicitly was a commitment to God, reinforced with Bible quotations on its packing slips. Richards’ commitment to Christianity almost certainly kick-started a flywheel, attracting a religious workforce, and further reinforcing the company’s commitment to God. However, that flywheel was not likely an alpha asset. In fact, the company’s target market segment was not exactly a God-fearing population, as reflected in animated threads about the company’s practices in online discussion forums among extreme outdoor enthusiasts. If anything, the company’s evangelism was a headwind against its other efforts to grow brand equity. When the company was acquired by TJX Corporation, the bible quotes were quickly scrubbed and today the company devotes substantial effort to communicating its commitment to religious diversity. 

Now consider the outdoor gear company Cotopaxi. Co-founder Davis Smith built his company around doing good, with particular emphasis on enhancing the lives of individuals in extreme poverty in Latin America through improvements in health, education, and livelihood. Cotopaxi pursues this aim with fervor similar to that of Sierra Trading Post. Although Smith is himself religious, his company focuses on non-religious values that are directly aligned with its target market, and thus reinforce its brand equity.

The fifth flywheel is only an alpha asset when it directly or indirectly contributes to enhancing the value proposition of the company to its customers.

Continuous Improvement

The most valuable companies in the world are disproportionately technology companies with household names, including Apple, Microsoft, Amazon, Google, Tesla, and Tencent. But those companies cannot exist without another of the most valuable companies in the world, which most people have never heard of – TSMC, a semiconductor manufacturer with no chip designs or consumer presence of its own. 

The very highest performance chips are now created by a partnership between a designer of the chip, say Apple, who does no production, and a semiconductor fabricator (a “fab”) who does no design. This is because manufacturing the chips themselves is so hard that it requires extreme focus and massive scale, two characteristics that can only be achieved by pooling the demand for chips from many device designers and devoting all organizational attention to one goal – relentlessly increasing the density of transistors on silicon wafers in order to continue to deliver on the promise of Moore’s Law.

Semiconductor manufacturing may be the hardest job to be done in the world. TSMC is currently the only company in the world that can make the microprocessors that power Apple’s latest iPhone or NVIDIA’s latest AI chip, and is currently on track to produce chips with features that are just 3 nM wide. How small is that? About 17 thousand times smaller than the width of a human hair. Transistors on chips can now be so small that Apple can put more than 100 billion of them in its latest microprocessor. Now imagine what is required to make 100 billion electronic devices 3 nM wide, which must each function perfectly in order for you to use Instagram on your iPhone. 

TSMC is built around a gigantic product performance flywheel. Its product, perhaps better described as a service or solution, is making chips. As solution quality improves, its value proposition improves, attracting the most demanding customers and increasing growth rate. Access to more demanding customers, greater scale, and increased experience further enhance solution quality. Now consider what organizational capability is required to lubricate this flywheel. An intense focus on one thing – process improvement. To give a sense of the size and scope of process improvements, consider that in 2021, TSMC filed almost 9000 patents and registered more than 20,000 trade secrets. The company has honed its process improvement and trade secret management system to a level that it now works to train others in its ecosystem and supply chain to make concomitant improvements themselves.

Adjacent Innovation

A few years ago I taught an MBA course on innovation in China which included a study tour of entrepreneurial companies in Beijing. One day we visited Xiaomi, a company that sells about as many smart phones each year as Apple. We spent the day talking with executives and touring facilities.The students also loaded up on colorful, zany, consumer products in the company store, with one commenting that Xiaomi was like Apple’s crazy cousin. Xiaomi fueled its meteoric rise to the Fortune Global 500 in just a decade by pursuing a strategy of prolific innovation around its core business of affordable high-performance smartphones running the company’s Android-based operating system MIUI. In addition to eight model lines of smartphones, the company’s product line includes tablets, laptops, home entertainment devices, electric scooters, small appliances, wearables, drones, and cloud services.

Although the company launches new products at a rapid clip, its innovation is extraordinarily bounded. There is no research on quantum computing at Xiaomi. Instead the company leverages a vital external ecosystem of users, developers, technology providers, and supply chain partners to reliably create a rapid-fire sequence of interesting and novel products, none of which represent a huge technological leap.

This innovation capability directly reinforces Xiaomi’s alpha assets of product performance and brand. This capability is also a flywheel with positive feedback – the more interesting Xiaomi’s products, the more desirable the company is as a place for innovative designers and engineers to work, and the more attractive it is as a partner for small inventive companies.

How did Xiaomi kick-start the innovation flywheel? The origin of Xiaomi was a big bang of talent. Founder, Lei Jun, key executives, and company communications staff all love to display a photo from the beginning of the company with a smiling Lei sitting on a conference table surrounded by seven seasoned leaders, several of them previous entrepreneurs, and combining experience from Microsoft, Google, and Qualcomm. Co-founder Liu De, educated at the elite ArtCenter College of Design in California, assumed the role of design visionary. It was a dream team of global innovators. They had learned an A game from the best global corporations and now focused their knowledge on one thing – efficient product innovation around a core operating system.

Lei Jun himself is an endlessly curious and open-minded leader. I recall visiting a Silicon Valley company with a unique production process for physical goods. I was surprised to learn that Lei had been there days earlier learning about the technology first hand. My sense is that Xiaomi does what Apple designers dream of doing when senior executives aren’t watching. The company acts like Apple might act if it had nothing to lose.

Dynamic Capabilities

The problem with flywheels is that they are, well, flywheels. Their job is to preserve momentum around a particular axis of rotation. What gives them the power to blast through barriers also makes them resistant to changes in direction. This is a good thing most of the time – companies that have found product-market fit benefit from a single-minded focus on ratcheting incremental improvements in product, cost efficiency, brand, and customer network. But, in highly dynamic industries, or in periods of significant industry disequilibrium, the companies that can adapt survive and thrive. This capability to reinvent capabilities is what scholar David Teece calls dynamic capabilities. He argues that this capability to change is itself a source of competitive advantage. Using my terminology, dynamic capabilities is a fifth flywheel. 

Consider the software company Adobe. You know the company primarily as the source of portable document format or PDF, a widely licensed standard for encoding documents digitally. However, the bulk of Adobe’s revenue comes from the productivity tools used by creative professionals. These tools include Photoshop, Illustrator, Premier, and many others. Adobe has thrived since its founding 1982, but not without many periods of reinvention. Adobe’s first product was the postscript format used to represent documents for printers. Then, it created PDF. Then, it created a tool for manipulating images, Photoshop. As the world wide web developed, it acquired Macromedia corporation and provided the dominant tool for creating websites, Dreamweaver. Then, a tool for video, Premiere. As the world changed, Adobe changed with it. Likely the biggest disruptive threat for Adobe was the shift in the landscape from single-shot purchases of client software to cloud-based software-as-a-service monetized with subscriptions. In response, Adobe reinvented itself again, with the Adobe Creative Suite, a massive success, giving an all-you-can eat set of tools to creators for a reasonable monthly fee. When threatened by Figma, an easy-to-use collaborative tool for digital design, Adobe acquired Figma. This elephant can dance, and this agility is an alpha asset. Adobe’s ability to reinvent itself is clearly performance enhancing. But it is also vexingly hard for rivals to emulate. A culture of organizational innovation is the result of hundreds of actions taken over decades to reinforce reinvention as a core value, and to give the organization the specific tools and methods for changing itself.

Organizational Attributes are Sticky

It’s not hard to understand how organizational attributes like a culture of innovation can be performance enhancing. However, are organizational attributes really that hard for rivals to acquire? After all, there is no fancy technology or trade secret involved in a fifth flywheel. As most practicing managers know, organizational change is shockingly hard. Scholar Gabriel Szulanski did a fascinating study of the stickiness of organizational capabilities. What makes his work so interesting is that he studied the factors that explained the transfer of capabilities from one unit of a company to another unit in the same company. He found that even elements of know-how that substantially enhanced performance were resistant to transfer, even among sibling organizations. For example, telecommunications firm CENTEL had proven success with a process improvement methodology it called WPA, essentially a version of total quality management. When it attempted to deliberately transfer WPA to all units, the effort largely failed. Szulanski found that the key factors in predicting stickiness more generally are causal ambiguity, absorptive capacity, and arduous relationships among the transferring partners. Given how hard it is to adopt a practice from a sibling, imagine the challenge in adopting the organizational attributes of a rival.

Focus

Building organizational capabilities is hard, which is why they can be sources of advantage. The difficulty in spinning up the fifth flywheel is also the reason an organization can probably only possess one or two organizational attributes that are so exceptional they become alpha assets. Adobe would likely lose its edge if it attempted to focus on dynamic capabilities and on cost efficiency. There are three key reasons organizations must focus on just one or maybe two fifth flywheels.

Finite Resources

Social scientist Anders Ericsson did the research popularized by Malcolm Gladwell as the “10,000 hour rule,” which suggests that true expertise requires 10,000 hours of deliberate practice. Ericsson’s findings are much more subtle than Gladwell’s re-telling, but the essential truth is that being the best in the world at pretty much anything significant requires a huge investment of effort. Being the best in the world at more than one thing is exceedingly rare. This logic holds for organizations as well. Organizations have only so much managerial attention and free cash flow. Much better to be good enough on most dimensions and truly exceptional on the one dimension most reinforcing of performance. That one dimension might be customer intimacy, quality of the work experience, capability to innovate, obsession with process improvement, or alignment with values important to a target customer. Pick one.

Clear Communication

I bet you’ve heard the phrase “It’s the economy, stupid.” That’s what campaign strategist James Carville told campaign workers during Bill Clinton’s successful 1992 presidential campaign. Do you remember the second and third points in Clinton’s campaign? I didn’t think so. (They were “Change vs. more of the same” and “Don’t forget health care.”) The economy, change, healthcare – all important things. It’s a noisy world out there. You need every employee to understand what’s most important. Focus on one flywheel.

Decision Making Complexity

Imagine a bellhop at the Ritz Carlton trying to figure out the best course of action when a guest asks to store 12 pieces of luggage. Not a hard challenge when there is just one organizational value, customer satisfaction. However, imagine that the organization attempts to optimize both cost efficiency and customer satisfaction. What does the bellhop do? Consult a manual, calculate a trade-off, devise a fee for extra bags. Yikes. By then, neither objective is satisfied. To engender decisive action without bureaucracy, reduce the complexity of the objective. Of course some decisions require a lot of nuance. But, most don’t. Better to align an organization around the critical one or two objectives than to mire employees in complexity.

Levers on the Fifth Flywheel

Select a fifth flywheel carefully, preferring a focus on one that most enhances the value proposition to customers, as say the influence of organizational values on brand, or the influence of a culture of continuous improvement on cost efficiency. In selecting a fifth flywheel, also consider the beliefs and values of leaders. If you yourself are the company’s owner and CEO, what are your core values? If you are working to formulate strategy but report to more senior leaders, what are their core values? Yes, in theory, shareholders of a public company, via a board of directors, could select new leadership aligned with a desired fifth flywheel. But, most of us work in real situations that may deviate from theory, and may need to just assume organizational leaders are who they are.

Once a desired organizational attribute is identified, how can the associated flywheel be accelerated?

I remember a story my father told me when I was in high school. He worked as a consultant for Dupont and reported that someone at the lab that day had cut their finger with a box cutter. That minor injury launched the unit into action with an analysis of root causes and a plan for reducing the risk of recurrence. For Dupont, safety has been articulated as its primary core value for more than a century. 

Davide Vassallo, when he led Dupont’s safety consulting unit articulated why safety is a fifth flywheel and alpha asset: “When you have good safety, that means you have control of your operations. And if you’re in control, of course you can drive business improvement as well.” Dupont went on to spin off a large and successful consulting firm DSS to deliver its safety solutions to companies around the world, an organization Vassallo now leads.

Dupont’s impressive success in creating its safety flywheel illustrates some effective methods useful for any organization.

Visible actions aligned with a North Star

The Chinese love short pithy sayings, usually four characters long, that they call “cheng yu.” My favorite is translated as “kill the chicken to warn the monkey.” Not exactly consistent with the Total Quality mantra “drive out fear,” but certainly reflective of the incontrovertible truth “actions speak louder than words.”

Employees are not stupid. The single most important lever for reinforcing a fifth flywheel is visible and costly actions that are aligned with a North Star. Absent those actions, no one believes. Conversely, when employees see an organization halt operations and investigate a cut finger from a box cutter, that action speaks much louder than the slogan “safety first” hung at the entrance of the laboratory.

When I was asked to serve as Vice Dean for Innovation at the Wharton School, then dean Tom Robertson allocated about ½ percent of Wharton’s revenues for unspecified exploration of opportunities to better pursue the School’s mission. Forming an innovation group, appointing a senior faculty member to lead it, and creating a multi-million dollar budget got the organization’s attention and was an honest signal that the School was serious about innovation.

Origin stories and myths

Dupont employees love to explain that Dupont’s culture of safety arose because the company started in 1802 with just one product, gunpowder. Frequent explosions were dangerous and disruptive. Founder E.I. Dupont committed to improvement, and built his own house in the blast zone of a potential explosion to reinforce his commitment to employee safety.

Although not as dramatic an origin story, consider how Amazon employees recount their founder Jeff Bezos making the company’s desks from doors and framing timber to minimize costs. That story is a founding myth that reinforces the company core value of cost efficiency.

Codification in processes and systems

Dupont includes an evaluation of safety practices in its performance reviews. In the 1960s, Dupont started selling its safety services to other companies, and developed structured processes for risk assessment and safety improvements.

When Adobe committed to improving its culture of innovation it created the Red Box. <more…>

Processes and systems aligned with a fifth flywheel reduce friction and increase momentum, and no flywheel can spin when faced with the friction of processes and systems at cross purposes with a desirable attribute of organizational performance.

Communications and manifestos

Dupont’s corporate communications reinforce its core values. Safety is listed first.

You’ve probably heard of Netflix Founder Jeff Hastings’ 2001 Powerpoint presentation about Netflix values. It’s a highly public manifesto of what matters at Netflix, or at least what mattered in the early 2000s. A fifth flywheel is clearly articulated on slide 21, “A great workplace is stunning colleagues.”  And then on slide 23, “Adequate performance gets a generous severance package.” Whoa. Got it. When a company releases a detailed document viewed tens of millions of times and containing extraordinarily specific and distinctive desired elements of culture, the company torques the flywheel.

Investment in education, training, and deliberate practice

Dupont makes significant and deliberate investments in education and training. For example, in order to influence employees beliefs and cognitive models about safety, it created a system called DnA (Dupont Integrated Approach) that was implemented with a 2-day training program for the plant managers, a 2-day program for mid-level supervisors, focused coaching sessions for all leaders, managers, and supervisors; a 4-hour training program for all plant-floor workers; and then periodic on-going skills workshops for supervisors.

For elements of culture to stick, they require continual reinforcement and visible investment in education and training.

Notes

Senge PM (1990) The Fifth Discipline: The Art and Practice of the Learning Organization (Currency, New York).

Szulanski G (2003) Sticky Knowledge: Barriers to Knowing in the Firm (SAGE Publications, London).

https://www.mi.com/global/about/founder/
(Accessed November 21, 2022)

Apple M1 chip

https://en.wikipedia.org/wiki/Apple_M1

(accessed November 22, 2022)

DuPont Origin story

https://www.dupont.com/news/safety-at-our-core.html

Accessed November 16, 2022

Dupont Values

https://www.dupont.com/about/our-values.html

Accessed November 16, 2022

Dupont training (DnA) program

Accessed November 16, 2022

Bloom et al 2020 AEA

Adobe Red Box

Dupont spins out DSS as its own business

https://cen.acs.org/safety/DuPonts-safety-segment-solo/97/i12

Netflix Reed Hastings 2001

Accessed November 16, 2022

Anders Ericsson and Robert Pool, Peak: Secrets from the New Science of Expertise.

Michael Treacy. Discipline of Market Leaders.

Wikipedia “It’s the Economy Stupid” (accessed November 15, 2022)

https://en.wikipedia.org/wiki/It%27s_the_economy,_stupid

https://www.glassdoor.com/Reviews/NVIDIA-great-culture-Reviews-EI_IE7633.0,6_KH7,20.htm

Accessed November 14, 2022

Keith Richards (Sierra Trading Post founding story)

Wayback machine April 4, 2003 version of http://www.sierratradingpost.com/StaticText/CompanyHistory.asp?wc=true

https://www.tetongravity.com/forums/showthread.php/54383-Does-the-whole-jesus-and-STP

(animated debate about christianity and outdoor gear)

https://www.tjx.com/responsibility/workplace/inclusion-and-diversity

TJX corp diversity and inclusion – religious diversity

Founding story (http://www.sierratradingpost.com/lp2/founding_story.html):

“Harder than the planning was deciding how to commit a business to God. To do this, we ensure that this business reflects God’s principles in the way we treat employees, vendors and customers. Each catalog includes three “We believe” statements. In addition, we print a quotation from the Bible on our order blank. These statements serve to hold me accountable.”

Competition and Product Strategy

You may believe that you have identified a unique opportunity to create value with your new business. You’re probably mistaken about the unique part. Others have likely tried to do this job before, and some scrappy entrepreneurs just getting started elsewhere in the world probably share your hopes and dreams. Even if your insight is unique, it can’t remain a secret for long. If you are able to grow your business and achieve profitability, you will effectively be publishing the location of a gold mine to the public. Competition is a central, unavoidable characteristic of entrepreneurship. But, competition is not necessarily a bad thing, particularly at the dawn of a new market. Competitors can teach you a lot about what works and what doesn’t, spur you to innovate and move quickly, and share the burden of educating potential customers about an emerging market.

Many aspects of competition are unpredictable and so entrepreneurs should probably not spend inordinate time obsessing over rivals. Still, some attention to competition can result in smarter strategic choices in product positioning and in refining the definition of the beachhead market. Furthermore, potential investors will want to see that you have identified and analyzed the competition and have made sensible decisions about how to direct your efforts given the competitive landscape. As a way to organize this chapter and to avoid unnecessary theory, let me start with an identification of the key questions most entrepreneurs need to answer and the associated decisions they need to make. Then, I’ll illustrate several key concepts, analyses, and ways of presenting information that are most useful in addressing these questions and decisions.

What Questions are You Really Trying to Answer?

Three questions relevant over three different time horizons are usually most pressing.

First, is there really a gap in the market? This is the immediate question relevant to the decision to pursue an opportunity. Entrepreneurial opportunity is born out of disequilibrium, and for start-ups that disequilibrium is usually either (a) some technological change that has given rise to a new solution to an existing job to be done, or (b) some new job to be done that has emerged because of changes in attitudes, preferences, demographics, regulation, or other external forces. A closely related question is how big is the gap in the marketplace in terms of TAM and SAM.

Second, given that an opportunity exists, how should the specific attributes of your solution be positioned relative to the alternatives available to your potential customers? Positioning concerns both the decisions you make about the substantial features of your solution, as well as what you emphasize in your marketing efforts. This question is answered as you develop your solution, refine its characteristics, and craft a message for communicating your value proposition.

Third, how likely is your new organization to be able to sustain competitive advantage in the long term? In most cases a start-up’s most valuable assets relative to larger rivals are speed and agility. But, if you are successful, you will likely become bigger and a bit more sluggish. Existing and new companies will come for your customers. How can you thrive when that happens?

In order to answer these three questions, you’ll first form a hypothesis about the job to be done, the beachhead market, and your solution concept. If you are following the process in this handbook, this hypothesis is developed with the triple-diamond model. In any case, to consider the issues in this chapter you should have at least a preliminary decision in these three areas. In many cases, these preliminary decisions are the key elements of the description of the entrepreneurial opportunity.

With a hypothesis about the opportunity in hand, here’s a process to assess the competition, position your solution, and articulate how you will sustain competitive advantage:

  1. Identify the direct, indirect, and potential competitors and research their solutions and marketing strategy.
  2. Refine and articulate your value proposition by Iteratively refining your product positioning and by mapping your solution relative to those of direct competitors on the dimensions of product performance that most influence the value you offer to your potential customer.
  3. Develop your advantage thesis by articulating your alpha assets, the moats and barriers that you possess or hope to develop over time.

Identify Direct, Indirect, and Potential Competitors

In broad terms, competition is comprised of the organizations that deliver a solution that customers can select to do the job you have identified as the primary focus of your business. These rivals can be categorized as direct competition, indirect competition, and potential competition.

Direct competition refers to organizations that deliver essentially similar solutions to the same customer segment you are targeting and more or less addressing the same customer needs — the Coke and Pepsi of the soft drink market, UPS and FedEx for ground parcel delivery, Nike and Adidas in athletic shoes. Direct competitors are usually the most obvious and visible sources of competition.

Indirect competition refers to organizations that offer a substantially different solution to your segment for addressing the same or closely related customer needs. For example, Peet’s Coffee and Red Bull are indirect competitors for morning stimulants.

Potential competition refers to organizations that do not currently offer solutions to the focal customer segment, but who have the capability and incentive to do so in the future. For example, Amazon and Google are potential competitors in many markets where they do not currently operate, such as healthcare or education. Potential competitors are dormant, but may substantially pollute the attractiveness and sustainability of an opportunity given the possibility they may enter the market later.

Once you’ve identified the direct, indirect, and potential competitors, spend some time learning what you can about them. Devote the most time to direct competitors, but also investigate the indirect competitors; it’s possible they are more aligned with your beachhead market than you think. Your time is probably not best spent going deep on all the companies that could potentially be competitors — too much uncertainty clouds their role in your future. For the most relevant competitors, read white papers and articles; listen to podcasts; watch video interviews; try out their products; talk to their customers. These competitors, as a result of their marketing efforts, have effectively all run experiments out in full view of the public. You should take advantage of whatever information you can glean from what is working for them, what has not worked for them, and what weaknesses are revealed about them by their current efforts.

Refine and Articulate the Value Proposition

When you developed your solution concept, you probably used a concept selection matrix to compare alternatives. (See the chapter on Concept Development.) The criteria you used for comparison included the key customer needs for the beachhead market. Now pull out that list of needs again and revise and extend it until you have 6 – 10 key customer needs that will mostly determine the value that your solution can deliver to your customer.

Needs are usually expressed in the language of the customer, not as technical specifications. At this point you may wish to elaborate the metrics that most closely match each customer need. For instance, if the customer need for an electric vehicle is “has sufficient range for my daily needs” then some metrics might be “range at 50 kph average speed” and “range at 100 kph average speed” which would capture both city and highway driving.

Once you’ve compiled a list of needs, organize them in a table, along with the key performance specifications. Then, fill in the values for your solution and those of your direct — and possibly indirect — competitors. For example, Mokwheel is a relatively recent start-up company entering the electric bike market with the Mokwheel Basalt model.

Mokwheel bike solution concept. Source: Mokwheel

Here is a table showing the comparison of the Mokwheel Basalt relative to some of its competitors.

Customer NeedMetricMokwheelRad Power RadRover 6 PlusJuiced Bikes CC XNiner RIP E9 3-StarLectric XP 3.0Ride1UP 700 SeriesAventon Level.2
RangeMiles per charge on test course60453030253040
AffordabilityPrice (USD)$1,999$1,999$2,499$6,295$999$1,495$1,800
WeightKilograms35.934.325.023.528.624.528.1
Ride comfortSuspension typeFront fork suspension w/ lockout. Fat tires.Front fork suspension and rear coil-over suspension w/ lockoutFront fork suspension w/ lockout Full suspension w/ RockShox ZEB Select forkRigid frame/fork w/ fat tires for cushioningFront fork suspension w/ lockoutFront fork suspension
Payload capacityRack weight limit (Kg)8245N/AN/AN/AN/A55

The hypothesis for Mokwheel is that an affordable, rugged electric bicycle with very long range and huge cargo capacity will be well received in the beachhead market, even if the weight of the vehicle is relatively high.

Product Positioning on Key Dimensions

Competitive positioning is often boiled down to just two dimensions to allow visualization with a scatter plot. For this example, let’s assume that the two attributes of electric bikes that seem to best describe differences in products and in preferences in the market are weight and range.

Given two dimensions, we can then draw a map of the landscape of possible solutions. You could very reasonably object to this oversimplification. You’re right. In virtually any market, we oversimplify by representing the competitive landscape in two dimensions. Still, it’s done all the time, and has an obvious benefit for visualization. Recall that you have already captured the other dimensions that matter in the value proposition table from the previous section. You can experiment with which two dimensions are both important to customers and reflect meaningful differences among competitors.

Note that you can sometimes sneak in a third dimension, say price, by labeling the data markers in the scatter plot, as I’ve done with price below.

In using scatter plots for communicating product positioning, a distinction between two types of attributes is important. Weight and range are largely more-is-better or less-is-better attributes. Everyone can agree that — at least for reasonably foreseeable solutions — more range and less weight are desirable. All else equal, customers would prefer a product located in the upper left corner — low weight and high range. However, cost and technical feasibility likely make that position overly optimistic. In contrast, imagine you are designing a chocolate bar and that the two attributes of greatest importance to customers are (1) intensity of chocolate flavor and (2) crunchiness. For the chocolate bar domain, each customer likely has an ideal point — a combination of intensity of chocolate flavor and of crunchiness that they prefer. The producer can position the solution pretty much anywhere, as most positions are technically feasible at similar cost. Reinforced by these examples, we can probably all agree on some basic principles:

  • All else equal, a product should be positioned where there is demand.
  • All else equal, products should be positioned where there is little competitive intensity.
    • For more/less-is-better attributes, cost and technical feasibility constrain the position of your solution, and you likely will face trade-offs among competing attributes.

By the way, many of you have heard about or read the book Blue Ocean Strategy – that’s all the book really says. Put your product where there is demand and where there’s limited competition. Much of the field of quantitative market research is devoted to increasingly precise methods for measuring preferences and optimizing product positions in a competitive landscape. There’s nothing wrong with that logic or that approach. However, I want to warn you about two ways this approach to product positioning could lead you astray.

First, not every location in this space is feasible. Imagine, we were applying the same process, but for cameras, and our axes were image quality and size. There would be a big open area – a so-called “blue ocean” in the region of very high quality images and tiny size. Yet, the optics of photography introduce a fundamental tradeoff between size and quality, for a given imaging technology. This suggests that product strategy and product positioning in technology-intensive industries are cross-functional challenges, and that engineering breakthroughs are what allow for differentiation. For instance, the advent of computational photography, the use of image processing of several images in order to create one excellent composite image, which underlies much of the power of photography on today’s mobile devices, allows some loosening of the connection between camera size and image quality. In the electric bike market, advances in battery chemistry, motor efficiency, aerodynamics, and tire performance may allow for competitive positioning that beats the basic trade-offs reflected by existing competitors and solutions.

My second concern is probably more substantial. If you find yourself drawing two dimensional maps of your product landscape and debating the fine points of position, or if you find yourself building elaborate mathematical models to estimate market share in a crowded market for products in which a few attributes dominate consumer preference, you are probably not in a dynamic industry with abundant entrepreneurial opportunities. Rather, you are in a stagnant industry in which tuning is done by product marketing managers, and often based on mathematical models and consumer data. The goal is a few additional points of market share. If this is your situation, my advice is to find a way to make this industry less stable, to shake it up, and introduce some new dimensions of competition.

In fairness to the authors of Blue Ocean Strategy, shaking up the industry is more the essence of their message. Avoid head to head competition tuning product parameters within a highly evolved product landscape. Instead, look for a way to introduce new attributes to the competitive landscape. For example, in the chocolate bar space, consider the FlavaNaturals bar, which is made with cocoa that is super concentrated in flavonoids, which have been shown clinically to increase memory. Or consider the KIND bar, which cleverly blurs the boundary between candy and health food. It tempts the consumer with chocolatey flavor while presenting an image of wholesome goodness with the obvious use of nuts and seeds. Those are both competitors that have shaken up the more traditional dimensions of competition in the candy bar market.

Develop an Advantage Thesis

I’ve written a lot about competitive advantage elsewhere. (See Alpha assets and the Five Flywheels.) But, in sum, advantage always arises from controlling or possessing some resource that significantly enhances your performance in doing a job and that your rivals can’t easily get. I call those resources your alpha assets.

A unique solution is usually the start-up’s initial alpha asset. In a few rare instances, the solution will remain hard to imitate for a long time. For instance, in the pharmaceutical industry a new molecular entity can be patented, and what is patented is what eventually receives government approval. Thus, rivals can not offer the approved compound without infringing the patent. Given the typical time requirements for commercialization, such patent protection may offer 10 or even 15 years of exclusivity. But, outside of the biopharmaceutical industry, patents rarely provide strong barriers to imitation for very long (Ulrich, Eppinger, and Yang 2019). Your unique solution combined with your speed and agility probably give you a few years of advantage, at which point you had best have developed other sources of advantage. The most likely are brand and the scale economies enabled by a large established customer base.

Why Can’t Google Do this?

One of the most common questions that entrepreneurs face from investors is “Why can’t Google (or Apple, Meta, Amazon, et al.) do this?” This question reflects the concern that Google, or any other large and powerful company, could enter your market and offer a similar or better solution than yours, using their vast resources, capabilities, and customer base. The “Google question” is common enough to consider specifically. The answer varies depending on your industry, market, and product category. For example, consider how the answer may differ for two start-ups, one pursuing on-line dating and one pursuing cloud-based video services. Although these examples are specific to the competitive threat by Google, they are illustrative of how an entrepreneur might think about competitive threats from any large, powerful incumbent.

Google could enter the online dating market and offer a similar or better solution than a start-up, but it is unlikely that they will do so for several reasons. First, online dating is not aligned with Google’s mission, which is to organize the world’s information and make it universally accessible and useful. Second, the online dating market is fraught with privacy concerns. Google may face legal and ethical issues if it enters the online dating market and uses customer data for matching purposes. Third, online dating is a highly competitive and dynamic industry. Google may not exhibit sufficient agility to keep up with changing customer preferences and needs, as well as the emerging technologies and features in the online dating space. Putting these reasons together, one could argue that Google is not a serious potential competitor in the online dating market. In sum, Google could do it, but Google won’t do it.

Google could also enter the market for cloud-based video services and offer a similar or better solution than the start-up. They might credibly do so for several reasons. First, cloud services is their core business and competency. Google already offers a range of cloud services products such as Google Cloud Platform, Google Workspace, Google Cloud Storage, etc. It has the incentive and interest to enter a niche or specialized segment of the market in order to stimulate demand for Google’s core services. Second, cloud services is a technologically complex industry. Google has the resources and capabilities to enter the cloud services market and offer a high-quality and reliable solution that meets the needs and expectations of customers. Third, cloud services is large and growing industry. Google not only could do it, but Google likely will do it, and has the opportunity and potential to enter the cloud services market and capture a significant share of customers and revenue. If you are in the directly path of a company like Google in its core business, then you will likely need to make an argument about the importance of speed and agility, and some important alpha asset — such as network effects — that can be developed in the two or three years it will take Google to recognize and respond to the opportunity. You may of course also argue that Google would more likely acquire your start-up than build its business from scratch. Such arguments are weak, in my opinion, unless you can make a credible argument for why your start-up will have significant alpha assets within a few years, and in that case, whether or not Google would acquire the company, you have built something of substantial value.

Wrap-Up and Common Pitfalls

Your business plan or “pitch deck,” whether for investors or just for your own planning, should have a section on competition. Everyone expects that, and for good reason. You’ll usually have a table showing how your solution stacks up against the rival solutions on a handful of key customer needs. You’ll likely show your product position relative to direct competitors on a two-dimensional plot. You’ll devote some space to an articulation of your planned sources of long-term competitive advantage.

Do those things and at the same time avoid these rookie mistakes:

  • Do not claim that you have no competitors or that you are better than all of them. Every job to be done has been around in some form for a very long time in society. Your potential customers were getting that job done somehow before you had your bright idea. The pre-existing solutions are competitors.
  • Do not be dismissive of competitors. If there is an existing company doing the job you are setting out to do, then that company is more accomplished than you are at the time of your analysis. Show some respect and learn from that company’s experience.
  • Do not argue that you are the first mover, and that this is a source of competitive advantage. There are rarely first-mover advantages, except sometimes when the market exhibits very, very strong network effects. Consider that Google was not even one of the first ten companies to enter the internet search business.
  • Do not cite patents or “patent protection” as a significant source of competitive advantage. Unless you are a bio-pharmaceutical company, patents are at best a low picket fence around your solution. They are not typically a significant barrier to entry.

Notes

Karl T. Ulrich, Steven D. Eppinger, and Maria C. Yang. 2019. Product Design and Development. Chapter “Patents and Intellectual Property.” McGraw-Hill. New York.

Karl T. Ulrich. Alpha Assets and the Five Flywheels. Working Paper. The Wharton School. 2018.

Kim, W. C., and R. A. Mauborgne. 2005. Blue Ocean Strategy: How to Create Uncontested Market Space and Make the Competition Irrelevant. Boston: Harvard Business School Press.