INFERA Protocol · Live on Base v1.0 · April 2026

Inference decentralized.
Idle GPUs in
Trustworthy AI tokens out.

INFERA is the verifiable, permissionless inference layer for AI. Run any open-source model on a global network of GPUs — with cryptographic proof the right model ran — and pay per token in stablecoins. Up to 70% cheaper than hyperscalers, with the same OpenAI-compatible API your code already speaks.

GPUs Online
0
+ 47 this week
Tokens / second
0
across 12 models
Avg cost vs AWS
−68%
per million tokens
P50 TTFT (70B)
0
comparable to Bedrock
/01 The Thesis

Three trillion tokens a day, three companies.

The world is about to spend more on AI inference than on entertainment, software licensing, and search advertising combined. Today, almost all of it flows through a handful of cloud providers. Their pricing reflects that.

Meanwhile, there are millions of capable GPUs sitting idle — in gaming rigs, ex-mining farms, university labs, and the back rooms of crypto natives who definitely did not buy these RTX 4090s for just gaming.

INFERA is the protocol that connects the two sides. Open-source models run on permissionless hardware, with hardware-attested or cryptographically-proven correctness, settled per token in USDC on Base. Developers pay a fraction of hyperscaler rates. GPU operators monetize idle silicon. The protocol takes a small cut and burns it.

We are not the first to attempt this. We have built the one that's actually fast enough, cheap enough, and honest enough to use in production.

/02 Live Network

The protocol, in real time.

Every job below is a real inference call settled in the last 60 seconds. Reload to see the next batch — and yes, it really is this active.

Network status · operational
All Models
LLMs
Image
Embed
Provider distribution · 28 countries
N. America312
Europe248
Asia156
Other26
Tokens served · 24h
0
+ 18.2% vs prev day
USDC volume · 24h
0
protocol fee: $73
Recent jobs streaming
    /03 The Protocol

    Six steps from prompt to payment.

    /01 · Request
    Your app calls a familiar endpoint.
    Point your existing OpenAI client at api.infera.network/v1. Same JSON shape, same streaming, same SDKs. Authentication is an API key bound to your on-chain account.
    /02 · Match
    A Router picks the best GPU in milliseconds.
    Our staked Routers maintain a real-time index of every Provider's model loadout, latency, and reputation. They sign a job ticket and dispatch within 10–40ms.
    /03 · Execute
    The model runs inside a TEE.
    The Provider executes inference inside a hardware-enforced enclave (NVIDIA Confidential Compute, AMD SEV-SNP, Intel TDX). The chip itself signs proof that the right code ran on the right weights.
    /04 · Stream
    Tokens flow back as they're generated.
    Standard SSE streaming. P50 time-to-first-token of ~350ms for a 70B-class model in the hot pool. Comparable to Bedrock, at a fraction of the price.
    /05 · Settle
    Payment lands on-chain in 60 seconds.
    Up to 10,000 jobs per Merkle batch. Provider gets paid in USDC. Router gets 1.5%. Protocol takes 1.5% — a third of which is used to buy and burn $INFR every week.
    /06 · Verify
    Validators sample. Cheaters get slashed.
    1–5% of jobs are randomly re-executed by independent Validators. A confirmed bad output costs the Provider their stake. Reputation compounds; bad actors don't get a second night of sleep.
    /04 Verification

    Three tiers of trust.

    Cost, latency, assurance — pick two corners of the triangle. Per call, per use case. The default is good enough for almost everyone; the premium tier is good enough for the regulator.

    Tier 01 · Default

    TEE attestation

    Hardware-Signed Proofs
    The chip itself signs an attestation that the right model code ran on the right weights, inside a hardware-enforced enclave. Negligible runtime overhead. Trust assumption: the chip vendor's signing keys aren't compromised.
    Overhead
    < 5%
    Premium
    Baseline
    Best for
    Most apps
    Latency
    Native
    Tier 03 · Premium

    zkML proofs

    Cryptographic Certainty
    A SNARK proves that running model M on input x produced output y. Verified on-chain in milliseconds. Trust assumption: math. Currently practical for <13B models in real-time, <70B in batch — both improving 10× per year.
    Overhead
    Significant
    Premium
    +5−50×
    Best for
    Regulated
    Trust
    Cryptographic
    /05 For Developers

    Five minutes to first call.

    The SDK is intentionally a drop-in for OpenAI's. Change the base URL, change the API key, ship. Yes, it really is just that.

    python
    typescript
    curl
    
        
    $0.18
    per million tokens
    Llama 3.1 70B · vs $2.65 on Bedrock
    99.6%
    monthly uptime
    SLA-backed in Tier 01 · since launch
    12
    supported models
    Llama, Mistral, Qwen, DeepSeek, Stable Diffusion, Whisper, BGE
    /06 For Providers

    Your idle GPU, at work.

    Estimate what your hardware can earn. Numbers are based on current network averages — actual earnings depend on bid price, model loadout, and uptime.

    Hardware
    RTX 4090
    consumer
    RTX 6000
    prosumer
    H100
    datacenter
    B200
    flagship
    Hours per day online 20h
    Average utilization 75%
    Verification tier Tier 01 · TEE
    Estimated monthly earnings
    $117
    USDC · before electricity
    Daily average$3.90
    Per hour active$0.20
    $INFR stake required~$240
    Break-even on stake62 days
    /07 Tokenomics

    $INFR. Boring, on purpose.

    One billion tokens, fixed. Real utility — staking, slashing collateral, governance, and a fee-funded buyback-and-burn. Developers don't need to hold it to use the protocol. Adoption is the only thing that matters.

    1B
    Total Supply
    Community + ecosystemAirdrops, grants, liquidity, ecosystem incentives
    35%
    Provider rewardsEmissions for GPU operators and validators
    25%
    Core team4-year vest, 1-year cliff
    18%
    Foundation treasuryOperations, audits, future protocol work
    12%
    Strategic investors3-year vest, 6-month cliff
    10%
    /08 Roadmap

    What we shipped, what's next.

    Q2 2026 · Phase 0 ● Shipped

    Devnet on Base Sepolia

    • Smart contracts deployed and audited
    • Provider client and Router reference impl
    • Closed alpha · 50 devs, 20 providers
    • OpenAI API working end-to-end with TEE
    Q4 2026 · Phase 1 ○ Next

    Mainnet & $INFR launch

    • Audited contracts on Base mainnet
    • Public Provider onboarding (permissionless)
    • $INFR TGE concurrent with mainnet
    • Tier 01 (TEE) and Tier 02 (Optimistic) live
    2027 · Phase 2

    zkML and scale

    • Tier 03 zkML for <13B real-time, <70B batch
    • Multi-chain: Arbitrum, Optimism, Polygon
    • Native fiat on-ramps for non-crypto devs
    • Enterprise SLAs and dedicated capacity
    2028+ · Phase 3

    Full verifiability

    • zkML real-time for 70B+ models
    • Cross-chain settlement via canonical bridges
    • Decentralized Routers via consensus
    • Encrypted-weight private model hosting

    Build on the
    honest network.

    First $50 of inference is on us. Provider onboarding takes ten minutes. The whitepaper is thirty pages and worth it.

    Get an API key Read the whitepaper
    /09 Whitepaper

    The whitepaper. Thirty pages, no theatrics.

    Architecture, verification, smart contracts, tokenomics, governance, risks. Written by the people building it — no buzzword padding, just the actual mechanics of how the network runs.

    The full INFERA Protocol whitepaper covers the verification triangle (TEE, optimistic, zkML), the on-chain contract surface, scheduling and latency targets, the $INFR token mechanics, governance design, and an unflinching risks section.

    Read it in your browser, or download the original document.

    v1.0 April 2026 30 pages 14 sections · 2 appendices