Anvil o1.
Predicting federal contract winners.

See Demo

Our goal is to know what federal contractors will do before they act—and turn that intelligence into advantage for American industry.

0% top-10 accuracy
0 contracts analyzed
0x vs random baseline

$700 Billion

The Problem

The Stakes iFederal proposals can cost $10K-$1M+ to prepare, making bid/no-bid decisions critical for ROI.

The U.S. federal government is the largest buyer on Earth. Each year, it awards over $700 billion in contracts — more than the GDP of Sweden. For the thousands of companies competing for this business, the central question is deceptively simple: "If I bid on this contract, will I win?"

The Cost iCapture management, compliance documentation, and pricing strategies require significant upfront investment.

The stakes are high. Preparing a federal proposal costs real money — sometimes tens of thousands of dollars for a routine bid, sometimes millions for a major defense program. Companies pour resources into capture management, compliance documentation, and pricing strategies, often with no idea whether they're the frontrunner or a long shot.

The Solution iMachine learning trained on historical outcomes replaces intuition with data-driven predictions.

Historically, answering the "will I win?" question required intuition, relationships, and educated guesses. Incumbent contractors had an advantage because they knew their own win rates. Everyone else was flying blind.

Anvil o1 changes this. It replaces guesswork with prediction — using machine learning trained on millions of historical contract awards to forecast who will win future solicitations.

Two Databases

The Core Insight

Labeled Data iLinking solicitations to awards creates supervised learning data: input is "what the government asked for" and label is "who won."

Every federal contract has two distinct moments in time, recorded in two separate government databases:

  1. The solicitation — when the government posts a request for bids on SAM.gov, describing what it wants to buy
  2. The award — when the government picks a winner, recorded weeks or months later in FPDS (Federal Procurement Data System)

Here's the problem: these two systems don't talk to each other. There's no official field that links an award back to its original solicitation. The government publishes "we want to buy 500 connectors" in one place and "we bought 500 connectors from Acme Corp for $50,000" in another place, but never explicitly says "this award fulfilled that solicitation."

This disconnect creates an opportunity. If you can connect them — match each award to its original solicitation — you create something powerful: a labeled dataset where the input is "what the government asked for" and the label is "who won."

That's supervised learning. And that's exactly what Anvil o1 does.

16M+ Contracts

Why Start with the DLA?

We didn't pick the Defense Logistics Agency at random. DLA is the ideal starting point for building a contract prediction model:

Volume iMore training data leads to better pattern recognition. 16M+ contracts provide statistical significance across thousands of product categories.

DLA is the Pentagon's supply chain manager. It purchases everything the military needs: fuel, food, clothing, medical supplies, and — most importantly — spare parts. Millions of spare parts. DLA's 16+ million contracts over the past decade give us a massive training corpus.

Standardization iConsistent data formats and procurement processes reduce noise in the training data and improve model accuracy.

DLA contracts follow predictable patterns. Most are for commodity items with National Stock Numbers (NSNs). The model can learn "when DLA needs Type X connectors, Vendor Y usually wins."

Repeatability iRepeated purchases of identical items create multiple training examples for the same scenario, reinforcing learned patterns.

The same items get purchased repeatedly. DLA might buy the same O-ring NSN dozens of times per year. This repetition is gold for machine learning.

Clear Outcomes iBinary win/lose outcomes are easier to model than complex multi-winner scenarios or subjective evaluations.

DLA contracts are typically fixed-price awards with a clear winner. The strategic logic: prove the approach works on DLA's high-volume procurements, then expand to other agencies.

The Data Pipeline

Step 1
Collect the awards

We ingest the complete DLA contract archive from FPDS — the government's official record of contract spending. This includes:

  • 16.1 million contract actions spanning multiple fiscal years
  • Vendor names and DUNS/UEI identifiers
  • Dollar amounts (base value, options exercised, modifications)
  • Product Service Codes (PSC) — a taxonomy of what's being purchased
  • NAICS codes — industry classifications
  • Contracting office identifiers
  • Award dates and performance periods
  • Set-aside designations (small business, veteran-owned, 8(a), etc.)
vendor_name amount naics award_date psc Acme Defense LLC $48,500 334419 2024-03-15 5935 TechParts Inc $127,000 332999 2024-03-12 3040 MilSpec Supply $8,200 334419 2024-03-10 5935 16.1M more rows...

This data is public. Anyone can download it from USASpending.gov or query the FPDS API. The raw information isn't the competitive advantage — the linkage is.

Step 2
Collect the solicitations

Separately, we pull solicitations from SAM.gov (formerly FedBizOpps). Each solicitation contains:

  • The description of what's being purchased
  • Quantity and unit of issue
  • Delivery location and timeline
  • Set-aside restrictions
  • Approved source lists (when applicable)
  • Attachments (specifications, drawings, SOWs)
  • Response deadline
SAM.GOV SOLICITATION Combined NSN 5935-01-039-8902 QUANTITY 50 EA EST. VALUE $50,000 DESCRIPTION CONNECTOR, RECEPTACLE — Electrical connector for avionics systems. MIL-SPEC qualified. RESPONSE DUE: 2024-01-15 OFFICE: DLA AVIATION

Solicitation text varies in quality. Some are detailed, others sparse. The model has to work with what it gets.

Step 3
Link them together

This is the hard part — and where Anvil's proprietary value lies.

Awards and solicitations don't share a common identifier. You can't just join on a key. Instead, you have to infer the connection through:

  • NSN matching: If the solicitation mentions NSN 5935010398902 and an award two months later references the same NSN, they're probably linked
  • Timing: Awards typically follow solicitations by 30-90 days
  • Dollar amount correlation: A solicitation for "estimated value $50K" that matches an award for $48,500
  • Contracting office: The same office that posted the solicitation issues the award
  • Textual similarity: The product descriptions should align
SOLICITATION NSN 5935... Est. $50K ? MATCH 98.7% AWARD Acme Defense $48,500

This is a probabilistic matching problem. Not every link is certain. We apply confidence thresholds and validate a sample manually to ensure quality.

The result: ~98,000 high-confidence linked pairs where we know both what the government asked for and who won.

Why only 98K from 16M contracts? Not all awards have matching solicitations (some are modifications, options, or sole-source awards). Some solicitations are too vague to match confidently. We prioritize precision over recall — better to have 98K clean examples than 500K noisy ones.

What the Training Data Looks Like

Data Format

Each training example is a JSON record:

{ "text": "25--BOX,AMMUNITION STOW Proposed procurement for NSN 2541015263462 BOX,AMMUNITION STOW: Line 0001 Qty 154 UI EA Deliver To: By: 0069 DAYS ADO...", "psc": "25", "naics": "336390", "set_aside": "", "vendor": "OSHKOSH DEFENSE", "amount": 250000.0 }
FieldDescription
textThe raw solicitation text — product description, quantities, delivery terms
pscProduct Service Code — "25" means Vehicular Equipment Components
naicsIndustry classification — "336390" is Other Motor Vehicle Parts Manufacturing
set_asideSocioeconomic restriction, if any (small business, SDVOSB, 8(a), HUBZone)
vendorThe company that won — this is our prediction target
amountDollar value of the award

The vendor field is what we're trying to predict. Given a new solicitation, which vendor will win?

The Math
Understanding the Numbers

To understand what 56.2% top-10 accuracy means in practice, we need to establish a baseline.

The vendor universe

DLA has awarded contracts to approximately 227,000 unique vendors over the training period. These range from Lockheed Martin to one-person machine shops. Most vendors win only a handful of contracts; a small number win thousands.

Random baseline

If you picked 10 vendors at random from this pool, what's the probability that the actual winner is among them?

Random probability
P(winner in top 10 | random) = 10 / 227,000 ≈ 0.0044%

That's less than half a percent. Essentially zero.

What Anvil achieves

Anvil's top-10 accuracy is 56.2%. The winner is in the model's top 10 predictions more than half the time.

Lift calculation
Lift = 56.2% / 0.0044% = ~1,277x better than random
18.3%
Top-1 Accuracy
41.7%
Top-5 Accuracy
56.2%
Top-10 Accuracy
84.1%
Top-50 Accuracy
Measuring progress

The right question isn't "is 56% good enough?" — it's "can we improve it?" We track performance against the random baseline and prior model versions to ensure each iteration moves the needle.

Where improvement is possible

We've identified two primary areas for future gains:

1. Better tail handling: Many contracts are won by vendors with limited history. Enhanced feature engineering for sparse data and transfer learning from related vendors can help.

2. Richer input signals: Adding past performance ratings, subcontracting patterns, and SAM.gov capability statements could give the model more to work with.

The current model is a baseline. Each of these improvements is a concrete step on the roadmap.

Feature Engineering

1
Product Category

The PSC (Product Service Code) is one of the strongest predictors. Federal procurement is highly segmented — vendors specialize. A company that makes O-rings doesn't bid on turbine blades.

The model learns: "When PSC = 59 (Electrical Components) and NAICS = 334417 (Electronic Connector Mfg), these 50 vendors account for 80% of wins."
2
Set-Aside Constraints

Set-aside status dramatically narrows the candidate pool. If a solicitation is marked "Total Small Business Set-Aside (FAR 19.5)," large contractors like Lockheed or Boeing are ineligible.

The model uses set-aside as a hard filter — certain vendors become impossible predictions.

3
Vendor History

For each vendor in the training data, we compute:

  • Total wins in this PSC category
  • Win rate (wins / total opportunities)
  • Recency — days since last win
  • Average deal size
  • Geographic concentration

Historical winners tend to be future winners. Incumbency is real.

4
Text Features

The solicitation text contains signal:

  • NSN patterns: Approved sources for certain NSNs
  • Delivery location: Office-specific vendor preferences
  • Timeline: Rush vs standard delivery
  • Quantity: Manufacturers vs distributors
5
Agency Behavior

Contracting offices develop relationships with vendors. An office that has awarded 50 contracts to Vendor X in the past two years is more likely to award the 51st.

The model learns office-vendor affinities.

Model Architecture

1
Gradient Boosting

Anvil o1 uses a gradient boosting ensemble — specifically, LightGBM optimized for ranking (LambdaRank objectives).

Why not deep learning? For tabular data with mixed features, gradient boosting still beats neural networks. It offers:

  • Interpretability: Inspect feature importances
  • Speed: Fast training and inference
  • Robustness: Less prone to overfitting
2
Learning to Rank

We frame this as learning to rank, not classification. For each solicitation, generate a ranked list of vendors. Evaluate success by whether the winner appears in the top K.

The model learns a scoring function: given (solicitation features, vendor features), output a relevance score.

3
Candidate Generation

We don't score all 227K vendors. First, filter to a candidate pool:

  • PSC category (vendors who've won here)
  • Set-aside eligibility
  • Active status (won in past 3 years)

This reduces candidates to 500-5,000 vendors, which the model ranks.

What We Discovered

1
Incumbency

The single strongest predictor is "has this vendor won this exact NSN before?" If yes, they're the favorite. Federal procurement is sticky — agencies prefer known quantities.

2
Set-Asides

About 23% of DLA contracts have small business set-asides. Performance by set-aside status:

  • Full and open competition: Top-10 accuracy 51%
  • Small business set-aside: Top-10 accuracy 63%

Set-asides make prediction easier by constraining the candidate pool.

3
PSC Variance

Certain categories are more predictable than others:

PSCCategoryTop-10
59Electrical Components68%
53Hardware & Abrasives61%
16Aircraft Components54%
84Clothing & Textiles49%
4
Price Ceiling

The model can't observe bid prices. DLA uses LPTA evaluation — price often decides. This creates a theoretical ceiling on accuracy around 70-80%.

5
New Entrants

When a vendor wins their first contract in a PSC category, the model rarely predicts them. About 8% of awards go to first-time winners. This bounds our floor.

Model Performance

Five ways to understand how Anvil o1 predicts federal contract winners.

Calibration Curve

Does the model know what it knows?

A well-calibrated model's confidence scores reflect true probabilities. When Anvil o1 predicts a vendor has a 30% chance of winning, they should win approximately 30% of the time.

The diagonal dashed line represents perfect calibration. Points close to this line indicate the model's confidence scores are reliable and actionable.

Key insight: Anvil o1 is slightly overconfident at high probabilities but well-calibrated overall, with a Brier score of 0.089.
Predicted Probability
Actual Win Rate

Cumulative Gains

How much signal is in the top predictions?

This chart answers: "If I only review the model's top X% of predictions, what percentage of actual winners will I capture?"

The steeper the curve rises above the diagonal baseline, the better the model concentrates winners at the top of its rankings.

Key insight: Reviewing just the top 20% of Anvil's predictions captures 58% of all contract winners.
% of Predictions Reviewed
% of Winners Captured

Live Examples

See predictions vs. actual outcomes

These are real solicitations from our test set. For each, we show Anvil's top-5 predicted vendors with confidence scores, and mark the actual winner.

The green checkmark indicates who actually won the contract. Notice how often the winner appears in the top 5 predictions.

Key insight: In 56.2% of cases, the actual winner appears in Anvil's top-10 predictions.
SPRWA1-24-Q-0127 Electrical Connector, Receptacle
1 DCX-CHOL ENTERPRISES 29.9%
2 EMPIRE AVIONICS CORP. 6.8%
3 PAR DEFENSE INDUSTRIES 6.3%
4 R & M GOVERNMENT SERVICES 5.2%
5 JANELS INDUSTRIES INC 4.5%
Outcome: Winner ranked #1 with 29.9% confidence

Lift Chart

How much better than random?

Lift measures how many times more likely you are to find a winner using model predictions versus random selection, at each decile of the ranked list.

A lift of 5x in the first decile means the top 10% of predictions contain 5 times more winners than you'd expect by chance.

Key insight: The first decile shows 12.8x lift—vendors in the top 10% of predictions win 12.8x more often than random.
Decile (sorted by predicted probability)
Lift vs. Random

Confidence Buckets

Higher confidence = higher win rate

We group predictions by confidence level and measure the actual win rate in each bucket. This shows that confidence scores are meaningful—high confidence predictions really do win more often.

This is actionable intelligence: focus resources on opportunities where Anvil shows high confidence.

Key insight: Predictions with >40% confidence have a 61% actual win rate, vs. 8% for predictions under 10%.
8%
<10%
19%
10-20%
34%
20-30%
47%
30-40%
61%
>40%
Predicted Confidence → Actual Win Rate

Use Cases

Who Benefits
For Contractors: Bid/No-Bid Intelligence
A contractor sees a solicitation and asks: "What are my odds?" If Anvil ranks them #2 out of 500 candidates, that's a strong signal to bid. If they're ranked #200, maybe save the proposal dollars for a better opportunity.
For Contractors: Competitive Positioning
Beyond "should I bid?", the model reveals "who am I competing against?" Seeing that Vendor X is the frontrunner tells you something. Maybe sharpen your price, or consider teaming instead of competing.
For Investors: Pipeline Modeling
Defense contractors are publicly traded. Their stock prices depend on expected future revenues. An investor analyzing Raytheon might ask: "How likely are they to win the contracts currently in their pipeline?"
For M&A: Due Diligence
Acquiring a federal contractor? You want to know: "How defensible is their revenue? Are they winning because they're good, or because of relationships that might not transfer?"
Implications
Information asymmetry gets flattened

Incumbent contractors have always had an advantage — they know their win rates, their competitors, their agency relationships. New entrants are at an information disadvantage. Anvil changes this. A startup entering federal contracting can now see the competitive landscape with the same fidelity as a 20-year incumbent. This is democratizing.

Proposal economics shift

If contractors can better predict their odds, they'll bid more selectively. This means fewer low-quality bids (good for agencies), more competitive bids on winnable opportunities (bad for incumbents), and better resource allocation industry-wide.

Transparency creates pressure

If vendors can see that a particular contracting office awards 80% of its business to one vendor, that's interesting. Maybe justified, maybe worth scrutinizing. The model makes patterns visible that were previously buried in millions of transaction records.

The Road Ahead

Expanding Beyond DLA

The proof-of-concept worked on DLA. The next step is generalizing to other agencies:

  • GSA (General Services Administration): Government's general-purpose buyer. More diverse categories, different dynamics.
  • VA (Veterans Affairs): Large healthcare procurement. Medical supplies, equipment, services.
  • DoD components beyond DLA: Army, Navy, Air Force direct contracts. Larger dollar values, more complex evaluations.

Each agency has its own procurement culture. Models will likely need per-agency training, at least initially.

More Signal

Current features are primarily structured data. We're leaving signal on the table:

  • Full solicitation text: Using transformer models (BERT, etc.) to extract deeper textual understanding
  • Attachments: Many solicitations include PDFs with detailed specs. We don't currently parse these.
  • Vendor financials: SAM.gov registration data, revenue estimates, employee counts
  • Protest history: Has a vendor protested awards in this category? Are they litigious?
Real-Time & Price
Real-time prediction

Currently, the model is trained offline on historical data. The vision is real-time scoring: the moment a solicitation posts on SAM.gov, Anvil ranks vendors and pushes alerts. This requires live SAM.gov monitoring, low-latency inference, and push notification infrastructure. It's engineering, not science — the hard ML work is done.

Price modeling (the holy grail)

The biggest limitation is not knowing bid prices. If we could model price distributions — "Vendor X typically bids 15% above cost in this category" — we could predict winners even more accurately.

Price data isn't public, but some vendors might share their bid histories in exchange for insights. This creates a data network effect: the more vendors participate, the better the model gets for everyone.

Limitations

Honest Caveats
DLA-specific (for now)

The model is trained on DLA data. It reflects DLA's procurement patterns, DLA's vendor base, DLA's contracting offices. Predictions for non-DLA opportunities should be treated skeptically until we train agency-specific models.

Commodity-focused

DLA buys commodities — parts, supplies, consumables. The model won't work for major weapons systems (different evaluation, different dynamics), professional services (subjective evaluation criteria), or R&D contracts (unpredictable by nature).

We're good at predicting who wins the O-ring contract. We're not trying to predict who wins the next fighter jet program.

Backward-looking

ML models learn from history. If procurement patterns shift — new vendors enter, incumbents exit, policy changes — the model's accuracy degrades until retrained. We retrain quarterly to stay current.

Not a guarantee

56% top-10 accuracy means 44% of the time, the winner isn't in the top 10. The model provides probabilistic guidance, not certainty. Treat predictions as one input among many in bid/no-bid decisions, not as gospel.

Doesn't know pricing

We predict who's likely to win based on structural factors. We can't predict who will submit the lowest price. For LPTA competitions, price often decides — and that's outside our visibility.

Summary

The Approach

Anvil o1 solves a data integration problem that unlocks predictive capability. By linking 16 million DLA contract awards to their original solicitations, we create supervised training data that teaches a model to answer: "Given what the government is asking for, who will win?"

The approach is straightforward:

  • Collect awards from FPDS
  • Collect solicitations from SAM.gov
  • Link them using probabilistic matching
  • Extract features from the linked pairs
  • Train a gradient boosting model to rank vendors
  • Evaluate on held-out data

Current performance: 56% top-10 accuracy. This is a starting point, not a ceiling. We're tracking concrete improvements — better tail handling, richer input signals, and expanded agency coverage.

We started with DLA because it offered volume, standardization, and repeatability. As we validate each improvement, we expand to additional agencies and use cases.

The path forward is clear:

  • Incorporate past performance ratings
  • Add subcontracting network features
  • Expand to GSA, Army, and other high-volume buyers
  • Reduce prediction latency for real-time use

Federal procurement data is richer than commonly assumed. We're building the tools to extract signal from it — and we're just getting started.