Compress • Extract • Manage context — all free

Less token waste. Better LLM results.
Free, instant, no API calls.

mdmin gives you two tools for cheaper, sharper LLM context: compress strips verbose phrases and formatting waste (13–35% savings), and extract returns only the chunks of a large document relevant to your query (70–95% reduction). Both free, both instant.

13–35%compression savings
70–95%extract reduction
losslessno facts dropped

Live Demo

Paste your markdown and see compression happen in real-time. Runs entirely in your browser.

Deep Compress — LLM rewriting on top of rule compression. Best on verbose docs. Upgrade to Pro →
Input
Output
Compressed output will appear here
v0.3.0 — new

Context Extractor

Have a 10,000 token document but only need 800 tokens of it? Give mdmin a query and it returns only the relevant chunks — in milliseconds, no LLM required, no vector database, no setup. TF-IDF based. Free for all users.

Paste any document

CLAUDE.md, API reference, architecture doc, meeting notes — any large markdown document.

Ask a question

"How does auth work?" — the extractor scores every chunk against your query using TF-IDF cosine similarity.

Get only what matters

Top-scoring chunks returned in document order within your token budget. 70–95% reduction on targeted queries.

CLAUDE.md / .cursorrulesInject only the rules relevant to the current task. 90% reduction typical.
API documentationQuery a 500-page API reference for one endpoint. No scrolling, no noise.
Lightweight RAGNo vector DB, no infrastructure, no embeddings API. Paste + query.
Try Context Extractor

Free for all users · 100KB doc limit · Sign in to unlock API access

What it compresses

Six rule categories, all running deterministically with zero API cost.

Verbose Pattern Removal

150+ patterns: "In order to" → "To", "Due to the fact that" → "Because". Systematic telegraphic rewriting.

Table Compression

Markdown tables → compact CSV or key:value format. 40-60% token reduction on tabular data.

Whitespace Cleanup

Blank lines, trailing spaces, decorative horizontal rules, HTML comments — all stripped.

Code Block Safe

Code blocks and inline code are protected before compression and restored exactly after.

Dictionary Deduplication

Repeated phrases replaced with §1, §2 tokens. Prepends a compact dictionary.

Link & Image Cleanup

Hover titles removed, redundant reference definitions stripped, verbose alt text shortened.

What compresses well?

Results depend entirely on how much redundancy is in your document. Here's what to expect for common content types.

Rule = free tier. Deep = Pro (deep compression on top of rules).

LLM-generated textHigh gains
Rule
8–9%
Deep*
up to 58%

Filler-heavy prose compresses 50%+. Fact-dense LLM output may get less.

Before

"It is important to note that all configuration options are described in this section. Needless to say, each setting should be reviewed carefully before deployment. As mentioned earlier, these options control the behaviour of the service."

After

"Config options below control service behaviour. Review before deployment."

README & long introsMedium gains
Rule
7–12%
Deep*
12–24%

Prose intros compress well. Code blocks and config tables are preserved verbatim.

Before

"This repository provides a comprehensive and battle-tested implementation of background job processing. As you may already know, reliable job processing is a critical concern in modern web applications."

After

"Battle-tested background job processing library. Reliable job processing critical for web apps."

API & Technical DocsMedium gains
Rule
7–9%
Deep*
8–18%

Prose explanations compress. Config-heavy or YAML-heavy docs stay near rule-layer gains.

Before

"The service is responsible for routing all incoming requests. In order to ensure high availability, we have implemented a load balancing strategy. Due to the fact that traffic is unpredictable..."

After

"Service routes all requests. Load balancing for high availability. Traffic unpredictable..."

Meeting notes & reportsLow gains
Rule
7–9%
Deep*
7–9%

Dense with hard facts (dates, $, %, names). LLM correctly falls back to preserve all data.

Before

"Date: March 2, 2026. Attendees: Sarah Chen, James Okafor. Budget approved: $142,800. Decision: launch API v2 by April 30. Action: James to submit reserved instance request by March 9."

After

≈ rule layer only — meeting docs are fact-dense (dates, amounts, names, decisions)

Concise notes & scriptsLow gains
Rule
2–8%
Deep*
2–8%

Already telegraphic. Nothing to strip.

Before

"Video 1 — highest shareability 7 signs you're in perimenopause. Brain fog, rage, itchy skin, heart palpitations."

After

≈ unchanged — content is already compact

Code-heavy filesLow gains
Rule
5–12%
Deep*
5–12%

Code blocks preserved verbatim. Gains come only from surrounding prose.

Before

function authenticate(token) { return jwt.verify(token, secret) } // Code blocks are preserved verbatim.

After

Code blocks untouched. Only prose comments & docs around them are compressed.

Install & integrate

CLI, npm package, or MCP server for AI assistants.

npm package
npm install -g mdmin

# Compress a file
mdmin compress README.md

# Save to file
mdmin compress README.md -o README.min.md

# Batch compress directory
mdmin compress ./docs/ --level aggressive

# Compare levels
mdmin stats README.md
programmatic
const { compress, estimateTokens } = require('mdmin')

const { output, stats } = compress(markdownText, {
  level: 'medium', // light | medium | aggressive
})

console.log(stats)
// { inputTokens: 2273, outputTokens: 1765, saved: 508, pct: 22.3 }
context budget
const { ContextBudget } = require('mdmin')

const budget = new ContextBudget({
  limit: 128_000,   // your model's context window
  reserve: 8_000,   // headroom for LLM output
  keepLastN: 10,    // recent turns always verbatim
})

budget.setSystem('You are a helpful assistant.')
budget.pin('user_id=u_123')          // never dropped
budget.addContext(ragDocument)       // compressed on ingestion

// in your message loop:
await budget.addMessage({ role: 'user', content: userInput })
const { messages } = budget.get()   // ready for any LLM API

// passes directly to OpenAI / Anthropic / any provider:
// await openai.chat.completions.create({ model, messages })
MCP server
// Claude Desktop / Cursor config
{
  "mcpServers": {
    "mdmin": {
      "command": "npx",
      "args": ["mdmin-mcp"]
    }
  }
}

// Claude Code CLI
claude mcp add mdmin npx mdmin-mcp
v0.2.0 — context window manager

Context Window Management

ContextBudget automatically manages a sliding message window for chat apps, agents, and RAG pipelines. It compresses older turns before dropping them, pins critical facts that must never disappear, and protects your most recent messages — so you never write brittle overflow logic again.

Pins — never dropped

Critical facts (user ID, task state, session data) pinned with .pin() survive every trim, no matter how full the window gets.

Recent turns — always verbatim

The last N messages are never compressed or dropped. Your most recent context is always intact, exactly as sent.

Old turns — compressed first

Messages outside the protected window are rule-compressed (free, instant), then dropped oldest-first only if still over budget.

Support botsLarge KB docs + long conversation history without overflow.
Coding agents100+ tool call cycles — trim old steps, keep task goal pinned.
RAG pipelinesContext docs compressed 13–20% on ingestion, never dropped.
ContextBudget — drop-in for any LLM provider
const { ContextBudget } = require('mdmin')

const budget = new ContextBudget({ limit: 128_000, reserve: 8_000, keepLastN: 10 })
budget.setSystem('You are a helpful assistant.')
budget.pin('user_id=u_123')       // survives every trim
budget.addContext(ragDocument)    // compressed on ingestion, never dropped

await budget.addMessage({ role: 'user', content: userInput })
const { messages, stats } = budget.get()

// → pass to any provider, no manual slicing needed
await openai.chat.completions.create({ model: 'gpt-4o', messages })
// stats: { used: 14820, remaining: 105180, dropped: 0, compressed: 3, ... }
Chrome Extension — new

Compress on any webpage

Select text, click the mdmin icon — compressed text is ready to copy. Works offline with no account. Connect your account to track token savings across every page you visit.

1

Select text

Highlight any text on any webpage — docs, GitHub, Notion, ChatGPT, anywhere.

2

Click the icon

Pick your compression level (light / medium / aggressive) and hit Compress.

3

Copy and paste

Compressed text is ready in the popup. One click to copy, no tab switching.

No account neededTier 0

Compression runs entirely in the extension — zero network calls. Works on every tab immediately.

Free accountTier 1

Paste your mdmin_sk_ key in extension settings. Every compression is logged — track tokens saved and estimated API cost on your dashboard.

Pro accountTier 2

Deep Compress button active in the popup — LLM rewrite on top of rules, 50%+ savings on verbose text.

Pricing

Compression and extract are free, forever. Pro unlocks scale, deep compression, and dashboard.

Free
$0forever

Rule-based compression everywhere. Web, CLI, MCP, and public API. Zero marginal cost.

  • Rule-based compression (13–35% savings)
  • 150+ verbose patterns, table compression, dedup
  • Extract — query any doc, 100KB / 2K token budget
  • CLI, npm package & Python (pip install mdmin)
  • MCP server for AI assistants
  • Public API with mdmin_sk_xxx key
  • VS Code extension
  • Browser extension (Chrome)
  • ContextBudget sliding window manager
  • Compression history (last 10)
  • Prompt Analyzer
Rule-based — always free
curl https://mdmin.dev/api/v1/compress \
  -H "Authorization: Bearer $MDMIN_API_KEY" \
  -d '{"markdown":"...","level":"aggressive"}'
Probest value
$8/ month

Deep compression, larger extract budgets, activity dashboard across all your tools. 500 Deep Compress runs/month.

  • Everything in Free
  • Deep Compress — LLM rewrite (500 runs/mo)
  • Extract — 2MB docs, 16K token budget, 60 req/min
  • Activity dashboard (VS Code, CLI, MCP, API, ext)
  • Compression history — unlimited
  • Higher rate limits across all APIs
  • Priority support

500 runs resets on billing date. No overages.