Less token waste. Better LLM results.
Free, instant, no API calls.
mdmin gives you two tools for cheaper, sharper LLM context: compress strips verbose phrases and formatting waste (13–35% savings), and extract returns only the chunks of a large document relevant to your query (70–95% reduction). Both free, both instant.
Live Demo
Paste your markdown and see compression happen in real-time. Runs entirely in your browser.
Context Extractor
Have a 10,000 token document but only need 800 tokens of it? Give mdmin a query and it returns only the relevant chunks — in milliseconds, no LLM required, no vector database, no setup. TF-IDF based. Free for all users.
Paste any document
CLAUDE.md, API reference, architecture doc, meeting notes — any large markdown document.
Ask a question
"How does auth work?" — the extractor scores every chunk against your query using TF-IDF cosine similarity.
Get only what matters
Top-scoring chunks returned in document order within your token budget. 70–95% reduction on targeted queries.
Free for all users · 100KB doc limit · Sign in to unlock API access
What it compresses
Six rule categories, all running deterministically with zero API cost.
Verbose Pattern Removal
150+ patterns: "In order to" → "To", "Due to the fact that" → "Because". Systematic telegraphic rewriting.
Table Compression
Markdown tables → compact CSV or key:value format. 40-60% token reduction on tabular data.
Whitespace Cleanup
Blank lines, trailing spaces, decorative horizontal rules, HTML comments — all stripped.
Code Block Safe
Code blocks and inline code are protected before compression and restored exactly after.
Dictionary Deduplication
Repeated phrases replaced with §1, §2 tokens. Prepends a compact dictionary.
Link & Image Cleanup
Hover titles removed, redundant reference definitions stripped, verbose alt text shortened.
What compresses well?
Results depend entirely on how much redundancy is in your document. Here's what to expect for common content types.
Rule = free tier. Deep = Pro (deep compression on top of rules).
Filler-heavy prose compresses 50%+. Fact-dense LLM output may get less.
"It is important to note that all configuration options are described in this section. Needless to say, each setting should be reviewed carefully before deployment. As mentioned earlier, these options control the behaviour of the service."
"Config options below control service behaviour. Review before deployment."
Prose intros compress well. Code blocks and config tables are preserved verbatim.
"This repository provides a comprehensive and battle-tested implementation of background job processing. As you may already know, reliable job processing is a critical concern in modern web applications."
"Battle-tested background job processing library. Reliable job processing critical for web apps."
Prose explanations compress. Config-heavy or YAML-heavy docs stay near rule-layer gains.
"The service is responsible for routing all incoming requests. In order to ensure high availability, we have implemented a load balancing strategy. Due to the fact that traffic is unpredictable..."
"Service routes all requests. Load balancing for high availability. Traffic unpredictable..."
Dense with hard facts (dates, $, %, names). LLM correctly falls back to preserve all data.
"Date: March 2, 2026. Attendees: Sarah Chen, James Okafor. Budget approved: $142,800. Decision: launch API v2 by April 30. Action: James to submit reserved instance request by March 9."
≈ rule layer only — meeting docs are fact-dense (dates, amounts, names, decisions)
Already telegraphic. Nothing to strip.
"Video 1 — highest shareability 7 signs you're in perimenopause. Brain fog, rage, itchy skin, heart palpitations."
≈ unchanged — content is already compact
Code blocks preserved verbatim. Gains come only from surrounding prose.
function authenticate(token) { return jwt.verify(token, secret) } // Code blocks are preserved verbatim.
Code blocks untouched. Only prose comments & docs around them are compressed.
Install & integrate
CLI, npm package, or MCP server for AI assistants.
npm install -g mdmin
# Compress a file
mdmin compress README.md
# Save to file
mdmin compress README.md -o README.min.md
# Batch compress directory
mdmin compress ./docs/ --level aggressive
# Compare levels
mdmin stats README.mdconst { compress, estimateTokens } = require('mdmin')
const { output, stats } = compress(markdownText, {
level: 'medium', // light | medium | aggressive
})
console.log(stats)
// { inputTokens: 2273, outputTokens: 1765, saved: 508, pct: 22.3 }const { ContextBudget } = require('mdmin')
const budget = new ContextBudget({
limit: 128_000, // your model's context window
reserve: 8_000, // headroom for LLM output
keepLastN: 10, // recent turns always verbatim
})
budget.setSystem('You are a helpful assistant.')
budget.pin('user_id=u_123') // never dropped
budget.addContext(ragDocument) // compressed on ingestion
// in your message loop:
await budget.addMessage({ role: 'user', content: userInput })
const { messages } = budget.get() // ready for any LLM API
// passes directly to OpenAI / Anthropic / any provider:
// await openai.chat.completions.create({ model, messages })// Claude Desktop / Cursor config
{
"mcpServers": {
"mdmin": {
"command": "npx",
"args": ["mdmin-mcp"]
}
}
}
// Claude Code CLI
claude mcp add mdmin npx mdmin-mcpContext Window Management
ContextBudget automatically manages a sliding message window for chat apps, agents, and RAG pipelines. It compresses older turns before dropping them, pins critical facts that must never disappear, and protects your most recent messages — so you never write brittle overflow logic again.
Pins — never dropped
Critical facts (user ID, task state, session data) pinned with .pin() survive every trim, no matter how full the window gets.
Recent turns — always verbatim
The last N messages are never compressed or dropped. Your most recent context is always intact, exactly as sent.
Old turns — compressed first
Messages outside the protected window are rule-compressed (free, instant), then dropped oldest-first only if still over budget.
const { ContextBudget } = require('mdmin')
const budget = new ContextBudget({ limit: 128_000, reserve: 8_000, keepLastN: 10 })
budget.setSystem('You are a helpful assistant.')
budget.pin('user_id=u_123') // survives every trim
budget.addContext(ragDocument) // compressed on ingestion, never dropped
await budget.addMessage({ role: 'user', content: userInput })
const { messages, stats } = budget.get()
// → pass to any provider, no manual slicing needed
await openai.chat.completions.create({ model: 'gpt-4o', messages })
// stats: { used: 14820, remaining: 105180, dropped: 0, compressed: 3, ... }Compress on any webpage
Select text, click the mdmin icon — compressed text is ready to copy. Works offline with no account. Connect your account to track token savings across every page you visit.
Select text
Highlight any text on any webpage — docs, GitHub, Notion, ChatGPT, anywhere.
Click the icon
Pick your compression level (light / medium / aggressive) and hit Compress.
Copy and paste
Compressed text is ready in the popup. One click to copy, no tab switching.
Compression runs entirely in the extension — zero network calls. Works on every tab immediately.
Paste your mdmin_sk_ key in extension settings. Every compression is logged — track tokens saved and estimated API cost on your dashboard.
Deep Compress button active in the popup — LLM rewrite on top of rules, 50%+ savings on verbose text.
Pricing
Compression and extract are free, forever. Pro unlocks scale, deep compression, and dashboard.
Rule-based compression everywhere. Web, CLI, MCP, and public API. Zero marginal cost.
- Rule-based compression (13–35% savings)
- 150+ verbose patterns, table compression, dedup
- Extract — query any doc, 100KB / 2K token budget
- CLI, npm package & Python (pip install mdmin)
- MCP server for AI assistants
- Public API with mdmin_sk_xxx key
- VS Code extension
- Browser extension (Chrome)
- ContextBudget sliding window manager
- Compression history (last 10)
- Prompt Analyzer
curl https://mdmin.dev/api/v1/compress \
-H "Authorization: Bearer $MDMIN_API_KEY" \
-d '{"markdown":"...","level":"aggressive"}'Deep compression, larger extract budgets, activity dashboard across all your tools. 500 Deep Compress runs/month.
- Everything in Free
- Deep Compress — LLM rewrite (500 runs/mo)
- Extract — 2MB docs, 16K token budget, 60 req/min
- Activity dashboard (VS Code, CLI, MCP, API, ext)
- Compression history — unlimited
- Higher rate limits across all APIs
- Priority support
500 runs resets on billing date. No overages.