No Model Training on Your Data at ScaleDown

Your data trains your competitors’ models. At ScaleDown, it doesn’t.

Jun 03, 2026

Your data is never used to train ScaleDown’s models. Not during inference. Not after inference. Not in aggregate, anonymized, de-identified, or any other form. The models you call today were not trained on any customer’s production data, and the models you call tomorrow won’t be either

Why This Matters

Most AI providers have a training clause somewhere in their terms. The language varies: “we may use inputs to improve our services,” “aggregated and de-identified data may be used for model development,” “you can opt out of training by contacting support.” The default is usually that your data trains their models. You have to actively opt out of it.

At ScaleDown, there is no opt-out. The training pipeline and the inference pipeline are architecturally separate systems. Customer request data does not flow into the training pipeline. There is no flag to toggle, no setting to configure, no enterprise tier required to unlock this. It is the default and only behavior.

How Our Models Are Trained

ScaleDown’s task-specific SLMs are trained on public datasets and synthetically generated data. The training corpus includes MS-MARCO, Natural Questions, S2ORC, EDGAR filings, CUAD, and synthetic QA pairs generated over public-domain documents. This is documented in detail in our finetuning methodology post.

No customer data enters this pipeline at any stage. Not as training examples. Not as evaluation data. Not as seed data for synthetic generation. The boundary is absolute.

What This Means in Practice

When you send a financial document through our compression endpoint, the model processes it and returns the compressed output. The document is not added to a training queue. It is not written to a dataset. It is not sampled for future fine-tuning runs. Combined with our zero data retention policy, the document exists only in memory for the duration of the request and is discarded when the response is returned.

This matters most for customers in regulated industries. If you are processing legal contracts, medical records, financial filings, or internal communications through ScaleDown, your data cannot appear in a future model checkpoint. It cannot leak into another customer’s inference results through memorization. It cannot surface in a training data audit because it was never in the training data.

No Exceptions

The guarantee applies uniformly across all endpoints (compression, summarization, extraction, classification), all deployment configurations, and all customer tiers. There is no “research usage” carve-out, no “aggregate insights” exception, no clause that activates under specific conditions.

If ScaleDown ever changes its training data policy, which would require a fundamental architectural change, customers will be notified in advance with explicit consent required. This is not a footnote in a terms-of-service update. It is a commitment.

Key Takeaways

Your data trains your competitors’ models. At ScaleDown, it doesn’t. The inference pipeline has no connection to the training pipeline. Customer data is never used for model improvement in any form.

We offer 50M free tokens for every agent. Try it at scaledown.ai.

Discussion about this post

Ready for more?