Newsfork
Back to blog
Engineering

Seed Contracts: Compliant Crawl Policies by Country

Web scraping without policy is a liability. Government news domains have robots.txt rules, attribution requirements, crawl-delay directives, and country-specific licensing terms.

Newsfork's seed contract system defines crawl policies per country and category before any request is made:

Scope — Which domains, URL patterns, and content types are in scope.

Cadence — How often each domain is crawled, respecting crawl-delay.

Extraction — What fields to extract (title, body, metadata, attachments).

Compliance — Attribution rules, retention periods, and GDPR-aware handling.

Seed contracts are version-controlled in a Git-backed ledger. Every crawl decision is auditable — what was crawled, when, and under which contract version.

Enterprise customers can define custom seed contracts for proprietary domain lists or restricted collections. Contact enterprise@newsfork.com for details.

The `/v1/seeds` endpoint returns active contracts for your tier. Pro and Enterprise plans include full contract metadata; Free tier includes country-level summaries.