Seed Contracts: Compliant Crawl Policies by Country
Web scraping without policy is a liability. Government news domains have robots.txt rules, attribution requirements, crawl-delay directives, and country-specific licensing terms.
Newsfork's seed contract system defines crawl policies per country and category before any request is made:
Scope — Which domains, URL patterns, and content types are in scope.
Cadence — How often each domain is crawled, respecting crawl-delay.
Extraction — What fields to extract (title, body, metadata, attachments).
Compliance — Attribution rules, retention periods, and GDPR-aware handling.
Seed contracts are version-controlled in a Git-backed ledger. Every crawl decision is auditable — what was crawled, when, and under which contract version.
Enterprise customers can define custom seed contracts for proprietary domain lists or restricted collections. Contact enterprise@newsfork.com for details.
The `/v1/seeds` endpoint returns active contracts for your tier. Pro and Enterprise plans include full contract metadata; Free tier includes country-level summaries.