Freshness and the daily crawl
Last reviewed 2026-06-11.
The daily crawl
A crawl runs every 24 hours. Each pass enumerates every active job_board, pulls the listing index, fetches detail HTML for new IDs, and enriches via the AI extraction pipeline (skills, salary, role category, experience level).
Rollup refresh schedule
- Rollups (
role_hiring_weekly,company_hiring_weekly,skill_weekly) recompute the same night the crawl completes. - Each row carries a
refreshed_attimestamp that drives the as-of date on every entity page (Phase 0).
Why we show the as-of date
Strategy Rule §3: last-updated + as-of-date on every page. The <LastUpdated>component reads the rollup's refreshed_at, renders it inside a <time dateTime>tag (machine-readable), and the page's JSON-LD dateModified field matches. Together these give Google a freshness signal it can actually trust.
What happens on a failed crawl
If the daily crawl fails for a specific job_board, its rows keep their last-good refreshed_at and the entity page surfaces an older timestamp. The daily-ops dashboard tracks board-level staleness so failures don't go unnoticed.
See it on a live page
- /stacks/typescript— the “Updated” line under the demand-share card reads from the rollup's
refreshed_at. - /sitemap.xml — every URL carries a
<lastmod>sourced from the same timestamps.