Freshness and the daily crawl

Last reviewed 2026-06-11.

The daily crawl

A crawl runs every 24 hours. Each pass enumerates every active job_board, pulls the listing index, fetches detail HTML for new IDs, and enriches via the AI extraction pipeline (skills, salary, role category, experience level).

Rollup refresh schedule

  • Rollups (role_hiring_weekly, company_hiring_weekly, skill_weekly) recompute the same night the crawl completes.
  • Each row carries a refreshed_at timestamp that drives the as-of date on every entity page (Phase 0).

Why we show the as-of date

Strategy Rule §3: last-updated + as-of-date on every page. The <LastUpdated>component reads the rollup's refreshed_at, renders it inside a <time dateTime>tag (machine-readable), and the page's JSON-LD dateModified field matches. Together these give Google a freshness signal it can actually trust.

What happens on a failed crawl

If the daily crawl fails for a specific job_board, its rows keep their last-good refreshed_at and the entity page surfaces an older timestamp. The daily-ops dashboard tracks board-level staleness so failures don't go unnoticed.

See it on a live page

  • /stacks/typescript— the “Updated” line under the demand-share card reads from the rollup's refreshed_at.
  • /sitemap.xml — every URL carries a <lastmod> sourced from the same timestamps.