Two modes in one skill. Analyze existing XML sitemaps for validation issues, broken URLs, and quality signals. Or generate new sitemaps with industry templates, quality gates, and penalty risk detection.
The sitemap skill operates in two modes. Analysis mode validates an existing sitemap against the protocol specification and SEO best practices. Generation mode creates a new sitemap from scratch with industry-specific templates and quality gates to prevent thin content penalties.
In analysis mode, it checks every URL for HTTP status, validates XML format, flags deprecated tags, and compares crawled pages against the sitemap to find missing pages. In generation mode, it asks for your business type, plans the structure interactively, and applies safeguards at 30+ and 50+ location pages.
What it validates
The analysis mode checks your sitemap against the protocol specification and Google's documented best practices. Every URL is verified for accessibility, and the sitemap is compared against your actual crawlable pages to find gaps.
50K
URL LIMIT VALIDATION
Verifies the sitemap stays under the 50,000 URL protocol limit per file. Recommends sitemap index files for larger sites, split by content type: pages, posts, images, and videos.
200
HTTP STATUS CHECK
Verifies every URL in the sitemap returns HTTP 200. Flags non-200 URLs (broken links), redirected URLs that should be updated to final destinations, and noindexed URLs that should not be in the sitemap.
XML
FORMAT VALIDATION
Validates XML structure, encoding, namespace declarations, and proper element nesting. Reports specific parsing errors with line numbers. Checks for HTTPS-only URLs and no non-canonical entries.
6
COMMON ISSUES
Detects 6 common sitemap issues by severity: exceeding 50K URLs (Critical), non-200 URLs (High), noindexed URLs included (High), redirected URLs (Medium), identical lastmod dates (Low), and deprecated priority/changefreq tags (Info).
5
QUALITY GATES
In generation mode, applies quality gates for programmatic pages. Safe at scale: integration pages, template pages, glossary (200+ word definitions), product pages, user profiles. Warning at 30+ location pages, hard stop at 50+.
GAP
COVERAGE ANALYSIS
Compares crawled pages against sitemap entries to find pages missing from the sitemap. Also checks that the sitemap is referenced in robots.txt for search engine discovery.
// OUTPUT
WHAT THE REPORT INCLUDES
The output depends on the mode. Analysis produces a validation report with prioritized issues. Generation produces a ready-to-deploy sitemap.xml (or split files with index) and a STRUCTURE.md documentation file.
Analysis mode output
VALIDATION-REPORT.md - complete analysis results
Issues list - organized by severity (Critical, High, Medium, Low, Info)
Coverage gap - pages found by crawling but missing from sitemap
Recommendations - specific fixes for each issue
Generation mode output
sitemap.xml - valid XML sitemap (or split files with sitemap index)
STRUCTURE.md - site architecture documentation
URL count - total URLs and organization summary
Correct sitemap format
Google only uses loc and lastmod tags. The deprecated priority and changefreq tags are ignored:
Fetches the sitemap from common locations (/sitemap.xml, /sitemap_index.xml, robots.txt reference), validates it, and checks every URL. Produces a validation report with prioritized issues.
Generate a new sitemap
/seo sitemap generate
Interactive mode. Detects or asks for your business type, loads an industry template, plans the structure with you, applies quality gates, and generates valid XML ready for deployment.
Penalty risk detection
The generation mode actively prevents thin content patterns that risk Google penalties:
Location pages with only city name swapped
"Best [tool] for [industry]" pages without industry-specific value
"[Competitor] alternative" pages without real comparison data
AI-generated pages without human review and unique value
// FAQ
QUESTIONS ABOUT SITEMAP ANALYSIS
The validator checks valid XML format, URL count under the 50,000 protocol limit, all URLs returning HTTP 200, accurate lastmod dates (not all identical), no deprecated tags (priority and changefreq are ignored by Google), sitemap referenced in robots.txt, no non-canonical URLs, no noindexed URLs, no redirected URLs, and HTTPS-only URLs.
Yes. In generation mode, Claude SEO detects or asks for your business type, loads an industry template, plans the site structure interactively, applies quality gates (warning at 30+ location pages, hard stop at 50+), generates valid XML, splits at 50,000 URLs with a sitemap index, and produces a STRUCTURE.md documentation file.
Safe programmatic pages at scale include integration pages with real setup docs, template/tool pages with downloadable content, glossary pages with 200+ word definitions, product pages with unique specs and reviews, and user profile pages with user-generated content. Penalty risks include location pages with only city name swapped, best-tool-for-industry pages without specific value, and AI-generated pages without human review.
No. Google has confirmed it ignores both priority and changefreq tags. Claude SEO flags their presence as informational and recommends removing them. The only tags you need are loc (URL) and lastmod (last modification date). Using accurate lastmod dates is the most effective signal for crawl scheduling.
The sitemap protocol limits each file to 50,000 URLs. Claude SEO flags this as a critical issue in analysis mode. In generation mode, it automatically splits URLs across multiple sitemap files and creates a sitemap index file that references each sub-sitemap. It also recommends splitting by content type (pages, posts, images, videos).
// RELATED SKILLS
EXPLORE MORE
TECHNICAL SEO
Deep technical analysis across 9 categories including crawlability, indexability, security, and Core Web Vitals.