HOME SKILLS BLOG GITHUB
// SKILL

XML SITEMAP
ANALYSIS AND GENERATION

Two modes in one skill. Analyze existing XML sitemaps for validation issues, broken URLs, and quality signals. Or generate new sitemaps with industry templates, quality gates, and penalty risk detection.

$
/seo sitemap https://your-site.com

REQUIRES CLAUDE SEO INSTALLED IN CLAUDE CODE

Claude SEO XML sitemap analysis and generation
// HOW IT WORKS

2 MODES. FULL COVERAGE.

The sitemap skill operates in two modes. Analysis mode validates an existing sitemap against the protocol specification and SEO best practices. Generation mode creates a new sitemap from scratch with industry-specific templates and quality gates to prevent thin content penalties.

In analysis mode, it checks every URL for HTTP status, validates XML format, flags deprecated tags, and compares crawled pages against the sitemap to find missing pages. In generation mode, it asks for your business type, plans the structure interactively, and applies safeguards at 30+ and 50+ location pages.

/seo sitemap <url | generate> MODE 1: ANALYZE XML VALIDATION Format + limits URL STATUS HTTP 200 check QUALITY SIGNALS lastmod, noindex COVERAGE GAP Crawled vs listed MODE 2: GENERATE DETECT TYPE Industry template QUALITY GATES 30/50 page limits GENERATE XML Split at 50K STRUCTURE.MD Documentation VALIDATION-REPORT.md Issues + recommendations sitemap.xml + STRUCTURE.md Ready to deploy

What it validates

The analysis mode checks your sitemap against the protocol specification and Google's documented best practices. Every URL is verified for accessibility, and the sitemap is compared against your actual crawlable pages to find gaps.

50K
URL LIMIT VALIDATION
Verifies the sitemap stays under the 50,000 URL protocol limit per file. Recommends sitemap index files for larger sites, split by content type: pages, posts, images, and videos.
200
HTTP STATUS CHECK
Verifies every URL in the sitemap returns HTTP 200. Flags non-200 URLs (broken links), redirected URLs that should be updated to final destinations, and noindexed URLs that should not be in the sitemap.
XML
FORMAT VALIDATION
Validates XML structure, encoding, namespace declarations, and proper element nesting. Reports specific parsing errors with line numbers. Checks for HTTPS-only URLs and no non-canonical entries.
6
COMMON ISSUES
Detects 6 common sitemap issues by severity: exceeding 50K URLs (Critical), non-200 URLs (High), noindexed URLs included (High), redirected URLs (Medium), identical lastmod dates (Low), and deprecated priority/changefreq tags (Info).
5
QUALITY GATES
In generation mode, applies quality gates for programmatic pages. Safe at scale: integration pages, template pages, glossary (200+ word definitions), product pages, user profiles. Warning at 30+ location pages, hard stop at 50+.
GAP
COVERAGE ANALYSIS
Compares crawled pages against sitemap entries to find pages missing from the sitemap. Also checks that the sitemap is referenced in robots.txt for search engine discovery.
// OUTPUT

WHAT THE REPORT INCLUDES

The output depends on the mode. Analysis produces a validation report with prioritized issues. Generation produces a ready-to-deploy sitemap.xml (or split files with index) and a STRUCTURE.md documentation file.

Analysis mode output

  • VALIDATION-REPORT.md - complete analysis results
  • Issues list - organized by severity (Critical, High, Medium, Low, Info)
  • Coverage gap - pages found by crawling but missing from sitemap
  • Recommendations - specific fixes for each issue

Generation mode output

  • sitemap.xml - valid XML sitemap (or split files with sitemap index)
  • STRUCTURE.md - site architecture documentation
  • URL count - total URLs and organization summary

Correct sitemap format

Google only uses loc and lastmod tags. The deprecated priority and changefreq tags are ignored:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2026-03-25</lastmod>
  </url>
</urlset>
// USAGE

HOW TO AUDIT OR GENERATE

Analyze an existing sitemap

/seo sitemap https://your-site.com

Fetches the sitemap from common locations (/sitemap.xml, /sitemap_index.xml, robots.txt reference), validates it, and checks every URL. Produces a validation report with prioritized issues.

Generate a new sitemap

/seo sitemap generate

Interactive mode. Detects or asks for your business type, loads an industry template, plans the structure with you, applies quality gates, and generates valid XML ready for deployment.

Penalty risk detection

The generation mode actively prevents thin content patterns that risk Google penalties:

  • Location pages with only city name swapped
  • "Best [tool] for [industry]" pages without industry-specific value
  • "[Competitor] alternative" pages without real comparison data
  • AI-generated pages without human review and unique value
// FAQ

QUESTIONS ABOUT SITEMAP ANALYSIS

The validator checks valid XML format, URL count under the 50,000 protocol limit, all URLs returning HTTP 200, accurate lastmod dates (not all identical), no deprecated tags (priority and changefreq are ignored by Google), sitemap referenced in robots.txt, no non-canonical URLs, no noindexed URLs, no redirected URLs, and HTTPS-only URLs.
Yes. In generation mode, Claude SEO detects or asks for your business type, loads an industry template, plans the site structure interactively, applies quality gates (warning at 30+ location pages, hard stop at 50+), generates valid XML, splits at 50,000 URLs with a sitemap index, and produces a STRUCTURE.md documentation file.
Safe programmatic pages at scale include integration pages with real setup docs, template/tool pages with downloadable content, glossary pages with 200+ word definitions, product pages with unique specs and reviews, and user profile pages with user-generated content. Penalty risks include location pages with only city name swapped, best-tool-for-industry pages without specific value, and AI-generated pages without human review.
No. Google has confirmed it ignores both priority and changefreq tags. Claude SEO flags their presence as informational and recommends removing them. The only tags you need are loc (URL) and lastmod (last modification date). Using accurate lastmod dates is the most effective signal for crawl scheduling.
The sitemap protocol limits each file to 50,000 URLs. Claude SEO flags this as a critical issue in analysis mode. In generation mode, it automatically splits URLs across multiple sitemap files and creates a sitemap index file that references each sub-sitemap. It also recommends splitting by content type (pages, posts, images, videos).
// RELATED SKILLS

EXPLORE MORE

VIEW ALL 14 SKILLS →

AUDIT YOUR SITEMAP
IN SECONDS.

$
git clone --depth 1 https://github.com/AgriciDaniel/claude-seo.git && bash claude-seo/install.sh
VIEW ON GITHUB ALL SKILLS >