- Text similarity sees how words look. SERP overlap sees how Google treats them. Only one of those signals matches what gets rewarded.
- Threshold for clustering: 4 to 6 shared top-10 URLs groups two keywords in the same spoke. 7 or more merges them into one page. 0 to 1 separates them.
- Hub-and-spoke architecture with mandatory bidirectional links, 2 to 3 lateral spoke-to-spoke links per post, no orphans, no cannibalization.
- Intent classification per cluster drives template choice: ultimate-guide, how-to, listicle, comparison, review, best-of, landing-page.
- Outputs:
cluster-plan.json, human-readable plan, interactivecluster-map.htmlsized by volume, plus content briefs or direct execution via claude-blog.
Most topic clustering tools group keywords that sound alike. That misses what actually matters: which keywords Google ranks for the same pages. SERP overlap clustering measures that directly. The seo-cluster skill in Claude SEO is built on it.
The text-similarity trap
Walk through a concrete example. "best running shoes" vs "top running sneakers". Text similarity sees them as roughly 60 percent similar. Different enough to need separate pages? Or similar enough to merge? You cannot tell from text alone.
Now check the SERPs. Google returns the same 8 of 10 top results for both queries. SERP overlap = 80 percent. Google treats them as the same intent. One page can win both queries. Text similarity would have told you to write two pages and split your authority across them, then watched both struggle while a competitor wrote one stronger page and ranked for the pair.
The inverse trap is just as bad. "marketing automation" and "marketing automation software" look 95 percent similar by text. The SERPs share maybe 2 URLs out of 10. The first is mostly informational (what is, definitions, guides). The second is mostly commercial (comparison pages, vendor reviews). Treating them as one cluster gives you a page that satisfies neither query well.
This is why every text-similarity tool produces clusters that look tidy on paper and rank inconsistently in practice. The grouping signal is the wrong one. Strings and embeddings tell you about language. SERPs tell you about Google. Only the second category predicts ranking outcomes.
What SERP overlap measures
SERP overlap is a Jaccard-style similarity over the top-10 organic URLs for each keyword. Two keywords with 7 out of 10 shared URLs have 70 percent overlap. The clustering threshold inside the seo-cluster skill is calibrated on real ranking behavior, not on string distance.
| Shared top-10 URLs | Relationship | Action |
|---|---|---|
| 7 to 10 | Same post | Merge into one target page |
| 4 to 6 | Same cluster | Group under same spoke cluster |
| 2 to 3 | Interlink | Adjacent clusters, add cross-links |
| 0 to 1 | Separate | Different clusters or exclude |
Ads, featured snippets, and People Also Ask are ignored. Only the organic top 10 counts. That is the layer Google is making a decision about when it composes a SERP, and it is the layer your page has to break into.
From cluster to architecture
Clusters on their own are just lists of keywords. They become useful when they turn into pages. The seo-cluster skill outputs a full architecture:
- Hub pages: one pillar per cluster, broad pillar content, 2500 to 4000 words.
- Spoke pages: 2 to 4 per cluster, deep dives into specific subtopics, 1200 to 1800 words.
- Internal link matrix: hub-to-spoke and spoke-to-hub mandatory, 2 to 3 spoke-to-spoke within cluster, 0 to 1 cross-cluster, anchor text suggested per link.
- Intent classification per cluster (informational, commercial, transactional, navigational) drives template choice.
- Interactive
cluster-map.htmlsized by total search volume, expandable per cluster.
The architecture is the output. The keywords are just raw material.
Intent classification
SERP composition determines intent. Read the page types Google is serving and you can tell what query type Google thinks it is.
- Informational: SERPs filled with articles, guides, how-tos. Template: ultimate-guide, how-to, explainer, listicle.
- Commercial: SERPs filled with comparison pages, reviews, listicles like "best X for Y". Template: comparison, review, best-of.
- Transactional: SERPs filled with product pages, pricing, buy CTAs. Template: landing-page.
- Navigational: SERPs filled with a single brand's properties. Excluded from clusters.
Each cluster gets one primary intent and 0 to 2 secondary intents. URL pattern and template follow the intent assignment. The skill flags borderline cases like "best CRM software" (commercial dominant, informational secondary) for manual review rather than guessing.
Hub-and-spoke link math
Why this works. A hub page passes PageRank to N spokes. Each spoke passes PageRank back to the hub plus laterally to 2 or 3 relevant siblings. Internal linking is the cheapest way to consolidate ranking signal, and most sites get it wrong by linking everything to the homepage.
Cluster architecture fixes that. Every spoke must have at least 3 incoming internal links. No orphan pages (every post reachable from pillar in 2 clicks). Anchor text uses the target keyword or a close variant, never "click here". Link placement sits inside body content, not just navigation and sidebar.
A cannibalization check runs before any page gets written. If two posts share the same primary keyword, the SERP-overlap pass would have caught it at the 7+ threshold and merged them. The architecture is built to not compete with itself.
The link math is not magic. It is a transparent application of how PageRank propagates inside a site. The reason it works is that almost nobody else does it. Most sites have flat link graphs centered on the homepage. A cluster you actually link properly stands out against that baseline, which is why hub-and-spoke remains one of the highest-leverage moves in technical SEO after a decade of practitioners knowing about it.
Execution: brief vs build
After /seo cluster <seed-keyword> generates an architecture, two paths exist.
Editorial brief (default). Outputs a JSON plan plus a markdown brief listing the hub and spokes, suggested URLs, target keyword and secondaries per page, template type, word count, and the internal links each page needs to receive and emit. Hand it to a writer or to claude-blog.
Direct execution. If claude-blog is installed, /seo cluster execute hands the plan off and claude-blog writes the pillar plus spokes with the agreed link matrix automatically. Pillar first, then spokes by volume highest first. After each post, previous posts get scanned for backward link placeholders and the new URL gets injected. The result is a fully linked cluster, not a stack of orphaned posts.
/seo cluster running shoes /seo cluster execute
Community contribution
The cluster skill was contributed by Lutfiya Miller as the Pro Hub Challenge winner. The original engine lives at github.com/Drfiya/semantic-cluster-engine. It was integrated into Claude SEO v1.9.0 with permission and continues to ship in current releases. Credit where credit is due. The Pro Hub Challenge pattern (community contributors building skills that ship in core) has produced 4 of the strongest skills in the suite.
Pair with the rest of Claude SEO
The cluster skill is one node in a graph of skills that compose into a real workflow.
- /seo plan and FLOW find for keyword discovery, then cluster turns discoveries into architecture.
- /seo programmatic for cluster plus variant pages at scale (city, industry, use-case grids).
- /seo dataforseo for live SERP data underlying the cluster instead of WebSearch (faster, more accurate, costs cents).
- /seo content for E-E-A-T quality on each spoke after it is written.
Start now
Step by step.
- Install Claude SEO.
- Pick a seed keyword. Start broad: "running shoes", "remote work", "ecommerce platforms".
- Run
/seo cluster <seed-keyword>. - Open the resulting
cluster-map.htmlin a browser. Explore the clusters, check the link relationships, sanity-check the pillar choice. - Hand the brief to a writer, or run
/seo cluster executeif you have claude-blog installed.
claude /install github:AgriciDaniel/claude-seo /seo cluster running shoes
Conclusion
SERP overlap is the only clustering signal that matches what Google actually rewards. Text similarity sounds smart and ranks dumb. If two keywords share 8 of 10 top results, they want the same page. If two keywords share 1 of 10, they want different pages. The strings on the screen tell you nothing useful about that question.
Build the architecture on the signal that matches the algorithm. The skill is free, MIT, and ships with every install. One seed keyword, one command, one interactive cluster map. The rest is execution.