Topics and Keyword Clustering

What are topics?

Topics are groups of keywords discussing the same thing. Sometimes that means search queries that mean exactly the same thing, even though they are expressed slightly differently. But mostly it’s a mix of those keywords with a few more general and specific topics.

What is keyword clustering?

Keyword clustering is the process of grouping keywords into topics.

Topics, not keywords

While comprehensive, large keyword lists can be difficult to interpret and act upon, sometimes you can’t see the wood for the trees. Topics cluster keywords into meaningfully named groups.

What are topics useful for?

Matching topics to your site means understanding how your users think and search for your content. You can use this to organize a site in a more user-centric way. You can work out the right demand-driven taxonomy for a site. You can find the best facets or filters that align with your users’ mental models. You can identify missing pages that will drive revenue.

Topics represent demand in the business’s language. Reporting that uses them translates complex demand data into understandable terms for everyone, from the C-suite downward.

Topics structure demand data, allowing you to predict current and future consumer needs. For instance, you might use topic data to decide which brands or product lines to stock.

How does Site Topic cluster keywords into topics?

Site Topic takes raw keyword data and identifies meaningful “labels”. A label is one or more words that means something different than the sum of its parts. For instance, dress shirt means something very different than dress or shirt separately (and different from shirt dress!).

Each label belongs to a label group. For instance, dress, shirt, shirt dress, and dress shirt are all examples of a cateogry label group. We write labels and label groups like code. Instead of “red is a colour”, we write colour: red.

Sometimes, people don’t explicitly state what they’re looking for when they search. Instead, much of the meaning is implicit, but obvious to other humans. For instance, searching for iphone 15 implies looking for an Apple product. So, we’d add brand: apple to the labels for that keyword.

Keywords are often misspelled or expressed with different synonyms. Site Topic standardizes around one label to cover all expressions.

Why use labels and label groups?

Some keywords are ambiguous. Writing a keyword as a set of labels and label groups helps clarify the interpretation. For instance (keeping the fashion theme going), red valentino dress might be interpreted either as

  • colour: red, brand: valentino, category: dress , type: clothing
  • or as brand: red valentino, category: dress, type: clothing

Interpreting or translating a keyword into labels and groups is an example of query understanding, a core part of search engine functionality.

Label and label groups let us group keywords meaningfully because they have universally understood names.

What’s the difference between labels and entities?

You might think, “hang on, these things look like named entities!” and you’d be right. Labels are a different name for entities, but they are subtly different. If you look up a definition of named entity recognition, it says

Named-entity recognition (NER) seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

But Site Topic doesn’t have a list of pre-defined categories — entity names or label groups. Instead, it works these out as it turns keywords into labels. There are common ones that appear repeatedly, but Site Topic discovers these on the fly, then builds them into standardized structures.

NER is top-down and Site Topic is bottom-up. Both have their place. Site Topic’s approach is powerful because it allows us to structure messy demand data dynamically. Hyper Icon