Entity SEO: The definitive guide

About Me

Latinos MediaSyndication

The Latinos del Mundo Media Syndication profile.

Entity SEO: The definitive guide

Posted by - Latinos MediaSyndication -

on - April 6, 2023 -

Filed in - Marketing -

493 Views - 0 Comments - 0 Likes - 0 Reviews

This article was co-authored by Andrew Ansley.

Things, not strings. If you haven’t heard this before, it comes from a famous Google blog post that announced the Knowledge Graph.

The announcement’s 11th anniversary is only a month away, yet many still struggle to understand what “things, not strings” really means for SEO.

The quote is an attempt to convey that Google understands things and is no longer a simple keyword detection algorithm.

In May 2012, one could argue that entity SEO was born. Google’s machine learning, aided by semi-structured and structured knowledge bases, could understand the meaning behind a keyword.

The ambiguous nature of language finally had a long-term solution.

So if entities have been important for Google for over a decade, why are SEOs still confused about entities?

Good question. I see four reasons:

Entity SEO as a term has not been used widely enough for SEOs to become comfortable with its definition and therefore incorporate it into their vocabulary.
Optimizing for entities greatly overlaps with the old keyword-focused optimization methods. As a result, entities get conflated with keywords. On top of this, it was not clear how entities played a role in SEO, and the word “entities” is sometimes interchangeable with “topics” when Google speaks on the subject.
Understanding entities is a boring task. If you want deep knowledge of entities, you’ll need to read some Google patents and know the basics of machine learning. Entity SEO is a far more scientific approach to SEO – and science just isn’t for everyone.
While YouTube has massively impacted knowledge distribution, it has flattened the learning experience for many subjects. The creators with the most success on the platform have historically taken the easy route when educating their audience. As a result, content creators haven’t spent much time on entities until recently. Because of this, you need to learn about entities from NLP researchers, and then you need to apply the knowledge to SEO. Patents and research papers are key. Once again, this reinforces the first point above.

This article is a solution to all four problems that have prevented SEOs from fully mastering an entity-based approach to SEO.

By reading this, you’ll learn:

What an entity is and why it’s important.
The history of semantic search.
How to identify and use entities in the SERP.
How to use entities to rank web content.

Why are entities important?

Entity SEO is the future of where search engines are headed with regard to choosing what content to rank and determining its meaning.

Combine this with knowledge-based trust, and I believe that entity SEO will be the future of how SEO is done in the next two years.

Examples of entities

So how do you recognize an entity?

The SERP has several examples of entities that you’ve likely seen.

The most common types of entities are related to locations, people, or businesses.

Google Business Profile

Google image search

Knowledge Panel

Intent clusters

Perhaps the best example of entities in the SERP is intent clusters. The more a topic is understood, the more these search features emerge.

Interestingly enough, a single SEO campaign can alter the face of the SERP when you know how to execute entity-focused SEO campaigns.

Wikipedia entries are another example of entities. Wikipedia provides a great example of information associated with entities.

As you can see from the top left, the entity has all sorts of attributes associated with “fish,” ranging from its anatomy to its importance to humans.

While Wikipedia contains many data points on a topic, it is by no means exhaustive.

What is an entity?

An entity is a uniquely identifiable object or thing characterized by its name(s), type(s), attributes, and relationships to other entities. An entity is only considered to exist when it exists in an entity catalog.

Entity catalogs assign a unique ID to each entity. My agency has programmatic solutions that use the unique ID associated with each entity (services, products, and brands are all included).

If a word or phrase is not inside an existing catalog, it does not mean that the word or phrase is not an entity, but you can typically know whether something is an entity by its existence in the catalog.

It is important to note that Wikipedia is not the deciding factor on whether something is an entity, but the company is most well-known for its database of entities.

Any catalog can be used when talking about entities. Typically, an entity is a person, place, or thing, but ideas and concepts can also be included.

Some examples of entity catalogs include:

Wikipedia
Wikidata
DBpedia
Freebase
Yago

Entities help to bridge the gap between the worlds of unstructured and structured data.

They can be used to semantically enrich unstructured text, while textual sources may be utilized to populate structured knowledge bases.

Recognizing mentions of entities in text and associating these mentions with the corresponding entries in a knowledge base is known as the task of entity linking.

Entities allow for a better understanding of the meaning of text, both for humans and for machines.

While humans can relatively easily resolve the ambiguity of entities based on the context in which they are mentioned, this presents many difficulties and challenges for machines.

The knowledge base entry of an entity summarizes what we know about that entity.

As the world is constantly changing, so are new facts surfacing. Keeping up with these changes requires a continuous effort from editors and content managers. This is a demanding task at scale.

By analyzing the contents of documents in which entities are mentioned, the process of finding new facts or facts that need updating may be supported or even fully automated.

Scientists refer to this as the problem of knowledge base population, which is why entity linking is important.

Entities facilitate a semantic understanding of the user’s information need, as expressed by the keyword query, and the document’s content. Entities thus may be used to improve query and/or document representations.

In the Extended Named Entity research paper, the author identifies around 160 entity types. Here are two of seven screenshots from the list.

1/7 entity types

3/7 entity types

Certain categories of entities are more easily defined, but it’s important to remember that concepts and ideas are entities. Those two categories are very difficult for Google to scale on its own.

You can’t teach Google with just a single page when working with vague concepts. Entity understanding requires many articles and many references sustained over time.

Google’s history with entities

On July 16, 2010, Google purchased Freebase. This purchase was the first major step that led to the current entity search system.

After investing in Freebase, Google realized that Wikidata had a better solution. Google then worked to merge Freebase into Wikidata, a task that was far more difficult than expected.

Five Google scientists wrote a paper titled “From Freebase to Wikidata: The Great Migration.” Key takeaways include.

“Freebase is built on the notions of objects, facts, types, and properties. Each Freebase object has a stable identifier called a “mid” (for Machine ID).”

“Wikidata’s data model relies on the notions of item and statement. An item represents an entity, has a stable identifier called “qid”, and may have labels, descriptions, and aliases in multiple languages; further statements and links to pages about the entity in other Wikimedia projects – most prominently Wikipedia. Contrary to Freebase, Wikidata statements do not aim to encode true facts, but claims from different sources, which can also contradict each other…”

Entities are defined in these knowledge bases, but Google still had to build its entity knowledge for unstructured data (i.e., blogs).

Google partnered with Bing and Yahoo and created Schema.org to accomplish this task.

Google provides schema directions so website managers can have tools that help Google understand the content. Remember, Google wants to focus on things, not strings.

In Google’s words:

“You can help us by providing explicit clues about the meaning of a page to Google by including structured data on the page. Structured data is a standardized format for providing information about a page and classifying the page content; for example, on a recipe page, what are the ingredients, the cooking time and temperature, the calories, and so on.”

Google continues by saying:

“You must include all the required properties for an object to be eligible for appearance in Google Search with enhanced display. In general, defining more recommended features can make it more likely that your information can appear in Search results with enhanced display. However, it is more important to supply fewer but complete and accurate recommended properties rather than trying to provide every possible recommended property with less complete, badly-formed, or inaccurate data.”

More could be said about schema, but suffice it to say schema is an incredible tool for SEOs looking to make page content clear to search engines.

The last piece of the puzzle comes from Google’s blog announcement titled “Improving Search for The Next 20 Years.”

Document relevance and quality are the main ideas behind this announcement. The first method Google used for determining the content of a page was entirely focused on keywords.

Google then added topic layers to search. This layer was made possible by knowledge graphs and by systematically scraping and structuring data across the web.

That brings us to the current search system. Google went from 570 million entities and 18 billion facts to 800 billion facts and 8 billion entities in less than 10 years. As this number grows, entity search improves.

How is the entity model an improvement from previous search models?

Traditional keyword-based information retrieval (IR) models have an inherent limitation of not being able to retrieve (relevant) documents that have no explicit term matches with the query.

If you use ctrl + f to find text on a page, you use something similar to the traditional keyword-based information retrieval model.

An insane amount of data is published on the web every day.

It simply isn’t feasible for Google to understand the meaning of every word, every paragraph, every article, and every website.

Instead, entities provide a structure from which Google can minimize the computational load while improving understanding.

“Concept-based retrieval methods attempt to tackle this challenge by relying on auxiliary structures to obtain semantic representations of queries and documents in a higher-level concept space. Such structures include controlled vocabularies (dictionaries and thesauri), ontologies, and entities from a knowledge repository.”
– Entity-Oriented Search, Chapter 8.3

Krisztian Balog, who wrote the definitive book on entities, identifies three possible solutions to the traditional information retrieval model.

Expansion-based: Uses entities as a source for expanding the query with different terms.
Projection-based: The relevance between a query and a document is understood by projecting them onto a latent space of entities
Entity-based: Explicit semantic representations of queries and documents are obtained in the entity space to augment the term-based representations.

The goal of these three approaches is to gain a richer representation of the user’s information needed by identifying entities strongly related to the query.

Balog then identifies six algorithms associated with projection-based methods of entity mapping (projection methods relate to converting entities into three-dimensional space and measuring vectors using geometry).

Explicit semantic analysis (ESA): The semantics of a given word are described by a vector storing the word’s association strengths to Wikipedia-derived concepts.
Latent entity space model (LES): Based on a generative probabilistic framework. The document’s retrieval score is taken to be a linear combination of the latent entity space score and the original query likelihood score.
EsdRank: EsdRank is for ranking documents, using a combination of query-entity and entity-document features. These correspond to the notions of query projection and document projection components of LES, respectively, from before. Using a discriminative learning framework, additional signals can also be incorporated easily, such as entity popularity or document quality
Explicit semantic ranking (ESR): The explicit semantic ranking model incorporates relationship information from a knowledge graph to enable “soft matching” in the entity space.
Word-entity duet framework: This incorporates cross-space interactions between term-based and entity-based representations, leading to four types of matches: query terms to document terms, query entities to document terms, query terms to document entities, and query entities to document entities.
Attention-based ranking model: This is by far the most complicated one to describe.

Here is what Balog writes:

“A total of four attention features are designed, which are extracted for each query entity. Entity ambiguity features are meant to characterize the risk associated with an entity annotation. These are: (1) the entropy of the probability of the surface form being linked to different entities (e.g., in Wikipedia), (2) whether the annotated entity is the most popular sense of the surface form (i.e., has the highest commonness score, and (3) the difference in commonness scores between the most likely and second most likely candidates for the given surface form. The fourth feature is closeness, which is defined as the cosine similarity between the query entity and the query in an embedding space. Specifically, a joint entity-term embedding is trained using the skip-gram model on a corpus, where entity mentions are replaced with the corresponding entity identifiers. The query’s embedding is taken to be the centroid of the query terms’ embeddings.”

For now, it is important to have surface-level familiarity with these six entity-centric algorithms.

The main takeaway is that two approaches exist: projecting documents to a latent entity layer and explicit entity annotations of documents.

Three types of data structures

The image above shows the complex relationships that exist in vector space. While the example shows knowledge graph connections, this same pattern can be replicated on a page-by-page schema level.

To understand entities, it is important to know the three types of data structures that algorithms use.

Using unstructured entity descriptions, references to other entities must be recognized and disambiguated. Directed edges (hyperlinks) are added from each entity to all the other entities mentioned in its description.
In a semi-structured setting (i.e., Wikipedia), links to other entities might be explicitly provided.
When..

Read Full News Here

Pinterest More Stuff

Photos

Photos

0 albums found

Related News

Entity SEO: The definitive guide

Photos