Wednesday, November 4, 2020

HigherVisibility Wins Two SEO Agency of the Year Awards in 2020

SEO

The accolades from Search Engine Land and the US Search Awards cap off a banner year.

HigherVisibility was presented with two SEO Agency of the Year awards in the month of October:

The awards are judged by industry professionals and peers and highlight the most successful companies and campaigns in the SEO industry.

“Winning the most respected awards in our industry during such a uniquely challenging time is a testament to the hard work that our team puts in for our clients,” Managing Partner Scott Langdon said. “We couldn’t be more proud of our staff and we are thankful for the clients that have put their faith in us.”

The US Search Awards featured a vast panel of judges to peer review the finalists for their awards. HigherVisibility beat out a shortlist of six other finalists to win the award. Kevin Gibbons, the CEO at Re: signal and the judge presenting the award, cited HigherVisibility’s “great results, both for clients and in their own growth” as the reasons for our victory.

Search Engine Land’s panel of judges included many of their top in-house editors, as well as a group of marketing professionals specializing in a variety of fields. The Search Engine Land Awards – coined as Search Marketing’s Highest Honors – celebrate those within the search marketing community who have achieved excellence in their organic and paid search campaigns as well as overall research.

This is not the first time that HigherVisibility has been named SEO Agency of the Year by Search Engine Land; we also won the award in 2017, making us two-time winners of the award.


HigherVisibility Wins Two SEO Agency of the Year Awards in 2020 was originally posted by Video And Blog Marketing

Wednesday, October 14, 2020

Adjusting Featured Snippet Answers by Context

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360, and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=
  • V2=
  • V3=

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text which it was selected from.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete of an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
Adjusting Featured Snippet Answers by Context was originally posted by Video And Blog Marketing

Tuesday, October 13, 2020

Adjusting Featured Snippet Answers by Context

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360, and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=
  • V2=
  • V3=

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text which it was selected from.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete of an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
Adjusting Featured Snippet Answers by Context was originally posted by Video And Blog Marketing

Bursty Fresh and Local Featured Snippet Answers at Google

Featured Snippet Answers Based on Context

Last month I wrote about answer passages when Google decides what answers to show in response to queries that are asking questions, in the post, Featured Snippet Answer Scores Ranking Signals. In that post, I wrote about an updated patent which made it clear that passages that might be shown in response to a query are given answer scores that are based on both query dependent and query independent signals.

A query dependent signal is one that includes relevance of a term in the query to some aspect of candidate featured snippet answers. A query independent signal doesn’t rely upon the terms in a query, and their relevance to terms in an answer passage, but could look at other aspects of answers, such as whether an answer is written in complete sentences or other query independent aspects of those answers.

At the end of September, Danny Sullivan, Public Liaison for Search at Google, posted on the Google Keyword Blog about some recent queries that were performed on Google that contained questions about smoke-related to wildfires in California. One frequent query in the area was, “why is the sky orange?” The blog post told us about how Google might use contextual information about location and freshness of content in featured snippet answers.

You may notice that the location of searchers is not expressly identified in the query, much like a search for different business types, such as restaurants or places to shop. The article about these queries is in the post at:

Why is the sky orange? How Google gave people the right info

Danny tells us about how Google might respond to these queries:

Well, language understanding is at the core of Search, but it’s not just about the words. Critical context, like time and place, also helps us understand what you’re really looking for. This is particularly true for featured snippets, a feature in Search that highlights pages that our systems determine are likely a great match for your search. We’ve made improvements to better understand when fresh or local information — or both — is key to delivering relevant results to your search.

So this is pointing out that Google has worked on improving answers for questions that are asking about fresh or local information (Or both). The snippet from the post refers to critical context, and how Google may understand the context of a question is essential to how helpful it can be in answering questions.

Google tells us that “Our freshness indicators identified a rush of new content was being produced on this topic that was both locally relevant and different from the more evergreen content that existed.”

Since Google actively is engaged in indexing content on the web, they can notice bursty behavior about different topics, and where it is from. That reminds me of a post I wrote back in 2008 called How Search Query Burstiness Could Increase Page Rankings. So Google can tell what people are searching for and where they are searching from, by keeping an eye on their log files, and Google can tell what people are creating content about when it indexes new and updated webpages.

I liked this statement from the Google post, too:

Put simply, instead of surfacing general information on what causes a sunset, when people searched for “why is the sky orange” during this time period, our systems automatically pulled in current, location-based information to help people find the timely results they were searching for.

Danny also points out a query that sometimes surfaces from searchers in places such as New York City, or Boston: “Why is it Hazy?” to show that Google can use local context in those areas to provide relevant results for people searching from there.

We are told that this Google blog post provided information about a couple of queries specific to certain locations, but Google receives billions of queries a day, and they provide fresh and relevant results to all of those queries when they receive them.

Understanding the context of questions that people perform on different topics and from different places can help people receive answers to what they want to learn more about. The Google Blog post from Danny is worth reading and thinking about if you haven’t seen it


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
Bursty Fresh and Local Featured Snippet Answers at Google was originally posted by Video And Blog Marketing

Thursday, October 8, 2020

Bursty Fresh and Local Featured Snippet Answers at Google

Featured Snippet Answers Based on Context

Last month I wrote about answer passages when Google decides what answers to show in response to queries that are asking questions, in the post, Featured Snippet Answer Scores Ranking Signals. In that post, I wrote about an updated patent which made it clear that passages that might be shown in response to a query are given answer scores that are based on both query dependent and query independent signals.

A query dependent signal is one that includes relevance of a term in the query to some aspect of candidate featured snippet answers. A query independent signal doesn’t rely upon the terms in a query, and their relevance to terms in an answer passage, but could look at other aspects of answers, such as whether an answer is written in complete sentences or other query independent aspects of those answers.

At the end of September, Danny Sullivan, Public Liaison for Search at Google, posted on the Google Keyword Blog about some recent queries that were performed on Google that contained questions about smoke-related to wildfires in California. One frequent query in the area was, “why is the sky orange?” The blog post told us about how Google might use contextual information about location and freshness of content in featured snippet answers.

You may notice that the location of searchers is not expressly identified in the query, much like a search for different business types, such as restaurants or places to shop. The article about these queries is in the post at:

Why is the sky orange? How Google gave people the right info

Danny tells us about how Google might respond to these queries:

Well, language understanding is at the core of Search, but it’s not just about the words. Critical context, like time and place, also helps us understand what you’re really looking for. This is particularly true for featured snippets, a feature in Search that highlights pages that our systems determine are likely a great match for your search. We’ve made improvements to better understand when fresh or local information — or both — is key to delivering relevant results to your search.

So this is pointing out that Google has worked on improving answers for questions that are asking about fresh or local information (Or both). The snippet from the post refers to critical context, and how Google may understand the context of a question is essential to how helpful it can be in answering questions.

Google tells us that “Our freshness indicators identified a rush of new content was being produced on this topic that was both locally relevant and different from the more evergreen content that existed.”

Since Google actively is engaged in indexing content on the web, they can notice bursty behavior about different topics, and where it is from. That reminds me of a post I wrote back in 2008 called How Search Query Burstiness Could Increase Page Rankings. So Google can tell what people are searching for and where they are searching from, by keeping an eye on their log files, and Google can tell what people are creating content about when it indexes new and updated webpages.

I liked this statement from the Google post, too:

Put simply, instead of surfacing general information on what causes a sunset, when people searched for “why is the sky orange” during this time period, our systems automatically pulled in current, location-based information to help people find the timely results they were searching for.

Danny also points out a query that sometimes surfaces from searchers in places such as New York City, or Boston: “Why is it Hazy?” to show that Google can use local context in those areas to provide relevant results for people searching from there.

We are told that this Google blog post provided information about a couple of queries specific to certain locations, but Google receives billions of queries a day, and they provide fresh and relevant results to all of those queries when they receive them.

Understanding the context of questions that people perform on different topics and from different places can help people receive answers to what they want to learn more about. The Google Blog post from Danny is worth reading and thinking about if you haven’t seen it


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
Bursty Fresh and Local Featured Snippet Answers at Google was originally posted by Video And Blog Marketing

Tuesday, October 6, 2020

Bursty Fresh and Local Featured Snippet Answers at Google

Featured Snippet Answers Based on Context

Last month I wrote about answer passages when Google decides what answers to show in response to queries that are asking questions, in the post, Featured Snippet Answer Scores Ranking Signals. In that post, I wrote about an updated patent which made it clear that passages that might be shown in response to a query are given answer scores that are based on both query dependent and query independent signals.

A query dependent signal is one that includes relevance of a term in the query to some aspect of candidate featured snippet answers. A query independent signal doesn’t rely upon the terms in a query, and their relevance to terms in an answer passage, but could look at other aspects of answers, such as whether an answer is written in complete sentences or other query independent aspects of those answers.

At the end of September, Danny Sullivan, Public Liaison for Search at Google, posted on the Google Keyword Blog about some recent queries that were performed on Google that contained questions about smoke-related to wildfires in California. One frequent query in the area was, “why is the sky orange?” The blog post told us about how Google might use contextual information about location and freshness of content in featured snippet answers.

You may notice that the location of searchers is not expressly identified in the query, much like a search for different business types, such as restaurants or places to shop. The article about these queries is in the post at:

Why is the sky orange? How Google gave people the right info

Danny tells us about how Google might respond to these queries:

Well, language understanding is at the core of Search, but it’s not just about the words. Critical context, like time and place, also helps us understand what you’re really looking for. This is particularly true for featured snippets, a feature in Search that highlights pages that our systems determine are likely a great match for your search. We’ve made improvements to better understand when fresh or local information — or both — is key to delivering relevant results to your search.

So this is pointing out that Google has worked on improving answers for questions that are asking about fresh or local information (Or both). The snippet from the post refers to critical context, and how Google may understand the context of a question is essential to how helpful it can be in answering questions.

Google tells us that “Our freshness indicators identified a rush of new content was being produced on this topic that was both locally relevant and different from the more evergreen content that existed.”

Since Google actively is engaged in indexing content on the web, they can notice bursty behavior about different topics, and where it is from. That reminds me of a post I wrote back in 2008 called How Search Query Burstiness Could Increase Page Rankings. So Google can tell what people are searching for and where they are searching from, by keeping an eye on their log files, and Google can tell what people are creating content about when it indexes new and updated webpages.

I liked this statement from the Google post, too:

Put simply, instead of surfacing general information on what causes a sunset, when people searched for “why is the sky orange” during this time period, our systems automatically pulled in current, location-based information to help people find the timely results they were searching for.

Danny also points out a query that sometimes surfaces from searchers in places such as New York City, or Boston: “Why is it Hazy?” to show that Google can use local context in those areas to provide relevant results for people searching from there.

We are told that this Google blog post provided information about a couple of queries specific to certain locations, but Google receives billions of queries a day, and they provide fresh and relevant results to all of those queries when they receive them.

Understanding the context of questions that people perform on different topics and from different places can help people receive answers to what they want to learn more about. The Google Blog post from Danny is worth reading and thinking about if you haven’t seen it


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
Bursty Fresh and Local Featured Snippet Answers at Google was originally posted by Video And Blog Marketing

Sunday, September 27, 2020

Featured Snippet Answer Scores Ranking Signals

Calculating Featured Snippet Answer Scores

An update this week to a patent tells us how Google may score featured snippet answers.

When a search engine ranks search results in response to a query, it may use a combination of query dependant and query independent ranking signals to determine those rankings.

A query dependant signal may depend on a term in a query, and how relevant a search result may be for that query term. A query independent signal would depend on something other than the terms in a query, such as the quality and quantity of links pointing to a result.

Answers to questions in queries may be ranked based on a combination of query dependant and query independent signals, which could determine a featured snippet answer score. An updated patent about textual answer passages tells us about how those may be combined to generate featured snippet answer scores to choose from answers to questions that appear in queries.

A year and a half ago, I wrote about answers to featured snippets in the post Does Google Use Schema to Write Answer Passages for Featured Snippets?. The patent that post was about was Candidate answer passages, which was originally filed on August 12, 2015, and was granted as a continuation patent on January 15, 2019.

That patent was a continuation patent to an original one about answer passages that updated it by telling us that Google would look for textual answers to questions that had structured data near them that included related facts. This could have been something like a data table or possibly even schema markup. This meant that Google could provide a text-based answer to a question and include many related facts for that answer.

Another continuation version of the first version of the patent was just granted this week. It provides more information and a different approach to ranking answers for featured snippets and it is worth comparing the claims in these two versions of the patent to see how those are different from Google.

The new version of the featured snippet answer scores patent is at:

Scoring candidate answer passages
Inventors: Steven D. Baker, Srinivasan Venkatachary, Robert Andrew Brennan, Per Bjornsson, Yi Liu, Hadar Shemtov, Massimiliano Ciaramita, and Ioannis Tsochantaridis
Assignee: Google LLC
US Patent: 10,783,156
Granted: September 22, 2020
Filed: February 22, 2018

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for scoring candidate answer passages. In one aspect, a method includes receiving a query determined to be a question query that seeks an answer response and data identifying resources determined to be responsive to the query; for a subset of the resources: receiving candidate answer passages; determining, for each candidate answer passage, a query term match score that is a measure of similarity of the query terms to the candidate answer passage; determining, for each candidate answer passage, an answer term match score that is a measure of similarity of answer terms to the candidate answer passage; determining, for each candidate answer passage, a query dependent score based on the query term match score and the answer term match score; and generating an answer score that is a based on the query dependent score.

featured snippet answer scores

Candidate Answer Passages Claims Updated

There are changes to the patent that require more analysis of potential answers, based on both query dependant and query independent scores for potential answers to questions. The patent description does provide details about query dependant and query independent scores. The first claim from the first patent covers query dependant scores for answers, but not query independent scores as the newest version does. It provides more details about both query dependant scores and query independent scores in the rest of the claims, but the newer version seems to make both query dependant and query independent scores more important.

The first claim from the 2015 version of the Scoring Answer Passages patent tells us:

1. A method performed by data processing apparatus, the method comprising: receiving a query determined to be a question query that seeks an answer response and data identifying resources determined to be responsive to the query and ordered according to a ranking, the query having query terms; for each resource in a top-ranked subset of the resources: receiving candidate answer passages, each candidate answer passage selected from passage units from content of the resource and being eligible to be provided as an answer passage with search results that identify the resources determined to be responsive to the query and being separate and distinct from the search results; determining, for each candidate answer passage, a query term match score that is a measure of similarity of the query terms to the candidate answer passage; determining, for each candidate answer passage, an answer term match score that is a measure of similarity of answer terms to the candidate answer passage; determining, for each candidate answer passage, a query dependent score based on the query term match score and the answer term match score; and generating an answer score that is a measure of answer quality for the answer response for the candidate answer passage based on the query dependent score.

The remainder of the claims tell us about both query dependant and query independent scores for answers, but the claims from the newer version of the patent appear to place as much importance on the query dependant and the query independent scores for answers. That convinced me that I should revisit this patent in a post and describe how Google may calculate answer scores based on query dependant and query independent scores.

The first claims in the new patent tell us:

1. A method performed by data processing apparatus, the method comprising: receiving a query determined to be a question query that seeks an answer response and data identifying resources determined to be responsive to the query and ordered according to a ranking, the query having query terms; for each resource in a top-ranked subset of the resources: receiving candidate answer passages, each candidate answer passage selected from passage units from content of the resource and being eligible to be provided as an answer passage with search results that identify the resources determined to be responsive to the query and being separate and distinct from the search results; determining, for each candidate answer passage, a query dependent score that is proportional to a number of instances of matches of query terms to terms of the candidate answer passage; determining, for each candidate answer passage, a query independent score for the candidate answer passage, wherein the query independent score is independent of the query and query dependent score and based on features of the candidate answer passage; and generating an answer score that is a measure of answer quality for the answer response for the candidate answer passage based on the query dependent score and the query independent score.

As it says in this new claim, the answer score has gone from being “a measure of answer quality for the answer response for the candidate answer passage based on the query dependent score” (from the first patent) to “a measure of answer quality for the answer response for the candidate answer passage based on the query dependent score and the query independent score” (from this newer version of the patent.)

This drawing is from both versions of the patent, but it shows the query dependant and query independent scores both playing an important role in calculating featured snippet answer scores:

query dependent & query independent answers combine

Query Dependant and Query Independent Scores for Featured Snippet Answer Scores

Both versions of the patent tell us about how a query dependant score and a query independent score for an answer might be calculated. The first version of the patent only told us in its claims that an answer score used the query dependant score, and this newer version tells us that both the query dependant and the query independent scores are combined to calculate an answer score (to decide which answer is the best choice of an answer for a query.)

Before the patent discusses how Query Dependant and Query Independent signals might be used to create an answer score, it does tell us this about the answer score:

The answer passage scorer receives candidate answer passages from the answer passage generator and scores each passage by combining scoring signals that predict how likely the passage is to answer the question.

In some implementations, the answer passage scorer includes a query dependent scorer and a query independent scorer that respectively generate a query dependent score and a query independent score. In some implementations, the query dependent scorer generates the query dependent score based on an answer term match score and a query term match score.

Query Dependant Scoring for Featured Snippet Answer Scores

Query Dependent Scoring of answer passages is based on answer term features.

An answer term match score is a measure of similarity of answer terms to terms in a candidate answer passage.

The answer-seeking queries do not describe what a searcher is looking for since the answer is unknown to the searcher at the time of a search.

The query dependent scorer begins by finding a set of likely answer terms and compares the set of likely answer terms to a candidate answer passage to generate an answer term match score. The set of likely answer terms is likely taken from the top N ranked results returned for a query.

The process creates a list of terms from terms that are included in the top-ranked subset of results for a query. The patent tells us that each result is parsed and each term is included in a term vector. Stop words may be omitted from the term vector.

For each term in the list of terms, a term weight may be generated for the term. The term weight for each term may be based on many results in the top-ranked subset of results in which the term occurs multiplied by an inverse document frequency (IDF) value for the term. The IDF value may be derived from a large corpus of documents and provided to the query dependent scorer. Or the IDF may be derived from the top N documents in the returned results. The patent tells us that other appropriate term weighting techniques can also be used.

The scoring process for each term of the candidate answer passage determines several times the term occurs in the candidate answer passage. So, if the term “apogee” occurs two times in a candidate answer passage, the term value for “apogee” for that candidate answer passage is 2. However, if the same term occurs three times in a different candidate answer passage, then the term value for “apogee” for the different candidate answer passage is 3.

The scoring process, for each term of the candidate answer passage, multiplies its term weight by the number of times the term occurs in the answer passage. So, assume the term weight for “apogee” is 0.04. For the first candidate answer passage, the value based on “apogee” is 0.08 (0.08.times.2); for the second candidate answer passage, the value based on “apogee” is 0.12 (0.04.times.3).

Other answer term features can also be used to determine an answer term score. For example, the query dependent scorer may determine an entity type for an answer response to the question query. The entity type may be determined by identifying terms that identify entities, such as persons, places, or things, and selecting the terms with the highest term scores. The entity time may also be identified from the query (e.g., for the query [who is the fastest man]), the entity type for an answer is “man.” For each candidate answer passage, the query dependent scorer then identifies entities described in the candidate answer passage. If the entities do not include a match to the identified entity type, the answer term match score for the candidate answer passage is reduced.

Assume the following candidate passage answer is provided for scoring in response to the query [who is the fastest man]: Olympic sprinters have often set world records for sprinting events during the Olympics. The most popular sprinting event is the 100-meter dash.

The query dependent scorer will identify several entities–Olympics, sprinters, etc.–but none of them are of the type “man.” The term “sprinter” is gender-neutral. Accordingly, the answer term score will be reduced. The score may be a binary score, e.g., 1 for the presence of the term of the entity type, and 0 for an absence of the term of the correct type; alternatively may be a likelihood that is a measure of the likelihood that the correct term is in the candidate answer passage. An appropriate scoring technique can be used to generate the score.

Query Independant Scoring for Featured Snippet Answer Scores

Scoring answer passages according to query independent features.

Candidate answer passages may be generated from the top N ranked resources identified for a search in response to a query. N may be the same number as the number of search results returned on the first page of search results.

The scoring process can use a passage unit position score. This passage unit position could be the location of a result that a candidate answer passage comes from. The higher the location results in a higher score.

The scoring process may use a language model score. The language model score generates a score based on candidate answer passages conforming to a language model.

One type of language model is based on sentence and grammar structures. This could mean that candidate answer passages with partial sentences may have lower scores than candidate answer passages with complete sentences. The patent also tells us that if structured content is included in the candidate answer passage, the structured content is not subject to language model scoring. For instance, a row from a table may have a very low language model score but may be very informative.

Another language model that may be used considers whether text from a candidate answer passage appears similar to answer text in general.

A query independent scorer accesses a language model of historical answer passages, where the historical answer passages are answer passages that have been served for all queries. Answer passages that have been served generally have a similar n-gram structure, since answer passages tend to include explanatory and declarative statements. A query independent score could use a tri-gram model to compares trigrams of the candidate answer passage to the tri-grams of the historical answer passages. A higher-quality candidate answer passage will typically have more tri-gram matches to the historical answer passages than a lower quality candidate answer passage.

Another step involves a section boundary score. A candidate answer passage could be penalized if it includes text that passes formatting boundaries, such as paragraphs and section breaks, for example.

The scoring process determines an interrogative score. The query independent scorer searches the candidate answer passage for interrogative terms. A potential answer passage that includes a question or question term, e.g., “How far is away is the moon from the Earth?” is generally not as helpful to a searcher looking for an answer as a candidate answer passage that only includes declarative statements, e.g., “The moon is approximately 238,900 miles from the Earth.”

The scoring process also determines discourse boundary term position scores. A discourse boundary term is one that introduces a statement or idea contrary to or modification of a statement or idea that has just been made. For example, “conversely,” “however,” “on the other hand,” and so on.

A candidate answer passage beginning with such a term receives a relatively low discourse boundary term position score, which lowers the answer score.

A candidate answer passage that includes but does not begin with such a term receives a higher discourse boundary term position score than it would if it began with the term.

A candidate answer passage that does not include such a term receives a high discourse boundary term position score.

The scoring process determines result scores for results from which the candidate answer passage was created. These could include a ranking score, a reputation score, and site quality score. The higher these scores are, the higher the answer score will be.

A ranking score is based on the ranking score of the result from which the candidate answer passage was created. It can be the search score of the result for the query and will be applied to all candidate answer passages from that result.

A reputation score of the result indicates the trustworthiness and/or likelihood that that subject matter of the resource serves the query well.

A site quality score indicates a measure of the quality of a web site that hosts the result from which the candidate answer passage was created.

Component query independent scores described above may be combined in several ways to determine the query independent score. They could be summed; multiplied together; or combined in other ways.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
Featured Snippet Answer Scores Ranking Signals was originally posted by Video And Blog Marketing