Funnelback search rankings

Nick Mullen
Monday 14 March 2022

One question we are often asked is why a particular URL is not listed at the top of our search results. The University uses Funnelback to provide our website search functions. The search works by scanning our documents and giving each document a score (ranking). The score reflects the overall quality of the content, the validity of the source and how relevant the document is to the given search term.  The results are displayed to the user in descending order, with the highest scoring documents at the top of the list.

The scoring algorithm evaluates each document considering a range of factors in order to determine its overall score.  The following describes the factors that influence a pages overall ranking.

The content

  • The number of occurrences of the query term within the contents of the page. The page content can come from the main body of text, the title and additional metadata.
  • The length of the document: shorter documents rank the highest. Documents that are longer than average are negatively scored.
  • The overall usage of the query words: specific search words that have a limited use will score higher than more commonly occurring words.

On-site and off-site link count

  • The number of times the page is linked to. The more a page is linked to the higher the page will score.

URL

  • The length of the URL will affect the score. The shorter the URL the higher the score.
  • Excessive punctuation. The use of punctuation such as slashes, ampersands and dashes within the URL will have a negative effect on the ranking.
  • Human readable URLs will help the search algorithm understand the content.
  • Default pages such as index.html score higher than other page names. Pages that are closer to the root hierarchy of the website are given greater importance. For example ‘sport/football/’ is better than ‘sport/football/teams/’. Also subdomains are also given less weighting than the main search domain.

Date proximity

  • Recently published content is preferred over old content. This is used to offset the fact that old documents are likely to be linked to more often than newer documents.

Lexical span

  • The distance between search words.  This applies when multiple words are used within the search term. The distance (number of characters) between the words within the content will affect the ranking.   A higher score is given when the words appear closer to each other within the content.  For example when searching with the term “University ranked”, a page with the content “The University was ranked first” will be given a higher score than a page with the content “University of St Andrews has been ranked number one” This is due to there being fewer characters between the words “University” and “ranked” in the first instance.

Implicit phrase

  • The distance between search words.  Similar to the lexical span. A higher score is given if multiple search words are used and they appear next to each other within the content. For example when searching with the term “University ranked” a page with the content “The University ranked best in Scotland” will be given an additional score.

Non binary

  • Textual data will score higher than non-binary documents such as pdf, docx, etc…

Doesn’t contain advertisements

  • Ad free content will be given a higher score than content with adverts.

Other factors

  • Query independent evidence are rules that can be applied against the weighting of the results.  For example a result from a particular website could be boosted or a particular file type could be down weighted.

Summary

Our user research shows us that search is an important tool and it’s an area that we need to improve in. These factors clearly show us that to improve search we need to take a holistic approach, not just fine tuning Funnelback, but improving the content, structure and how we apply metadata.

 

Related topics