Meta Discussion

Unexpected search result [#2171]

Back to forum

New reply

willemien: Unexpected search result (2010-01-30 15:50) [#7139]

{willemien]: I was trying out some ways to search for the EGF and noticed the results are rather unexpected. Just Typing egf (lowercase) in the (Full text) seach box the EGF (European Go Federation) appears only as number 14.

Typing it as EGF (Capitals gives the same result)

In title seach it works fine but why does ti appears so low on the full text search page?

ArnoHollosi: Search results (2010-02-24 08:57) [#7305]

This one and the "shimari" search may be unexpected. I have some ideas on how to improve the situation, but I'm not sure if they work out.

The full text search engine used on SL is [ext] Xapian. It is quite a feature rich engine that leaves nothing to be desired. The ranking algorithm is as good as it gets, when doing text search.

How is the ranking calculated? Currently it only looks at the content of the page, its title and keywords, and the title of alias pages. Every word has an associated weight: words in page titles weigh more than words in headings which weigh more than words in ordinary text. How often a word appears has implications too. (This is a simplistic description, in the background there is some serious math going on.)

Anyway, it boils down to this: how should Xapian, or any full text search enginge for that matter, decide that European Go Federation is more important than Top Amateur Players By Country? If you look at the first three search results for "EGF" you will see that all of them have the term "EGF" come up several times, whereas the term "EGF" is not mentioned that often on the European Go Federation page.

One solution: increase the weight of words in page titles. But this only goes so far, because at some point, the full text search degrades into a title search.

Solution two: combine the results of title and full text search. Either as separate lists or in a fancy way, where the title search results seed the Xapian search.

Solution three: introduce an overall score for pages (like google pagerank). I tried this one, but no matter what you do, well linked pages like atari will show up at the top spots for almost every search term you enter.

Solution four: add anchor text of links that point to the page. Google does that too. Maybe that helps some. It is not easy to implement though.

Neither of these solutions is a clear winner. Maybe some combination. Who knows. I'm not sure how to proceed at this point.

HermanHiddema: Re: Search results (2010-02-24 09:50) [#7306]

I think solution four would help, but is probably too much work.

Solution two sounds like the best option, and should not be too much work, I guess?

Unkx80: Re: Search results (2010-02-24 11:18) [#7308]

I also feel that solution one will not work.

Solution two may work. In particular, feeding results from title search into full text search is known more generally as "query expansion". The difficult question here is what terms (and how many) to include in the set of expanded terms.

For solution three, the observation regarding PageRank is not unexpected. Have you tried algorithms such as HITS and SALSA that differentiates between hubs and authorities?

Solution four may work. I do agree that a lot of effort is needed though.

However, there is a very crude solution that should be easy to implement. Terms like "EGF" and "shimari" are actually direct hits on page titles (in this case, aliases). Why not return these first before the others?

willemien: sorry late reply (2010-04-05 14:19) [#7609]

Maybe just giving direct hits a high value. (if that is possible) or otherwise giving Aliasses a high value. (treat them as direc t hit or so)

When you do a seach for the title it does end up at the top of the list. The same should apply to a Alias (in my humble opinion)

Back to forum

New reply

[Welcome to Sensei's Library!]
Search position
Page history
Latest page diff
Partner sites:
Go Teaching Ladder
Login / Prefs
Sensei's Library