Adventures in Office SharePoint 2007 Search Thesaurus
On a recent project one of my suggestions to my client on how to help new employees joining the organization to better and more easily find content on the corporate portal was to employ the handy search thesaurus feature of SharePoint.
The search thesaurus feature is one where you can configure SharePoint search so that if one word is searched, such as “HR” it could also concurrently search the words “Human Resources” and “Talent” in the same search query. In SharePoint terms this would be an “expansion set”. You also have the option of replacing the search word so that if an end user searched for the word “Windows NT” the search could actually query some words such as “Windows” and “Windows Server” while not searching the word “Windows NT”, this is known as a “replacement set”.
This is all well and good, however I quickly found the implementation of this cool feature was more easily said than done.
After updating what I thought was the appropriate .XML file it was unclear what needed to be done to “activate” the thesaurus to take hold (IISRESET? Run a full crawl?), also, it was rather unclear which files needed to be updated and I was unsure if the .XML file I updated was the correct one.
After searching around the web and hitting that standard issue SharePoint books I became even more confused. It seemed some things worked for some individuals but not others. So now, after having a bit of a struggle, I think I got this thing “mostly” figured out and I wanted to share my findings. The following are the steps I took to get the search thesaurus feature working for me and my client.
How to configure the search thesaurus in Office SharePoint Server 2007
Step 1: Find the correct “tsxxx.xml” file
There are a number of “tsxxx.xml” files (where “xxx” is actually a three letter abbreviation of a language) such as “tsenu.xml” for American English and “tseng.xml” for British English. But there is also the “tsneu.xml” file, this is the neutral language file that will work no matter what language SharePoint is configured to be serving. It was the “tsneu.xml” file that I ended up using in the end for my project.
There are actually a couple of locations of the “tsxxx.xml” files. The one that I found that actually impacted my search results in the end is the following (if you installed SharePoint in the standard file location):
C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Although, in my client’s case we actually moved the index location to a SAN for search index growth so in that case the location of the “tsxxx.xml” files was somewhere to the effect of:
E:\Portal Search Index\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Now, from what I read, theoretically if you update the “tsxxx.xml” files located in the folder:
C:\Program Files\Microsoft Office Servers\12.0\Data\Config
These files will at some point get copied down into the deeper (C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\) folder. However after much extensive testing this never worked for me.
Step 2: Verify you are working with the correct “tsxxx.xml” file
The next step is to verify you are working with the correct “tsxxx.xml” file. I just took the “tsneu.xml” file and removed the XML comments around the main nodes in the default example file to make the provided run and jog expansion set active. From there, I uploaded two simple text files: one named jog.txt and another named “run.txt” and inside each had just the word “jog” and “run” respectively. I just wanted to make the simplest example work first.
Once I did this I had to do something to make my XML changes take hold. I read that you need to do an IISRESET and also a full crawl on the content source, but in the end I found out that the only thing I had to do to make my “tsxxx.xml” file take hold was to restart the Office SharePoint Server Search service, also known as “osearch”. I actually wrote a little batch file (commands shown below) to stop and start the service because I kept trying different files over and over again in my testing.
Search Service Restart Batch File Commands:
net stop osearch
net start osearch
PAUSE
Step 3: Update the correct “tsxxx.xml” file
So this is where I really got stuck. After finally figuring out which file I needed to update and how to make it take hold, my final series of expansion sets I got back from my client were not taking hold in my search results and it was a real mystery. What I had done was provide my client with a worksheet to fill out and they really did a great job, so I had a really large number of expansion sets each with many words. Turns out this is actually what got me in trouble in the end – turns out you can’t have duplicate entries in your expansion or replacement sets. SharePoint gets horribly confused. To illustrate, consider the following XML snippet:
<expansion>
<sub>word1</sub>
<sub>word2</sub>
<sub>word3</sub>
</expansion>
<expansion>
<sub>word5</sub>
<sub>word2</sub>
<sub>word6</sub>
</expansion>
This above example would actually not work and interestingly enough, there would be no errors or indication produced by SharePoint (that I could find). The reason for this not working is because “word2” is used in both sets. This seems to be an undocumented bug of some sort. So, the big lesson here was to remove all duplicates across each expansion set. Once I did this, my results worked perfectly!
The search thesaurus feature is one where you can configure SharePoint search so that if one word is searched, such as “HR” it could also concurrently search the words “Human Resources” and “Talent” in the same search query. In SharePoint terms this would be an “expansion set”. You also have the option of replacing the search word so that if an end user searched for the word “Windows NT” the search could actually query some words such as “Windows” and “Windows Server” while not searching the word “Windows NT”, this is known as a “replacement set”.
This is all well and good, however I quickly found the implementation of this cool feature was more easily said than done.
After updating what I thought was the appropriate .XML file it was unclear what needed to be done to “activate” the thesaurus to take hold (IISRESET? Run a full crawl?), also, it was rather unclear which files needed to be updated and I was unsure if the .XML file I updated was the correct one.
After searching around the web and hitting that standard issue SharePoint books I became even more confused. It seemed some things worked for some individuals but not others. So now, after having a bit of a struggle, I think I got this thing “mostly” figured out and I wanted to share my findings. The following are the steps I took to get the search thesaurus feature working for me and my client.
How to configure the search thesaurus in Office SharePoint Server 2007
Step 1: Find the correct “tsxxx.xml” file
There are a number of “tsxxx.xml” files (where “xxx” is actually a three letter abbreviation of a language) such as “tsenu.xml” for American English and “tseng.xml” for British English. But there is also the “tsneu.xml” file, this is the neutral language file that will work no matter what language SharePoint is configured to be serving. It was the “tsneu.xml” file that I ended up using in the end for my project.
There are actually a couple of locations of the “tsxxx.xml” files. The one that I found that actually impacted my search results in the end is the following (if you installed SharePoint in the standard file location):
C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Although, in my client’s case we actually moved the index location to a SAN for search index growth so in that case the location of the “tsxxx.xml” files was somewhere to the effect of:
E:\Portal Search Index\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Now, from what I read, theoretically if you update the “tsxxx.xml” files located in the folder:
C:\Program Files\Microsoft Office Servers\12.0\Data\Config
These files will at some point get copied down into the deeper (C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\) folder. However after much extensive testing this never worked for me.
Step 2: Verify you are working with the correct “tsxxx.xml” file
The next step is to verify you are working with the correct “tsxxx.xml” file. I just took the “tsneu.xml” file and removed the XML comments around the main nodes in the default example file to make the provided run and jog expansion set active. From there, I uploaded two simple text files: one named jog.txt and another named “run.txt” and inside each had just the word “jog” and “run” respectively. I just wanted to make the simplest example work first.
Once I did this I had to do something to make my XML changes take hold. I read that you need to do an IISRESET and also a full crawl on the content source, but in the end I found out that the only thing I had to do to make my “tsxxx.xml” file take hold was to restart the Office SharePoint Server Search service, also known as “osearch”. I actually wrote a little batch file (commands shown below) to stop and start the service because I kept trying different files over and over again in my testing.
Search Service Restart Batch File Commands:
net stop osearch
net start osearch
PAUSE
Step 3: Update the correct “tsxxx.xml” file
So this is where I really got stuck. After finally figuring out which file I needed to update and how to make it take hold, my final series of expansion sets I got back from my client were not taking hold in my search results and it was a real mystery. What I had done was provide my client with a worksheet to fill out and they really did a great job, so I had a really large number of expansion sets each with many words. Turns out this is actually what got me in trouble in the end – turns out you can’t have duplicate entries in your expansion or replacement sets. SharePoint gets horribly confused. To illustrate, consider the following XML snippet:
<expansion>
<sub>word1</sub>
<sub>word2</sub>
<sub>word3</sub>
</expansion>
<expansion>
<sub>word5</sub>
<sub>word2</sub>
<sub>word6</sub>
</expansion>
This above example would actually not work and interestingly enough, there would be no errors or indication produced by SharePoint (that I could find). The reason for this not working is because “word2” is used in both sets. This seems to be an undocumented bug of some sort. So, the big lesson here was to remove all duplicates across each expansion set. Once I did this, my results worked perfectly!
Labels: Search


9 Comments:
Jonathon - thanks for your very clear article, helped me a lot.
Just 4 extra points from my recent experiences debugging MOSS 2007 thesauruses which may help folks further:
1. The thesaurus parser will fail if there is a duplicate entry under and expansion element, for example:
<expansion>
<sub>insolvency</sub>
<sub>bankruptcy</sub>
<sub>insolvency</sub>
</expansion>
2. Indeed, all you need to do is restart the osearch.exe Service - how I wish I'd found your post earlier and saved myself all sorts of convoluted re-crawling combinations....
3. Although the Microsoft support article at http://support.microsoft.com/kb/837847 seems to imply that you can add weight tags to <sub> elements:
<sub weight="0.7">Stuff</sub>
I encountered failures when trying to do this.
4. Furthermore, the referenced tsschema.xml file which ships with MOSS also does not appear to have <case> element described (for switching case sensitivity on/off, although it DOES have the diacritics_sensitive element. Maybe some of these are legacy elements, or features which got pulled before MOSS RTM'ed...
Cheers
Ben
If you using a search results web part with the StemmingEnabled property set to True, the entries in the thesaurus <expansion> tag get excluded from search results.
You must set StemmingEnabled = "False" in order for the Thesaurus entries to work.
Jonathon,
Your article is a great help - got the thesaurus running in just a few minutes. However, I notice that if a thesaurus-based result comes up, hit-highlighting does not recognize the thesaurus words - only the actual query text. Taking a look at the XSLT this makes sense, but I'd like the thesaurus results to be highlighted. Any experience in doing this?
Hi,
Thanks for the post.
But I was wondering of how to get the “GUID” of SSP in the SharePoint Server programmatically.
I am struggling to get it.
Thanks
how does 'net stop/start osearch' differ from the 'stsdadm -o osearch -action stop -f blah blah blah ' ?
More to the point; there are - by my count - 3 ways to start/stop the 'osearch' and 'spsearch':
1. admin gui
2. stsadm
3. basic dos net start/stop
Can anyone quantify the implications/differences of each --- if any?
Thankks
陈道明最爱夏利林肯升降平台 铝合金升降机 液压机械 升降台 数据恢复沪指十月开门三连阴 升降机 RAID数据恢复 北京心理咨询 服务器数据恢复宋丹丹放狠话不上春晚:除非被枪逼着 液压升降机 无缝钢管 扁被传唤出庭对质 博客 升降机 Google排名 网站优化 心理咨询 升降机 升降平台 高空作业平台 一颗小行星险与地球撞击升降机 升降平台 无缝管 升降平台分析:杨致远重返前台能走多远 弹簧 升降机 升降平台 登车桥 升降机升降机 前三季业绩预告冰火两重天
同声翻译 同声传译 会议设备租赁 同声翻译设备 文件柜 论文发表 同声传译设备租赁 同声传译 表决器租赁 更衣柜 论文发表 会议设备租赁
神七飞天:中国人首次漫步太空 Gas Alarm,Gas Detector,Breathalyser,Breathalyzer,Alcohol Tester,Co Alarm.Breathalyzer Alcohol TesterGas Alarm Breathalyser Co Alarm Gas Detector Co Detector Alcohol Tester
Cheap WoW Gold, World Of Warcraft Gold,the best WoW PowerLeveling and other MMORPG Power Levelingwow gold wow gold wow gold 章子怡想拍《孟小冬传》 黎明表态称"很理解"
身辺調査
妻浮気
悩み相談
探偵紹介
探偵事務所
行動調査
追跡調査
探偵 調査
素行調査
スペシャルティコーヒー
シャネル 通販
キャバクラ ドレス
先物 ネット
TAYA
スキンケア化粧品
FX ゲーム
サイディング
医学論文翻訳
アンドロペニス
まつげエクステ スクール
結婚式 二次会
おとなのおもちゃ
フェイシャル 麻布十番
まつげ エクステ
電話占い
結婚式
電報
税理士 東京
カップリングパーティー
結婚式 ウェルカムボード
株式 情報
中京競馬場
レディースシューズ 小さい
SEO
モバイルSEO
順位チェック
結婚関連情報
[[PR]]
生活役立ち
GooGoo
トゥグテョランダ
季節の花
カップリングパーティー
カラーコンタクト
ホメオスタシス
無料 av サンプル 動画
ブリザーブドフラワー
FX 初心者
FX 初心者
FX 比較
ブライダル エステ
安い 国際電話
情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,A片,視訊聊天室,聊天室,視訊,視訊聊天室,080苗栗人聊天室,上班族聊天室,成人聊天室,中部人聊天室,一夜情聊天室,情色聊天室,視訊交友網
免費A片,AV女優,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,A片,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛
A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,avdvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX,成人論壇
情趣用品,A片,免費A片,AV女優,美女視訊,情色交友,色情網站,免費AV,辣妹視訊,美女交友,色情影片,成人網站,H漫,18成人,成人圖片,成人漫畫,成人影片,情色網
情趣用品,A片,免費A片,日本A片,A片下載,線上A片,成人電影,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人文章,成人影城,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,臺灣情色網,色情,情色電影,色情遊戲,嘟嘟情人色網,麗的色遊戲,情色論壇,色情網站,一葉情貼圖片區,做愛,性愛,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,美女交友,做愛影片
av,情趣用品,a片,成人電影,微風成人,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,愛情公寓,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,aio,av女優,AV,免費A片,日本a片,美女視訊,辣妹視訊,聊天室,美女交友,成人光碟
情趣用品.A片,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,色情遊戲,色情網站,聊天室,ut聊天室,豆豆聊天室,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,免費A片,日本a片,a片下載,線上a片,av女優,av,成人電影,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,成人網站,自拍,尋夢園聊天室
Yourwow gold PVP statswow gold are now buy wow goldalso active buy wow goldin your charactercheap wow gold tab, keepingcheap wow gold track of wow power levelingyour performance wow power levelingin PVP. power leveling In additionpower leveling your new PVP .
Post a Comment
Links to this post:
Create a Link
<< Home