Adventures in Office SharePoint 2007 Search Thesaurus
On a recent project one of my suggestions to my client on how to help new employees joining the organization to better and more easily find content on the corporate portal was to employ the handy search thesaurus feature of SharePoint.
The search thesaurus feature is one where you can configure SharePoint search so that if one word is searched, such as “HR” it could also concurrently search the words “Human Resources” and “Talent” in the same search query. In SharePoint terms this would be an “expansion set”. You also have the option of replacing the search word so that if an end user searched for the word “Windows NT” the search could actually query some words such as “Windows” and “Windows Server” while not searching the word “Windows NT”, this is known as a “replacement set”.
This is all well and good, however I quickly found the implementation of this cool feature was more easily said than done.
After updating what I thought was the appropriate .XML file it was unclear what needed to be done to “activate” the thesaurus to take hold (IISRESET? Run a full crawl?), also, it was rather unclear which files needed to be updated and I was unsure if the .XML file I updated was the correct one.
After searching around the web and hitting that standard issue SharePoint books I became even more confused. It seemed some things worked for some individuals but not others. So now, after having a bit of a struggle, I think I got this thing “mostly” figured out and I wanted to share my findings. The following are the steps I took to get the search thesaurus feature working for me and my client.
How to configure the search thesaurus in Office SharePoint Server 2007
Step 1: Find the correct “tsxxx.xml” file
There are a number of “tsxxx.xml” files (where “xxx” is actually a three letter abbreviation of a language) such as “tsenu.xml” for American English and “tseng.xml” for British English. But there is also the “tsneu.xml” file, this is the neutral language file that will work no matter what language SharePoint is configured to be serving. It was the “tsneu.xml” file that I ended up using in the end for my project.
There are actually a couple of locations of the “tsxxx.xml” files. The one that I found that actually impacted my search results in the end is the following (if you installed SharePoint in the standard file location):
C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Although, in my client’s case we actually moved the index location to a SAN for search index growth so in that case the location of the “tsxxx.xml” files was somewhere to the effect of:
E:\Portal Search Index\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Now, from what I read, theoretically if you update the “tsxxx.xml” files located in the folder:
C:\Program Files\Microsoft Office Servers\12.0\Data\Config
These files will at some point get copied down into the deeper (C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\) folder. However after much extensive testing this never worked for me.
Step 2: Verify you are working with the correct “tsxxx.xml” file
The next step is to verify you are working with the correct “tsxxx.xml” file. I just took the “tsneu.xml” file and removed the XML comments around the main nodes in the default example file to make the provided run and jog expansion set active. From there, I uploaded two simple text files: one named jog.txt and another named “run.txt” and inside each had just the word “jog” and “run” respectively. I just wanted to make the simplest example work first.
Once I did this I had to do something to make my XML changes take hold. I read that you need to do an IISRESET and also a full crawl on the content source, but in the end I found out that the only thing I had to do to make my “tsxxx.xml” file take hold was to restart the Office SharePoint Server Search service, also known as “osearch”. I actually wrote a little batch file (commands shown below) to stop and start the service because I kept trying different files over and over again in my testing.
Search Service Restart Batch File Commands:
net stop osearch
net start osearch
PAUSE
Step 3: Update the correct “tsxxx.xml” file
So this is where I really got stuck. After finally figuring out which file I needed to update and how to make it take hold, my final series of expansion sets I got back from my client were not taking hold in my search results and it was a real mystery. What I had done was provide my client with a worksheet to fill out and they really did a great job, so I had a really large number of expansion sets each with many words. Turns out this is actually what got me in trouble in the end – turns out you can’t have duplicate entries in your expansion or replacement sets. SharePoint gets horribly confused. To illustrate, consider the following XML snippet:
<expansion>
<sub>word1</sub>
<sub>word2</sub>
<sub>word3</sub>
</expansion>
<expansion>
<sub>word5</sub>
<sub>word2</sub>
<sub>word6</sub>
</expansion>
This above example would actually not work and interestingly enough, there would be no errors or indication produced by SharePoint (that I could find). The reason for this not working is because “word2” is used in both sets. This seems to be an undocumented bug of some sort. So, the big lesson here was to remove all duplicates across each expansion set. Once I did this, my results worked perfectly!
The search thesaurus feature is one where you can configure SharePoint search so that if one word is searched, such as “HR” it could also concurrently search the words “Human Resources” and “Talent” in the same search query. In SharePoint terms this would be an “expansion set”. You also have the option of replacing the search word so that if an end user searched for the word “Windows NT” the search could actually query some words such as “Windows” and “Windows Server” while not searching the word “Windows NT”, this is known as a “replacement set”.
This is all well and good, however I quickly found the implementation of this cool feature was more easily said than done.
After updating what I thought was the appropriate .XML file it was unclear what needed to be done to “activate” the thesaurus to take hold (IISRESET? Run a full crawl?), also, it was rather unclear which files needed to be updated and I was unsure if the .XML file I updated was the correct one.
After searching around the web and hitting that standard issue SharePoint books I became even more confused. It seemed some things worked for some individuals but not others. So now, after having a bit of a struggle, I think I got this thing “mostly” figured out and I wanted to share my findings. The following are the steps I took to get the search thesaurus feature working for me and my client.
How to configure the search thesaurus in Office SharePoint Server 2007
Step 1: Find the correct “tsxxx.xml” file
There are a number of “tsxxx.xml” files (where “xxx” is actually a three letter abbreviation of a language) such as “tsenu.xml” for American English and “tseng.xml” for British English. But there is also the “tsneu.xml” file, this is the neutral language file that will work no matter what language SharePoint is configured to be serving. It was the “tsneu.xml” file that I ended up using in the end for my project.
There are actually a couple of locations of the “tsxxx.xml” files. The one that I found that actually impacted my search results in the end is the following (if you installed SharePoint in the standard file location):
C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Although, in my client’s case we actually moved the index location to a SAN for search index growth so in that case the location of the “tsxxx.xml” files was somewhere to the effect of:
E:\Portal Search Index\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\
Now, from what I read, theoretically if you update the “tsxxx.xml” files located in the folder:
C:\Program Files\Microsoft Office Servers\12.0\Data\Config
These files will at some point get copied down into the deeper (C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\) folder. However after much extensive testing this never worked for me.
Step 2: Verify you are working with the correct “tsxxx.xml” file
The next step is to verify you are working with the correct “tsxxx.xml” file. I just took the “tsneu.xml” file and removed the XML comments around the main nodes in the default example file to make the provided run and jog expansion set active. From there, I uploaded two simple text files: one named jog.txt and another named “run.txt” and inside each had just the word “jog” and “run” respectively. I just wanted to make the simplest example work first.
Once I did this I had to do something to make my XML changes take hold. I read that you need to do an IISRESET and also a full crawl on the content source, but in the end I found out that the only thing I had to do to make my “tsxxx.xml” file take hold was to restart the Office SharePoint Server Search service, also known as “osearch”. I actually wrote a little batch file (commands shown below) to stop and start the service because I kept trying different files over and over again in my testing.
Search Service Restart Batch File Commands:
net stop osearch
net start osearch
PAUSE
Step 3: Update the correct “tsxxx.xml” file
So this is where I really got stuck. After finally figuring out which file I needed to update and how to make it take hold, my final series of expansion sets I got back from my client were not taking hold in my search results and it was a real mystery. What I had done was provide my client with a worksheet to fill out and they really did a great job, so I had a really large number of expansion sets each with many words. Turns out this is actually what got me in trouble in the end – turns out you can’t have duplicate entries in your expansion or replacement sets. SharePoint gets horribly confused. To illustrate, consider the following XML snippet:
<expansion>
<sub>word1</sub>
<sub>word2</sub>
<sub>word3</sub>
</expansion>
<expansion>
<sub>word5</sub>
<sub>word2</sub>
<sub>word6</sub>
</expansion>
This above example would actually not work and interestingly enough, there would be no errors or indication produced by SharePoint (that I could find). The reason for this not working is because “word2” is used in both sets. This seems to be an undocumented bug of some sort. So, the big lesson here was to remove all duplicates across each expansion set. Once I did this, my results worked perfectly!
Labels: Search










