Friday, October 26, 2007

Adventures in Office SharePoint 2007 Search Thesaurus

On a recent project one of my suggestions to my client on how to help new employees joining the organization to better and more easily find content on the corporate portal was to employ the handy search thesaurus feature of SharePoint.

The search thesaurus feature is one where you can configure SharePoint search so that if one word is searched, such as “HR” it could also concurrently search the words “Human Resources” and “Talent” in the same search query. In SharePoint terms this would be an “expansion set”. You also have the option of replacing the search word so that if an end user searched for the word “Windows NT” the search could actually query some words such as “Windows” and “Windows Server” while not searching the word “Windows NT”, this is known as a “replacement set”.

This is all well and good, however I quickly found the implementation of this cool feature was more easily said than done.

After updating what I thought was the appropriate .XML file it was unclear what needed to be done to “activate” the thesaurus to take hold (IISRESET? Run a full crawl?), also, it was rather unclear which files needed to be updated and I was unsure if the .XML file I updated was the correct one.

After searching around the web and hitting that standard issue SharePoint books I became even more confused. It seemed some things worked for some individuals but not others. So now, after having a bit of a struggle, I think I got this thing “mostly” figured out and I wanted to share my findings. The following are the steps I took to get the search thesaurus feature working for me and my client.

How to configure the search thesaurus in Office SharePoint Server 2007

Step 1: Find the correct “tsxxx.xml” file

There are a number of “tsxxx.xml” files (where “xxx” is actually a three letter abbreviation of a language) such as “tsenu.xml” for American English and “tseng.xml” for British English. But there is also the “tsneu.xml” file, this is the neutral language file that will work no matter what language SharePoint is configured to be serving. It was the “tsneu.xml” file that I ended up using in the end for my project.

There are actually a couple of locations of the “tsxxx.xml” files. The one that I found that actually impacted my search results in the end is the following (if you installed SharePoint in the standard file location):

C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\

Although, in my client’s case we actually moved the index location to a SAN for search index growth so in that case the location of the “tsxxx.xml” files was somewhere to the effect of:

E:\Portal Search Index\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\

Now, from what I read, theoretically if you update the “tsxxx.xml” files located in the folder:

C:\Program Files\Microsoft Office Servers\12.0\Data\Config

These files will at some point get copied down into the deeper (C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\{GUID for SSP}\Config\) folder. However after much extensive testing this never worked for me.

Step 2: Verify you are working with the correct “tsxxx.xml” file

The next step is to verify you are working with the correct “tsxxx.xml” file. I just took the “tsneu.xml” file and removed the XML comments around the main nodes in the default example file to make the provided run and jog expansion set active. From there, I uploaded two simple text files: one named jog.txt and another named “run.txt” and inside each had just the word “jog” and “run” respectively. I just wanted to make the simplest example work first.

Once I did this I had to do something to make my XML changes take hold. I read that you need to do an IISRESET and also a full crawl on the content source, but in the end I found out that the only thing I had to do to make my “tsxxx.xml” file take hold was to restart the Office SharePoint Server Search service, also known as “osearch”. I actually wrote a little batch file (commands shown below) to stop and start the service because I kept trying different files over and over again in my testing.

Search Service Restart Batch File Commands:

net stop osearch
net start osearch
PAUSE

Step 3: Update the correct “tsxxx.xml” file

So this is where I really got stuck. After finally figuring out which file I needed to update and how to make it take hold, my final series of expansion sets I got back from my client were not taking hold in my search results and it was a real mystery. What I had done was provide my client with a worksheet to fill out and they really did a great job, so I had a really large number of expansion sets each with many words. Turns out this is actually what got me in trouble in the end – turns out you can’t have duplicate entries in your expansion or replacement sets. SharePoint gets horribly confused. To illustrate, consider the following XML snippet:

<expansion>
<sub>word1</sub>
<sub>word2</sub>
<sub>word3</sub>
</expansion>

<expansion>
<sub>word5</sub>
<sub>word2</sub>
<sub>word6</sub>
</expansion>

This above example would actually not work and interestingly enough, there would be no errors or indication produced by SharePoint (that I could find). The reason for this not working is because “word2” is used in both sets. This seems to be an undocumented bug of some sort. So, the big lesson here was to remove all duplicates across each expansion set. Once I did this, my results worked perfectly!

Labels:

Wednesday, January 17, 2007

Searching Custom Column Values in MOSS 2007

An issue that came up on a recent project was how can we search custom meta-data on document libraries in MOSS 2007? So for instance, let’s say that in a set of document libraries in a site or across a number of sites each will have a custom column named “Department” that is of a data type “Choice” and the possible values are each department in an organization. Now let’s say that end-users would like to use advanced search to use the Department field as search criteria.

I knew this was possible with SPS 2003, however, I sadly never had a chance or a reason to try it out – but now I did with SharePoint 2007, yippee! So in order to figure this out, I did a cursory search online and hit TechNet’s MOSS section, but alas, I could not find anything showing exactly what I was looking for. So through some brainstorming with a colleague and also some trial and error I was able to get it.

First off, let's create a custom column of type "Choice" in a sample document library. I've named mine "Department" and given it three possible value options ("Sales", "IT", "Marketing").

Now that we have our column created, we need to run a full crawl in the Shared Service Provider (SSP) to make SharePoint aware of this new piece of meta-data. After the crawl is run, our new property will be captured as (what in SharePoint parlance is called) a "Crawled Property", however that is not enough to perform searching on it, we need to also then create a separate "Managed Property" and map the crawled property to the manually created Managed Property (also as a side note, you can map more than one crawled property to a managed property). Once we do this, SharePoint will be fully aware of the property's existence and we can use it as search criteria. The steps for all of this are below:
  1. Open up the admin page to the SSP that is hosting the web application your document library is sitting in
  2. Click on “Search Setting”, then “Content sources and crawl schedules”

  3. Find the content source that contains your site and perform a full crawl on it

    Now SharePoint is aware our your new piece of meta-data and we can create a "Managed Property" to map to it.

  4. Go back to the "Search Settings" page for your SSP and click "Metadata property mappings"

  5. On the "Metadata Property Mappings" page click the "New Managed Property" link

Now on this page, we need to fill out the form for our new managed property. We need to give it a unique name (be careful here because many are already used, such as "Department"), select the correct data type, and then choose which crawled properties map to this new managed property. In this case, it will be only one crawled property and it will be the custom column we had created earlier named "Department".

  1. Give a name for the managed property (I chose "OrgDepartment" to make it unique).
  2. Select the correct data type, in this case "Text"
  3. Now in the "Mappings to crawled properties" section of the form, click "Add Mapping" to bring up the crawled property selection dialog

  4. This is a cool little widget in that you can actually search for the property you are looking for, in this case "Department". So enter "Department" in the "Crawled property name" text box and click "Find"

  5. In the results I found one named "Department(Text)" and chose that. Now this is something I am still not completely clear on because also in the results list was a crawled property named "ows_Department(Text)" and I am not sure which one is best to use, but I can say that choosing the one without the "ows_" prefix worked just fine. Maybe when better documentation comes out I can get to the bottom of this!
  6. Once that we have this form filled out, click OK and perform another full crawl on the necessary content source
At this point our crawled property is now correctly managed, which is great, but in order to actually use it as search criteria we need to customize our search center advanced search page.

  1. Once that we have this form filled out, click OK then go back to the "Manage Content Sources" page and perform another full crawl on the necessary content source
  2. Go to a search center site and click "Advanced Search"
  3. Click on "Site Actions" then "Edit Page"
  4. In the "Advanced Search Box" click "Edit" on the top right then click "Modify Shared Web Part"
  5. On the pane to the right expand the "Properties" section
In the "Properties" text box under the "Properties" section is the XML defining which properties are searchable when which "Result type" option is selected (such as "Word Documents" or "Excel Documents"). We need to update this XML snippet with the correct nodes to define our new "Department" meta-data. Copy and paste the XML into your favorite text editor and make the following updates:

  1. In the <PropertyDefs> node add the following line:

    <PropertyDef Name="OrgDepartment" DataType="text" DisplayName="Department" />

    This now adds a definition for our managed property into the search page, however, we will still need to add one more node to make it searchable on the page. The "Name" attribute must match up to the name of the managed property we created earlier and whatever is in the "DisplayName" attribute will be shown in the drop-down list on the UI of the advanced search page.

  2. In one of the <ResultType> nodes (it is your choice, each ResultType node points to a different document type or overarches across all or a set of document types.) add the following node:

    <PropertyRef Name="OrgDepartment" />

    At this point you just need to exit edit mode and refresh the page and -kapow!- your new piece of meta-data is available in advanced search as search criteria!

Labels: , , , , , ,