tony testa posted on March 23, 2009 14:57

I recently had to dive into MOSS search crawl rules and figured I’d post some good references for myself that would my first stop if I have to do something with MOSS crawl rules again.

First things first, if you want to do some custom crawl rules specifically with FBA, you’ll need the tool “addrule.exe” from MS.  This tool allows you to build an custom crawl rule in XML format and inject it into the MOSS search.  Obviously this tool will need to be run on the server that does the crawling.

http://www.microsoft.com/downloads/details.aspx?FamilyId=D5090BC4-5B4F-411B-8CDE-E37D33F7EFDF&displaylang=en

After downloading the tool your obviously going to want to know the XML schema for the crawl rules, so check out the following MSDN article.

http://msdn.microsoft.com/en-us/library/bb852172.aspx#MOSS2007CrawlContentFormsBased_Overview

 

With those 2 links you should be well on your way to working with some custom search crawl rules especially when dealing with FBA sites and search.


Posted in:   Tags:
tony testa posted on March 23, 2009 14:51

MSDN has a  has a great listing of the protocol handlers that are supported out of the box and explains that weird “SPS” syntax you might see in your search crawl settings.  Here is the MSDN article “Plan to crawl content”.

Protocol handler

Used to crawl

Bdc

Business Data Catalog

Bdc2

Business Data Catalog URLs (internal protocol)

File

File shares

http

Web sites

https

Web sites over Secure Sockets Layer (SSL)

Notes

Lotus Notes databases

Rb

Exchange public folders

Rbs

Exchange public folders over SSL

Sps

People profiles from Windows SharePoint Services 2.0 server farms

Sps3

People profile crawls of Windows SharePoint Services 3.0 server farms only

Sps3s

People profile crawls from Windows SharePoint Services 3.0 server farms only over SSL

Spsimport

People profile import

Spss

People profile import from Windows SharePoint Services 2.0 server farms over SSL

Sts

Windows SharePoint Services 3.0 root URLs (internal protocol)

Sts2

Windows SharePoint Services 2.0 sites

Sts2s

Windows SharePoint Services 2.0 sites over SSL

Sts3

Windows SharePoint Services 3.0 sites

Sts3s

Windows SharePoint Services 3.0 sites over SSL


Posted in:   Tags:

Again this is another link to other information but this is mainly for my own reference.  MSDN has a good search refence which lets you know the file extensions in MOSS search that are crawled out of the box and that have iFilter support out of the box.  In Addition, it has a great listing of the protocol handlers that are supported out of the box and explains that weird “SPS” syntax you might see in your search crawl settings.  Here is the MSDN article “Plan to crawl content”.

File name extension

Default IFilter support

Default file type inclusions

ascx

Yes

Yes

asm

Yes

No

asp

Yes

Yes

aspx

Yes

Yes

bat

Yes

No

c

Yes

No

cmd

Yes

No

cpp

Yes

No

css

Yes

No

cxx

Yes

No

def

Yes

No

dic

Yes

No

doc

Yes

Yes

docm

Yes

Yes

docx

Yes

Yes

dot

Yes

Yes

eml

Yes

Yes

exch

No

Yes

h

Yes

No

hhc

Yes

No

hht

Yes

No

hpp

Yes

No

hta

Yes

No

htm

Yes

Yes

html

Yes

Yes

htw

Yes

No

htx

Yes

No

jhtml

No

Yes

jsp

No

Yes

lnk

Yes

No

mht

Yes

Yes

mhtml

Yes

Yes

mpx

Yes

No

msg

Yes

Yes

mspx

No

Yes

nsf

No

Yes

odc

Yes

Yes

one

No

No

php

No

Yes

pot

Yes

No

pps

Yes

No

ppt

Yes

Yes

pptm

Yes

Yes

pptx

Yes

Yes

pub

Yes

Yes

stm

Yes

No

tif

Yes

Yes

tiff

No

Yes

trf

Yes

No

txt

Yes

Yes

url

No

Yes

vdx

No

Yes

vsd

No

Yes

vss

No

Yes

vst

No

Yes

vsx

No

Yes

vtx

No

Yes

xlb

Yes

No

xlc

Yes

No

xls

Yes

Yes

xlsm

Yes

Yes

xlsx

Yes

Yes

xlt

Yes

No

xml

Yes

Yes


Posted in:   Tags:

I ran across a really good blog posting which gives you the crawl rules that you can use to eliminate useless results in your search results, such as view/edit pages.  Maybe I missed this somewhere, but frankly I don’t recall stumbling across these and I think its something that should be published more.  I am posting the crawl rules here so that I can reference them later incase the blog goes down, but the source of these rules is here.

 

*://*webfldr.aspx*
*://*my-sub.aspx*
*://*mod-view.aspx*
*://*allitems.aspx*
*://*all forms.aspx*
*://*DispForm.aspx

 

To add these go into the SSP, into search settings, and then into crawl rules.

Make sure that after adding these rules you do a full crawl to make sure that your results are clean and that the rules are applied.


Posted in:   Tags:

Yet another interesting MSDN forum post I answered.  A user asked how they could programmatically determine the installation directory of SharePoint.  After multiple searches on the web, etc. I stumbled on the answer….read the registry!

Here is some sample code (i havent actually tested this code but the logic is sound):

using Microsoft.Win32;
...
RegistryKey masterKey = Registry.LocalMachine.CreateSubKey("SOFTWARE\\Microsoft\\Office Server\\12.0");
if (masterKey == null)
{
   Console.WriteLine ("Null Masterkey!");
}
else
{
   Console.WriteLine ("MyKey = {0}", masterKey.GetValue ("InstallPath"));
}
masterKey.Close();

 

the only issue I see here is that depending your farm (a 5 server farm for example), is if you install SharePoint in a different directory on each web front end.  In that case, depending on what server the request gets processed, you may get a different location that if you run the same request on a different WFE where the install directory was different. 

Honeslty I've never run into a case where the install directory was different on each WFE since most WFE's mirror each other, just thought I would throw it out there to you as well as the fact that it would just be bad practice to have the WFE’s use different install directories.

Another user responded to the post and said you could do it using the SPUtility class’s GetGenericSetupPath method, which seems a bit more logical and using the SharePoint API.

using Microsoft.SharePoint.Utilities;
...
string featurePath = SPUtility.GetGenericSetupPath("12");

Posted in:   Tags: , ,
Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2017 Tony Testa's World