Configuring Enterprise Search in SharePoint 2010

Most of the research for this blog is from here validated by Jon Waite. Also some from here maybe

Some interesting facts about search I continually dig up.

Search topology operations in SharePoint Server 2010 (white paper)

Search PowerShell commands

3 Tier Farm Deployment              Blog article step by Step by Bill B

SharePoint can only crawl using a Windows NTLM authentication account. Your content source must authorize the Windows account sent as part of the crawl request in order to access the document content. The gatherer is still not a claims-aware application and will not access a content source that has claims authentication only.

Security trimming within the index is possible when the crawler can obtain Access Control Lists (ACLs) for each item and store them in the index. You may not have ACLs available at index time ,crawling web sites does not allow for the indexing of ACLs, therefore security trimmers are at query-time. You may want to perform custom security trimming, SharePoint Server search provides support for these scenarios through the ISecurityTrimmer2 interface in the Microsoft.Office.Server.Search.Query namespace. See how to walk through here

The search administration component cannot be moved to another server: it resides on the server where the Search Service Application was created. There can only be one search administration component per farm. Test 1: I first installed it in server 8, moved it to server 7, then search became disabled in services on server 8, it also got removed from topology (but was still in pick list) and search still worked where Admin and query were both on server 7 and crawl on server 9. Then moved both crawl and admin back to server 8, query on 7, then 9 became disabled and removed from topology.

Search Query and Site Settings Service: does this need to be started on each wfe? Yes, it does, I tested turning it on and off on all 3 servers in a lab in various on/off scenarios

Interesting article here about Search Security Trimming

Crawl Logs are in table called MSSCrawlUrlReport in Crawl Store db.

Service Account
The server farm account is used to create and access your configuration database. It also acts as the application pool identity account for the SharePoint Central Administration application pool, and it is the account under which the Windows SharePoint Services Timer service runs. The SharePoint Products Configuration Wizard adds this account to the SQL Server Login accounts, the SQL Server dbcreator server role, and the SQL Server securityadmin server role. The user account that you specify as the service account must be a domain user account, but it does not need to be a member of any specific security group on your Web servers or your database servers. We recommend that you follow the principle of least privilege, and specify a user account that is not a member of the Administrators group on your Web servers or your database servers.


ImportantImportant:  (why is this important)

When creating a new Web application or extending the existing Web application into a new zone initially, ensure that the public URL is the URL that end users will use to browse to the Web application. If you are using reverse proxy servers or load balancers, you may also have to add internal URLs for alternate access mapping (AAM). We recommend that you configure AAM before creating a site collection. For more information, see Logical architecture components (SharePoint Server 2010).

Index Partitions subdivide the full-text index. A new Query Component can either be the first component in a new partition (Query Component 0) or an additional component in an existing partition. What does this really mean?

Description of Enterprise Search from SharePoint Brew

Notes: While reading below, indexer and crawler component mean same thing, Query server and Query Component mean same thing and both run within MSSearch.exe process called “SharePoint Server Search 14” in Windows.

In 2010 the indexer no longer stores a copy of an index on the index server. As items are being indexed, they are streamed\propagated to a Query server. Because the indexer no longer holds a physical copy of the index, it’s no longer a single point of failure. The method for copying indexes to machines has not changed, as the index partitions are still shared and files are still copied via these shares.

Several things have changed for Query infrastructure, it is also componentized so now you only provision what you need. In SharePoint 2007, the Query Processor ran on any WFE. In SharePoint 2010, any server can run the Query Processor. It’s no longer tied into a server running the Query role. By default, when you provision a search service application using Central Administrator or Farm Configuration Wizard, a Crawler/index  component and Crawl database is provisioned for you.

It’s possible to provision multiple Indexers for a single Search service application. Just because an index doesn’t physically reside on the indexer doesn’t mean that you should only have one. Fault tolerance can be achieved by provisioning a secondary crawl component on a secondary server. By having multiple crawl components mapped to the same crawl database, fault tolerance is achieved. Fault tolerance for the Crawl DB can be achieved by SQL Mirroring. Performance is improved with two indexers crawling the content instead of one, the load is distributed across both index servers, it is not possible to double crawl with duplicate items in index because Items are crawled and “picked up” in batches by both index servers for processing.

Query consists of components which can be scaled out among multiple servers to improve performance. The WFE serving the search request uses the associated search service application proxy to connect to a server running the Query Processor, it gather results merges\security trims and return results back to WFE for display to user. Query Servers/Components hold a full or partial piece of the search index. The indexer crawls content and builds a temporary index. The Indexer propagates portions of the temporary index over to Query Server to be indexed. Query Servers contain a copy of the entire or partial index referred to as an Index Partition.

A single search service application can have multiples of the following: Property Store DB, Query Components  and Query Processors. A query component is mapped to only one Property Store DB. It’s possible to provision multiple Property Store databases and Query components for a single Search service application. Query components can be provisioned to partition an index and\or mirror an index in order to provide fault tolerance.

The size of an index can easily become a bottleneck for query performance. Index partition can contain the entire index or a portion of the index. By creating additional query components, a new index partition is created and owns a portion of the index. By partitioning indexes, query times are reduced and is as simple as provisioning new Query Components from the Search Application Topology section in Central Admin.

The Query processor is responsible for processing a Query and runs under w3wp.exe process. It retrieves results from Property Store DB and the Index\Query Components. Once results are retrieved, they are packaged\security trimmed and delivered back to the requester which is the WFE that initiated the request. The Query Processor will load balance requests based on round robin if more than one Query Component (mirrored) exists within the same Index Partition. The exception to this rule is if one of the Query Component’s is marked as fail over only.

Question: What if I partitioned off my index and I have multiple Query Components provisioned each serving a partition of the index? How does Query Processor know which partition to connect to in order to accurately retrieve results?

Answer: It doesn’t! The Query Processor will connect to every single non-mirrored Query component that contains a partition of the Index to retrieve results.

Question: What if I created multiple Property Store Databases for performance reasons? How does Query Processor know which Property store to connect to in order to accurately retrieve results?

Answer: It doesn’t! The Query Processor will connect to every single Property Store DB to retrieve results.

Centralizing Search – Query Processor functions in Parent\Child Farm

In a Publishing/Consumer farm scenario, the Query Processor always runs in the farm where the Search Service Application resides. So if Search Service Application resides in Publishing farm, Query Processor only runs in publishing farm. The Consumer farm utilizes the associated Search Service Application proxy to make the connection over WCF to a Query Processor in the publishing farm.

Note about Dedicated WFE for crawling

Using a dedicated web front end for crawling does not depend on hosts file modifications on the indexer to ensure crawl from specific Web Front End.

Upon provisioning you can specify whether the database is “dedicated” or not., “dedicated”, will NOT participate in the automatic host distribution performed by the Master Crawler, and will only contain hosts specified by an in a Host Distribution Rule. As long as there is more than one crawl database, any crawl database can be specified in a Host Distribution Rule.

How to Make changes to your Search Topology

Before arbitrarily provisioning new query components and property store DB’s, observe the current environment\query health so some evidence can be gathered before making this important decision.

To add a new Query server/component, you add a Mirror, the index partition is then copied from the initial Query server to the new server. Query to Query propagation is new in 2010. Once the new Query server is online, it begins its propagation changes from the indexer.

How make topology changes Central Admin > Search Service Application > Manage from Ribbon. 1. Hit Modify button 2. Select New Property Database or Query Component and enter appropriate options at your discretion 3. Apply Topology Changes

You provision Query Processor role on a WFE or other, in Central Admin >  System Settings > Service on Server > Start the Search Query and Site Settings Service.