Trim SharePoint Search Results for Better Security

Microsoft SharePoint search uses an account that usually has full read access across the repository to index its contents. So it’s important that when a user queries for some content, he should be restricted to view only the documents he has permission to see. SharePoint uses the access control list (ACL) associated with each document to trim out query results that users have no permission to view, but the default trimming provided by SharePoint (out-of-box trimming) may not always be adequate to meet data security needs. In that case, you may want to further trim the results depending on an organization’s authentication structure.

This is where the SharePoint custom security trimming infrastructure is useful. SharePoint lets you implement business logic in a separate module and then integrate it into the workflow of the query processor that serves the queries. In the security trimming path, custom query trimming follows out-of-box security trimming. So the number of query results after custom trimming must be equal to or less than the number of documents recalled before registering the custom security trimmer (CST) assembly.

Before delving into the CST architecture, we’ll provide a quick view of SharePoint search and the new claims authentication infrastructure.

SharePoint Search Overview

At a high level, the search system can be divided into two discrete parts: the gatherer pipeline and the query processor pipeline.

Gatherer Pipeline This part is responsible for crawling and indexing content from various repositories, such as SharePoint sites, HTTP sites, file shares, Lotus Notes, Exchange Server and so on. This component lives inside MSSearch.exe. When a request is issued to crawl a repository, the gatherer invokes a filter daemon, MssDmn.exe, to load the required protocol handlers and filters necessary to connect, fetch and parse the content. Figure 1 represents a simplified view of the gatherer pipeline.

image: A Simplifed View of the SharePoint Gatherer Pipeline

Figure 1 A Simplifed View of the SharePoint Gatherer Pipeline

SharePoint can only crawl using a Windows NTLM authentication account. Your content source must authorize the Windows account sent as part of the crawl request in order to access the document content. Though claims authentication is supported in SharePoint 2010, the gatherer is still not a claims-aware application and will not access a content source that has claims authentication only.

Query Processor Pipeline In SharePoint 2010, two of the most important changes in the query processor pipeline are in its topological scalability and authentication model. In Microsoft SharePoint Server (MOSS) 2007, the query processor (search query and site settings service, referred to as search query service from here on) runs in the same process as Web front end (WFE), but in SharePoint 2010 it can run anywhere in the farm—and it also runs as a Web service.

The WFE talks to the search query service through Windows Communication Foundation (WCF) calls. The search query service is now completely built on top of the SharePoint claims authentication infrastructure. This decouples SharePoint search from its tight integration with Windows authentication and forms authentication. As a result, SharePoint now supports various authentication models. The search query service trims the search results according to the rights of the user who issues the query. Custom security trimmers are called by the search query service after out-of-box trimming has completed. See Figure 2 for the various components involved when a query is performed.

image: Workflow of a Query Originating from the Search Center in a SharePoint Site

Figure 2 Workflow of a Query Originating from the Search Center in a SharePoint Site

Custom security trimming is part of the query pipeline, so we’ll limit this discussion to components of the query pipeline.

Claims Authentication in SharePoint 2010

A basic understanding of claims authentication support in SharePoint 2010 is required to implement custom trimming logic inside a CST assembly. In the claims authenticated environment, the user identity is maintained inside an envelope called a security token. It contains a collection of identity assertions or claims about the user. Examples of claims are username, e-mail address, phone number, role and so on. Each claim will have various attributes such as type and value. For example, in a claim the UserLogonName may be the type and the name of the user who is currently logged in may be the value.

Security tokens are issued by an entity called a security token service (STS). This is a Web service that responds to user authentication requests. Once the user is authenticated, STS sends back a security token with all the user rights. STS can be configured either to live inside the same SharePoint farm or act as a relying party to another STS that lives outsides the farm: Identity Provider-STS (IP-STS) and Relying Party-STS (RP-STS), respectively. Whether you want to use IP-STS or RP-STS has to be carefully considered while designing SharePoint deployment.

SharePoint uses the default claims provider shipped with the product in a simple installation. Even if you set up the farm completely using Windows authentication, when a query is issued, a search service application proxy will talk to STS to extract all the claims of the user in a security token. This token is then passed to the search query service through a WCF call.

Workflow of Custom Security Trimming

The workflow logic of a CST can be represented in a simple flowchart as shown in Figure 3.

image: The Workflow Logic of a CST

Figure 3 The Workflow Logic of a CST

As stated earlier, the search query service first performs out-of-box security trimming and then looks for the presence of any CSTs associated with the search results. The association of a particular content source with a CST is done by defining a crawl rule for that specific content source. If the search query service finds any CST associated with any of the URLs in the search results, it calls into that trimmer. Trimmers are loaded into the same IIS worker process, w3wp.exe, in which the search query service is running.

Once the trimmer is loaded, the search query service calls into the CheckAccess method implemented inside the trimmer with an out-of-box trimming result set associated with the crawl rule that you defined earlier. The CheckAccess method decides whether a specific URL should be included in the final result set sent back to the user. This is done by returning a bit array. Setting a bit inside this array to either true or false will “include” or “block” the URL from the final result set. In case you want to stop processing the URLs due to performance or some unexpected reason, you must throw a PluggableAccessCheckException. If you throw after processing a partial list of URLs, the processed results are sent back to the user. The search query service will remove all the unprocessed URLs from the final result set.

Steps Involved in Deploying a Custom Security Trimmer

In a nutshell, there are five steps involved in the successful deployment of a CST:

  1. Implement ISecurityTrimmer2 interface.
    1. Implement Initialize and CheckAccess methods using managed code
    2. Create an assembly signing file and include it as part of the project
    3. Build the assembly
  2. Deploy the trimmer into the Global Assembly Cache (GAC) of all the machines where a search query service is running.
  3. Create a crawl rule for the content sources that you want to custom trim. You can do this from the Search Administration site.
  4. Register the trimmer with the crawl rule using the Windows PowerShell cmdlet New-SPEnterpriseSearchSecurityTrimmer.
  5. Perform a full crawl of the content sources associated with the crawl rules that you created in step 3. A full crawl is required to properly update all of the related database tables. An incremental crawl will not update the appropriate tables.

Implementing the Custom Security Trimmer Interface

MOSS 2007 and Microsoft Search Server (MSS) 2008 supported custom security trimming of search results through the interface ISecurityTrimmer. This interface has two methods, Initialize and CheckAccess. Because of the architectural changes in SharePoint and the search system in the 2010 versions, both of these methods won’t work as they did in MOSS 2007. They need to be re-implemented using the ISecurityTrimmer2 interface. As a result, if you try to register a MOSS 2007 trimmer in SharePoint 2010, it will fail, saying ISecurityTrimmer2 is not implemented. Other changes from MOSS 2007 include:

Changes in the Initialize Method In MOSS 2007, one of the parameters passed was the SearchContext object. SearchContext was the entry point into the search system and it provided the search context for the site or search service provider (SSP). This class has been deprecated in 2010. Instead, use the SearchServiceApplication class:

void Initialize(NameValueCollection staticProperties, SearchServiceApplication searchApplication);

Changes in the CheckAccess Method In both MOSS 2007 and SharePoint 2010, the search query service calls into the CST assemblies. In MOSS 2007, the CheckAccess method took only two parameters, but in SharePoint 2010, the search query service passes the user identity into CheckAccess using a third parameter of type IIdentity:

public BitArray CheckAccess(IList<String>documentCrawlUrls, IDictionary<String, Object>sessionProperties, IIdentity passedUserIdentity)

ISecurityTrimmer2::Initialize Method This method is called the first time a trimmer is loaded into the search query service IIS worker process. The assembly will live for the duration of the worker process. Here’s the signature of this method and a description of how it works:

void Initialize(NameValueCollection staticProperties, SearchServiceApplication searchApplication);

staticProperties–The trimmer registration Windows PowerShell cmdlet, New-SPEnterpriseSearchSecurityTrimmer, takes a parameter called “properties” (in MOSS 2007 this was called “configprops”) through which you can pass named value pairs separated by ~. This may be useful to initialize your trimmer class properties.

For example: When passing “superadmin~foouser~poweruser~baruser” to the New-SPEnterpriseSearchSecurityTrimmer cmdlet, the NameValueCollection parameter will have two items in the collection with keys as “superadmin” and ”poweruser” and values as “foouser” and “baruser,” respectively.

searchApplication–If your trimmer requires a deeper knowledge about the search service instance and the SharePoint farm, use a searchApplication object to determine that information. To learn more about the SearchServiceApplication class, refer to msdn.microsoft.com/library/ee573121(v=office.14).

ISecurityTrimmer2::CheckAccess Method This implements all the trimming logic. Pay special attention to two aspects in this method: the identity of the user who issued the query, and the performance latency caused by a large returned query set.

Following are the signature of this method and a description of how it works:

public BitArray CheckAccess(IList<String>documentCrawlUrls, IDictionary<String, Object>sessionProperties, IIdentitypassedUserIdentity)

documentCrawlUrls–The collection of URLs to be security trimmed by this trimmer.

sessionProperties–A single query instance is treated as one session. If your query fetches many results, the CheckAccess method is called multiple times. You can use this parameter to share values or to keep track of the URLs processed between these calls.

passedUserIdentity–This is the identity of the user who issued the query. It’s the identity by which the code will allow or deny access to content.

BitArray–You need to return a bit array equal to the number of items in documentCrawlUrls. Setting a bit inside this array to true or false will determine whether the URL at that position should be included or blocked from the final result set sent back to the user.

UserIdentity The SharePoint 2010 search query engine is built upon the claims authentication model. The search query service will pass the query issuer’s claims though the IIdentity parameter. In order to get the user name of the user who issued the query, you must traverse through a collection of claims to compare the claim.ClaimType with the SPClaimTypes.UserLogonName.

The following snippet of code extracts the user logon name from the claims token:

IClaimsIdentity claimsIdentity = (IClaimsIdentity)passedUserIdentity;

if (null != claimsIdentity)
{
  foreach (Claim claim in claimsIdentity.Claims)
  {
    if (claim == null)
      continue;
    if (SPClaimTypes.Equals(claim.ClaimType, SPClaimTypes.UserLogonName))
      strUser = claim.Value;
  }
}

You may need information about the type of authentication used at the site collection level to correctly call internal APIs. To identify if the user logged in using Windows authentication, look for the presence of ClaimsType.PrimarySid. The following code looks for the PrimarySid claim and then extracts the user name from it:

if (SPClaimTypes.Equals(claim.ClaimType, ClaimTypes.PrimarySid))
{
  // Extract SID in the format "S-1-5-21-xxxxx-xxxxx-xxx"
  strUser = claim.Value;
  // Convert SID into NT Format "FooDomain\BarUser"
  SecurityIdentifier sid = new SecurityIdentifier(strUser);
  strUser = sid.Translate(typeof(NTAccount)).Value;
}

For forms or other similar non-Windows authentication providers, look at the Claim.OriginalIssuer value inside the claim. For example, if the server is configured for forms authentication using the ASP.NET SQL Membership Provider, the Claim.OriginalIssuer will have the value "Forms:AspNetSqlMembershipProvider":

if (SPClaimTypes.Equals(claim.ClaimType, SPClaimTypes.UserLogonName))
{
  strUser = claim.Value;
  strProvider = claim.OriginalIssuer; // For AspNet SQL Provider value will be
                                      // "Forms:AspNetSqlMembershipProvider"
}

If the query is issued by an anonymous user, the value of the IIdentity.IsAuthenticated method will be false. In this case, claimsIdentity.Name will have the value "NT AUTHORITY\\ANONYMOUS LOGON."

As a final note on the user context, limit the use of the API WindowsIdentity.GetCurrent().Name to retrieve the user identity. This will always give the application pool identity under which search query service is running. System.Threading.Thread.CurrentPrincipal.Identity will give you the same identity as the one passed to the CheckAccess method.

Performance Considerations Optimize the CheckAccess method to its fullest extent. If the query returns many results, the trimmer may get called multiple times. One of the common methods to take care of this situation is to keep track of the URLs processed inside the trimmer through the sessionProperties parameter. Once the method processes a certain number of result sets, it can throw a PluggableAccessCheckException. When this exception is thrown, the URLs processed up to that point are returned to the user.

Custom Security Trimmer and System Logs

Code inside a trimmer can’t write to the system logs maintained at <drive>\ Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS. The trimmer must maintain its own logging mechanism for both debugging and auditing. The only exception to this is when the method throws the PluggableAccessCheckException. The message string specified while throwing will be logged into the system log. Useful information that the search query service logs to the file includes the number of documents that were security trimmed. For example, the following log entry suggests that a query passed two documents to the CST, but sent zero documents back to the user, which means the CST trimmed those two documents:

04/23/2010 18:13:48.67    w3wp.exe (0x116C)    0x02B4    SharePoint Server Search    Query Processor    dm2e    Medium    Trim results: First result position = '0', actual result count = '0', total docs found = '0', total docs scanned = '2'.    742d0c36-ea37-4eee-bf8c-f2c662bc6a45

Custom Security Trimmers and Alerts The SharePoint search service has a feature called alerts (available only in Windows authentication mode) that can push the changes in the query results to the user through e-mails. However, when an alert query is issued by the timer service, the search query service will strip out all the URLs associated with CSTs.

Assembly Signing Requirement On finding the presence of a matching CST, the search query service calls into CST management code to load the specific assembly from the GAC. To do this, the assembly needs to be digitally signed. Refer to “Managing Assembly and Manifest Signing” (msdn.microsoft.com/library/ms247066) for ways to sign an assembly. Once the assembly is built, use the sn.exe tool to get the 64-bit hash known as a public key token. This token is needed at the time of trimmer registration.

Deployment of Custom Security Trimmer The CST assembly must reside in the GAC of each machine on which the search query and site settings service is running. Use Central Administration | System Settings | Services on Server to check the status of the search query and site settings service in each of the machines in the farm. If the service is started, you must import the CST to that machine. Don’t confuse the search query and site settings service with the machines that contain query components. The query component lives within MSSearch.exe to pull the results from the index. The search query and site settings service lives in its own IIS worker process of w3wp.exe.

SharePoint Cmdlets to Register, View and Delete CSTs

MOSS 2007 used the stsadm.exe command-line tool to register custom trimmers, but this tool is obsolete and not supported in SharePoint 2010. Instead, use Windows PowerShell cmdlets to register, view and delete CSTs. An assembly should already be available in the GAC to register them. Here’s how to use them:

Registration–Use the New-SPEnterpriseSearchSecurityTrimmer to register your trimmer, using the assembly’s manifest data such as Version, Culture and PublicKeyToken. This example registers the trimmer to the search application named “search service application”:

New-SPEnterpriseSearchSecurityTrimmer -SearchApplication "Search Service Application" -TypeName "SearchCustomSecurityTrimmer.CustomSecurityTrimmerTest, SearchCustomSecurityTrimmer, Version=14.0.0.0, Culture=neutral, PublicKeyToken=4ba2b4aceeb50e6d" -RulePath file://elenjickal2/* -id 102 -Properties superadmin~foouser~poweruser~baruser

The cmdlet takes the crawl rule (RulePath), an integer value as the identity (id) of the trimmer, configuration properties (properties) and TypeName, which consists of the manifest data as well as the name of the class that implements the interface. Cmdlet parameters are:

  • SearchApplication–Name of the search service application associated with the content source
  • TypeName–This consists of the manifest data such as Version, Culture and PublicKeyToken (it also points to the class that implements the interface; this will uniquely identify the assembly from the GAC)
  • RulePath–The crawl rule associated with the trimmer
  • Id–An int data type that uniquely identifies the trimmer instance
  • Properties–Set of name/value pairs separated by ~

View–Use the Get-SPEnterpriseSearchSecurityTrimmer cmdlet and pass the search application name. You can further filter it by passing trimmer identity or other properties that you used while registering (for example: Get-SPEnterpriseSearchSecurityTrimmer -SearchApplication "Search Service Application").

Delete–Use the Remove-SPEnterpriseSearchSecurityTrimmer cmdlet and pass the search application name as well as identity of the trimmer (for example: Remove-SPEnterpriseSearchSecurityTrimmer -SearchApplication "Search Service Application" –id 102).

Note: After registering the CST, a full crawl of the content source is required.

Troubleshooting Steps

Here are some tips to investigate any unexpected search results:

  • Make sure the crawl rule matches the content source location.
  • Check the crawl logs to make sure the account used to crawl the content source has access to it. The crawl would have failed if it doesn’t.
  • Make sure the query user has permission to view the content.
  • After trimmer registration, make sure you performed a full crawl.
  • Make sure the trimmer assembly is in the GAC of all machines in which search query service is running.
  • Check the system logs for the number of documents trimmed by the security trimmer.
  • Use the utility ProcessExplorer from technet.microsoft.com/sysinternals/bb896653 to make sure the trimmer assembly is loaded into IIS worker process w3wp.exe.
  • Attach the debugger to the worker process in which the assembly is loaded and step through the trimmer logic.

Query Processing Logic Flexibility

Wrapping up, CSTs provide the flexibility to extend the query processing logic to meet customized enterprise security needs. One should always keep in mind that implementation bugs inside the trimmer may cause unexpected search results, so it’s important that before the trimmer is deployed in a production environment, it’s thoroughly tested against different types of content sources and authentication providers.


Ashley Elenjickal and Pooja Harjani were part of a SharePoint Search feature team responsible for Custom Security Trimmer at Microsoft. They can be reached at AshleyEl@microsoft.com and PVaswani@microsoft.com, respectively.

Thanks to the following technical expert for reviewing this article: Michal Piaseczny

Advertisements