Long running incremental crawls on RecordPoint Content Source

Many times, we have seen customer reporting long running incremental crawls on RecordPoint content source. This document will explain reason behind it and possible solution to prevent it.

 

As we can see this screenshot there are very few items are crawled but there are lot of security update which talking most of the time in crawl.

 

 

What is Security Crawl?

 

Security Only crawl’s take place when users are added/removed from SharePoint groups and/or explicitly added/removed from a list.  When incremental crawl starts, these security changes, “Updated ACL’s”, must be pushed down to all affected items within the index.  We security trim these results.

 

For Example, User A is able to search on all items in a specific document library named HR Docs.  You don’t want User A to be able to search and discover documents which were crawled previously from Document Library named HR Docs.  You remove User A from the only SharePoint group that has permissions to view these documents and iniate an incremental crawl.  What will happen is that the security change will be discovered and the updated ACL will be processed.  The gatherer will commit the security changes to all items within HR Docs library.  When the security only crawl is completed, User A will not return results for querying items within HR Docs library.

 

Note:   The items within the document library are not fetched but the security of the item is. 

 

Why we make security changes in RecordPoint.

 

Whenever there is change in permission on active application and that item is RecordPoint managed, same changes need to be updated on RecordPoint storage layer.

RecordPoint triggers a timer job “RecordPoint Daily Task timer job” once a day to push those changes to RecordPoint storage layers. As RecordPoint manage permissions at item level, if changes are done at SharePoint Groups it will be expensive.

 

Question:

 

Why do security only crawls take so long?

 

Answer:  

 

The time difference in crawl can be attributed to expansion of the SharePoint Group and also that the group is at the site collection level and affects items beyond the list.  If a SharePoint group has several thousand users at site collection level, you can see how this can be very expensive.  Also, a large number of items within that site collection can add to the delay because new ACL changes will be pushed down to every item affected by the security change.   

 

 

 

Question: 

 

How can I work around this and prevent security only crawls from affecting incremental crawl times?

 

Answer:  

 

Instead of users explicitly added to SharePoint groups, add AD groups instead.   Managing adding/removing users from Active Directory security groups will not cause ACL changes within SharePoint .  Because of this, no security only crawls will occur.