Archive for February 19, 2021

Enable continuous crawls is a crawl schedule option that is an alternative to incremental crawls. This option is new in SharePoint Server and applies only to content sources of type SharePoint Sites.

Continuous crawls crawl SharePoint Server sites frequently to help keep search results fresh. Like incremental crawls, a continuous crawl crawls content that was added, changed, or deleted since the last crawl. Unlike an incremental crawl, which starts at a particular time and repeats regularly at specified times after that, a continuous crawl automatically starts at predefined time intervals. The default interval for continuous crawls is every 15 minutes. Continuous crawls help ensure freshness of search results because the search index is kept up to date as the SharePoint Server content is crawled so frequently. Thus, continuous crawls are especially useful for crawling SharePoint Server content that is quickly changing.

A single continuous crawl includes all content sources in a Search service application for which continuous crawls are enabled. Similarly, the continuous crawl interval applies to all content sources in the Search service application for which continuous crawls are enabled.

You cannot run multiple full crawls or multiple incremental crawls for the same content source at the same time. However, multiple continuous crawls can run at the same time. Therefore, even if one continuous crawl is processing a large content update, another continuous crawl can start at the predefined time interval and crawl other updates. Continuous crawls of a particular content repository can also occur while a full or incremental crawl is in progress for the same repository.

A continuous crawl doesn’t process or retry items that repeatedly return errors. Such errors are retried during a “clean-up” incremental crawl, which automatically runs every four hours for content sources that have continuous crawl enabled. Items that continue to return errors during the incremental crawl will be retried during future incremental crawls, but will not be picked up by the continuous crawls until the errors are resolved.

You can set incremental crawl times on the Search_Service_Application_Name: Add/Edit Content Source page, but you can change the frequency interval for continuous crawls only by using Microsoft PowerShell.

To enable continuous crawls for an existing content source

  1. Verify that the user account that is performing this procedure is an administrator for the Search service application.
  2. In Central Administration, in the Application Management section, click Manage service applications.
  3. Click the Search service application.
  4. On the Search_Service_Application_Name: Search Administration page, in the Quick Launch, under Crawling, click Content Sources.
  5. On the Search_Service_Application_Name: Manage Content Sources page, click the SharePoint content source for which you want to enable continuous crawl.
  6. In the Crawl Schedules section, select Enable Continuous Crawls.
  7. Click OK.
  8. Verification: On the Search_Service_Application_Name: Manage Content Sources page, verify that the Status column has the status Crawling Continuous.

To enable continuous crawls for a new content source

  1. Verify that the user account that is performing this procedure is an administrator for the Search service application.
  2. In Central Administration, in the Application Management section, click Manage service applications.
  3. Click the Search service application.
  4. On the Search_Service_Application_Name: Search Administration page, in the Quick Launch, under Crawling, click Content Sources.
  5. On the Search_Service_Application_Name: Manage Content Sources page, click New Content Source.
  6. Create a content source of the type SharePoint Sites.
  • In the Name section, type a name in the Name field.
  • In the Content Source Type section, select SharePoint Sites.
  • In the Start Addresses section, type the start address or addresses.
  • In the Crawl Settings section, select the crawling behavior for all start addresses.
  • In the Crawl Schedules section, select Enable Continuous Crawls.
  1. Click OK.
  2. Verification: On the Search_Service_Application_Name: Manage Content Sources page, verify that the newly added content source appears and that the Status column has the status Crawling Continuous.

To disable continuous crawls for a content source

  1. Verify that the user account that is performing this procedure is an administrator for the Search service application.
  2. In Central Administration, in the Application Management section, click Manage service applications.
  3. Click the Search service application.
  4. On the Search_Service_Application_Name: Search Administration page, in the Quick Launch, under Crawling, click Content Sources.
  5. On the Search_Service_Application_Name: Manage Content Sources page, click the SharePoint content source for which you want to disable continuous crawls.
  6. In the Crawl Schedules section, clear Enable Incremental Crawls. This disables continuous crawls.
  7. To confirm that you want to disable continuous crawls, click OK.
  8. Optional: click Edit schedule to change the schedule for incremental crawls, and then click OK.
  9. On the Search_Service_Application_Name: Edit Content Source page, click OK.
  10. Verification: On the Search_Service_Application_Name: Manage Content Sources page, verify that the Status column has changed to Idle. This might take some time, because all URLs that remain in the crawl queue are still crawled after you disable continuous crawls.

To disable continuous crawls for all content sources

  1. Verify that the user account that performs this procedure is an administrator for the Search service application.
  2. Start a SharePoint Management Shell on a server in the farm.
  3. At the Microsoft PowerShell command prompt, type the following commands:

$SSA = Get-SPEnterpriseSearchServiceApplication
$SPContentSources = $SSA | Get-SPEnterpriseSearchCrawlContentSource | WHERE {$_.Type -eq “SharePoint”}
foreach ($cs in $SPContentSources)
{
$cs.EnableContinuousCrawls = $false
$cs.Update()
}

  1. Verification: On the Search_Service_Application_Name: Manage Content Sources page, verify that the Status column has changed to Idle for all content sources. This might take some time, because all URLs that remain in the crawl queue are still crawled after you disable continuous crawls.

To change the continuous crawl interval

  1. Verify that the user account that is performing this procedure is a member of the Farm Administrators group.
  2. Start a SharePoint Management Shell.
  3. At the Microsoft PowerShell command prompt, type the following commands:

$ssa = Get-SPEnterpriseSearchServiceApplication
$ssa.SetProperty(“ContinuousCrawlInterval”,n)

Where:

  • n is the regular interval in minutes at which you want to continuous crawls to start. The default interval is every 15 minutes. The shortest interval that you can set is 1 minute.

NOTE: If you reduce the interval, you increase the load on SharePoint Server and the crawler. Make sure that you plan and scale out for this increased consumption of resources accordingly.

in SharePoint 2010 we had 2 crawls available and it was configurable on our Search Service Application.

  • Full: Crawl all content,
  • Incremental: As the name says, it crawls content that has been modified since the last crawl.

The disadvantage of these crawls, is that once launched, you are not able to launch a second crawl in parallel (on the same content source), and therefore for the content changed in the meantime we will need to wait until the current crawl is finished (crawl and another) to be integrated into the index, and therefore to be found via search.

An example :

  • A incremental crawl named ALFA is started and will last 50 take minutes,
  • After 10 minutes of crawling a new document has been added, so we need a second incremental crawl named BETA to get the document in the index.
  • This item will have to wait at least 40 minutes to be integrated into the index.

 So, we can’t keep an updated index with the latest changes, because latency is invited in each crawling process.

It is possible that in most of cases this operation is suitable and favorable for your clients, but for those who want to search their content immediately or after their integration into SharePoint there is now a new solution in SharePoint: “Continuous Crawl“.

The Continuous Crawl

So resuming: The “Continuous Crawl” is a type of crawl that aims to maintain the index as current as possible.

It’s operation is simple: once activated, it will launch the crawl at regular intervals. The major difference with incremental crawl is that the crawl can run in parallel, and does not expect the previous crawl to complete prior the launch.

Important Points:

  • “Continuous Crawl” is only available for sources of content type “SharePoint Sites”
  •  By default, a new crawl is run every once in 15 minutes, but the SharePoint administrator can change this interval using the PowerShell cmdlet Set-SPEnterpriseSearchCrawlContentSource  ,
  • Once started, a “Continuous Crawl” can’t be paused or stopped, you can just disable it.

If we take our example above with “Continuous Crawl”:

  •  Our ALFA crawl starts and will take at least 50 minutes,
  •  After 10 minutes of crawling an item already crawl is hereby amended, and requires a new crawl.
  •  Crawl “BETA” is launched,
  •  The crawl “BETA” starts in (15-10) minutes,
  •  Therefore this item will not need to wait 5 minutes (instead of 50 minutes) to be integrated into the index.

1- How to Enable it?

In Central Administration, click on “Search Service Application“, and then in the menu, click on the “Content Sources“. 

Click on “New Content Source” at the menu

Chose “SharePoint Sites”

Select “Enable Continuous Crawls”

  • The content source has been created so we can see the status on as “Crawling Continuous”

 2 – How to disable it?

  • From the content source page, chose the option “Enable Incremental Crawls” option. This will disable the continuous crawl.
  • Save changes.

 3 – How to see if it works ?

  • Click on your service application search then “Crawl Log” in the section “Diagnostics”.
  • Select your Content Source and click on “View crawl history”
  • Or via PowerShell Execute the following cmdlets 
  • $SearchSA = «Search Service»
    • Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $SearchSA | select *

Impact on our Servers

The impact of a “Continuous Crawl” is the same as an “Incremental Crawl”.

At the parallel execution of crawls, the “Continuous Crawl” crawls within the parameters defined in the “Crawler Impact Rule” which controls the maximum number of requests that can be executed by the server (default 8).

Note: this setting does not restrict the Content Processing component, only the rate at which links are added to the Crawl Queue.


Content Processing uses 3 threads per core by default (called Processing Flows). To restrict Content Processing impact, use ProwerShell to set the NumberOfCssFeedersPerCPUForRegularCrawl property on the Search Service Application object.

Ref: http://blogs.technet.com/b/searchguys/archive/2013/02/19/content-processing-performance-scaling.aspx 

https://social.technet.microsoft.com/wiki/contents/articles/15571.sharepoint-2013-continuous-crawl-and-the-difference-between-incremental-and-continuous-crawl.aspx