Tuesday, November 14, 2023

Sitecore Search : Addressing exceptions on the website crawling for the failed pages

Sitecore Search : Addressing common exceptions on the website crawling for the failed pages

This blog is to assist on how to address common exceptions with crawling errors on Sitecore Search for the failed pages. If you're experiencing issues with a site crawler,  where the crawler is encountering errors while crawling pages. The dashboard page does show the errors , but it does not provide a detailed log of the issue. This blog will help you to locate the full details around these errors. 

For checking the status of the Scheduled scans on the Site Crawler, 
  • Login to the CEC portal and click Sources
  • The summary on the last crawling is displayed here. It basically shows:
    • Last Run Status : shows Finished if it completed crawling or Failed if it stopped due to a failure
    • Last Run time 
    • Items Indexed: Number of items indexed 
    • Also, it also shows a summary of errors if there was any errors while crawling the site.
On the below example, after finishing the Crawling, it shows there are 3 configuration errors, but it does not show any further details for additional troubleshooting. 




So, in order to get more information on the crawling results, we can find it under the Analytics tab. Below are the steps to see more information on the crawling results:

  • Navigate to Analytics -> Sources ->  Overview and then select the Source at the bottom

  • This page shows the details on the last few crawling runs and details like duration of the run, status, Items Indexed and Job Run ID.


  • Click on the last Run to see if there were any documents that was dropped or failed. 



  • The reason of the failure is that the crawling of the page https://devsite.com/about-us/news failed because the page is unavailable or throwing errors while loading, which can be looked into with some troubleshooting. So, using the above method, we can potentially identify the crawling errors for faster troubleshooting.  

Wednesday, November 8, 2023

Executing Tasks with Sitecore PowerShell Extensions: A Practical Guide

Executing Tasks with Sitecore PowerShell Extensions: A Practical Guide 

Sitecore PowerShell Extensions (SPE) is a popular module for the Sitecore CMS that enhances its capabilities by providing a powerful scripting environment and a variety of useful commands for administrators and developers. It's commonly used for automating various tasks within Sitecore, such as content management, reporting, and maintenance. This blog is a practical guide to various uses of the module Sitecore PowerShell Extensions. It is meant to present some of the examples of how Sitecore PowerShell Extensions(SPE) can be leveraged for common tasks on Sitecore.


Below are some of the examples of how Sitecore PowerShell Extensions can be leveraged for common tasks on Sitecore:

Item Manipulation:

  • Creating Items: Sitecore items can be created programmatically using SPE. For example, using the below script, a new item is created under a specific folder Articles with a specified template Article.

    • New-Item -Path "master:\content\MySite\Articles" -Name "NewArticle" -ItemType "MySite/Article"

  • Copying / Moving Items: using the below SPE scripts, items can be copied or move items from one location to another.

    • Copy-Item -Path "master:\content\MySite\Articles\Article1" -Destination "master:\content\MySite\Articles\Article2"

    • Get-Item -Path "master:\content\MySite\Articles\Article1" | Move-Item  -Destination "master:\content\MySite\Articles"

  • Delete Items: using the below SPE command, items can be deleted. Using the permanently parameter, we specify the item should be deleted rather than recycled. 

    • Remove-Item -Path "master:\content\MySite\Articles\Article1" -Permanently

  • Publish Items: using the below SPE command, items can be published from one database to another, such as from the master database to the web database.

    • Get-Item -Path master:\content\home | Publish-Item -Recurse -PublishMode Incremental

  • For publishing to multiple databases 
    • $targets = [string[]]@('web','internet')
    • Publish-Item -Path master:\content\home -Target $targets

  • Bulk Operations: using the below command, bulk updates can be made on items, such as changing the template of multiple items or updating fields.

    • Get-ChildItem -Path "master:\content\MySite\Articles" | ForEach-Object {
              $_.ChangeTemplate("MySite/UpdatedArticleTemplate")
      }
  • Create Users and Roles: Sitecore User and role creation can be automated as well using the below commands.

    • New-User -Name "dev.user" -Password "password" -Email "dev.user@example.com" -Profile "Default Profile" -Roles @("Content Author", "Content Reviewer")

Friday, November 3, 2023

Configuring Sitecore Search Document Extraction: A Step-by-Step Guide

Configuring Sitecore Search Document Extractor: A Step-by-Step Guide 

Document extraction on Sitecore Search typically refers to the process of searching and extracting specific data or DOM elements by crawling over the web pages across the website. Document extraction involves the conversion of content into a structured format that can be processed and indexed for search.

This blog is intended to demo a step-by-step guide to setting up Document Extractors in Sitecore Search. Below are some of the common Key attributes that's required for the Search to work properly:
  • Title or Name of the page 
  • Content/Description of the pages
  • meta tags
  • key Elements from page components that should be included as part of the Search 
Here is a quick demo to configure a JavaScript Document Extractor to extract attribute values for an advanced web crawler or an API crawler: 
  • On the CEC portal, click Sources, and click on the custom Source. Then on the Source Settings configuration -> click on Document Extractors -> click edit on the right.
  • Next step is to add the extractor, add a Name Demo JavaScript Extractor and select JS as the Extractor type
  • In the Taggers section, -> click Add Tagger. The function must use Cheerio syntax and must return an array of objects.


This is a sample JS script already available on the editor on the tagger. This sample already includes extracting the below fields from the pages of the website:
  • title
  • description
  • searchtitle meta tag
  • Open Graph type
  • Open Graph URL
  • Open graph description tag
  • similarly language can be extracted from body or the url

// Sample extractor function. Change the function to suit your individual needs
function extract(request, response) {
    $ = response.body;

    return [{
        'description': $('meta[name="description"]').attr('content') || $('meta[property="og:description"]').attr('content') || $('p').text(),
        'name': $('meta[name="searchtitle"]').attr('content') || $('title').text(),
        'type': $('meta[property="og:type"]').attr('content') || 'website_content',
        'url': $('meta[property="og:url"]').attr('content')
    }];
}

Conditional logic can be implemented to extract key variables based on URLs:

    if (url.includes('/blogs')) {
      type = 'Blogs';
    } else if (url.includes('about-us')) {
      type = 'About Us';
    } else ..

One key thing is to configure attributes under Textual relevance under Domain Configuration and also on Global Widget Settings to make these attributes work properly. Here is an article which describe in details to configure the attributes on Textual relevance. 

https://sitecorebasics5.blogspot.com/2023/10/sitecore-search-how-to-configure.html

There are other methods of extraction as well like XPath and CSS document extractors. Depending on your requirement, you can choose them.

Reference:
https://doc.sitecore.com/search/en/users/search-user-guide/configuring-document-extractors.html

Thursday, October 26, 2023

Sitecore Search: How to Configure Textual Relevance for Better Search Results

Sitecore Search : How to Configure Textual Relevance for Better Search Results

Textual relevance in Sitecore Search is determined by how closely a potential result matches the visitor’s search query. To configure textual relevance, you need to specify which content areas Sitecore Search should look for matching terms and the relative importance of each area. This blog is intended to demo on how to configure Textual Relevance on Sitecore Search to aim better search experience for customers. 

  • For example, If the attributes name and description are configured for textual relevance then Sitecore Search looks for matching terms in these attributes. 
  • Multiple attributes can be configured with the textual relevance feature.
  • Each attribute can be give a Weight/Priority in the Global Widget Settings. This weight is used with other factors to determine the order of documents in search results. 
Below are the steps to configure Textual Relevance in Sitecore Search:

To add an attribute and enable it for textual relevance:
  • In the CEC portal, click Administration > Domain Settings. Under Attributes click on Add Attribute at the top right corner.
  • Click Settings > Entity and choose the relevant entity. In the Display Name field, enter a display name for the attribute, e.g. Name
  • In the Attribute Name field, enter the attribute's key e.g. Name. This value is used later in the source configuration

  • On the Use For Features tab, select the Textual relevance option. Click Save and then, click Publish

The next step is to configure textual relevance at the domain level

  • In the CEC portal, click Administration > Domain Setting > Feature Configuration.
  • Click Textual Relevance > Add Attribute.
  • On the field where textual relevance needs to be added, click Add Analyzer and then click Add.
  • By default, the analyzer Multi-Locale Standard Analyzer is already set on the attribute but as per the requirement it can be selected from the list available. Click Save and Publish. 


The next step is to enable the new attribute for Textual Relevance in the Global Widget 
  • In the CEC, click Global Resources > Global Widget > Global Widget Settings > Textual Relevance. Click Advanced Mode.
  • Here weightage can be assigned numeric values for different attribute/analyzer combinations. 
  • To include an attribute, click Include.
  • In the WEIGHT column, assign a weight to the attribute, e.g. enter 2 for Name and 1 for Description.
  • Click Save and Publish.

By setting up the attributes for textual relevance, all you need to do is to run the rescan and re-index and check if the search results have updated and better potential results are shown to the search query.

Thursday, October 12, 2023

Troubleshooting Sitecore Search Crawling Failures: A Step-by-Step Guide

Troubleshooting Sitecore Search Crawling Failures: A Step-by-Step Guide

Sitecore Search offers the following pull sources:

  • Web crawler - a tool that crawls your content by starting from a point and following hyperlinks. 

  • Advanced web crawler - a powerful and highly customizable crawler that crawls your content and adds it to an index. 

  • API crawler - a crawler specifically designed to crawl API endpoints that return JSON. 

Sitecore Search crawls the website to extract the latest content using the trigger setup, usually it is sitemap.xml. There could be multiple reasons why the crawling might start to fail and index may not get the latest content from the website due to that. This blog is intended to demo on various reasons of why the crawling may fail, and how to resolve these issues.






Below are some potential options that you can try to remediate the issue faster:

  • This issue may arise when the system attempts to parse your source and finds it not in the correct expected format. 
    • For example, if the source is sitemap.xml and if it does not render in the correct XML format, the crawling will fail. 
    • To prevent this, please ensure that your sitemap (https://site.com/sitemap.xml) is always formatted correctly.
  • Rerun the crawling and the index and check if it is progressing to completion. Navigate to the Sources link on the CEC, and then find the source and click on the "Recrawl and reindex" link.  
  •  
  • There could be an issue with the Sitecore Search platform itself so please reach out to Sitecore Support via a ticket. 
    • We recently faced an issue with Sitecore Search where the Sitecore Search crawling started to fail intermittently giving the error "Job failed due to heartbeat error". Sitecore Support did confirm there was an issue going on with the heartbeat error, and they immediately launched a new version with the fix immediately. 
  • There could be a recent change implemented by an admin or developer before the crawling started failing. If the scripts on the document extractors start throwing errors, then there will be an impact on the crawling job. 
    • One option is to undo the recent change and see if the issue get fixed and the crawling is successful again. 
    • Further troubleshooting may be required with the changes on the scripts for the document extractors.




Thursday, August 31, 2023

Sitecore PowerShell Extensions: Creating Sitecore users in bulk using SPE

Sitecore PowerShell Extensions: Creating Sitecore users in bulk using SPE

This blog is intended to demo on how to create multiple users and assign them the required roles through automation by the use of Sitecore PowerShell Extensions. If there is a requirement to create multiple users in Sitecore and assign them individual Roles, it can take a while to do it manually one by one. Instead, we can configure the user creation and role assignment using a script in Sitecore PowerShell Extensions. 



Here is an example which explains how multiple users have been created into Sitecore. Reference to the Sitecore documentation : https://doc.sitecorepowershell.com/appendix/security/new-user

New-User -Identity usrA -Enabled -Password b -Email usrA@gmail.com -FullName "User A"
New-User -Identity usrB -Enabled -Password b -Email usrB@gmail.com -FullName "User B"
New-User -Identity usrC -Enabled -Password b -Email usrC@gmail.com -FullName "User C"

Once these users are created, the below step is to add the users into their respective roles. Below is the command to add the individual users into the Roles. 

Add-RoleMember -Identity "Developer" -Members "usrA", "usrC"
Add-RoleMember -Identity "Publisher" -Members "usrB"

Reference to the Sitecore documentation : https://doc.sitecorepowershell.com/appendix/security/new-userhttps://doc.sitecorepowershell.com/appendix/security/add-rolemember

Hope this article helps to create users in bulk and can save lot of manual efforts. 





Monday, August 7, 2023

Allowing PDF file redirects on the Sitecore website

Allowing PDF file redirects on the Sitecore website

As per the Standard Sitecore setup, the PDF redirects are not allowed or handled via Sitecore. This blog is intended to demo the use case to allow redirects of PDF files.

In order to give the freedom to Content Authors, so that they can setup these PDF redirects, the PDF extension is required to be allowed in the below processor FilterUrlFilesAndExtensions

Below is the configuration required for a SXA website:

<sitecore>
  <pipelines>
    <preprocessRequest>
      <processor type="Sitecore.XA.Foundation.SitecoreExtensions.Pipelines.PreprocessRequest.FilterUrlFilesAndExtensions, Sitecore.XA.Foundation.SitecoreExtensions">
        <param desc="Allowed extensions (comma separated)">aspx, ashx, asmx, pdf</param>
      </processor>
    </preprocessRequest>
  </pipelines>
</sitecore>






After making the above change, the URL https://sc102.dev.local/dummypage/dummy.pdf gets redirected successfully to https://sc102.dev.local/home

Below is the configuration required for a normal Sitecore website without SXA module:

<sitecore>
<pipelines>
<preprocessRequest>
<processor type="Sitecore.Pipelines.PreprocessRequest.FilterUrlExtensions, Sitecore.Kernel">
<param desc="Allowed extensions (comma separated)">aspx, ashx, asmx, pdf</param>
</processor>
</preprocessRequest>
</pipelines>