Removing HTML Tags from a String Field - ben-vargas/servicenow-wiki GitHub Wiki

This article describes how to remove HTML tags from a custom string field in ServiceNow. This is often necessary when you need to display text without any formatting or when HTML formatting causes issues with integrations or reporting.

Use Case

Imagine a scenario where a custom field, such as "Status Description" on a Release record, allows users to enter text that includes HTML tags. This might happen if they copy and paste from a rich-text editor, an email, or other source. While convenient for input, these HTML tags can be problematic when the text needs to be displayed in plain text format, such as in a notification, a report, or a system integration. The presence of HTML in these areas is undesirable and should be removed before further processing.

The Solution

We will use a background script that iterates through all records in a given table that contain data in a specific string field. This script will then remove the HTML tags from the field before updating the record.

Enhanced Script

(function() {
  // Define the table and field names
  var tableName = 'rm_release';
  var fieldName = 'u_status_description';

  try {
    // Instantiate and query for records with data in the specified field
    var releaseGR = new GlideRecord(tableName);
    releaseGR.addEncodedQuery(fieldName + 'ISNOTEMPTY');
    releaseGR.query();

    // Log start of the process
    gs.info('Starting HTML tag removal for table: ' + tableName + ', field: ' + fieldName + '. Total records: ' + releaseGR.getRowCount());

    var updatedRecordCount = 0;
    // Loop through each record and remove HTML tags
    while (releaseGR.next()) {
      var originalValue = releaseGR[fieldName];
      var cleanValue = strip_tags(originalValue);

      if (originalValue !== cleanValue) {
        releaseGR[fieldName] = cleanValue;
        releaseGR.autoSysFields(false); // Do not update sys_updated_by, sys_updated_on, sys_mod_count
        releaseGR.setWorkflow(false); // Do not run any other business rules
        releaseGR.update();
        updatedRecordCount++;
      }
    }

    gs.info('HTML tag removal process complete. ' + updatedRecordCount + ' records updated.');


  } catch (ex) {
      gs.error('An error occurred during the HTML tag removal process for table: ' + tableName + ', field: ' + fieldName + '. Error: ' + ex);
  }

    /**
    * Removes HTML tags from a string.
    * @param {string} input - The string to remove tags from.
    * @param {string} [allowed] - Optional string of allowed HTML tags to be retained.
    * @returns {string} String with HTML tags removed.
    */
    function strip_tags(input, allowed) {
        //  discuss at: http://phpjs.org/functions/strip_tags/
        // original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
        // improved by: Luke Godfrey
        // improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
        //    input by: Pul
        //    input by: Alex
        //    input by: Marc Palau
        //    input by: Brett Zamir (http://brett-zamir.me)
        //    input by: Bobby Drake
        //    input by: Evertjan Garretsen
        // bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
        // bugfixed by: Onno Marsman
        // bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
        // bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
        // bugfixed by: Eric Nagel
        // bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
        // bugfixed by: Tomasz Wesolowski
        //  revised by: Rafał Kukawski (http://blog.kukawski.pl/)

        allowed = (((allowed || '') + '')
            .toLowerCase()
            .match(/<[a-z][a-z0-9]*>/g) || [])
            .join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)
        var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi,
            commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi;
        return input.replace(commentsAndPhpTags, '')
            .replace(tags, function($0, $1) {
                return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : '';
            });
    }
})();

Explanation

  1. Table and Field Definition:
    • The script starts by defining variables for the tableName and fieldName. This makes the script more flexible and easier to use in different scenarios without needing to rewrite the whole script.
  2. GlideRecord Initialization:
    • A GlideRecord is created for the specified table, querying for records that have data in the u_status_description field. The ISNOTEMPTY query ensures only records that have data in the field are processed.
  3. Logging:
    • Logs the start of the process using gs.info, including the total number of records to be processed. This helps with monitoring and troubleshooting.
  4. HTML Stripping and Update:
    • The script loops through each record found.
    • Before applying the strip tags function, a check is done to ensure the value being used is not blank.
    • It calls the strip_tags() function to remove HTML tags from the string field.
    • It uses autoSysFields(false) to prevent the script from updating fields such as sys_updated_by and sys_updated_on. This is done when updating records using scripts to avoid unwanted updates of system fields.
    • It uses setWorkflow(false) to prevent the script from running other business rules. This is done when updating records using scripts to avoid running unwanted business rules.
    • It updates the record only if the value is changed.
    • The records update method is called and a record count is updated.
  5. Completion Logging:
    • After processing, it logs the number of records that were updated and displays a "complete" message. This provides a clear overview of the operation.
  6. Error Handling:
    • The main logic is wrapped in a try...catch block to log errors. This prevents the script from failing silently and aids in troubleshooting.
  7. strip_tags() Function:
    • The strip_tags() function removes HTML tags from the input string. It's a Javascript implementation of the PHP function with the same name.
    • It includes an optional allowed parameter that allows you to specify which tags should be kept (if needed). If not provided the function will remove all tags.

How to Use

  1. Navigate to Background Scripts: Open the background script utility in ServiceNow.
  2. Copy and Paste: Copy the enhanced script into the background script editor.
  3. Modify Variables: Modify the tableName and fieldName variables to match your target table and field.
  4. Execute Script: Click on "Run script."
  5. Monitor progress: Check the system logs to see progress messages and verify how many records were processed.

Best Practices

  • Use Background Script Cautiously: Always test background scripts in a development environment before executing them in production.
  • Error Handling: Include error handling to catch and log exceptions.
  • Logging: Use gs.info and gs.error for logging important information to the system log for auditing and troubleshooting.
  • Specific Queries: Use specific queries in your GlideRecord to prevent processing unnecessary records.
  • autoSysFields(false) and setWorkflow(false): These prevent unnecessary updates to system fields and the triggering of other business rules.
  • Code Comments: Use comments to document important logic and make your script more understandable.

Conclusion

This article provides a practical approach for removing HTML tags from text fields in ServiceNow. By using this script and following best practices, you can ensure that your data is consistent and clean, which is essential for seamless reporting, notifications, and system integrations. This script is designed to be flexible so it can be easily used with other tables and fields without requiring significant modification.

⚠️ **GitHub.com Fallback** ⚠️