Breaking down how SLAs work - jcmings/sn GitHub Wiki

SLAs, or Service Level Agreements, are a critical measurement of response and resolution duration. SLAs are used by organizations to measure performance and understand how efficiently teams are working. While troubleshooting a few SLAs the other day, I came across a few helpful knowledge articles posted on ServiceNow's KB. If you're interested in learning more after reading my post, I'd recommend checking them out:

How SLAs actually work

SLAs are designed to be an evaluation of the business time it takes to get something done. A SLA could measure the Response time (the time it takes for a case to be acknowledged) or the Resolution time (the time it takes to close out a case). These are referred to as the Target for the SLA.

We know when to start the timer by establishing a Start condition, and we know when to end the timer by establishing a Stop condition. If we want, we can also set up a Pause condition, as well as a Reset condition and/or Cancel condition. We'll dive into this and an example in the next section.

When the timer has counted up to the Duration that we've established for the SLA, it is considered Breached. This is an indication that our team did not work fast enough to get-that-something-done. Insights like this are important for leadership as they indicate what service gaps need to be filled. Ultimately, the goal is to have as few breaches as possible.

Diving in: Setting up SLAs for the first time

To get started with setting up your first SLA, head over to Service Level Management > SLA > SLA Definitions. For our example, we'll be configuring a Response SLA and lightly detailing a Resolution SLA. For our Response SLA, we'll assess how long it takes for a New P1 Incident to enter an In Progress state. (Please note that I am not in the Global scope for any particular reason; I just happened to be in it when writing this post. I would encourage you to use whatever scope is relevant to you.)

So to start, we'll punch in a few of the defining details: image

  • Setting a clear, descriptive Name
  • Choosing a Target type of Response
  • Selecting our Table, Incident [incident]
  • Setting our Duration to 1 hour; we want P1 incidents acknowledged quickly
  • Setting our Schedule as 8-5 weekdays excluding holidays
  • Leaving other fields in their default values

I want to take a moment to call out the Schedule that we set here -- 8-5 weekdays excluding holidays. It's important to note that our SLAs measure time based on our schedule; thus, in this case, the maximum time a SLA will measure in one day is 9 hours. If we set our schedule to 24 x 7, the timer would run outside of our fulfillers' working hours. Since we're excluding holidays, our timer won't run on days like Christmas or New Years. If we need to customize what days we consider holidays -- for example, say your organization has every other Wednesday off -- we can customize the Schedule we're following. If we open our Schedule reference record, in the Child Schedules related list, we can modify what we consider a holiday. By default, we're using U.S. Holidays, but we can add or remove dates as needed.

Back to the topic at hand -- now that we've set up the defining details, we can jump into our triggers.

Start condition (and Cancel condition)

image

We want our SLA timer to start when a P1 (Priority 1 - Critical) case is created. Since new cases are in the New state, we'll include that as a condition as well. With these conditions, the system knows to start the timer when a new P1 incident is created.

We also are noting our Cancel condition in this tab -- if the case is Canceled, we don't need to keep monitoring the response time.

Stop condition

image

Simply put, we want to stop our timer when our case moves into an In Progress state. Since our Target measurement is Response and not Resolution, we're content with this condition. Were it a Resolution SLA, we'd probably want to set the condition to look for a Completed state.

Pause condition

image

If for some reason our case gets put On Hold before it moves into an In Progress state, we'll pause our timer. This could potentially happen if details are missing, and we don't want to start our next SLA (Resolution) timer yet.

In the screenshot, you'll notice another field: When to resume. This is how we tell the system when to re-evaluate the Start condition. I generally recommend using the Pause conditions are not met option, rather than the alternative, Resume conditions are met. (When you have this second option selected, you can establish a list of conditions to act as the trigger.)

I also want to call out the order of executions for the condition evaluations. SLAs evaluate triggers in this order:

  1. Stop
  2. Pause
  3. Start

Therefore, if we hit the Stop condition, we won't have to worry about our timer unpausing if the case moves from In Progress to On Hold. In that scenario, this particular SLA won't enter the Paused state, since it's already completed (i.e. Stopped). We would most likely want to track that scenario in a separate SLA that targets Resolution.

So how can we "reset" our SLA then, if it's already completed? Enter: the reset condition.

Reset condition

image

Here, we're telling our SLA timer to restart if the State changes to New. So if we've already gone through our entire lifecycle, for example, of a New case that's moved to In Progress, and then for some reason the case changes back to New -- we can restart the timer. And because of the Reset action we have here, we'll get a second SLA record to track our timer and the first SLA capture record will be marked as Completed. (The alternative Reset action is to Cancel the existing Task SLA.)

Seeing the SLA in action

How can we actually monitor the SLA in action? At the bottom of a case record should be a related list called Task SLAs. From here, we can take a look at the timer and the SLA definition. (And of course, we can always report on the task_sla table.)

image

This list contains all of the SLA records associated with the current case. If I were to move the current record into the In Progress state, you'll see that the SLA we created earlier moves into a Completed status:

image

And if I move the case back into the New status, you'll notice we have a new SLA record with a stage of In progress. Our initial SLA record is marked as Completed. (Because we have our Reset action as Complete and not Cancel, our previous record stays in a stage of Completed.)

image

I'll also call out the top item in this list, the Priority 1 resolution (1 hour) SLA. This is an SLA that comes OOTB with ServiceNow. With that said, before you start configuring SLAs, take a look at what's already out there. Custom SLAs really come in handy if you want to track specific conditions (e.g. how long a specific Assignment Group takes to resolve their portion of a case).

Troubleshooting

If your SLAs aren't working properly, I would suggest looking at your Business rules to confirm that none of the OOTB rules on the task_sla table have been customized. If that doesn't solve your problem, chances are you have another business rule that's causing problems. You can always Enable logging on your SLA record to see what data is captured with each database update. For more information on debugging, check out my post on troubleshooting with ServiceNow's built‐in debug tools.


That's a wrap! I hope this helps you understand SLAs in a bit more detail. If you've still got questions, try playing around in a PDI, or check out the links I shared above.