KQL Kusto Query Language - urcuqui/Data-Science GitHub Wiki

Operators

Search

  • search -> will search all columns in a table for the value
  • search kind=case_sensitive
  • search in (table1, table2) "value"

within a table, you can search a specific column for an exact value

Perf
| search CounterName=="Available MBytes"

Can also search for the value anywhere in the text in the specific column

Perf
| search CounterName:"MBytes"

This searched for a column whose value exactly matched the word Memory

Perf
| search "Memory"

Can also search across all columns using wildcards

Perf
| search "*Bytes*"

Begins with Bytes then any text after it

Perf
| search * startswith "Bytes"

Ends with Bytes

Perf
| search * endswith "Bytes"

Begins with Free, ends with bytes, anything in between

Perf
| search "Free*bytes"

Searches can be combined logically

Perf
| search "Free*bytes" and ("C:" or "D:")

Where

This limits the result set.

Perf
| where TimeGenerated >=ago(1h)

Ago says "start right now, then go back in time N quantity

// Can build up the where by adding to it logically

Perf
| where TimeGenerated >= ago(1h)
	and CounterName == "Bytes Received/sec"

// OR logic is allowed to!

Perf
| where TimeGenerated >= ago(1h)
	and (CounterName == "Bytes Received/sec"
	or
	CounterName == "% Processor Time"
	)

// Stackable where operators

Perf
| where TimeGenerated >= ago(1h)
| where (CounterName ==  "Bytes Received/sec"
	or
	CounterName ==  "% Processor Time"
	)
| where CounterValue > 0

// We can simulate search using where

Perf
| where * has "Bytes"

// we can search for positional matches

Perf
| where * hassuffix "Bytes" // At the end
Perf 
| where * contains "Bytes" // contains and has behave the same

// where supports regex as well

Perf
| where InstanceName matches regex "[A-Z]:"

Take

It is used to grab a random number of rows from the input data

Perf
| take 10

// Take can be combined with other language operates

Perl
| where TimeGenerated >= ago(1h)
        and CounterName == "Bytes Received/sec"
        and CounterValue > 0 
| take 5

Count

Returns the number of rows in the input dataset

Perf
| count

// we can also use with other filters

Perf
| where TimeGenerated >= ago(1h)
  and CounterName == "Bytes Received/sec"
  and CounterValue > 0
| count

Summarize

This operator allows us to count the number of rows column using the count()

Perf
| summarize count() by CounterName

// We can rename the output column for count

Perf
| summarize PerfCount=count()
            by ObjectName, CounterName

// With Summarize, we can use other aggregation functions

Perf
| where CounterName == "% Free Space"
| summarize NumberOfEntries=count(),
  AverageFreeSpace = avg(CounterValue)
  by CounterName

// Bin allows us to summarize into logical groups, like days

Perf
| summarize NumberOfEntries=count()
            by bin(TimeGenerated, 1d)

Extend

It creates a calculated column and adds to the result set

Perf
| where CounterName == "Free Megabytes"
| extend FreeGB = CounterValue /1000

Can extend multiple columns at the same time

Perf
| where CounterName == "Free Megabytes"
| extend FreeGB = CounterValue  / 1000,
         FreeKB = CounterValue * 1000

Can also use with strcat to create new string columns

Perf
| where TimeGenerated >= ago(10m)
| extend ObjectCounter = strcat(ObjectName, " - ", CounterName)

Project

Project allows us to select a subset of columns

Perl
| project ObjectName,
  CounterName,
  InstanceName

There is a variant called project-away. It will project all except the columns listed

Perf
| where TimeGenerated > ago(1h)
| project-away TenantId,
               SourceSystem,
               CounterPath,
               MG

Distinct

Returns a list of de-duplicated values for columns from the input dataset

Perf
| distinct ObjectName, CounterName

Top

Top returns the first N rows of the dataset when the dataset is sorted by the "by" clause

Perf
| top 20 by TimeGenerated desc

Scalar Operators

  • print -> it is a debugging tool
  • now -> returns the current date/time in UTC
  • ago
  • sort
  • extract
  • parse
  • datetime
  • Timespan Arithmetic
  • startof -> the start of a date
  • endof -> the end of a date
  • between -> it is used to get a range of values
  • todynamic
  • format_datetime / format_timespan
  • datetime_part
  • case
  • iif
  • isempty / isnull
  • split
  • String Operators
  • strcat

sort

Sort will sort the output of a query

Perf
| where TimeGenerated > ago(15m)
| where CounterGame == "Avg. Disk sec/Read"
        and InstanceName == "C:"
| project Computer,
          TimeGenerated,
          ObjectName,
          CounterName, 
          InstanceName, 
          CounterValue
| sort by Computer, TimeGenerated

Sorting with ascending

Perf
| where TimeGenerated > ago(15m)
| where CounterGame == "Avg. Disk sec/Read"
        and InstanceName == "C:"
| project Computer,
          TimeGenerated,
          ObjectName,
          CounterName, 
          InstanceName, 
          CounterValue
| sort by Computer asc, TimeGenerated asc

I can use order by and get the same result as sort

Perf
| where TimeGenerated > ago(15m)
| where CounterGame == "Avg. Disk sec/Read"
        and InstanceName == "C:"
| project Computer,
          TimeGenerated,
          ObjectName,
          CounterName, 
          InstanceName, 
          CounterValue
| order by Computer asc, TimeGenerated asc

extract

This operator gets the characters in a string based on the application of a regular expression placed inside the parenthesis

Perf
| where ObjectName == "LogicalDisk"
  and InstanceName matches regex "[A-Z]":
Perf
| where ObjectName == "LogicalDisk"
  and InstanceName matches regex "[A-Z]:"
| project Computer,
          CounterName,
          extract("[A-Z]:", 0, InstanceName)

Parse

This operator extracts apart from a string and the result will be a new column as a result

Event
| where RenderedDescription startswith "Event code:"
| parse RenderedDescription with "Event code:" myEventCode
                                 "Event message:" myEventMessage
                                 "Event time:" myEventTime
                                 "Event time (UTC):" myEventTimeUTC
                                 "Event ID:" myEventID
                                 "Event sequence:" myEventSequence
                                 "Event occurrence:" *
| project myEventCode, myEventTime, myEventTimeUTC, myEventID, myEventSequence

datetime and timespan

// Determine how long a counter was generated

Perf 
| where CounterName =="Avg. Disk sec/Read"
| where CounterValue > 0 
| take 100
| extend HowLongAgo( now() - TimeGenerated)
| project Computer,
          CounterName,
          CounterValue,
          TimeGenerated,
          HowLongAgo,

// Time since a specific date (i.e. start of the year)

Perf 
| where CounterName =="Avg. Disk sec/Read"
| where CounterValue > 0 
| take 100
| extend HowLongAgo( now() - TimeGenerated),
         TimeSinceStartOverOfYear = (TimeGenerated - datetime(2018-01-01 )
| project Computer,
          CounterName,
          CounterValue,
          TimeGenerated,
          HowLongAgo,
Perf 
| where CounterName =="Avg. Disk sec/Read"
| where CounterValue > 0 
| take 100
| extend HowLongAgo( now() - TimeGenerated),
         TimeSinceStartOverOfYear = (TimeGenerated - datetime(2018-01-01 )
| extend TimeSinceStartOverOfYearInHours = (TimeSinceStartOverOfYear / 1h)
| project Computer,
          CounterName,
          CounterValue,
          TimeGenerated,
          HowLongAgo,
          TimeSinceStartOverOfYear,
          TimeSinceStartOverOfYearInHours

// Get the amount of time used

Usage
| extend Duration = (EndTime - StartTime)
| project Computer,
          StartTime, 
          EndtTime,
          Duration,

between

It is used to get a range of values.

Perf
| where CounterName == "% Free Space"
| where CounterValue between (70.0 .. 100.0)

// it can also be used with dates

Perf
| where CounterName == "% Free Space"
| where TimeGenerated between (datetime(2018-04-01) .. datetime(2018-04-03))

todynamic

It takes json stored in a string and lets you address its individual values

| SecurityAlert
| ExtPros=todynamic(ExtendedProperties)
| project AlertName
          TimeGenerated,
          ExtProps["Alert Start Time (UTC)"],
          ExtProps["Source"], // these are the elements from the Json string (ExtendedProperties) values in alertname    

// We can also use column renaming to give them decent column names

| SecurityAlert
| ExtPros=todynamic(ExtendedProperties)
| project AlertName
          TimeGenerated,
          AlertStartTime = ExtProps["Alert Start Time (UTC)"],
          Source = ExtProps["Source"], // these are the elements from the Json string (ExtendedProperties) values in alertname    
          ActionTaken = ExtProps.ActionTaken

// multilevel notation is supported, such as ExtProps.Level1.Level2

format_datetime

It allows us to return specific date formats

Perf 
| take 100
| project CounterName
          CounterValue,
          TimeGenerated,
          format_datetime(TimeGenerated, "y-M-d"),
          format_datetime(TimeGenerated, "yyyy-MM-dd"),
          format_datetime(TimeGenerated, "MM-dd-yyyy"),

datetime_part

This command extracts part of a date-time.

Perf 
| take 100
| project CounterName
          CounterValue,
          TimeGenerated,
          year = datetime_part("year", TimeGenerated),
          quarter = datetime_part("quarter", TimeGenerated),
          month = datetime_part("month", TimeGenerated),

iif

iif is a mini if/then/else

Perf 
| where CounterName == "%Free Space"
| extend FreeState = iif(CounterValue < 50,
                     "You might want to look at this",
                      "You're OK!")
| project Computer,
          ComputerName,
          CounterValue
          FreeState

case

It is used to create labels based on values

Perf 
| where CounterName == "% Free Space"
| extend FreeLevel = case( CounterValue < 10, "Critical",
                           CounterValue < 30, "Danger",
                           CounterValue < 50, "Look at it",
                           "You´re OK!")
| project Computer,
          CounterName,
          CounterValue,
          FreeLevel  

isempty and isnull

In KQL, strings can be empty, and numeric fields can be null. To handle these situations, we can use isempty and isnull

split

Use split to break a string into an array based on a delimiter

Perf 
| take 100
| extend CounterPathArray = split(CounterPath, "\\")
| extend myComputer = CounterPathArray[2],
         myObjectInstance = CounterPathArray[3],
         myCounterName = CounterPathArray[4]
| project Computer,
          ObjectName
          CounterName,
          myComputer,
          myObjectInstance,
          myCounterName

Advanced Aggregations

  • arg_max / arg_min
  • makeset / makelist
  • mvexpand
  • percentiles
  • dcount -> distinct values
  • dcountif
  • countif
  • pivot
  • top-nested -> it does nested measurements
  • max / min
  • sum / sumif
  • any - > it is a random row generator.

makeset and makelist

makeset - creates an array of json objects by flattening a hierarchy

makelist - It is like makeset, but duplicates are not removed

mvexpand

it takes a dynamic value (like a set or list) and converts it back into individual rows

Datasets

  • let -> it has many uses. First, it can be used to represent a constant value
  • join
  • union
  • datatable -> this generates a datable
  • prev / next
  • toscalar
  • row_cumsum
  • materialize

let

let can hold a function

let dateDiffInDays = (date1: datetime, date2: datetime = datetime(2018-01-01), { (date1 - date2 / 1d)});
print dateDiffInDays(now(), todatetime("2018-05-01"))

join

Perf
| where TimeGenerated >= ago(30d)
| take 1000
| join (alert) on $left.Computer == right.Computer

// joins supports many types: fullouter, inner, innerunique, leftanti, leftantisemi, leftouter, leftsemi, rightanti, rightantisemi, rightouter, rightsemi the default in innerunique join

Perf
| where CounterName == "% Processor Time"
| project Computer,
          CounterName,
          CounterValue,
          PerfTime = TimeGenerated
| join kind = fullouter
             ( Alert
               | project Computer
                         AltertName,
                         AltertDescription,
                         AlertTime= TimeGenerated
               | where AlertName == "High CPU Alert"
              )
         on Computer

prev and next commands

serialize is necessary to get prev and next to work

materialize

this takes a snapshot of a query in memory as a variable

Time Series

  • range
  • make-series -> takes a series of values and converts them to an array of values within a column
  • series_stats
  • series_outliers
  • series_fir
  • series_iir
  • series_fit_line
  • series_fit_2lines

range

it produces a table in steps using the boundaries indicated, incrementing by the value in the step parameter

range myNumbers from 1 to 8 step 1 
range LastWeek from ago(7d) to now() step 1d
range LastWeek from ago(7d) to now() step 1d

make-series

Perf
| where TimeGenerated > ago(3d)
| where CounterName == "Available MBytes"
| make-series avg(CounterValue)  default =0
              on TimeGenerated in range(ago(3d), now(), 1h) by Computer

series_stats

takes a dynamic series of values and produces a list of all the statistical

series_outliers

calculate the outliers into an array

Machine Learning

  • basket
  • autocluster
  • diffpatterns
  • reduce

basket

basket analysis is a for recommendation engines. threshold and other hyperparameters can be used

Perf
| where TimeGenerated >= ago(1d)
| project Computer,
          OBjectName,
          CounterName
| evaluate basket()

autocluster

it has a hyperparameter siezeWeight that has a range of 0-1 with 0.5 being default

let sizeWeight = 0.3;
Event
| where TimeGenerated >= ago(10d)
| project Source,
          EventLog,
          Computer,
          EventLevelName,
          RenderedDescription
| evaluate autocluster(sizeWeight)

diffpatterns

It takes a dataset and splits it into two halves based on two values in a species column. It then returns the most common set of attributes, showing how many were associated with the first value (A) and how many for the second value (B)

Event
| where TimeGenerated >= ago(5d)
| project Source, Computer,EventLevelName
| evaluate diffpatterns(EventLevelName, "Error", "Warning", "~", 0.07)

reduce

Reduce is used to determine patterns in string data. It has a threshold that should be in the range of 0 to 1. Characters is basically a list of characters that should be ignored as "word breakers".

Perf
| where TimeGenerated >= ago(12h)
| project Computer
| reduce by Computer
Perf
| where TimeGenerated >= ago(12h)
| project Computer
| reduce by Computer with threshold = 0.6

Exporting Data

  • we can export CSV file using an option on Microsoft Azure Log Analytics interface
  • Another way to use the data is through PowerBI, we can do this thing using the option of export and the advanced editor from Bussiness Intelligence platform