KQL Kusto Query Language - urcuqui/Data-Science GitHub Wiki
Operators
Search
- search -> will search all columns in a table for the value
- search kind=case_sensitive
- search in (table1, table2) "value"
within a table, you can search a specific column for an exact value
Perf
| search CounterName=="Available MBytes"
Can also search for the value anywhere in the text in the specific column
Perf
| search CounterName:"MBytes"
This searched for a column whose value exactly matched the word Memory
Perf
| search "Memory"
Can also search across all columns using wildcards
Perf
| search "*Bytes*"
Begins with Bytes then any text after it
Perf
| search * startswith "Bytes"
Ends with Bytes
Perf
| search * endswith "Bytes"
Begins with Free, ends with bytes, anything in between
Perf
| search "Free*bytes"
Searches can be combined logically
Perf
| search "Free*bytes" and ("C:" or "D:")
Where
This limits the result set.
Perf
| where TimeGenerated >=ago(1h)
Ago says "start right now, then go back in time N quantity
// Can build up the where by adding to it logically
Perf
| where TimeGenerated >= ago(1h)
and CounterName == "Bytes Received/sec"
// OR logic is allowed to!
Perf
| where TimeGenerated >= ago(1h)
and (CounterName == "Bytes Received/sec"
or
CounterName == "% Processor Time"
)
// Stackable where operators
Perf
| where TimeGenerated >= ago(1h)
| where (CounterName == "Bytes Received/sec"
or
CounterName == "% Processor Time"
)
| where CounterValue > 0
// We can simulate search using where
Perf
| where * has "Bytes"
// we can search for positional matches
Perf
| where * hassuffix "Bytes" // At the end
Perf
| where * contains "Bytes" // contains and has behave the same
// where supports regex as well
Perf
| where InstanceName matches regex "[A-Z]:"
Take
It is used to grab a random number of rows from the input data
Perf
| take 10
// Take can be combined with other language operates
Perl
| where TimeGenerated >= ago(1h)
and CounterName == "Bytes Received/sec"
and CounterValue > 0
| take 5
Count
Returns the number of rows in the input dataset
Perf
| count
// we can also use with other filters
Perf
| where TimeGenerated >= ago(1h)
and CounterName == "Bytes Received/sec"
and CounterValue > 0
| count
Summarize
This operator allows us to count the number of rows column using the count()
Perf
| summarize count() by CounterName
// We can rename the output column for count
Perf
| summarize PerfCount=count()
by ObjectName, CounterName
// With Summarize, we can use other aggregation functions
Perf
| where CounterName == "% Free Space"
| summarize NumberOfEntries=count(),
AverageFreeSpace = avg(CounterValue)
by CounterName
// Bin allows us to summarize into logical groups, like days
Perf
| summarize NumberOfEntries=count()
by bin(TimeGenerated, 1d)
Extend
It creates a calculated column and adds to the result set
Perf
| where CounterName == "Free Megabytes"
| extend FreeGB = CounterValue /1000
Can extend multiple columns at the same time
Perf
| where CounterName == "Free Megabytes"
| extend FreeGB = CounterValue / 1000,
FreeKB = CounterValue * 1000
Can also use with strcat to create new string columns
Perf
| where TimeGenerated >= ago(10m)
| extend ObjectCounter = strcat(ObjectName, " - ", CounterName)
Project
Project allows us to select a subset of columns
Perl
| project ObjectName,
CounterName,
InstanceName
There is a variant called project-away. It will project all except the columns listed
Perf
| where TimeGenerated > ago(1h)
| project-away TenantId,
SourceSystem,
CounterPath,
MG
Distinct
Returns a list of de-duplicated values for columns from the input dataset
Perf
| distinct ObjectName, CounterName
Top
Top returns the first N rows of the dataset when the dataset is sorted by the "by" clause
Perf
| top 20 by TimeGenerated desc
Scalar Operators
- print -> it is a debugging tool
- now -> returns the current date/time in UTC
- ago
- sort
- extract
- parse
- datetime
- Timespan Arithmetic
- startof -> the start of a date
- endof -> the end of a date
- between -> it is used to get a range of values
- todynamic
- format_datetime / format_timespan
- datetime_part
- case
- iif
- isempty / isnull
- split
- String Operators
- strcat
sort
Sort will sort the output of a query
Perf
| where TimeGenerated > ago(15m)
| where CounterGame == "Avg. Disk sec/Read"
and InstanceName == "C:"
| project Computer,
TimeGenerated,
ObjectName,
CounterName,
InstanceName,
CounterValue
| sort by Computer, TimeGenerated
Sorting with ascending
Perf
| where TimeGenerated > ago(15m)
| where CounterGame == "Avg. Disk sec/Read"
and InstanceName == "C:"
| project Computer,
TimeGenerated,
ObjectName,
CounterName,
InstanceName,
CounterValue
| sort by Computer asc, TimeGenerated asc
I can use order by
and get the same result as sort
Perf
| where TimeGenerated > ago(15m)
| where CounterGame == "Avg. Disk sec/Read"
and InstanceName == "C:"
| project Computer,
TimeGenerated,
ObjectName,
CounterName,
InstanceName,
CounterValue
| order by Computer asc, TimeGenerated asc
extract
This operator gets the characters in a string based on the application of a regular expression placed inside the parenthesis
Perf
| where ObjectName == "LogicalDisk"
and InstanceName matches regex "[A-Z]":
Perf
| where ObjectName == "LogicalDisk"
and InstanceName matches regex "[A-Z]:"
| project Computer,
CounterName,
extract("[A-Z]:", 0, InstanceName)
Parse
This operator extracts apart from a string and the result will be a new column as a result
Event
| where RenderedDescription startswith "Event code:"
| parse RenderedDescription with "Event code:" myEventCode
"Event message:" myEventMessage
"Event time:" myEventTime
"Event time (UTC):" myEventTimeUTC
"Event ID:" myEventID
"Event sequence:" myEventSequence
"Event occurrence:" *
| project myEventCode, myEventTime, myEventTimeUTC, myEventID, myEventSequence
datetime and timespan
// Determine how long a counter was generated
Perf
| where CounterName =="Avg. Disk sec/Read"
| where CounterValue > 0
| take 100
| extend HowLongAgo( now() - TimeGenerated)
| project Computer,
CounterName,
CounterValue,
TimeGenerated,
HowLongAgo,
// Time since a specific date (i.e. start of the year)
Perf
| where CounterName =="Avg. Disk sec/Read"
| where CounterValue > 0
| take 100
| extend HowLongAgo( now() - TimeGenerated),
TimeSinceStartOverOfYear = (TimeGenerated - datetime(2018-01-01 )
| project Computer,
CounterName,
CounterValue,
TimeGenerated,
HowLongAgo,
Perf
| where CounterName =="Avg. Disk sec/Read"
| where CounterValue > 0
| take 100
| extend HowLongAgo( now() - TimeGenerated),
TimeSinceStartOverOfYear = (TimeGenerated - datetime(2018-01-01 )
| extend TimeSinceStartOverOfYearInHours = (TimeSinceStartOverOfYear / 1h)
| project Computer,
CounterName,
CounterValue,
TimeGenerated,
HowLongAgo,
TimeSinceStartOverOfYear,
TimeSinceStartOverOfYearInHours
// Get the amount of time used
Usage
| extend Duration = (EndTime - StartTime)
| project Computer,
StartTime,
EndtTime,
Duration,
between
It is used to get a range of values.
Perf
| where CounterName == "% Free Space"
| where CounterValue between (70.0 .. 100.0)
// it can also be used with dates
Perf
| where CounterName == "% Free Space"
| where TimeGenerated between (datetime(2018-04-01) .. datetime(2018-04-03))
todynamic
It takes json stored in a string and lets you address its individual values
| SecurityAlert
| ExtPros=todynamic(ExtendedProperties)
| project AlertName
TimeGenerated,
ExtProps["Alert Start Time (UTC)"],
ExtProps["Source"], // these are the elements from the Json string (ExtendedProperties) values in alertname
// We can also use column renaming to give them decent column names
| SecurityAlert
| ExtPros=todynamic(ExtendedProperties)
| project AlertName
TimeGenerated,
AlertStartTime = ExtProps["Alert Start Time (UTC)"],
Source = ExtProps["Source"], // these are the elements from the Json string (ExtendedProperties) values in alertname
ActionTaken = ExtProps.ActionTaken
// multilevel notation is supported, such as ExtProps.Level1.Level2
format_datetime
It allows us to return specific date formats
Perf
| take 100
| project CounterName
CounterValue,
TimeGenerated,
format_datetime(TimeGenerated, "y-M-d"),
format_datetime(TimeGenerated, "yyyy-MM-dd"),
format_datetime(TimeGenerated, "MM-dd-yyyy"),
datetime_part
This command extracts part of a date-time.
Perf
| take 100
| project CounterName
CounterValue,
TimeGenerated,
year = datetime_part("year", TimeGenerated),
quarter = datetime_part("quarter", TimeGenerated),
month = datetime_part("month", TimeGenerated),
iif
iif is a mini if/then/else
Perf
| where CounterName == "%Free Space"
| extend FreeState = iif(CounterValue < 50,
"You might want to look at this",
"You're OK!")
| project Computer,
ComputerName,
CounterValue
FreeState
case
It is used to create labels based on values
Perf
| where CounterName == "% Free Space"
| extend FreeLevel = case( CounterValue < 10, "Critical",
CounterValue < 30, "Danger",
CounterValue < 50, "Look at it",
"You´re OK!")
| project Computer,
CounterName,
CounterValue,
FreeLevel
isempty and isnull
In KQL, strings can be empty, and numeric fields can be null. To handle these situations, we can use isempty and isnull
split
Use split to break a string into an array based on a delimiter
Perf
| take 100
| extend CounterPathArray = split(CounterPath, "\\")
| extend myComputer = CounterPathArray[2],
myObjectInstance = CounterPathArray[3],
myCounterName = CounterPathArray[4]
| project Computer,
ObjectName
CounterName,
myComputer,
myObjectInstance,
myCounterName
Advanced Aggregations
- arg_max / arg_min
- makeset / makelist
- mvexpand
- percentiles
- dcount -> distinct values
- dcountif
- countif
- pivot
- top-nested -> it does nested measurements
- max / min
- sum / sumif
- any - > it is a random row generator.
makeset and makelist
makeset - creates an array of json objects by flattening a hierarchy
makelist - It is like makeset, but duplicates are not removed
mvexpand
it takes a dynamic value (like a set or list) and converts it back into individual rows
Datasets
- let -> it has many uses. First, it can be used to represent a constant value
- join
- union
- datatable -> this generates a datable
- prev / next
- toscalar
- row_cumsum
- materialize
let
let can hold a function
let dateDiffInDays = (date1: datetime, date2: datetime = datetime(2018-01-01), { (date1 - date2 / 1d)});
print dateDiffInDays(now(), todatetime("2018-05-01"))
join
Perf
| where TimeGenerated >= ago(30d)
| take 1000
| join (alert) on $left.Computer == right.Computer
// joins supports many types: fullouter, inner, innerunique, leftanti, leftantisemi, leftouter, leftsemi, rightanti, rightantisemi, rightouter, rightsemi the default in innerunique join
Perf
| where CounterName == "% Processor Time"
| project Computer,
CounterName,
CounterValue,
PerfTime = TimeGenerated
| join kind = fullouter
( Alert
| project Computer
AltertName,
AltertDescription,
AlertTime= TimeGenerated
| where AlertName == "High CPU Alert"
)
on Computer
prev and next commands
serialize is necessary to get prev and next to work
materialize
this takes a snapshot of a query in memory as a variable
Time Series
- range
- make-series -> takes a series of values and converts them to an array of values within a column
- series_stats
- series_outliers
- series_fir
- series_iir
- series_fit_line
- series_fit_2lines
range
it produces a table in steps using the boundaries indicated, incrementing by the value in the step parameter
range myNumbers from 1 to 8 step 1
range LastWeek from ago(7d) to now() step 1d
range LastWeek from ago(7d) to now() step 1d
make-series
Perf
| where TimeGenerated > ago(3d)
| where CounterName == "Available MBytes"
| make-series avg(CounterValue) default =0
on TimeGenerated in range(ago(3d), now(), 1h) by Computer
series_stats
takes a dynamic series of values and produces a list of all the statistical
series_outliers
calculate the outliers into an array
Machine Learning
- basket
- autocluster
- diffpatterns
- reduce
basket
basket analysis is a for recommendation engines. threshold and other hyperparameters can be used
Perf
| where TimeGenerated >= ago(1d)
| project Computer,
OBjectName,
CounterName
| evaluate basket()
autocluster
it has a hyperparameter siezeWeight that has a range of 0-1 with 0.5 being default
let sizeWeight = 0.3;
Event
| where TimeGenerated >= ago(10d)
| project Source,
EventLog,
Computer,
EventLevelName,
RenderedDescription
| evaluate autocluster(sizeWeight)
diffpatterns
It takes a dataset and splits it into two halves based on two values in a species column. It then returns the most common set of attributes, showing how many were associated with the first value (A) and how many for the second value (B)
Event
| where TimeGenerated >= ago(5d)
| project Source, Computer,EventLevelName
| evaluate diffpatterns(EventLevelName, "Error", "Warning", "~", 0.07)
reduce
Reduce is used to determine patterns in string data. It has a threshold that should be in the range of 0 to 1. Characters is basically a list of characters that should be ignored as "word breakers".
Perf
| where TimeGenerated >= ago(12h)
| project Computer
| reduce by Computer
Perf
| where TimeGenerated >= ago(12h)
| project Computer
| reduce by Computer with threshold = 0.6
Exporting Data
- we can export CSV file using an option on Microsoft Azure Log Analytics interface
- Another way to use the data is through PowerBI, we can do this thing using the option of export and the advanced editor from Bussiness Intelligence platform