PowerQuery Functions - UN-OCHA/hpc-api GitHub Wiki
This page documents a number of PowerQuery functions that perform common data processing tasks for HPC data, and can be used in PowerBI or Excel.
Merge two columns in new one based on condition
the below code combines the two columns indicatorCustomRef and indicatorCustomRef.1 taking the values of the indicatorCustomRef column if not null otherwise take the values from indicatorCustomRef.1.
Merge_indicatorCustomRef= Table.CombineColumns(Merge_adminLevel, {"indicatorCustomRef", "indicatorCustomRef.1"}, each if _{0} is null then _{1} else _{0}, "indicatorCustomRef"),
Replace multiple values in same column in one step
the below code replaces the long org type names with shorten names.
ReplaceORgTypes = Table.ReplaceValue(AddOrgTypeDetails, each[organizationSubtype],
each if List.Contains({"Local NGOs/CSOs", "National NGOs/CSOs"},[organizationSubtype]) then "NNGO"
else if List.Contains({"International NGOs", "Internationally Affiliated Organizations"},[organizationSubtype]) then "INGO"
else if List.Contains({"National Governments","Local Governments"},[organizationSubtype]) then "GOV"
else if List.Contains({"UN Agencies"},[organizationSubtype]) then "UN"
else if List.Contains({"Country-based UN Pooled Funds","Global UN Pooled Funds", "Regional UN Pooled Funds"},[organizationSubtype]) then "PooledFund"
else if List.Contains({"International Private Organizations", "Local/National Private Organizations"},[organizationSubtype]) then "Private"
else if List.Contains({"International Red Cross/Red Crescent Movement", "Red Cross/Red Crescent National Societies"},[organizationSubtype]) then "Red C/M"
else if List.Contains({"Other Multilateral Organizations", "Uncategorized NGOs", "Uncategorized Private Organizations"},[organizationSubtype]) then "OTHER"
else [organizationSubtype], Replacer.ReplaceText,{"organizationSubtype"})
Create calculated column based on values stored in pivoted columns that change dynamically
the below code create a new column called result which retrieve the value calculated based on the values stored in the column [calculationMethod] i.e. Sum, Maximum value, ...etc. for the columns which there names are stored in the column openMPs and table table Open_MPs
Calculated_indicators_measure = Table.AddColumn(Pivot_MPs, "Result", each
try Function.Invoke(
Record.Field(
[Sum=List.Sum, Maximum value=List.Max, Minimum value=List.Min, Average=List.Average, Latest value=List.Last, Manual value=List.Last],
[calculationMethod]??"Manual value"),
{List.RemoveNulls(Record.FieldValues(Record.SelectFields(_,Open_MPs[openMPs])))})
otherwise null)
GetLocationsById
This function takes a Service URL (e.g. https://api.hpc.tools
) and list of location IDs, and produces a detailed table with all locations in HPC up to admin level3 (including deactivated locations), including the admin0, admin1, admin2 and admin3 IDs, names and Pcodes for all of a location's parents, and a name
, id
and pcode
.
This is useful for example to lookup locations that are used in disaggregated caseloads or indicators throughout HPC.
(Service as text, locationIDs as list) => let
Data = List.Transform(locationIDs, (locationId) => Json.Document(Web.Contents(Service & "/v2/public/location/" & Number.ToText(locationId) & "?maxLevel=3&status=all"))[data]),
Countries = Table.FromList(Data, Splitter.SplitByNothing(), {"Admin0"}),
ProcessAdminLevel = (
baseTable as table,
adminLevel as number,
calculateName as function
) => let
namespace = "Admin" & Number.ToText(adminLevel),
// Add columns that will be used across all descendant admin levels
addId = Table.AddColumn(baseTable, namespace & "Id", each Record.Field(_, namespace)[id]),
addName = Table.AddColumn(addId, namespace & "Name", each Record.Field(_, namespace)[name]),
addPcode = Table.Buffer(Table.AddColumn(addName, namespace & "Pcode", each Record.Field(_, namespace)[pcode])),
// Add columns that are specifically for locations that are the current admin level
addFinalId = Table.AddColumn(addPcode, "id", each Record.Field(_, namespace)[id]),
addFinalName = Table.AddColumn(addFinalId, "name", calculateName),
addFinalPCode = Table.AddColumn(addFinalName, "pcode", each Record.Field(_, namespace)[pcode]),
addFinalAdminLevel = Table.AddColumn(addFinalPCode, "AdminLevel", each adminLevel),
// Used in next base
childrenColumn = "Admin" & Number.ToText(adminLevel + 1),
addChildren = Table.Buffer(Table.AddColumn(addPcode, childrenColumn, each Record.Field(_, namespace)[children])),
expandChildren = Table.SelectRows(Table.ExpandListColumn(addChildren, childrenColumn), each Record.Field(_, childrenColumn) <> null)
in
[
nextBase = expandChildren,
final = addFinalAdminLevel
],
Admin0 = ProcessAdminLevel(Countries, 0, each [Admin0][name]),
Admin1 = ProcessAdminLevel(Admin0[nextBase], 1, each [Admin0][name] & " -> " & [Admin1][name]),
Admin2 = ProcessAdminLevel(Admin1[nextBase], 2, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name]),
Admin3 = ProcessAdminLevel(Admin2[nextBase], 3, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name] & " -> " & [Admin3][name]),
Result = Table.Combine({
Admin0[final],
Admin1[final],
Admin2[final],
Admin3[final]
})
in
Result
GetLocationsByISO
This function accepts a list of ISO country codes, and will return all the locations that existin in HPC up to admin level 3 for the countries listed. This function makes use of the function GetLocationsById
(Service as text, locationISOs as list) => let
AllLocations = Json.Document(Web.Contents(Service & "/v2/public/location"))[data],
FilteredLocations = List.Select(AllLocations, each List.Contains(locationISOs, [iso3])),
FilteredLocationIDs = List.Transform(FilteredLocations, each [id])
in
GetLocationsById(Service, FilteredLocationIDs)
ProcessDisaggregatedAttachment
This function can take a caseload or indicator attachment JSON Record, and parse the disaggregated data, producing a table with a separate row for each (location, category, metric) combination.
It requires a table of locations (produced by either GetLocationsById
or GetLocationsByISO
) to use to lookup the location name
and pcode
based on ID.
(attachment as record, locations as table) => let
Values = attachment[attachmentVersion][value][metrics][values],
Metrics = Values[totals],
Disaggregated = if Record.HasFields(Values, "disaggregated") then Values[disaggregated] else null
in
if Disaggregated = null then null else (
let
Categories = Disaggregated[categories],
Locations =
if List.Count(Disaggregated[dataMatrix]) = List.Count(Disaggregated[locations]) + 1 then
Disaggregated[locations]
else
error Error.Record("Mismatch of rows and location length"),
RowIndexes = {0..(List.Count(Disaggregated[dataMatrix]) - 1)},
InitialTable = Table.FromList(RowIndexes, Splitter.SplitByNothing(), { "RowIndex" }),
AddRow = Table.AddColumn(InitialTable, "row", each Disaggregated[dataMatrix]{[RowIndex]}),
AddColIndex = Table.AddColumn(AddRow, "ColIndex", each {0..(List.Count([row]) - 1)}),
ExpandColIndex = Table.ExpandListColumn(AddColIndex, "ColIndex"),
AddCellValue = Table.AddColumn(ExpandColIndex, "Value", each [row]{[ColIndex]}),
FilterNullValues = Table.SelectRows(AddCellValue, each ([Value] <> null and [Value] <> 0 and [Value] <> "")),
AddMetricIndex = Table.AddColumn(FilterNullValues, "MetricIndex", each Number.Mod([ColIndex], List.Count(Metrics))),
AddCategoryIndex = Table.AddColumn(AddMetricIndex, "CategoryIndex", each Number.RoundDown([ColIndex] / List.Count(Metrics))),
AddLocationRef = Table.AddColumn(AddCategoryIndex, "LocationRef", each Locations{[RowIndex] - 1}),
AddCategory = Table.AddColumn(AddLocationRef, "Category", each if [CategoryIndex] < List.Count(Categories) then Categories{[CategoryIndex]}[label] else "Total"),
AddMetric = Table.AddColumn(AddCategory, "Metric", each Metrics{[MetricIndex]}),
AddLocationId = Table.AddColumn(AddMetric, "LocationId", each [LocationRef][id]),
AddLocation = Table.AddColumn(AddLocationId, "Location", each Table.SelectRows(locations, (loc) => loc[id] = [LocationId]){0}),
AddLocationName = Table.AddColumn(AddLocation, "LocationName", each try [Location][name] otherwise "Broken Location: " & [LocationRef][name] ),
AddLocationPcode = Table.AddColumn(AddLocationName, "LocationPcode", each try [Location][pcode] otherwise "Broken Location: " & [LocationRef][name]),
AddLocationAdminLevel = Table.AddColumn(AddLocationPcode, "LocationAdminLevel", each try [Location][AdminLevel] otherwise "Broken Location: " & [LocationRef][name]),
AddMetricType = Table.AddColumn(AddLocationAdminLevel, "MetricType", each [Metric][type]),
AddMetricName = Table.AddColumn(AddMetricType, "MetricName", each [Metric][name][en]),
Result = Table.SelectColumns(AddMetricName, {"Location", "LocationName", "LocationPcode", "LocationAdminLevel", "Category", "MetricType", "MetricName", "Value"})
in
Result
)
GetPaginatedData
The HPC API makes use of pagination for many of its endpoints (such as searching for flows or projects). What this means is that to get a complete set of certain parts of our data, you need to make multiple requests to our API to get the individual pages of data.
This is quite a common practice in API design to prevent requests becoming too large for unbounded data-sets, or prevent requests taking too long to serve.
You can create queries in PowerQuery that dynamically make the appropriate number of requests to get a complete data-set by using recursion.
We use this utility function to help us create such queries:
(Get as function, args) => let
data = Get(args),
nextResults = if data[nextArgs] <> null then @GetPaginatedData(Get, data[nextArgs]) else {}
in
List.Combine({data[items], nextResults})
The first argument (Get
) needs to be a function that:
- accepts an argument that provides enough data for the function to craft an API request for a single page (e.g. the URL, or page ID)
- returns a
Record
with the properties:items
: aList
of the items for the requested pagenextArgs
: either:null
(when the current page is the last page, and no more requests are needed)- a value that will be passed into the
Get
function to get the next page
The second argument (args
) needs to be a value that represents the first page
you would like to fetch, and is passed to Get
for the first request.
Note: when fetching data like this, you ALMOST ALWAYS want to use List.Buffer
to prevent making an unnecessary number of queries to the server.
(see examples below)
GetAllFlows
This function makes use of GetPaginatedData
to get all of the flows for a
particular query.
Note the use of limit=1000
to increase the number of results per page
(1000
is the maximum for this endpoint).
This will reduce the number requests that need to be made overall.
(query as text) => let
GetFlowPage = (url as text) => let
data = Binary.Buffer(Web.Contents(url)),
json = Json.Document(data)
in
[
items = json[data][flows],
nextArgs = if Record.HasFields(json[meta], "nextLink") then json[meta][nextLink] else null
]
in
List.Buffer(GetPaginatedData(
GetFlowPage,
"https://api.hpc.tools/v1/public/fts/flow?limit=1000&" & query
))
Notes:
- The
query
argument takes the parameters that you would put after the?
in the URL. You can add multiple arguments using&
as normal. - You may be surprised that we pass in a URL rather than a page ID for the arguments of this function. The reason we do this is because the API handily provides the URL that we need to use to get the next page of data, so we can use that instead, and simply provide the first URL of the sequence.
So for example, to get all flows for 2019
, you would use it like so:
GetAllFlows("year=2019")