Using the PowerShell module with large data sets - TheJumpCloud/support GitHub Wiki

What is a Large Data Set?

A large data set within JumpCloud is any endpoint that contains over 100 objects. When an endpoint contains more then 100 objects multiple API calls must be made to paginate and return all data when using a GET method with the JumpCloud API.

Understanding Pagination

The JumpCloud PowerShell module is a wrapper for the JumpCloud API.

When using the JumpCloud API, and a GET method to query an endpoint, if more then 100 objects exist for a given endpoint then pagination must be implemented to ensure that all data is returned.

Pagination is implemented using the skip and limit query string parameters when making a GET call to an API endpoint to return only 100 objects at a time.

CURL example:

curl \
  -X 'GET' \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H "x-api-key: REDACTED" \
 "https://console.jumpcloud.com/api/v2/groups?&limit=100&skip=0"

Returns the first 100 groups.

curl \
  -X 'GET' \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H "x-api-key: REDACTED" \
 "https://console.jumpcloud.com/api/v2/groups?&limit=100&skip=100"

Returns groups 101-201.

The JumpCloud PowerShell module contains logic that automatically implements pagination when working with endpoints that contain large (> 100) data sets.

Using the -verbose parameter when calling a PowerShell function you can see the API calls that are sent to the API when the command is run.

PowerShell example:

Get-JCGroup -Verbose
VERBOSE: GET https://console.jumpcloud.com/api/v2/groups?sort=type,name&limit=100&skip=0 with 0-byte payload
VERBOSE: received 9228-byte response of content type application/json
VERBOSE: Content encoding: utf-8
VERBOSE: GET https://console.jumpcloud.com/api/v2/groups?sort=type,name&limit=100&skip=100 with 0-byte payload
VERBOSE: received 9116-byte response of content type application/json
VERBOSE: Content encoding: utf-8
VERBOSE: GET https://console.jumpcloud.com/api/v2/groups?sort=type,name&limit=100&skip=200 with 0-byte payload
VERBOSE: received 5023-byte response of content type application/json
VERBOSE: Content encoding: utf-8

Example for querying an org that has 250 groups. Three API requests are sent to paginate and return all group data in chunks of 100 objects using the skip and limit parameters.

Each API call takes roughly .3 seconds so the number of API calls that need to occur to return the total data set for a given command will be directly proportionate to the time it takes to return the results of a command.

Searching Efficiently

The functions Get-JCSystem and Get-JCUser leverage the JumpCloud search API endpoints. These endpoints lead to drastic performance improvements when working with large data sets.

See the below example.

Get-JCSystem | Measure-Object
Count             : 1420

This org has 1420 systems.

Measure-Command { Get-JCSystem }
TotalSeconds      : 4.0398495

When calling Get-JCSystem without a filter parameter all the systems are returned. In this org to return all 1420 systems the request takes 4 seconds.

The Wrong Way to Search

Measure-Command { Get-JCSystem | Where-Object hostname -like "win7x64tpm" }
TotalSeconds      : 6.5811725

In this example Where-Object is used to search all systems returned after calling Get-JCSystem. PowerShell searches through all 1420 objects using the pipeline and Where-Object is used to find only the objects where the hostname is equal win7x64tpm. This operation takes 6.5 seconds.

The Right Way to Search

Measure-Command { Get-JCSystem -hostname "win7x64tpm" }
TotalSeconds      : 0.3000114

In this example the -hostname parameter of Get-JCSystem is populated to search for only systems with a hostname of win7x64tpm. This operation leverages the JumpCloud search API which requests only systems that match query string send to the API ({"limit":1000,"filter":[{"hostname":"win7x64tpm"}],"skip":0}).

In this case because there is only one system out of the 1420 systems with a hostname of win7x64tpm only one API call is needed to return the results.

This operation takes .3 seconds which is drastically faster then 6.5 seconds it takes to generate the same results using Where-Object and the pipeline.

Using the -ReturnProperties Parameter

The functions Get-JCSystem and Get-JCUser contain a parameter named -returnProperties. This parameter allows admins to specify specific attributes to return when requesting user or system information from the API.

Using -returnProperties speeds up the time it takes to return data from the API because it decreases the size of the data set returned and only returns the requested fields.

The JumpCloud id value will always be returned regardless of what properties are requested when using -returnProperties.

See the below example.

Get-JCUser | measure
Count             : 3125

This org has 3125 users.

Measure-Command { Get-JCUser }
TotalSeconds      : 6.7468863

When calling Get-JCUser to return all user information for 3125 users this command takes 6.7 seconds.

Each JumpCloud users has 28 properties so this request returns 28 * 3125 = 87500 total pieces of user information.

Measure-Command { Get-JCUser -returnProperties username }
TotalSeconds      : 3.6268449

When calling Get-JCUser -returnProperties username only two fields, id and username, are returned for all 3125 users.

This command takes 3.6 seconds to return the requested data because only 2 * 3125 = 6250 total pieces of user information need to be returned.

Using -returnProperties can also improve the readability of the output within the PowerShell terminal, and be very useful when trying to export specific information to a CSV file using the Export-CSV command. PowerShell objects returned with less then 5 properties by default return in a table view.

Get-JCUser -returnProperties email, username, firstname

email                                username             firstname _id
-----                                --------             --------- ---
andrew.smith@sajumpcloud.com         andrew.smith         Andrew    5c7585ca2dff6d18cff186e5
jack.smith@sajumpcloud.com           jsmith               Jack      5c7585cbbfe8c0429a81555d
michael.scott@sajumpcloud.com        michael.scott        Michael   5c75cb9a2f2a730f317728ac
dwight.schrute@sajumpcloud.com       dwight.schrute       Dwight    5c75cb9c4b697f234853ba7d
jim.halpert@sajumpcloud.com          jim.halpert          Jim       5c75cb9fbfe8c0429a816d4d
pam.beesly@sajumpcloud.com           pam.beesly           Pam       5c75cba1e74ef15c67f68512
ryan.howard@sajumpcloud.com          ryan.howard          Ryan      5c75cba411d46a1ba0ece7d1
andrew.bernard@sajumpcloud.com       andrew.bernard       Andrew    5c75cba6ee8df27f82a800b1
robert.california@sajumpcloud.com    robert.california    Robert    5c75cba9024ebc546faff260
jan.levinson@sajumpcloud.com         jan.levinson         Jan       5c75cbab56b1317250b60a25

The Fastest Way To Make Bulk Updates To Users

When updating objects using the API a "PUT" request is sent to the JumpCloud id of the target object.

By default the command Set-JCUser uses the Username parameter set.

This allows admins to interact with users via the API without having to know the JumpCloud id value of the user.

This parameter set converts a JumpCloud username to a JumpCloud id value and executes additional API for this functionality.

See the below example.

Set-JCUser -Username clark.kent -middlename "super" -Verbose
VERBOSE: POST https://console.jumpcloud.com/api/search/systemusers with 59-byte payload
VERBOSE: PUT https://console.jumpcloud.com/api/Systemusers/5c7d92fb92040061adb77951 with 22-byte payload

An API call to the https://console.jumpcloud.com/api/search/systemusers endpoint is called to gather the id value for the user with username clark.kent.

Then this id value is then used in the PUT API request to update the -middlename of clark.kent to super.

This command takes .7 seconds to complete.

Measure-Command -expression {Set-JCUser -Username clark.kent -middlename "super"}
TotalSeconds      : 0.7209635

Users can also be updated by specifying a users id value.

When modifying users using the id value a single API call is run.

Measure-Command -expression {Set-JCUser -id "5c7d92fb92040061adb77951" -middlename "super"}
TotalSeconds      : 0.3216719

This command takes .3 seconds to complete.

Updating users using the id value is the fastest way to update users.

Users can be updated in bulk efficiently using Get-JCUser and Set-JCUser using the -byID switch parameter.

Using the -ById Parameter and the Pipeline

The -ByID parameter set is designed to be used when piping information from Get-JCUser into Set-JCUser to increase performance.

This will ensure that the ById parameter set is used which reduces then number of API calls made by the Set-JCUser command.

Example:

Get-JCUser -department macDev | measure
Count             : 10

Ten users have a department value set to macDev in this organization.

Measure-Command -expression {Get-JCUser -department "macDev" | Set-JCUser -employeeType "Developer"}
TotalSeconds      : 8.6117943

Using the pipeline to update 10 users without specifying the byID parameter takes 8.6 seconds to update 10 users

Measure-Command -expression {Get-JCUser -department "macDev"  | Set-JCUser -employeeType "Developer" -byID}
TotalSeconds      : 4.4400569

Using the pipeline to update 10 users specifying the byID parameter takes 4.4 seconds to update 10 users

Using the -ById Parameter and Using a CSV file

The -ByID parameter of Set-JCUser can also be used to update users when updating user information from a CSV file.

This CSV file must have a column heading with the user id value for the JumpCLoud users that you wish to update named either id or _id.

See the below example for how to implement this.

Example:

Get-JCUser -department "macDev" -returnProperties username,  costCenter, employeeType | Export-CSV "macDevUserUpdate.csv"

For this usecase we will be updating the "costCenter" and "employeeType" for all users with a current department value of "macDev". After running this command a CSV file is created named "macDevUserUpdate.csv"

CSV file pre updates:

"username","costCenter","employeeType","_id"
"dancer.reindeer","Corp","temp","595a8e1f89a46dfd172a191a"
"prancer.reindeer","Corp","temp","595a8e617e900dbc1d745d27"
"dasher.reindeer","Corp","temp","596cd4d4a12fd32f6f3f54d8"
"vixen.reindeer","Corp","temp","596cd588edc7920a53ddbf8a"

CSV file post updates:

"username","costCenter","employeeType","_id"
"dancer.reindeer","Denver","FT","595a8e1f89a46dfd172a191a"
"prancer.reindeer","Boulder","FT","595a8e617e900dbc1d745d27"
"dasher.reindeer","Longmont","PT","596cd4d4a12fd32f6f3f54d8"
"vixen.reindeer","Frisco","PT","596cd588edc7920a53ddbf8a"

Note that each user in the CSV file has been updated

After updating the information in the CSV file Import-CSV is used with Set-JCUser.

Import-Csv ./macDevUserUpdate.csv | Set-JCUser -ByID

The CSV has columns for both the username field and the _id field. By specifying -ByID the _id is used to update users

⚠️ **GitHub.com Fallback** ⚠️