Using the PowerShell module with large data sets - TheJumpCloud/support GitHub Wiki
- What is a Large Data Set?
- Understanding Pagination
- Searching Efficiently
- The Fastest Way To Make Bulk Updates To Users
A large data set within JumpCloud is any endpoint that contains over 100 objects. When an endpoint contains more then 100 objects multiple API calls must be made to paginate and return all data when using a GET
method with the JumpCloud API.
The JumpCloud PowerShell module is a wrapper for the JumpCloud API.
When using the JumpCloud API, and a GET
method to query an endpoint, if more then 100 objects exist for a given endpoint then pagination must be implemented to ensure that all data is returned.
Pagination is implemented using the skip
and limit
query string parameters when making a GET
call to an API endpoint to return only 100 objects at a time.
CURL example:
curl \
-X 'GET' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H "x-api-key: REDACTED" \
"https://console.jumpcloud.com/api/v2/groups?&limit=100&skip=0"
Returns the first 100 groups.
curl \
-X 'GET' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H "x-api-key: REDACTED" \
"https://console.jumpcloud.com/api/v2/groups?&limit=100&skip=100"
Returns groups 101-201.
The JumpCloud PowerShell module contains logic that automatically implements pagination when working with endpoints that contain large (> 100) data sets.
Using the -verbose
parameter when calling a PowerShell function you can see the API calls that are sent to the API when the command is run.
PowerShell example:
Get-JCGroup -Verbose
VERBOSE: GET https://console.jumpcloud.com/api/v2/groups?sort=type,name&limit=100&skip=0 with 0-byte payload
VERBOSE: received 9228-byte response of content type application/json
VERBOSE: Content encoding: utf-8
VERBOSE: GET https://console.jumpcloud.com/api/v2/groups?sort=type,name&limit=100&skip=100 with 0-byte payload
VERBOSE: received 9116-byte response of content type application/json
VERBOSE: Content encoding: utf-8
VERBOSE: GET https://console.jumpcloud.com/api/v2/groups?sort=type,name&limit=100&skip=200 with 0-byte payload
VERBOSE: received 5023-byte response of content type application/json
VERBOSE: Content encoding: utf-8
Example for querying an org that has 250 groups. Three API requests are sent to paginate and return all group data in chunks of 100 objects using the skip
and limit
parameters.
Each API call takes roughly .3
seconds so the number of API calls that need to occur to return the total data set for a given command will be directly proportionate to the time it takes to return the results of a command.
The functions Get-JCSystem and Get-JCUser leverage the JumpCloud search API endpoints. These endpoints lead to drastic performance improvements when working with large data sets.
See the below example.
Get-JCSystem | Measure-Object
Count : 1420
This org has 1420
systems.
Measure-Command { Get-JCSystem }
TotalSeconds : 4.0398495
When calling Get-JCSystem
without a filter parameter all the systems are returned. In this org to return all 1420
systems the request takes 4 seconds
.
Measure-Command { Get-JCSystem | Where-Object hostname -like "win7x64tpm" }
TotalSeconds : 6.5811725
In this example Where-Object
is used to search all systems returned after calling Get-JCSystem
. PowerShell searches through all 1420
objects using the pipeline and Where-Object
is used to find only the objects where the hostname is equal win7x64tpm
. This operation takes 6.5 seconds
.
Measure-Command { Get-JCSystem -hostname "win7x64tpm" }
TotalSeconds : 0.3000114
In this example the -hostname
parameter of Get-JCSystem is populated to search for only systems with a hostname of win7x64tpm
. This operation leverages the JumpCloud search API which requests only systems that match query string send to the API ({"limit":1000,"filter":[{"hostname":"win7x64tpm"}],"skip":0}
).
In this case because there is only one system out of the 1420
systems with a hostname of win7x64tpm
only one API call is needed to return the results.
This operation takes .3
seconds which is drastically faster then 6.5
seconds it takes to generate the same results using Where-Object
and the pipeline.
The functions Get-JCSystem
and Get-JCUser
contain a parameter named -returnProperties
. This parameter allows admins to specify specific attributes to return when requesting user or system information from the API.
Using -returnProperties
speeds up the time it takes to return data from the API because it decreases the size of the data set returned and only returns the requested fields.
The JumpCloud id
value will always be returned regardless of what properties are requested when using -returnProperties
.
See the below example.
Get-JCUser | measure
Count : 3125
This org has 3125
users.
Measure-Command { Get-JCUser }
TotalSeconds : 6.7468863
When calling Get-JCUser
to return all user information for 3125
users this command takes 6.7
seconds.
Each JumpCloud users has 28
properties so this request returns 28 * 3125 = 87500
total pieces of user information.
Measure-Command { Get-JCUser -returnProperties username }
TotalSeconds : 3.6268449
When calling Get-JCUser -returnProperties username
only two fields, id
and username
, are returned for all 3125
users.
This command takes 3.6
seconds to return the requested data because only 2 * 3125 = 6250
total pieces of user information need to be returned.
Using -returnProperties
can also improve the readability of the output within the PowerShell terminal, and be very useful when trying to export specific information to a CSV file using the Export-CSV command. PowerShell objects returned with less then 5 properties by default return in a table view.
Get-JCUser -returnProperties email, username, firstname
email username firstname _id
----- -------- --------- ---
andrew.smith@sajumpcloud.com andrew.smith Andrew 5c7585ca2dff6d18cff186e5
jack.smith@sajumpcloud.com jsmith Jack 5c7585cbbfe8c0429a81555d
michael.scott@sajumpcloud.com michael.scott Michael 5c75cb9a2f2a730f317728ac
dwight.schrute@sajumpcloud.com dwight.schrute Dwight 5c75cb9c4b697f234853ba7d
jim.halpert@sajumpcloud.com jim.halpert Jim 5c75cb9fbfe8c0429a816d4d
pam.beesly@sajumpcloud.com pam.beesly Pam 5c75cba1e74ef15c67f68512
ryan.howard@sajumpcloud.com ryan.howard Ryan 5c75cba411d46a1ba0ece7d1
andrew.bernard@sajumpcloud.com andrew.bernard Andrew 5c75cba6ee8df27f82a800b1
robert.california@sajumpcloud.com robert.california Robert 5c75cba9024ebc546faff260
jan.levinson@sajumpcloud.com jan.levinson Jan 5c75cbab56b1317250b60a25
When updating objects using the API a "PUT" request is sent to the JumpCloud id
of the target object.
By default the command Set-JCUser uses the Username
parameter set.
This allows admins to interact with users via the API without having to know the JumpCloud id
value of the user.
This parameter set converts a JumpCloud username to a JumpCloud id
value and executes additional API for this functionality.
See the below example.
Set-JCUser -Username clark.kent -middlename "super" -Verbose
VERBOSE: POST https://console.jumpcloud.com/api/search/systemusers with 59-byte payload
VERBOSE: PUT https://console.jumpcloud.com/api/Systemusers/5c7d92fb92040061adb77951 with 22-byte payload
An API call to the https://console.jumpcloud.com/api/search/systemusers
endpoint is called to gather the id
value for the user with username clark.kent
.
Then this id
value is then used in the PUT
API request to update the -middlename
of clark.kent
to super
.
This command takes .7
seconds to complete.
Measure-Command -expression {Set-JCUser -Username clark.kent -middlename "super"}
TotalSeconds : 0.7209635
Users can also be updated by specifying a users id
value.
When modifying users using the id
value a single API call is run.
Measure-Command -expression {Set-JCUser -id "5c7d92fb92040061adb77951" -middlename "super"}
TotalSeconds : 0.3216719
This command takes .3
seconds to complete.
Updating users using the id
value is the fastest way to update users.
Users can be updated in bulk efficiently using Get-JCUser
and Set-JCUser
using the -byID
switch parameter.
The -ByID
parameter set is designed to be used when piping information from Get-JCUser
into Set-JCUser
to increase performance.
This will ensure that the ById
parameter set is used which reduces then number of API calls made by the Set-JCUser
command.
Example:
Get-JCUser -department macDev | measure
Count : 10
Ten users have a department value set to macDev
in this organization.
Measure-Command -expression {Get-JCUser -department "macDev" | Set-JCUser -employeeType "Developer"}
TotalSeconds : 8.6117943
Using the pipeline to update 10 users without specifying the byID
parameter takes 8.6
seconds to update 10 users
Measure-Command -expression {Get-JCUser -department "macDev" | Set-JCUser -employeeType "Developer" -byID}
TotalSeconds : 4.4400569
Using the pipeline to update 10 users specifying the byID
parameter takes 4.4
seconds to update 10 users
The -ByID
parameter of Set-JCUser
can also be used to update users when updating user information from a CSV file.
This CSV file must have a column heading with the user id
value for the JumpCLoud users that you wish to update named either id or _id.
See the below example for how to implement this.
Example:
Get-JCUser -department "macDev" -returnProperties username, costCenter, employeeType | Export-CSV "macDevUserUpdate.csv"
For this usecase we will be updating the "costCenter" and "employeeType" for all users with a current department value of "macDev". After running this command a CSV file is created named "macDevUserUpdate.csv"
CSV file pre updates:
"username","costCenter","employeeType","_id"
"dancer.reindeer","Corp","temp","595a8e1f89a46dfd172a191a"
"prancer.reindeer","Corp","temp","595a8e617e900dbc1d745d27"
"dasher.reindeer","Corp","temp","596cd4d4a12fd32f6f3f54d8"
"vixen.reindeer","Corp","temp","596cd588edc7920a53ddbf8a"
CSV file post updates:
"username","costCenter","employeeType","_id"
"dancer.reindeer","Denver","FT","595a8e1f89a46dfd172a191a"
"prancer.reindeer","Boulder","FT","595a8e617e900dbc1d745d27"
"dasher.reindeer","Longmont","PT","596cd4d4a12fd32f6f3f54d8"
"vixen.reindeer","Frisco","PT","596cd588edc7920a53ddbf8a"
Note that each user in the CSV file has been updated
After updating the information in the CSV file Import-CSV is used with Set-JCUser.
Import-Csv ./macDevUserUpdate.csv | Set-JCUser -ByID
The CSV has columns for both the username
field and the _id
field. By specifying -ByID the _id is used to update users