High cpu or memory usage issues - chef/chef-server GitHub Wiki

Logs investigation

  1. Identify which service is using more CPU/Memory. Go through the Access logs from nginx for indentifing the max time taking request. status=200; req_time=346924; rdbms_time=267; rdbms_count=6; authz_time=91; authz_count=3; depsolver_time=146; depsolver_count=1
  2. Find the different response types from the access logs using the below command. awk '{print $9}' /var/log/opscode/nginx/access.log | sort | uniq -c | sort -rn sample output
     424886 200
     106221 404
          2 499
    
  3. The count of requests per second over the life of the log: cat access.log | awk '{print $4}' | uniq -c sample output
    280 rps
    
  4. Find the originating IP address in the access logs to identify the runs
  5. For example considering depsolver is taking more time for response.
  6. Run the fprof for finding which function is taking more time in the erchef console. redbug:start("chef_wm_depsolver:make_json_list", [{print_file, "/tmp/redbug.out"}, {file_size, 150}, {msgs,1}]).
  7. This captures one execution of make_json_list and prints the function args to file.Next, I had to edit the file (redbug.out) to make it a valid erlang term - so removing the function call, and basically just leaving behind the argument I cared about -- the single long list of cookbook versions. After that:
    {ok, [Content|_]} = file:consult("/tmp/redbug.out").`
    % Run fprof to profile the function in question. We'll use the argument data we just captured in redbug as the input.
    fprof:apply(chef_wm_depsolver, make_json_list, [Content, "https://[2600:1f1c:f24:ad01:b300:cfe6:5f15:b905]", 1], 
     [{file, "/tmp/fprof.trace"}]).
    
  8. This handy little escript converts the trace to callgrind format: https://github.com/isacssouza/erlgrind

Load testing setup

Chef Server

  1. Setup chef-server & 4 load servers in the AWS console using below AMI's
    chef-server-load-test-03122021(ami-0e2ad9ec5256c7b4c)
    Load generator backup
    load-gen-backup-03122021(ami-0e2ad9ec5256c7b4c)
    
  2. Upgrade chef-server to specific version by following https://docs.chef.io/server/upgrades/
  3. Create user's and organization in the chef-server using the below commands.
chef-server-ctl org-create test1 test1 > test1_validator.pem
chef-server-ctl user-create testuser1 test test [[email protected]](mailto:[email protected]) password > /home/ubuntu/testuser1.pem
chef-server-ctl org-user-add -a test1 testuser1
  1. Drop the beam files to chef-server. Path - /opt/opscode/embedded/service/opscode-erchef/lib/patches

Chef Load

  1. Use specific branch of chef-load (https://github.com/chef/chef-load/tree/mp/working)
  2. Copy all the users/client keys from chef-server to chef-load for generating load
Copy the pem's to local and then to load servers
scp -i ~/.ssh/aws-shared-chef-infra-server.pem [email protected]:/home/ubuntu/*.pem .

scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem [email protected]:/home/ubuntu/
scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem [email protected]:/home/ubuntu/
scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem [email protected]:/home/ubuntu/
scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem [email protected]:/home/ubuntu/
  1. update the chef-load.toml with chef server & other details.
log_file = "chef-load.log"
chef_server_url = "https://[2600:1f1c:f24:ad01:b300:cfe6:5f15:b905]/organizations/test1/"
client_key = "./testuser1.pem"
client_name = "testuser1"
ohai_json_file = "node.json"
chef_environment = "_default"

num_nodes = 1750
interval = 15
num_actions = 0 # For data collector, which is disabled. 
node_name_prefix = "load4"
node_replacement_rate = 0
run_lists = []

download_cookbooks = "always"
download_cookbooks_scale_factor = 0.01
sleep_duration = 0
node_save_frequency = 0.8

api_get_requests = [ ]
chef_version = "13.2.20"
chef_server_creates_client_key = false
enable_reporting = false
random_data = true
liveness_agent = false

  1. Start the load using the below command. For more information please read the chef-load readme file(https://github.com/chef/chef-load#readme) Example: ./chef-load -c chef-load.toml -i 1 -a 0 -n 10 -p load1a -R .01 start