GHAS CodeQL for SAST - EL-Kae/SAST_DAST_Tools GitHub Wiki

What is CodeQL, SAST and Github Actions?

Our main SAST workflow is located here

CodeQL is a security tool that scans for anti-patterns in source code that can lead to a security vulnerability. This tool is provided by our Github Advanced Security (or GHAS) license. Here is Github's official documentation on code scanning with CodeQL.

Static application security testing or SAST, is the process of scanning source code for security vulnerabilities. CodeQL is a SAST tool. In this form of testing, the application isn't in a live environment.

This is implemented in a Github action, Github's CI (continuous integration) tool in a .yml file. These .yml files are stored in the .github/workflow directory of the project's repository. Once there, the workflow only has access, and only works with, that particular source code. This is why this .yml file needs to be placed in the repo for SAST coverage.

A .yml file defines a workflow. A workflow has jobs, and a job has steps. 2 jobs in a workflow run concurrently without knowledge of each other by default. A job runs in a container created by Github Actions, 1 job means 1 container. This behavior can be configured to change. A step must either call another workflow from the Github marketplace using the uses key or run a bash script with run, then the step finishes. Once all the steps of job finishes the container of that job shutdown.

Our implementation for SAST

Below is a breakdown of our Github Action script of CodeQL and its plugins. This is in a file named sast.yml. We use CodeQL and other plugins to run SAST on all our source code. The plugins are needed for programming languages not supported by CodeQL such as PHP or Swift.

Every Github action starts with the scheduler, the block of code that determines when the action runs. Then comes the actual jobs of the workflow. Linguist is configured to be the first job to run. All other jobs are set to wait on the Linguist job to finish. This job detects which programming languages are present in the codebase. The rest are conditional jobs that run based on which languages are detect and use different SAST tools. CodeQL runs for C/C++, C#, Go, Java, Javascript/Typescript, Python and Ruby; Semgrep for PHP and Bash; Mobsf for Mobile Platforms (Java, Kotlin, Objective-C, Swift); and KICS for Ansible, Docker, and Terraform.

Scheduler

As mentioned above every workflow starts with this scheduler block. The code below outlines the 4 conditions for when the workflow should run. In more recent repos main has become the default branch name. However master is include so this workflow can work on repos still using the legacy name master. Development branches develop and dev are also included. In addition to the above triggers this action is set to run every 4 months. It is best security practice to continuous scan source code for vulnerabilities.

  1. This workflow will run on every git push to main, master, develop or dev branch.
  2. Or on pull requests to main, master, develop or dev branch.
  3. workflow_dispatch allows us to run this action from the Actions tab in the Github UI.
  4. schedule allows us to run this action as a cron job. It's set to run every 4 months or 3 times a year.
on:
  push:
    branches: [ main, master, develop, dev ]
  pull_request:
    branches: [ main, master, develop, dev ]
  
  workflow_dispatch:
  
  schedule:
    - cron: '0 0 1 */4 *'

Linguist Job

This job uses Github's linguist tool via command line to detect programming languages used in the repo. Linguist outputs a list of the detected languages in a json string like this {"HTML":"45.50%","JavaScript":"44.50%","Dockerfile":"10.00%"}. In this example, HTML, Javascript and a Dockerfile are present in the source code. This string is passed on to the subsequent jobs. The other jobs will check this string to see if their programming language is present, if so, the job runs.

  1. The name key defines the name of the job, this key will be present in all jobs.
  2. runs-on indicates the operating system of the container. As mentioned, Github Actions creates a separate container for each job. The OS needs to be defined here. This key will also appear in all the following jobs.
  3. steps indicates the beginning of the list of steps for the job.
  4. the first step Checkout repository outputs the source code into the container making it accessible to the tool. This step will be in the following jobs as well.
  5. The second step installs and run Linguist in the container.
  6. ::set-output name=languages::$(github-linguist --json) runs Linguist and puts the result in a variable called languages.
  7. The id key is needed to identify this step so the languages variable can be accessible via steps.linguist.outputs.languages.
  8. The last step prints steps.linguist.outputs.languages for troubleshooting purposes.
  9. steps.linguist.outputs.languages is a step-level variable and therefore can't be accessed by other jobs. Remember each job runs in a separate container. The outputs key, after steps, create a new variable for other jobs to use. This outputs is a job-level output, steps.linguist.outputs.languages uses a step-level output.
  linguist-job:
    name: Linguist
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      
      - name: Run linguist
        id: linguist
        run: | 
          sudo gem install github-linguist
          echo "::set-output name=languages::$(github-linguist --json)"
      
      - name: Print linguist result
        run: echo "${{ steps.linguist.outputs.languages }}"
    
    outputs:
      languages: ${{ steps.linguist.outputs.languages }} 

CodeQL Job for C/C++, C#, Go, Java, Javascript/Typescript, Python and Ruby

CodeQL is used to analyze C/C++, C#, Go, Java, Javascript, Python and Ruby code. CodeQL for Javascript can analyze HTML, Typescript, XML and more. Here is more information on CodeQL supported languages and frameworks. The job below exists more than just once in sast.yml. There is one for each programming language supported by CodeQL. For example one job for just python, one for javascript, etc. Below we will look at just the codeql-java-job as all the CodeQL implementation is similar to one another.

The CodeQL jobs will wait for the linguist-job to finish and take in the outputted variable needs.linguist-job.outputs.languages. This job will check for Java in the string. If this is not found, the job is skipped. In our implementation CodeQL scans for QA related issues as well by using the security-and-quality query suite. Here is more information on that.

  1. The needs key indicates which other job needs to finish first for this job to run, in this case, the linguist-job.
  2. if: contains(needs.linguist-job.outputs.languages, '"Java"') finds Java in the string this job runs.
  3. Just like Linguist, the OS of the container for this job needs to be defined by runs-on.
  4. permissions gives this job needed permissions to the Github repo.
  5. The first step Checkout repository, just like Linguist, makes the source code accessible to the job's container.
  6. Step Initialize CodeQL starts up the CodeQL tool.
  7. Using with, java is picked under the languages key, and the security-and-quality query suite is selected.
  8. If we need to exclude a certain file or directory from static analysis, we can define that in an external config file. To take in that external file the key config-file is used.
  9. The Autobuild step only exist in CodeQL jobs for C/C++, Java and C#.
  10. The Perform CodeQL Analysis step runs the tool and uploads the findings to the repo's Security tab.
  codeql-java-job:
    name: CodeQL (Java)
    needs: linguist-job
    if: contains(needs.linguist-job.outputs.languages, '"Java"')
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
      security-events: write

    steps: 
    - name: Checkout repository
      uses: actions/checkout@v3

    - name: Initialize CodeQL
      uses: github/codeql-action/init@v2
      with:
        languages: 'java'
        queries: +security-and-quality
        # config-file: ./.github/codeql/codeql-config.yml

    - name: Autobuild
      uses: github/codeql-action/autobuild@v2

    #- run: |
    #   make bootstrap
    #   make release
    
    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v2

Semgrep Job for PHP and Bash

The static analysis tool Semgrep is used to scan PHP and Bash code. CodeQL doesn't support these languages. This job follows a similar process as the CodeQL jobs, this job waits for Linguist, takes in the outputted variable, checkouts the source code, scans and uploads the results to the Github repo's Security tab. A marketplace plugin is available but in that documentation it is recommended to use the Semgrep container instead.

Like CodeQL, the job below exists more than once in sast.yml but twice for PHP and Bash. Below we will look at just the semgrep-php-job as both implementations are identical. Semgrep has sets of rules the tool checks for, two being r/php and r/bash. r/php indicates all rules pertaining to PHP same for r/bash for Bash. These are used in this script.

  1. Like the CodeQL jobs, needs is used to wait for Linguist.
  2. if either PHP or Bash is in the needs.linguist-job.outputs.languages string the job will run.
  3. container runs an image in the job container, here we are using Semgrep's returntocorp/semgrep image.
  4. The code is checkout by step Checkout code.
  5. The next step Semgrep runs and outputs the results in a sarif format.
  6. The set of rules is specified via the environment variable SEMGREP_RULES that's passed to the tool.
  7. The semgrep.sarif file is uploaded to the repo's Security tab in the Upload Security Analysis results to GitHub step. Please note files need to be in a .sarif format to be uploaded to the Security tab.
  semgrep-php-job:
    name: Semgrep (PHP)
    needs: linguist-job
    if: contains(needs.linguist-job.outputs.languages, '"PHP"')
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Run Semgrep
        run: semgrep scan --sarif --output=semgrep.sarif
        env:
          SEMGREP_RULES: r/php

      - name: Upload sarif report
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: semgrep.sarif

Mobsf Job for Mobile Platforms (Java, Kotlin, Objective-C, Swift)

The static analysis tool Mobsf from the marketplace is used to analyze all source code relating to mobile platforms. This includes Java, Kotlin, Objective-C, and Swift. Just like PHP, these languages aren't supported by CodeQL. And just like what you've already seen, this job follows the same steps as both tools above. Below is what's different.

  1. the mobsfscan runs the tool with args to output the SARIF file results.sarif.
  2. Upload mobsfscan report uploads the results.sarif back to Github to the Security tab of the repo.
  mobsf-job:
    name: Mobsf Scan (Android/iOS)
    needs: linguist-job
    if: |
      contains(needs.linguist-job.outputs.languages, '"Java"') ||
      contains(needs.linguist-job.outputs.languages, '"Kotlin"') ||
      contains(needs.linguist-job.outputs.languages, '"Objective-C"') ||
      contains(needs.linguist-job.outputs.languages, '"Swift"')
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout the code
      uses: actions/checkout@v3
      
    - name: mobsfscan
      uses: MobSF/mobsfscan@main
      with:
        args: '. --sarif --output results.sarif || true'
    
    - name: Upload mobsfscan report
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: results.sarif

KICS Job for Ansible, Docker, and Terraform

Just like any programming language, IaC languages can have vulnerabilities and be scanned. The tool KICS by Checkmarx is used to analyze our Infrastructure as code (IaC). This is done with the Github marketplace plugin. KICS covers Ansible, Docker and Terraform. The job checks for Dockerfile, HCL and VCL. Dockerfile is the language (according to Linguist) used for Docker. The HashiCorp Configuration Language or HCL is what's used for Terraform. The Varnish Configuration Language or VCL is used in Ansible.

  1. The Run KICS step runs KICS with arguments.
  2. path is needed to tell KICS where to start scanning. The tool later scans files recursively.
  3. ignore_on_exit tells KICS to keep running even if errors are found. By default, KICS stops when it finds a vulnerability.
  4. output_formats tells KICS to output the results in sarif, so it can be uploaded.
  5. Upload sarif report uploads the results.sarif back to Github to the Security tab of the repo.
  kics-job:
    name: KICS (Ansible/Docker/Terraform)
    needs: linguist-job
    if: |
      contains(needs.linguist-job.outputs.languages, '"Dockerfile"') ||
      contains(needs.linguist-job.outputs.languages, '"HCL"') ||
      contains(needs.linguist-job.outputs.languages, '"VCL"')
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout the code
      uses: actions/checkout@v3
    
    - name: Run KICS
      uses: checkmarx/[email protected]
      with:
        path: './'
        ignore_on_exit: all
        output_formats: 'sarif'
    
    - name: Upload sarif report
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: results.sarif