GHAS CodeQL for SAST - EL-Kae/SAST_DAST_Tools GitHub Wiki
What is CodeQL, SAST and Github Actions?
Our main SAST workflow is located here
CodeQL is a security tool that scans for anti-patterns in source code that can lead to a security vulnerability. This tool is provided by our Github Advanced Security (or GHAS) license. Here is Github's official documentation on code scanning with CodeQL.
Static application security testing or SAST, is the process of scanning source code for security vulnerabilities. CodeQL is a SAST tool. In this form of testing, the application isn't in a live environment.
This is implemented in a Github action, Github's CI (continuous integration) tool in a .yml
file. These .yml
files are stored in the .github/workflow
directory of the project's repository. Once there, the workflow only has access, and only works with, that particular source code. This is why this .yml
file needs to be placed in the repo for SAST coverage.
A .yml
file defines a workflow. A workflow has jobs, and a job has steps. 2 jobs in a workflow run concurrently without knowledge of each other by default. A job runs in a container created by Github Actions, 1 job means 1 container. This behavior can be configured to change. A step must either call another workflow from the Github marketplace using the uses
key or run a bash script with run
, then the step finishes. Once all the steps of job finishes the container of that job shutdown.
Our implementation for SAST
Below is a breakdown of our Github Action script of CodeQL and its plugins. This is in a file named sast.yml
. We use CodeQL and other plugins to run SAST on all our source code. The plugins are needed for programming languages not supported by CodeQL such as PHP or Swift.
Every Github action starts with the scheduler, the block of code that determines when the action runs. Then comes the actual jobs of the workflow. Linguist is configured to be the first job to run. All other jobs are set to wait on the Linguist job to finish. This job detects which programming languages are present in the codebase. The rest are conditional jobs that run based on which languages are detect and use different SAST tools. CodeQL runs for C/C++, C#, Go, Java, Javascript/Typescript, Python and Ruby; Semgrep for PHP and Bash; Mobsf for Mobile Platforms (Java, Kotlin, Objective-C, Swift); and KICS for Ansible, Docker, and Terraform.
Scheduler
As mentioned above every workflow starts with this scheduler block. The code below outlines the 4 conditions for when the workflow should run. In more recent repos main
has become the default branch name. However master
is include so this workflow can work on repos still using the legacy name master
. Development branches develop
and dev
are also included. In addition to the above triggers this action is set to run every 4 months. It is best security practice to continuous scan source code for vulnerabilities.
- This workflow will run
on
every gitpush
tomain
,master
,develop
ordev
branch. - Or
on
pull requests
tomain
,master
,develop
ordev
branch. workflow_dispatch
allows us to run this action from the Actions tab in the Github UI.schedule
allows us to run this action as a cron job. It's set to run every 4 months or 3 times a year.
on:
push:
branches: [ main, master, develop, dev ]
pull_request:
branches: [ main, master, develop, dev ]
workflow_dispatch:
schedule:
- cron: '0 0 1 */4 *'
Linguist Job
This job uses Github's linguist tool via command line to detect programming languages used in the repo. Linguist outputs a list of the detected languages in a json string like this {"HTML":"45.50%","JavaScript":"44.50%","Dockerfile":"10.00%"}
. In this example, HTML, Javascript and a Dockerfile are present in the source code. This string is passed on to the subsequent jobs. The other jobs will check this string to see if their programming language is present, if so, the job runs.
- The
name
key defines the name of the job, this key will be present in all jobs. runs-on
indicates the operating system of the container. As mentioned, Github Actions creates a separate container for each job. The OS needs to be defined here. This key will also appear in all the following jobs.steps
indicates the beginning of the list of steps for the job.- the first step
Checkout repository
outputs the source code into the container making it accessible to the tool. This step will be in the following jobs as well. - The second step installs and
run
Linguist in the container. ::set-output name=languages::$(github-linguist --json)
runs Linguist and puts the result in a variable calledlanguages
.- The
id
key is needed to identify this step so thelanguages
variable can be accessible viasteps.linguist.outputs.languages
. - The last step prints
steps.linguist.outputs.languages
for troubleshooting purposes. steps.linguist.outputs.languages
is a step-level variable and therefore can't be accessed by other jobs. Remember each job runs in a separate container. Theoutputs
key, after steps, create a new variable for other jobs to use. Thisoutputs
is a job-level output,steps.linguist.outputs.languages
uses a step-level output.
linguist-job:
name: Linguist
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Run linguist
id: linguist
run: |
sudo gem install github-linguist
echo "::set-output name=languages::$(github-linguist --json)"
- name: Print linguist result
run: echo "${{ steps.linguist.outputs.languages }}"
outputs:
languages: ${{ steps.linguist.outputs.languages }}
CodeQL Job for C/C++, C#, Go, Java, Javascript/Typescript, Python and Ruby
CodeQL is used to analyze C/C++, C#, Go, Java, Javascript, Python and Ruby code. CodeQL for Javascript can analyze HTML, Typescript, XML and more. Here is more information on CodeQL supported languages and frameworks. The job below exists more than just once in sast.yml
. There is one for each programming language supported by CodeQL. For example one job for just python, one for javascript, etc. Below we will look at just the codeql-java-job
as all the CodeQL implementation is similar to one another.
The CodeQL jobs will wait for the linguist-job
to finish and take in the outputted variable needs.linguist-job.outputs.languages
. This job will check for Java
in the string. If this is not found, the job is skipped. In our implementation CodeQL scans for QA related issues as well by using the security-and-quality
query suite. Here is more information on that.
- The
needs
key indicates which other job needs to finish first for this job to run, in this case, thelinguist-job
. if: contains(needs.linguist-job.outputs.languages, '"Java"')
findsJava
in the string this job runs.- Just like Linguist, the OS of the container for this job needs to be defined by
runs-on
. permissions
gives this job needed permissions to the Github repo.- The first step
Checkout repository
, just like Linguist, makes the source code accessible to the job's container. - Step
Initialize CodeQL
starts up the CodeQL tool. - Using
with
,java
is picked under thelanguages
key, and thesecurity-and-quality
query suite is selected. - If we need to exclude a certain file or directory from static analysis, we can define that in an external config file. To take in that external file the key
config-file
is used. - The
Autobuild
step only exist in CodeQL jobs for C/C++, Java and C#. - The
Perform CodeQL Analysis
step runs the tool and uploads the findings to the repo's Security tab.
codeql-java-job:
name: CodeQL (Java)
needs: linguist-job
if: contains(needs.linguist-job.outputs.languages, '"Java"')
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: 'java'
queries: +security-and-quality
# config-file: ./.github/codeql/codeql-config.yml
- name: Autobuild
uses: github/codeql-action/autobuild@v2
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
Semgrep Job for PHP and Bash
The static analysis tool Semgrep is used to scan PHP and Bash code. CodeQL doesn't support these languages. This job follows a similar process as the CodeQL jobs, this job waits for Linguist, takes in the outputted variable, checkouts the source code, scans and uploads the results to the Github repo's Security tab. A marketplace plugin is available but in that documentation it is recommended to use the Semgrep container instead.
Like CodeQL, the job below exists more than once in sast.yml
but twice for PHP and Bash. Below we will look at just the semgrep-php-job
as both implementations are identical. Semgrep has sets of rules the tool checks for, two being r/php
and r/bash
. r/php
indicates all rules pertaining to PHP same for r/bash
for Bash. These are used in this script.
- Like the CodeQL jobs,
needs
is used to wait for Linguist. if
eitherPHP
orBash
is in theneeds.linguist-job.outputs.languages
string the job will run.container
runs an image in the job container, here we are using Semgrep'sreturntocorp/semgrep
image.- The code is checkout by step
Checkout code
. - The next step Semgrep runs and outputs the results in a sarif format.
- The set of rules is specified via the environment variable
SEMGREP_RULES
that's passed to the tool. - The
semgrep.sarif
file is uploaded to the repo's Security tab in theUpload Security Analysis results to GitHub
step. Please note files need to be in a.sarif
format to be uploaded to the Security tab.
semgrep-php-job:
name: Semgrep (PHP)
needs: linguist-job
if: contains(needs.linguist-job.outputs.languages, '"PHP"')
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Run Semgrep
run: semgrep scan --sarif --output=semgrep.sarif
env:
SEMGREP_RULES: r/php
- name: Upload sarif report
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: semgrep.sarif
Mobsf Job for Mobile Platforms (Java, Kotlin, Objective-C, Swift)
The static analysis tool Mobsf from the marketplace is used to analyze all source code relating to mobile platforms. This includes Java, Kotlin, Objective-C, and Swift. Just like PHP, these languages aren't supported by CodeQL. And just like what you've already seen, this job follows the same steps as both tools above. Below is what's different.
- the
mobsfscan
runs the tool withargs
to output the SARIF fileresults.sarif
. Upload mobsfscan report
uploads theresults.sarif
back to Github to the Security tab of the repo.
mobsf-job:
name: Mobsf Scan (Android/iOS)
needs: linguist-job
if: |
contains(needs.linguist-job.outputs.languages, '"Java"') ||
contains(needs.linguist-job.outputs.languages, '"Kotlin"') ||
contains(needs.linguist-job.outputs.languages, '"Objective-C"') ||
contains(needs.linguist-job.outputs.languages, '"Swift"')
runs-on: ubuntu-latest
steps:
- name: Checkout the code
uses: actions/checkout@v3
- name: mobsfscan
uses: MobSF/mobsfscan@main
with:
args: '. --sarif --output results.sarif || true'
- name: Upload mobsfscan report
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: results.sarif
KICS Job for Ansible, Docker, and Terraform
Just like any programming language, IaC languages can have vulnerabilities and be scanned. The tool KICS by Checkmarx is used to analyze our Infrastructure as code (IaC). This is done with the Github marketplace plugin. KICS covers Ansible, Docker and Terraform. The job checks for Dockerfile
, HCL
and VCL
. Dockerfile
is the language (according to Linguist) used for Docker. The HashiCorp Configuration Language or HCL is what's used for Terraform. The Varnish Configuration Language or VCL is used in Ansible.
- The
Run KICS
step runs KICS with arguments. path
is needed to tell KICS where to start scanning. The tool later scans files recursively.ignore_on_exit
tells KICS to keep running even if errors are found. By default, KICS stops when it finds a vulnerability.output_formats
tells KICS to output the results in sarif, so it can be uploaded.Upload sarif report
uploads theresults.sarif
back to Github to the Security tab of the repo.
kics-job:
name: KICS (Ansible/Docker/Terraform)
needs: linguist-job
if: |
contains(needs.linguist-job.outputs.languages, '"Dockerfile"') ||
contains(needs.linguist-job.outputs.languages, '"HCL"') ||
contains(needs.linguist-job.outputs.languages, '"VCL"')
runs-on: ubuntu-latest
steps:
- name: Checkout the code
uses: actions/checkout@v3
- name: Run KICS
uses: checkmarx/[email protected]
with:
path: './'
ignore_on_exit: all
output_formats: 'sarif'
- name: Upload sarif report
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: results.sarif