PA3 - nguyenthanhvuh/class-verification GitHub Wiki
CS695/SWE699 (Fall'03)
PA3: Using a Real-World DNN Verification Tool
Goal
In this assignment, you will use NeuralSAT, a high-performance DNN verification tools to verify and disprove properties. The goal of this PA is to help you familiar with a state-of-the-art DNN verification tool and see how it is applied on real-world large neural networks.
Note that unlike other PAs, you will not do any coding.
Part 1: Scalability Issue of Symbolic Execution
Here you will show that symbolic execution does not scale on large neurons.
- You will create a function
create_randomthat takes as input a list of positive integers representing the sizes of the neurons[#inputs, #ofnumbers, #ofneurons, ..., #outputs], and create a neural network of that size with random weights and bias (e.g., between say -5 and 5). - You will run symbolic execution on this network as you did in PA1, get its results, and time the solving process.
- You will do this for various sizes of neurons until you feel that your computer cannot handle it anymore.
- Note: you will reuse these networks to measure and compare execution time using abstraction in Part 2.
- Finally, you will plot the results: the
x-axiswill show the size of the neurons (for simplicity, the product of all numbers in the input list) and they-axisshows the time.
Feel free to be creative in this process (e.g., play around with different sizes/layers), and discuss your findings in the README file. Below gives some code example
# Create a random network from a given number of inputs, hidden neurons, and outputs
l = [2,3,4,3,3] # 2 inputs, 3 hidden layers (with 3,4,3 hidden neurons for respective hidden layer 1, 2, 3), and 3 outputs.
dnn = create_random_nn(l)
symbolic_state = my_symbolic_execution(dnn)
# time the execution
import time
st = time.time()
_ = z3.solve(symbolic_state)
print('time to solve: ', time.time() - st)
Part 2: Abstract Interval Domain
Here you will implement the interval abstraction for DNN. The technique is based on Chapter 8 of the Intro to DNNV book (https://arxiv.org/pdf/2109.10317.pdf).
You can reuse the following DNN with 2 inputs, 2 hidden layers, and 2 outputs.

-
DNN Encoding: We can encode this DNN and additional details using Python. Note this is exactly as PA1.
# (weights, bias, use activation function relu or not) n00 = ([1.0, -1.0], 0.0, True) n01 = ([1.0, 1.0], 0.0, True) hidden_layer0 = [n00,n01] n10 = ([0.5, -0.2], 0.0, True) n11 = ([-0.5, 0.1], 0.0, True) hidden_layer1 = [n10, n11] # don't use relu for outputs o0 = ([1.0, -1.0], 0.0, False) o1 = ([-1.0, 1.0], 0.0, False) output_layer = [o0, o1] dnn = [hidden_layer0, hidden_layer1, output_layer]In this DNN, the outputs of the neurons in the hidden layers (prefixed with
n) are applied with thereluactivation function, but the outputs of the DNN (prefixed witho) are not. These settings are controlled by theTrue,Falseparameters as shown above. Also, this example does not usebias, i.e., bias values are all 0.0's as shown. Note that all of these settings are parameterized and I deliberately use this example to show these how these parameters are used (e.g.,reluonly applies to hidden neurons, but not outputs). -
Interval (Abstract) Value: Now you will implement a function called
my_interval_executionthat takes as inputs the dnn and a precondition over inputs, represented by the list of intervals over inputs. The function then evaluate ourdnnover these input intervals, and obtain a list of output intervals, representing values of the outputs. More specifically,my_interval_executionreturns a dictionary that contains input, output, and hidden nodes as keys and intervals as values (e.g.,'i0': [-2,1], 'o0':[0, 7], 'n0_0':[-3,4], etc). Withmy_interval_execution, you can do various tasks, including:- Simulating a concrete execution
# finding outputs when inputs are fixed [i0 == 1, i1 == -1] states = my_interval_execution(dnn, pre={'i0':[1,1], 'i1':[-1,-1]}) print(states) # {... 'o0': [1.0, 1.0], 'o1': [-1.0, -1.0]} - Execute using precondition
# executing using precondition 0.1 <= i0 <= 0.3, -0.7 <= i1 <= 0.0 states = my_interval_execution(dnn, pre={'i0':[0.1, 0.3], 'i1':[-0.7, 0.]}) print(states) # {..., 'o0': [0.0, 0.5], 'o1': [-0.5, 0.0]} - Checking assertions, i.e., verifying/proving properties about the DNN
# MANUAL checking if 0.1 <= i0 <= 0.3, -0.7 <= i1 <= 0.0 implies 0 <= o0 <= 1 # check that the output interval of o0 is COMPLETELY BETWEEN [0, 1]. If the answer is YES, then the property is proved. Otherwise, it is not proved. # If the output is {..., 'o0': [0.0, 0.5], 'o1': [-0.5, 0.0]} (like in this example) then YES, this property 0 <= o0 <= 1 is proved because 0.0, 0.5 is included in [0.1] # If the output is {..., 'o0': [0.0, 1.2], 'o1': [-0.5, 0.0]}, then NO, you cannot say anything about this property 0 <= o0 <= 1 being valid or not.
- Simulating a concrete execution
-
Scalability of Interval: now reuse the DNNS that you have created in Part1 and run
my_interval_executionon them. Plot the results and compare with the results from part 1. You should see that interval abstraction scales much better than symbolic execution.
TIPS
- It is strongly recommend that you do interval abstraction on this DNN example by hand first before attempt any coding. This example is small enough that you can work out step by step. For example, you can do these two steps
- First, obtain the intervals of each neurons and outputs by hand (e.g., on a paper) for the given DNN
- Then, put what you have in code but also for the given DNN. Use simple list or tuple for intervals if you'd like (e.g.,
i0 = [0, 3], i1 = [None, -1]represent0 <= i0 <= 3andi1 < -1), then compute intervals for hidden neurons and outputs, etc) - Finally, convert what you have into a general program that would work for any DNN inputs.
Conventions and Requirements
Your program must have the following (all of these are given in the concrete example above)
Part 1
- a main function
create_random(sizes)that- takes as input the list
sizesand returns a DNN of that size as descibed above.
- takes as input the list
- a plot (in pdf, png, or jpeg) showing the scalability of
my_symbolic_executionon various DNNs - In the README, write your observation and conclusion about the scalability of symbolic execution. You must refer to the plot and describe it to support your conclusion.
Part 2
- a main function
my_interval_execution(dnn)that- takes as input the DNN, whose specifications are given as example above (or using
tensorfloworpytorch) - This goes without saying: do not hard-code the DNN in your program (e.g., do not hardcode the example given above in your code). Your program should work with any DNN input.
- returns states of the DNN represented as a dictionary mapping neuron name as string to intervals as lists (or tuples) of size 2
- this goes without saying but do NOT write this function only to work with the given example (i.e., do not hard-code the weight values etc in your program). This function should work with any DNN input (though it could be slow for big DNN's).
- takes as input the DNN, whose specifications are given as example above (or using
- in your resulting formula, you must name
- the
nthinput asin(e.g., the first input isi0) - the
nthoutput ason(e.g., the second output iso1)
- the
- a testing function
test1()where you copy and paste pretty much the complete examples given above to demonstrate that your SE works for the given example. In summary, yourtest1will include- the specification of the 2x2x2x2 DNN in the Figure
- run the interval abstraction on the dnn
- output the state results as dictionary
- simulate concrete execution
- simulate execution with precondition
- MANUAL CHECKING OF ASSERTIONS AS COMMENTS LIKE ABOVE
- another testing function
test2()where you create another DNN:- The DNN will have
- the same number of 2 inputs and 2 outputs, but 3 hidden layers where each layer has 4 neurons, i.e., a 2x4x4x4x2 DNN.
- non-0.0 bias values.
- then on this DNN, do every single tasks you did in
test1().
- The DNN will have
- a plot (in pdf, png, or jpg) showing the scalability of
my_int_executionon the various DNNs generated in Part 1. - In the README, write your observation and conclusion about the scalability of symbolic execution. You must refer to the plot and describe it to support your conclusion.
What to Turn In
You must turn in a zip file containing the following files. Use the following names:
- the zip file must be named
pa2-yourgroupname.zip - One single main source file
se.py. Indicate in the README file on how I can run it.- your file should include at least a
create_randomfunction and amy_interval_executionfunction and two test functionstest1() and test2()as indicated above - you do not need to include code to generate plots. You can use any plotting tool you like (e.g., Excel, Google Sheets, etc) to generate the plots.
- your file should include at least a
- Include the plots (e.g., as images) in your zip file and in your README file.
- a
readme.mdfile (IMPORTANT: note that this is a Markdown file, so that you can include graphics. See the details below, I've made some additional requirement) - For each part:
- 2 screen shots showing how (1) you start your program (e.g., the commands used to and run your program) and (2) the end of the run
- Nothing else, do not include any binary files or anything else.
The readme.md file should be a plain MARKDOWN text file (not a Word file, not an RTF file, not an HTML file) consisting of the following:
-
Complete run show how you run your program AND the its complete outputs, e.g.,
$ python3 se.py #this should execute both test1() and test2 .... #and show me all outputs (something similar to the outputs for the complete example above) -
Answer the following questions about your SE
- Briefly describe what you did for part 1 and part 2.
- Describe the plots and conclusions you have for scalability of both parts 1 and 2 (see above).
- What do you think the most difficult part of this assignment? (e.g., programming, plotting, understanding etc)
- What advice do you give other students (e.g., students in this class next year) about this PA?
Grading Rubric
Grading (out of 30 points):
- 15 points — for a complete interval abstraction (part 2) as described
- 10 points if your code works for the given example (i.e.,
test()produces similar results) - 3 points if your code works for the addition example you created
- 2 points if your code works on my own example (not provided). This means you should try to generalize your work and test it on various scenarios.
- 10 points if your code works for the given example (i.e.,
- 10 points - for complete scalability analysis and plots (part 1 and part 2) as described
- 5 points for
create_randomcode - 5 points for scalability plots and explanations
- 5 points for
- 5 points — for the answers in the README
- 5 — clear explanations on running the program and thorough answers for the given questions
- 2 — vague or hard to understand; omits important details
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain MD