cprover
/builddir/build/BUILD/cbmc-cbmc-5.9/doc/architectural/howto.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \page tutorial Tutorials
3 
4 \section cbmc_tutorial CBMC Developer Tutorial
5 
6 \tableofcontents
7 
8 \author Kareem Khazem
9 
10 This is an introduction to hacking on the `cprover` codebase. It is not
11 intended as a user guide to `CBMC` or related tools. It is structured
12 as a series of programming exercises that aim to acclimatise the reader
13 to the basic data structures and workflow needed for contributing to
14 `CBMC`.
15 
16 
17 ## Initial setup
18 
19 Clone the [CBMC repository][cbmc-repo] and build it:
20 
21  git clone https://github.com/diffblue/cbmc.git
22  cd cbmc/src
23  make minisat2-download
24  make
25 
26 Ensure that [graphviz][graphviz] is installed on your system (in
27 particular, you should be able to run a program called `dot`). Install
28 [Doxygen][doxygen] and generate doxygen documentation:
29 
30  # In the src directory
31  doxygen doxyfile
32  # View the documentation in a web browser
33  firefox doxy/html/index.html
34 
35 If you've never used doxygen documentation before, get familiar with the
36 layout. Open the generated HTML page in a web browser; search for the
37 class `goto_programt` in the search bar, and jump to the documentation
38 for that class; and read through the copious documentation.
39 
40 The build writes executable programs into several of the source
41 directories. In this tutorial, we'll be using binaries inside the
42 `cbmc`, `goto-instrument`, and `goto-cc` directories. Add these
43 directories to your `$PATH`:
44 
45  # Assuming you cloned CBMC into ~/code
46  export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH
47  # Add to your shell's startup configuration file so that you don't have to run that command every time.
48  echo 'export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH' >> .bashrc
49 
50 Optional: install an image viewer that can read images on stdin.
51 I use [feh][feh].
52 
53 [cbmc-repo]: https://github.com/diffblue/cbmc/
54 [doxygen]: http://www.stack.nl/~dimitri/doxygen/
55 [graphviz]: http://www.graphviz.org/
56 [feh]: https://feh.finalrewind.org/
57 
58 
59 
60 ## Whirlwind tour of the tools
61 
62 CBMC's code is located under the `cbmc` directory. Even if you plan to
63 contribute only to CBMC, it is important to be familiar with several
64 other of cprover's auxiliary tools.
65 
66 
67 ### Compiling with `goto-cc`
68 
69 There should be an executable file called `goto-cc` in the `goto-cc`
70 directory; make a symbolic link to it called `goto-gcc`:
71 
72  cd cbmc/src/goto-cc
73  ln -s "$(pwd)/goto-cc" goto-gcc
74 
75 Find or write a moderately-interesting C program; we'll call it `main.c`.
76 Run the following commands:
77 
78  goto-gcc -o main.goto main.c
79  cc -o main.exe main.c
80 
81 Invoke `./main.goto` and `./main.exe` and observe that they run identically.
82 The version that was compiled with `goto-gcc` is larger, though:
83 
84  du -hs *.{goto,exe}
85 
86 Programs compiled with `goto-gcc` are mostly identical to their `clang`-
87 or `gcc`-compiled counterparts, but contain additional object code in
88 cprover's intermediate representation. The intermediate representation
89 is (informally) called a *goto-program*.
90 
91 
92 ### Viewing goto-programs
93 
94 `goto-instrument` is a Swiss army knife for viewing goto-programs and
95 performing single program analyses on them. Run the following command:
96 
97  goto-instrument --show-goto-functions main.goto
98 
99 Many of the instructions in the goto-program intermediate representation
100 are similar to their C counterparts. `if` and `goto` statements replace
101 structured programming constructs.
102 
103 Find or write a small C program (2 or 3 functions, each containing a few
104 varied statements). Compile it using `goto-gcc` as above into an object
105 file called `main`. If you installed `feh`, try the following command
106 to dump a control-flow graph:
107 
108  goto-instrument --dot main | tail -n +2 | dot -Tpng | feh -
109 
110 If you didn't install `feh`, you can write the diagram to the file and
111 then view it:
112 
113  goto-instrument --dot main | tail -n +2 | dot -Tpng > main.png
114  Now open main.png with an image viewer
115 
116 (the invocation of `tail` is used to filter out the first line of
117 `goto-instrument` output. If `goto-instrument` writes more or less
118 debug output by the time you read this, read the output of
119 `goto-instrument --dot main` and change the invocation of `tail`
120 accordingly.)
121 
122 There are a few other views of goto-programs. Run `goto-instrument -h`
123 and try the various switches under the "Diagnosis" section.
124 
125 
126 
127 ## Learning about goto-programs
128 
129 In this section, you will learn about the basic goto-program data
130 structures. Reading from and manipulating these data structures form
131 the core of writing an analysis for CBMC.
132 
133 
134 ### First steps with `goto-instrument`
135 
136 <div class=memdoc>
137 **Task:** Write a simple C program with a few functions, each containing
138 a few statements. Compile the program with `goto-gcc` into a binary
139 called `main`.
140 </div>
141 
142 
143 The entry point of `goto-instrument` is in `goto_instrument_main.cpp`.
144 Follow the control flow into `goto_instrument_parse_optionst::doit()`, located in `goto_instrument_parse_options.cpp`.
145 At some point in that function, there will be a long sequence of `if` statements.
146 
147 <div class=memdoc>
148 **Task:** Add a `--greet` switch to `goto-instrument`, taking an optional
149 argument, with the following behaviour:
150 
151  $ goto-instrument --greet main
152  hello, world!
153  $ goto-instrument --greet Leperina main
154  hello, Leperina!
155 
156 You will also need to add the `greet` option to the
157 `goto_instrument_parse_options.h` file in order for this to work.
158 Notice that in the `.h` file, options that take an argument are followed
159 by a colon (like `(property):`), while simple switches have no colon.
160 Make sure that you `return 0;` after printing the message.
161 </div>
162 
163 The idea behind `goto-instrument` is that it parses a goto-program and
164 then performs one single analysis on that goto-program, and then
165 returns. Each of the switches in `doit` function of
166 `goto_instrument_parse_options` does something different with the
167 goto-program that was supplied on the command line.
168 
169 
170 ### Goto-program basics
171 
172 At this point in `goto-instrument_parse_options` (where the `if`
173 statements are), the goto-program will have been loaded into the object
174 `goto_functions`, of type `goto_functionst`. This has a field called
175 `function_map`, a map from function names to functions.
176 
177 
178 <div class="memdoc">
179 **Task:** Add a `--print-function-names` switch to `goto-instrument`
180 that prints out the name of every function in the goto-binary. Are
181 there any functions that you didn't expect to see?
182 </div>
183 
184 The following is quite difficult to follow from doxygen, but: the value
185 type of `function_map` is `goto_function_templatet<goto_programt>`.
186 
187 <div class=memdoc>
188 **Task:** Read the documentation for `goto_function_templatet<bodyT>`
189 and `goto_programt`.
190 </div>
191 
192 Each \ref goto_programt object contains a list of
193 \ref goto_programt::instructiont called
194 `instructions`. Each instruction has a field called `code`, which has
195 type \ref codet.
196 
197 <div class=memdoc>
198 **Task:** Add a `--pretty-program` switch to `goto-instrument`. This
199 switch should use the `codet::pretty()` function to pretty-print every
200 \ref codet in the entire program. The strings that `pretty()` generates
201 for a codet look like this:
202 
203  index
204  * type: unsignedbv
205  * width: 8
206  * #c_type: char
207  0: symbol
208  * type: array
209  * size: nil
210  * type:
211  * #source_location:
212  * file: src/main.c
213  * line: 18
214  * function:
215  * working_directory: /some/dir
216  0: unsignedbv
217  * width: 8
218  * #c_type: char
219  ...
220 </div>
221 
222 The sub-nodes of a particular node in the pretty representation are
223 numbered, starting from 0. They can be accessed through the `op0()`,
224 `op1()` and `op2()` methods in the `exprt` class.
225 
226 Every node in the pretty representation has an identifier, accessed
227 through the `id()` function. The file `util/irep_ids.def` lists the
228 possible values of these identifiers; have a quick scan through that
229 file. In the pretty representation above, the following facts are true
230 of that particular node:
231 
232  - `node.id() == ID_index`
233  - `node.type().id() == ID_unsignedbv`
234  - `node.op0().id() == ID_symbol`
235  - `node.op0().type().id() == ID_array`
236 
237 The fact that the `op0()` child has a `symbol` ID menas that you could
238 cast it to a `symbol_exprt` (which is a subtype of `exprt`) using the
239 function `to_symbol_expr`.
240 
241 <div class=memdoc>
242 **Task:** Add flags to `goto-instrument` to print out the following information:
243 * the name of every function that is *called* in the program;
244 * the value of every constant in the program;
245 * the value of every symbol in the program.
246 </div>