cprover
util/README.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \defgroup util util
3 
4 # Folder util
5 
6 \author Martin Brain, Owen Jones
7 
8 \section data_structures Data Structures
9 
10 This section discusses some of the key data-structures used in the
11 CPROVER codebase.
12 
13 \subsection irept Irept Data Structure
14 
15 There are a large number of kinds of tree structured or tree-like data in
16 CPROVER. `irept` provides a single, unified representation for all of
17 these, allowing structure sharing and reference counting of data. As
18 such `irept` is the basic unit of data in CPROVER. Each `irept`
19 contains[^1] a basic unit of data (of type `dt`) which contains four
20 things:
21 
22 * `data`: A string[^2], which is returned when the `id()` function is
23  used.
24 
25 * `named_sub`: A map from `irep_namet` (a string) to `irept`. This
26  is used for named children, i.e. subexpressions, parameters, etc.
27 
28 * `comments`: Another map from `irep_namet` to `irept` which is used
29  for annotations and other ‘non-semantic’ information
30 
31 * `sub`: A vector of `irept` which is used to store ordered but
32  unnamed children.
33 
34 The `irept::pretty` function outputs the contents of an `irept` directly
35 and can be used to understand and debug problems with `irept`s.
36 
37 On their own `irept`s do not “mean” anything; they are effectively
38 generic tree nodes. Their interpretation depends on the contents of
39 result of the `id` function (the `data`) field. `util/irep_ids.txt`
40 contains the complete list of `id` values. During the build process it
41 is used to generate `util/irep_ids.h` which gives constants for each id
42 (named `ID_`). These can then be used to identify what kind of data
43 `irept` stores and thus what can be done with it.
44 
45 To simplify this process, there are a variety of classes that inherit
46 from `irept`, roughly corresponding to the ids listed (i.e. `ID_or`
47 (the string `"or”`) corresponds to the class `or_exprt`). These give
48 semantically relevant accessor functions for the data; effectively
49 different APIs for the same underlying data structure. None of these
50 classes add fields (only methods) and so static casting can be used. The
51 inheritance graph of the subclasses of `irept` is a useful starting
52 point for working out how to manipulate data.
53 
54 There are three main groups of classes (or APIs); those derived from
55 `typet`, `codet` and `exprt` respectively. Although all of these inherit
56 from `irept`, these are the most abstract level that code should handle
57 data. If code is manipulating plain `irept`s then something is wrong
58 with the architecture of the code.
59 
60 Many of the key descendent of `exprt` are declared in `std_expr.h`. All
61 expressions have a named subfield / annotation which gives the type of
62 the expression (slightly simplified from C/C++ as `unsignedbv_typet`,
63 `signedbv_typet`, `floatbv_typet`, etc.). All type conversions are
64 explicit with an expression with `id() == ID_typecast` and an ‘interface
65 class’ named `typecast_exprt`. One key descendent of `exprt` is
66 `symbol_exprt` which creates `irept` instances with the id of “symbol”.
67 These are used to represent variables; the name of which can be found
68 using the `get_identifier` accessor function.
69 
70 `codet` inherits from `exprt` and is defined in `std_code.h`. It
71 represents executable code; statements in C rather than expressions. In
72 the front-end there are versions of these that hold whole code blocks,
73 but in goto-programs these have been flattened so that each `irept`
74 represents one sequence point (almost one line of code / one
75 semi-colon). The most common descendents of `codet` are `code_assignt`
76 so a common pattern is to cast the `codet` to an assignment and then
77 recurse on the expression on either side.
78 
79 [^1]: Or references, if reference counted data sharing is enabled. It is
80  enabled by default; see the `SHARING` macro.
81 
82 [^2]: Unless `USE_STD_STRING` is set, this is actually
83 a `dstring` and thus an integer which is a reference into a string table
84 
85 \subsection irep_idt Irep_idt and Dstringt
86 
87 Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for
88 keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both
89 `irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default
90 it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one
91 field, an unsigned integer which is an index into a static table of strings.
92 This makes it expensive to create a new string (because you have to look
93 through the whole table to see if it is already there, and add it if it
94 isn't) but very cheap to compare strings (just compare the two integers). It
95 also means that when you have lots of copies of the same string you only have
96 to store the whole string once, which saves space.
97 
98 \dot
99 digraph G {
100  node [shape=box];
101  rankdir="LR";
102  1 [shape=none, label=""];
103  2 [label="command line parsing"];
104  3 [shape=none, label=""];
105  1 -> 2 [label="C files or goto-binaries"];
106  2 -> 3 [label="Command line options, file names"];
107 }
108 \enddot