A New Approach to Variable Visualization: Roles as Visualization Objects

R. Wilhelm

Informatik, Universit¨at des Saarlandes, 66123 Saarbr¨ucken, Germany wilhelm@cs.uni-sb.de

T. M¨uldner

Jodrey School of Computer Science, Acadia University, Wolfville B0P 1X0 N.S. Canada

tomasz.muldner@acadiau.ca 1 Introduction

Our report proposes a new approach to the field of explainingalgorithms. Traditionally, this field borrows from the Software Visualization and Program Animation, and aims at explaining the inner working of the algorithm by using computer graphics and animation. The most common approach is to execute the algorithm step-by-step (forwards or backwards). This execution may be interleaved with showing various important events, for example a visualization of swapping two elements during sorting.

Since this kind of information is by its nature dynamic, the standard way of implementing the visualization is to actually run the program that represents the algorithm, using concrete input data.

Therefore, the understanding of the algorithm requires careful and repetitive selections of representa-tive sets of input data.

We propose a different approach. Instead ofrunninga program onsingleinput data, we suggest toabstractly excuteit onallpossible input data. This is possible by statically analyzing the program to retrieve information that is sufficiently rich to create invariants. It is our belief thatinvariantsplay the most crucial role in understanding an algorithm, and therefore the algorithm explanation should concentrate on visualization of the invariants.

To help the reader understand the above point, consider the insertion into a binary search tree.

Standard algorithm visualization does not focus on the relevant part of the data structure; specifically it shows the entire tree, even though the insertion takes place only in one of the subtrees. Also, it does not show any invariants and thus forces the viewer to deduce them. In our approach, we show an essential invariant, stating that for each inner node, the elements in its left subtree have data components smaller than that of the node and elements in its right subtree have data components larger than that of the node, and we focus on the part of the tree that is being modified.

Let us recall from Wilhelm et al. (2002) the phases in the construction and use of an algorithm explanation system:

1. Thedesignerselects a data structure and an algorithm to be explained. He then selects a set of properties characterizing the data structure, i.e., those properties that would be used to formulate invariants for the algorithm.

2. He runs ashape analysis, as implemented in the TVLA (Three-Valued-Logic Analyzer) system on the program using properties mentioned above. The shape analysis will annotate each pro-gram point with a set ofshape graphsdescribing all heap contents that may exist when program execution reaches this program point. Together, all these shape graphs form an invariant for this program point.

3. He defines a visual representation for the shape graphs computed by the shape analysis.

4. Now, comes the user’s time! What does he get to see and how does he get to see it? The decisions made in point 1 above determinewhatand the decisions made in point 3 above deter-minehow. The user may want to select one particular shape graph at the entry to one particular

program statement and then execute the statement and observe the effect on the chosen shape graph. He may also want to perform avisualized abstract executionof the algorithm. This is a depth-first traversal of the program’s control flow graph together with the sets of shape graphs at the program points. Each step in the visual execution corresponds to the execution of one statement as described above.

Shape analysis is described in Sagiv et al. (1999, 2002); Lev-Ami et al. (2000). A system, TVLA, implementing it is available, (see http://www.math.tau.ac.il/∼rumster/TVLA/ ). A full version of this paper has appeared in Wilhelm et al. (2002). Below, we explain our methodology using a simple example.

2 Algorithm Explanation: An Annotated Example

We explain our approach with the example of insertion sort using a singly linked list. (This paper deals only with algorithms on totally ordered domains.) For our considerations, we use a singly linked list of double values defined as is Figure 1(a). The code of insertion sort is provided in Figure 1(b) (in this code, we show two labels that will be used in discussions below).

The basic idea of this algorithm, see Aho et al. (1983), is to split the sequence of data to be sorted into two consecutive sub-sequences; asorted prefix, in Fig. 1 pointed to by x, and anunsorted suffix, in Fig. 1 pointed to by r, which follows the sorted prefix. Initially, the prefix is empty, and the suffix consists of all elements. The algorithm loops until the suffix is empty (and so the sorted prefix contains all the elements). In each step of the loop, the algorithm inserts into the prefix the element, pointed to by r that immediately follows the prefix, thereby keeping the prefix sorted. Therefore, the basic invariant of this algorithm is: "the prefix is sorted".

The two figures provided on the next page show how the above invariant can be visualized while making a single step in the algorithm (because of space limitation, in this paper we do not explain how the invariant can be derived using the shape analysis; for details, see Wilhelm et al. (2002).

In our figures, horizontal arrows are implicitly labelledn; wherenis the successor in the singly-linked list. First, we explain Fig. 2. Here, the inner loop has been executed several times and the current execution point is P1. This figure shows theabstract heap; a summarization of the concrete heap accomplished using shape analysis. The abstract heap describes an infinite set of singly linked lists with a set of pointers pointing into this list and some conditions on the minimal lengths of sublists.

In the abstract heap shown in the two figures above, there are three kinds of nodes. Unique nodes represent single concrete elements of the heap, and are shown as rectangles.Summary nodesrepresent sets of nodes that are not distinguishable by the selected observation properties, and are shown as cubes. Finally, non-empty lists of nodes are shown as cubes with an additional rectangle at the back.

Dashed self-loops show that we deal with summarized sublists, i.e. sets of heap cells connected by n-components. Finally, the placement on the x-axis indicates the relative value of the represented concrete elements (using the “<” order; here calleddle; i.e. nodes are placed left-to-right according to the dle-order). In summary, the graph in Fig. 2 represents all lists for which the prefix of the list, pointed to by x, up to the element pointed to by pr, is already sorted (the sortedprefix). The next element in the list is pointed to by pointer variable r; it is the one to be inserted in the sorted prefix;

the suffix of the list following this element is represented by t3.

The graph shown in Figure 2 seems to show sortedness and comparability properties of the rep-resented list elements and sublists by placing nodes along the axis. Unfortunately not all elements shown in this figure are comparable with respect to dle; for example, the element pointed to by r is incomparable with any element in the suffix of the compared prefix starting at the element pointed to by l, since it has not been compared with them yet. Therefore, we need a graphical representation of partial orderon nodes, and introduceascending chains(for a definition of a chain, see Wilhelm et al.

(2002). Chains can be visualized in a variety of ways; here we use patterns; the combination of two or more patterns indicates that a node belongs to more than one chain.

The left-to-right placement showing comparability is then only relevant for nodes that are members of the same chain. In Figure 2, we have two chains:

/* list.h */

typedef struct elem { struct elem *n;

double data;

} *List;

/* insertion.c */

#include ‘‘list.h’’

List insert_sort(List x) {

/* sort list x, return sorted list */

List r, pr, l, pl;

r = x;

pr = NULL;

while (r != NULL) {

/*while the suffix is not empty */

l = x;

pl = NULL;

while (l != r) {

/*P1: */ if (l->data > r->data) { /* P2: after positive outcome of comparison, move r before l */

pr->n = r->n;

r->n = l;

if(pl == NULL) x = r;

else

pl->n = r;

r = pr;

break;

}

pl = l;

l = l->n;

}

pr = r;

r = r->n;

}

return x;

}

(a) (b)

Figure 1: (a) Declaration of a linked-list data type in C. (b) A C function that performs insertion sort on a list pointed to by parameterx.

Figure 2: A shape graph at point P1.

Figure 3: A shape graph at point P2 (i.e. after the comparison with positive outcome.

x, t1, pl, l, t2, pr x, t1, pl, r

We will now describe what happens in the transition corresponding to the comparison “l->data>

r->data”, assuming it is true. Fig. 3 shows the next state of the algorithm, when the current program point is P2 (see Fig. 1). At this point, we have the information, that “l->data>r->data”. This transition increases the sortedness of the list as now the element to be inserted has found its place between the pl- and the l-elements. Therefore, the element pointed to by r appears now between these elements.

There is only one chain, representing the sorted prefix, and the transition, from the state shown in Fig.

2 to the state shown in Fig. 3 explains the preservation of the invariant.

Shape analysis applied to this sorting program starts with a description of all possible inputs and has to compute invariants at all program points, in particular show that the postcondition holds, namely that upon termination the elements are sorted. Invariants do not refer to actual data, but only mention relative sizes. For example, an invariant for a sorted sublist could be that the data-component of each but the last element of the list is less than or equal to the data-component of its successor. Shape analysis abstracts from the values of data-components. Data structure elements are thus incomparable to start with, but “gain comparability” by abstractly executing conditions. Mathematically, we would say that only a partial order on the elements of the data structure is known to the shape analysis.

Abstractly executing comparisons may insert elements into chains, i.e. totally ordered subsets of the partial order.

Recall that in our approach to algorithm explanation, we preprocess an algorithm by running a shape analysis on this algorithm and thenvisualize an abstract execution. Instead of executing the algorithm with concrete input data and on concrete states, it is executed with arepresentation of all input data and on abstract states. Therefore, there is no need to determine representative input data, and repeat experiments using various inputs.

3 Conclusion

We have presented a new approach to algorithm explanation based on compile-time techniques used to automatically generate and visualize invariants. The example described in this paper is simple but representative for many similar programs operating on linked lists. Our approach involves several steps and only some of them are fully researched; others require additional work. The invariants are pre-computed by shape analysis, implemented in the TVLA system. This system provides a graphical output format using Graphviz, and supports a traversal strategy designed to visually execute the result of the analysis. There are two main area of the future research. Current analysis produces too many graphs for the users to traverse. Some of the graphs are redundant for the abstract execution and can be eliminated. In addition, we need to find appropriate visualizations for various kinds of nodes in shape graphs. Nevertheless, we believe that our research is very promising because it is the only known

approach which offers the following facilities:

• Automatic generation of invariants; based on observation properties

• Abstract execution that runs onallinput data

• Focusing on the relevant part of the data structure References

A.V. Aho, J.E. Hopcroft, and J. D. Ullman. Data Structures and Algorithms. Addison-Wesley, 1983.

T. Lev-Ami, T. Reps, M. Sagiv, and R. Wilhelm. Putting static analysis to work for verification: A case study. InIntl. Symp. on Software Testing and Analysis, pages 26–38, 2000.

M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape-analysis via 3-valued logic. InProc. of 26th ACM SIGACT-SIGPLAN Symposium on Priciples of Programming Languages, San Antonio, Texas, 1999.

M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM Transactions on Programming Languages and Systems, 24(3):217 – 298, 2002.

R. Wilhelm, T. M¨uldner, and R. Seidel. Algorithm explanation: Visualizing abstract states and invari-ants. In S. Diehl, editor,Software Visualization, LNCS, pages 381 – 394. Springer Verlag, 2002.

In document View of Proceedings of the Second Program Visualization Workshop, 2002 (Sider 27-32)