ARIZONA STATE UNIVERSITY CSE 310 — Data Structures and Algorithms — Fall 2021 Project designed and written by Dr. Violet R. Syrotiuk HW 07 Graph Programming There is no milestone for this project; the complete project is due Monday, 12/06/2021 This project involves algorithms on undirected graphs, represented using adjacency lists, to solve the in-band network telemetry (INT) problem. Note: This project must be completed individually, i.e., you must write all the code yourself. Your implementation must use C/C++ and ultimately your code for solving the INT problem will be evaluated for correctness using system which runs on a Linux system. All dynamic memory allocation for the adjacency lists, and any other data structures used in this project, must be done yourself, i.e., using either malloc() and free(), or new() and delete(). You may not use any external libraries to implement any part of this project, aside from the standard libraries for I/O and string functions (stdio.h, stdlib.h, string.h, and their equivalents in C++). If you are in doubt about what libraries you may use, ask for permission. By convention, your program should exit with a return value of zero to signal that all is well; various non-zero values signal abnormal situations. You must use a version control system as you develop your solution to this project and commit to it frequently. Your code repository must be private to prevent anyone from plagiarizing your work. In this project, your submission will require a snap shot of the commits you have made to demonstrate your work. The rest of this project description is organized as follows. §1 the problem of in-band network telemetry is described, along with an algorithm to find a single-path to solve it. The input format is described in §2, while the output format is given in §3. In §4 you will find some requirements on the design of your solution to this project. Finally, §5 provides submission instructions for this project. 1 The Problem of In-band Network Telemetry In computer networks, in-band network telemetry (INT) seeks to achieve high-precision monitoring of the network links (e.g., latency, packet drop rate, time-to-live, jitter, and bandwidth along a link) and also the routers (e.g., queue-depth of interfaces in a router). In this project, we will model a computer network as a connected undirected graph G = (V, E) with routers modelled as the vertices V ; there is an edge (u, v) ∈ E, where u, v ∈ V and u 6= v, if there is a communication link between routers u and v. In this project, we are interested solve the problem of in-band network telemetry by finding a single- path that traverses each edge of the graph once, though it is allowed to visit each vertex more than once. Unfortunately, a graph G has an Euler path if and only if it has at most two vertices of odd-degree. An Euler circuit is a path that starts and ends at the same vertex. As we’ll see, in order to handle an arbitrary connected graph we may need to traverse each edge more than once. 1.1 A Single-Path Solution to the Problem of In-band Network Telemetry At a high level, the steps to find a single-path to solve the in-band telemetry problem are given in Algorithm 1. Let’s step through the algorithm for the graph corresponding to a campus network in Figure 1. In order to guarantee the output is deterministic, lexicographic ordering is used in several steps. Be sure to pay attention to the ordering requirements to ensure your output matches the sample output. 1 Algorithm 1 Single-Path-INT(G,n) 1: // Input: A connected undirected graph G with n vertices; all edge weights are equal to one. 2: // Output: A single-path for collecting in-band network telemetry. 3: Find all odd-degree vertices, O 4: Use Floyd-Warshall to find the length of the shortest-path in G between each pair of vertices in O 5: Find a perfect matching M in O using a greedy algorithm 6: for each edge (u, v) ∈M do 7: Insert a virtual edge (u, v) into G with weight equal to the shortest-path length 8: end for 9: Find an Euler circuit in G including any virtual edges inserted 1.1.1 Find Odd Degree Vertices For the graph G in Figure 1, there are six vertices of odd degree, i.e., O = {3, 4, 5, 7, 14, 15}; these vertices are coloured red in the figure. Figure 1: A graph G corresponding to a cam- pus network. It has six vertices of odd degree, i.e., O = {3, 4, 5, 7, 14, 15} . Table 1: Output of Floyd-Warshall All Pairs Shortest Paths on G. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 0 1 1 2 2 2 3 4 3 3 3 3 4 4 4 2 1 0 1 1 1 2 3 4 2 2 2 3 4 4 4 3 1 1 0 2 2 1 2 3 2 2 3 2 3 3 3 4 2 1 2 0 1 2 3 4 1 2 2 3 4 4 4 5 2 1 2 1 0 1 2 3 2 1 1 2 3 3 3 6 2 2 1 2 1 0 1 2 1 1 2 1 2 2 2 7 3 3 2 3 2 1 0 1 2 2 3 2 1 1 1 8 4 4 3 4 3 2 1 0 3 3 4 3 2 2 1 9 3 2 2 1 2 1 2 3 0 2 3 2 3 3 3 10 3 2 2 2 1 1 2 3 2 0 2 2 3 3 3 11 3 2 3 2 1 2 3 4 3 2 0 1 4 4 4 12 3 3 2 3 2 1 2 3 2 2 1 0 3 3 3 13 4 4 3 4 3 2 1 2 3 3 4 3 0 1 2 14 4 4 3 4 3 2 1 2 3 3 4 3 1 0 1 15 4 4 3 4 3 2 1 1 3 3 4 3 2 1 0 1.1.2 Use the Floyd-Warshall Algorithm The Floyd-Warshall algorithm finds the length of the shortest-path between each pair of vertices in a graph; these are tabulated for the graph G in Figure 1 in Table 1. However, we’re only interested in the subset of odd-degree vertices of V , i.e., those in O; the shortest path between the vertices in O are indicated in boxes in Table 1 and extracted and placed in a separate table, Table 2. Figure 2: The fully connected graph on the six vertices in O. Table 2: Shortest path length in G between vertices in O. 3 4 5 7 14 15 3 0 2 2 2 3 3 4 2 0 1 3 4 4 5 2 1 0 2 1 1 7 2 3 2 0 1 1 14 3 4 3 1 0 1 15 3 4 3 1 1 0 1.1.3 Use a Greedy Algorithm to Find a Minimum Weight Perfect Matching View the vertices in O as the vertices of a fully connected undirected weighted graph G′ as depicted in Figure 2. The weights of the edges (u, v), where u, v ∈ O, correspond to the length of the shortest path between u and v in G; see Table 2. 2 For an undirected graph, a matching is a set of edges such that no two edges in the set are incident on the same vertex. A perfect matching is a matching in which every vertex is matched. We will use a greedy approach to find a minimum weight perfect matching of the fully connected graph on O. If p = |O|, is the number of vertices in the set O, then there are a total p×(p 1)2 edges in G′. First, order the edges (u, v), by their weight which is equal to the length of the shortest path between them in G. If there are ties, i.e., edges of the same weight, then sort the edges lexicographically. Given two edges (u, v) and (w, x), if u < w then (u, v) < (w, x); if u = w then (u, v) < (w, x) if v < x. Initialize the matching M = . Now that the edges are sorted, step through the edges one-by-one in order. If the edge e = (u, v) is not incident with any edge in M, then add e to the matching, i.e., M = M∪e. Otherwise discard the edge e. Following this greedy algorithm on the graph G′ sorts the edges of each length as follows. Edges of length 1: (4,5), (5, 14), (5,15), (7,14), (7,15), (14,15) Edges of length 2: (3,4), (3,5), (3,7), (5,7) Edges of length 3: (3,14), (3,15), (4,7) Edges of length 4: (4,14), (4,15) The edge (4, 5) is selected first and added into M, M = {(4, 5)}. The edges (4, 15) and (5, 15) are not added to M because each is incident with vertex 5. The edge (7, 14) is selected next and added into M, M = {(4, 5), (7, 14)}. The remaining edges of length 1, (7, 15) and (14, 15) are not added to M because each is incident with a vertex in edge (7, 14). All edges of length 2 are incident with a vertex already in M, as is edge (3, 14) of length 3. Edge (3, 15) of length 3 is added into M, M = {(4, 5), (7, 14), (3, 15)}. None of the remaining edges are added to M because each is incident with a vertex in an edge in M. At the conclusion of the greedy algorithm, the matching produced is M = {(4, 5), (7, 14), (3, 15)}. 1.1.4 Insert Virtual Edges into G For each edge in the greedy perfect matching, i.e., for each e = (u, v) ∈ M, insert a virtual edge into G with weight w equal to the length of the shortest path between u and v in G. Virtual edges must always be inserted into the adjacency lists for (u, v) after the original edge in G. Note that this may introduce parallel edges into the graph. Once the |M| edges are inserted into G, all vertices in G will have even degree, guaranteeing that an Euler tour exists; see Figure 3. 1.1.5 Find an Euler Circuit in G Figure 3: The graph G with virtual edges from M inserted. To find an Euler circuit in the graph G which may have virtual edges: 1. Start with an empty stack and an empty circuit. Start at vertex 1, i.e., let u = 1 be the current vertex. 2. If the current vertex has no more edges incident with it, pop the last edge (u, v) on the stack, append (v, u) to the circuit, and set u to the current vertex. Otherwise, i.e., the current u vertex has edges incident with it. Select the next edge incident with u, say (u, v), and push it onto the stack. Remove 3 (u, v) from G (or mark it as used), and let u = v. To guarantee the output format, it is crucial that edges be selected in lexicographic order, always selecting original edges before virtual edges. When a virtual edge is used, you must recover the shortest path in G corresponding to the virtual edge. 3. Repeat step 2 until the current vertex has no more edges incident with it and the stack is empty. Initially, the stack is empty, the circuit is empty, and u = 1. Vertex 1 has two edges incident with it. Select edge (1, 2) and push it onto the stack, and remove (1, 2) from G. Now, u = 2. At vertex 2, select edge (2, 3), push it onto the stack, and remove (2, 3) from G; update u = 3. At vertex 3, select edge (3, 1), push it onto the stack, and remove (3, 1) from G; update u = 1. Now vertex 1 has no more edges incident with it. Pop edge (3, 1) from the stack, append its reverse (1, 3) to the circuit, and set u = 3. The state of the circuit and stack are: Circuit: (1,3) Stack: (1,2), (2,3) At vertex 3, the next edge selected is (3,6) which is pushed onto the stack and removed from G. This process continues until we return to vertex 3, which has no more edges incident with it. At this point, the state of the circuit and stack are: Circuit: (1,3) Stack: (1,2), (2,3), (3,6), (6,5), (5,2), (2,4), (4,5), (5,4), (4,9), (9,6), (6,7), (7,8), (8,15), (15,3) Edges in red correspond to virtual edges. We now pop (15,3) from the stack. Because (15,3) is a virtual edge rather than appending (3,15) to the circuit, we must append the path of length 3 from 3 to 15 onto the circuit. (You should think about how to recover the path when you implement the Floyd-Warshall algorithm.) At this point, the state of the circuit and stack are: Circuit: (1,3), (3,6), (6,7), (7,15) Stack: (1,2), (2,3), (3,6), (6,5), (5,2), (2,4), (4,5), (5,4), (4,9), (9,6), (6,7), (7,8), (8,15) We resume the algorithm at vertex 15 and push a sequence of edges onto the stack returning back to vertex 15. The state of the circuit and stack are: Circuit: (1,3), (3,6), (6,7), (7,15) Stack: (1,2), (2,3), (3,6), (6,5), (5,2), (2,4), (4,5), (5,4), (4,9), (9,6), (6,7), (7,8), (8,15), (15,7), (7,13), (13,14), (14,7), (7,14), (14,15) We have returned to vertex 15 and it has no more edges incident with it. Hence we pop (14,15) from the stack and append (15,14) to the circuit. Vertex 14 also has no more edges incident with it so we pop the virtual edge (7,14) from the stack. Because the length of the virtual edge (7,14) is 1, (14,7) is simply appended to the circuit. Indeed, we keep popping the stack and appending to the circuit until we return to vertex 6, which still has edges incident with it. The state of the circuit and stack are now: Circuit: (1,3), (3,6), (6,7), (7,15), (15,14), (14,7), (7,14), (14,13), (13,7), (7,15), (15,8), (8,7), (7,6) Stack: (1,2), (2,3), (3,6), (6,5), (5,2), (2,4), (4,5), (5,4), (4,9), (9,6) At vertex 6, the next edge selected is (6,10). We push another sequence of edges onto the stack returning back to vertex 6. The state of the circuit and stack are now: Circuit: (1,3), (3,6), (6,7), (7,15), (15,14), (14,7), (7,14), (14,13), (13,7), (7,15), (15,8), (8,7), (7,6) Stack: (1,2), (2,3), (3,6), (6,5), (5,2), (2,4), (4,5), (5,4), (4,9), (9,6), (6,10), (10,5), (5,11), (11,12), (12,6) 4 At this point, there are no more edges incident with 6, so we begin popping edges from the stack and appending them to the circuit. Once again, when we encounter the virtual edge (5,4), it has length 1 so it is simply appended to the circuit. This time, we continue popping until the entire stack is empty, at which point the Euler circuit is found. Circuit: (1,3), (3,6), (6,7), (7,15), (15,14), (14,7), (7,14), (14,13), (13,7), (7,15), (15,8), (8,7), (7,6), (6,12), (12,11) (11,5), (5,10), (10,6), (6,9), (9,4), (4,5), (5,4), (4,2), (2,5), (5,6), (6,3), (3,2), (2,1) Stack: (empty) Note that because our original graph in this example contained vertices of odd degree, the Euler circuit traverses some edges more than once. (Technically, it is not an Euler circuit, it is a solution to what is known as the postman problem.) 2 Project Input You are to read in a connected undirected graph G = (V,E) from a text file whose name is specified as a command line parameter. Your executable, named int, is therefore invoked as: ./int where graph.txt is the name of the text file containing the input for the graph G = (V,E). The format of the input file containing the graph is as follows. The first line in the file contains two positive integers: n m where n = |V | is the number of vertices in G, and m = |E| is the number of edges in G. Each of the next m lines has two space separated fields of the form: u v where u and v are two integers in the interval [1,n] denoting the vertices that are the endpoints of the edge. Because the graph is undirected, each line in the file represents an edge connecting vertex u and vertex v; for this problem we assume that the weight of each edge is equal to one. You must represent G using adjacency lists. Therefore, when you read the value of n, you must initialize an array of pointers of size n for the adjacency list for each vertex. (You are welcome to define a structure for each vertex, e.g., you might want to keep track of the vertex degree in addition to its adjacency list. If you do so, you will instead initialize an array of structures of size n.) As you read each edge, you must insert it into the appropriate adjacency list. If the graph is undirected, then the edge is inserted into both vertex u’s and vertex v’s adjacency list. You must keep the adjacency list in lexicographic order to ensure that the solution you obtain is unique. 3 Project Output Given graph G = (V,E) as input, you are required to produce output at each of the four major stages of Algorithm 1: 1. Output a list of all odd-degree vertices, O. 2. Output of results of the Floyd-Warshall algorithm for each pair of vertices in O. 3. Output the greedy perfect matching M. 4. Output the Euler circuit in G with any virtual edges. Write all output to stdout so that it may be redirected to a file, if desired. In more detail, list of odd-degree vertices in sorted order, separated by spaces: 5 The odd-degree vertices in G: O = { list in sorted order } If O is the empty set then the list consists of the empty set, i.e., {}. For the graph in Figure 1, the output is: The odd-degree vertices in G: O = { 3 4 5 7 14 15 } The output of the Floyd-Warshall algorithm should look similar to Table 2. Control the width of each integer to a field width of 3. For the graph in Figure 1 the output is: Results of Floyd-Warshall on O: | 3 4 5 7 14 15 --- -+- --- --- --- --- --- --- 3 | 0 2 2 2 3 3 4 | 2 0 1 3 4 4 5 | 2 1 0 2 1 1 7 | 2 3 2 0 1 1 14 | 3 4 3 1 0 1 15 | 3 4 3 1 1 0 The output of the greedy perfect matching is a list of the edges in the matching, M. The greedy perfect matching in O: M = { list of edges in the order inserted } If the matching M is empty then the list consists of the empty set, i.e., {}. For the graph in Figure 1, the output is: The greedy perfect matching in O: M = { (4,5) (7,14) (3,15) } Finally, for the Euler circuit, after printing the header, list each edge on a separate line, tab indented. For the graph in Figure 1, the output for the Euler circuit, with most edges suppressed due to length, is: The Euler circuit in G with virtual edges is: (1,3) (3,6) ... (2,1) See the text file campus-out.txt for the complete output for the graph in Figure 1. 4 Project Design Your solution to HW07 must read in a graph in the format described in §2, solve the in-band network telemetry problem using Algorithm 1, and produce output in the format described in §3. You have flexibility in the data structures you use to solve this project except that the graph must be represented using adjacency lists. Your project must use a modular design, i.e., you must provide at least the following source modules: 1. a main program, which coordinates all other modules; 2. a module that provides any utility services; 3. a module that implements the graph data structure, an algorithm to find the odd-degree vertices, an implementation of the Floyd-Warshall algorithm, the algorithm to find a greedy perfect matching, and the algorithm to find the Euler circuit, described in this project; 6 4. a module that implements a stack (you must implement the stack by yourself); 5. a makefile that compiles all modules and links them into the executable named int, for in-band network telemetry. For each module other than the main program, you should have a header file which specifies the data structures and the prototypes of the functions in the module, and a source file which implements all of the functions specified in the header file. 5 Submission Instructions and Grading Policies Submissions are always due before 11:59pm on the deadline date. Do not expect the clock on your machine to be synchronized with the one on Canvas! This project is due on Monday, 12/06/2021. It is your responsibility to submit your project well before the time deadline!!! Late projects are not accepted. An unlimited number of submissions are allowed. The last submission will be graded. 5.1 Requirements for Deadline Before 11:59pm on Monday, 12/06/2021, submit a zip1 file containing the following to Canvas: Project State (10%): In a folder (directory) named State provide a brief report named state.pdf that addresses the following IN THIS ORDER: 1. Describe any problems encountered in your implementation for this project. (1%) 2. Describe any known bugs and/or incomplete query implementation for this project. (1%) 3. While this project is to be complete individually, describe any significant interactions with anyone (peers or otherwise) that may have occurred. (1%) 4. Cite any external books, and/or websites used or referenced. (1%) 5. A screen shot from your version control system showing commits over the development period of this project. (1%) 6. Describe your design decisions, i.e., the choice of data structures and algorithms you have used in solving this project. For example: Did you build a structure for the adjacency lists How did you compute the path between virtual edges (5%) Implementation (40%): In a folder (directory) named Code provide: 1. Your well documented C/C++ source code files using a modular design as described in §4. 2. A file named makefile (with no file extension!) that compiles and links the individual executa- bles into a single executable named int. Be sure to test that your makefile successfully produces the executable named int!) Correctness (50%): In a folder (directory) Your program will be compiled and graded on general.asu.edu If your program does not compile and execute, you will receive zero for correct-ness. Assuming your submission compiles, the correctness of your program will be determined by running it with input that adheres to the specified format, half of which will be provided to you on Canvas prior to the deadline for testing purposes. We use the Linux command diff to compare your output files to the expected output file. 1Use only the zip archiving utility 7 25 points: You will earn 5 points for each of the 5 test cases posted in advance of the deadline that your program passes. 25 points: You will earn 5 points for each of the 5 test cases not posted in advance of the deadline that your program passes. 8