LMP1: I/O and Filesystems ========================= Welcome to LMP1, the first long MP. LMP1 is the first stage of a project aimed at creating a simple yet functional networked filesystem. In this MP, you will learn about and use POSIX file system calls, while subsequent LMPs will introduce memory management, messaging, and networking functionality. If you implement all parts of this MP correctly, you will be able to reuse your code for future MPs. This first LMP concentrates on the file I/O portion of the project. Specifically, you will implement a custom filesystem and test its performance using a filesystem benchmark. A benchmark is an application used to test the performance of some aspect of the system. We will be using Bonnie, a real filesystem benchmark, to test various performance aspects of the filesystem we implement. LMP1 consists of four steps: 1. Read the code; run the Bonnie benchmark and the LMP1 test suite. 2. Implement Test Suite 1 functionality, encompassing basic file I/O operations. 3. Implement Test Suite 2-4 functionality (directory operations, file creation/deletion, and recursive checksumming). 4. Modify Bonnie to use your client-server file I/O methods. Code structure The code for this project is structured according to the client-server model. The client code (filesystem benchmark) will interact with the server (filesystem) only through interface functions defined in fileio.h: int file_read(char *path, int offset, void *buffer, size_t bufbytes); int file_info(char *path, void *buffer, size_t bufbytes); int file_write(char *path, int offset, void *buffer, size_t bufbytes); int file_create(char *path,char *pattern, int repeatcount); int file_remove(char *path); int dir_create(char *path); int dir_list(char *path,void *buffer, size_t bufbytes); int file_checksum(char *path); int dir_checksum(char *path); These functions represent a simple interface to our filesystem. In Steps 2 and 3 of this MP, you will write the code for functions implementing this interface, replacing the stub code in fileio.c. In Step 4, you will modify a Bonnie method to use this interface, rather than calling the normal POSIX I/O functions directly. The purpose of Step 4 is to help test our implementation. Step 1: Understanding the code 1. Compile the project, execute Bonnie and the test framework. Note: you may need to add execute permissions to the .sh files using the command chmod +x *.sh. Try the following: make ./lmp1 (this runs the Bonnie benchmark it may take a little while) ./lmp1 -test suite1 (run Test Suite 1 this has to work for stage1) make test (run all tests this has to work for stage2) 2. Read through the provided .c and .h files and understand how this project is organized: bonnie.c a version of the filesystem benchmark fileio.c file I/O functions to be implemented fileio.h declaration of file I/O functions restart.c restart library (available for use in fileio.c) restart.h declaration of restart library functions util.c useful utility functions util.h declaration of useful utility functions and macros In particular, pay close attention to the comments in fileio.h and bonnieb.c. You should understand what each of the following functions in bonnie.c does before undertaking the remainder of the MP: fill_file_char() file_read_rewrite() file_read_rewrite_block() fill_file_block() fill_read_getc() file_read_chunk() newfile() Step 2: Basic I/O operations - Implement file_read, file_write and file_info operations in fileio.c. If done correctly, your code should pass all suite1 tests: ./lmp1 -test suite1 Running tests 1.read ::pass 2.info ::pass 3.write ::pass Test Results:3 tests,3 passed,0 failed. IMPORTANT: fileio.c is the only file you should modify for this step. Step 3: More filesystem operations - Implement file and directory operations for suite2 (dir_create and dir_list), suite3 (file_create and file_remove), and suite4 (file_checksum and dir_checksum). You can test each operation and test suite individually: ./lmp1 -test dirlist ./lmp1 -test suite2 All tests should now pass: make test Running tests 1.read ::pass 2.info ::pass 3.write ::pass 4.dirlist ::pass 5.dircreate ::pass 6.remove ::pass 7.create ::pass 8.filechecksum ::pass 9.dirchecksum ::pass Test Results:9 tests,9 passed,0 failed. Step 4: Performance testing In this step, we will change parts of Bonnie to use our filesystem interface. Make the function file_read_rewrite_block() in bonnie.c to call your fileio.c functions instead of POSIX I/O operations. When answering the questions below, use this modified version of bonnie.c. Before making this change, its a good idea to write pseudocode comments for what each part of file_read_rewrite_block() does, so that you understand the code and can perform the change correctly. There may not be an exact one-to-one correspondence between our filesystem interface and the POSIX commands. Note: In future LMPs, we will be using the fileio.h interface in a similar manner, but we will call the functions remotely, via a message queue. Questions Q1. Briefly explain what the following code from bonnie.c does: if ((words = read(fd, (char *) buf, Chunk)) == -1) Q2. Is the above an example of a block read or a character read? What is the value of the variable words if the read succeeds? Fails? Q3. Explain the meaning of the flag value (O_CREAT | O_WRONLY | O_APPEND) for the POSIX function open(). Q4. Run Bonnie. What is being measured by each test function? Q5. Look at the summary results from the Bonnie run in Q4. Does Bonnie measure latency, throughput or something else? Justify your answer. Q6. Compare character reads with block reads using Bonnie. Which is faster? Why do you think this is the case? Q7. Copy and paste the performance measures output when running Bonnie benchmarks in a local directory and again in an NFS-mounted directory. Is one kind of disk access noticeably slower over the network, or are all tests significantly slower? Your home directory may be an NFS mount, whereas /tmp and /scratch are local disks. To test your code in /tmp, do the following: mkdir /tmp/your_username cp lmp1 /tmp/your_username cd /tmp/your_username ./lmp1 (record the output) cd rm -fr /tmp/your_username Q8. How does Bonnie handle incomplete reads, e.g., due to interruptions from signals? Justify why Bonnies approach is good or bad for a filesystem benchmark program. Q9. By now you should be very familiar with the self-evaluation test harness we provide for the MPs. Examine the function test_file_read() in lmp1_tests.c, which tests your file_read() function from Step 2. What does this test check for, specifically? You may want to copy and paste the code for this function in your answer, and annotate each quit_if or group of related quit_ifs with a comment. LMP2: Memory Management ======================= This machine problem will focus on memory. You will implement your own version of malloc() and free(), using a variety of allocation strategies. You will be implementing a memory manager for a block of memory. You will implement routines for allocating and deallocating memory, and keeping track of what memory is in use. You will implement four strategies for selecting in which block to place a new requested memory black: 1) First-fit: select the first suitable block with smallest address. 2) Best-fit: select the smallest suitable block. 3) Worst-fit: select the largest suitable block. 4) Next-fit: select the first suitable block after the last block allocated (with wraparound from end to beginning). Here, suitable means free, and large enough to fit the new data. Here are the functions you will need to implement: initmem(): Initialize memory structures. mymalloc(): Like malloc(), this allocates a new block of memory. myfree(): Like free(), this deallocates a block of memory. mem_holes(): How many free blocks are in memory? mem_allocated(): How much memory is currently allocated? mem_free(): How much memory is NOT allocated? mem_largest_free(): How large is the largest free block? mem_small_free(): How many small unallocated blocks are currently in memory? mem_is_alloc(): Is a particular byte allocated or not? We have given you a structure to use to implement these functions. It is a doubly-linked list of blocks in memory (both allocated and free blocks). Every malloc and free can create new blocks, or combine existing blocks. You may modify this structure, or even use a different one entirely. However, do not change function prototypes or files other than mymem.c. IMPORTANT NOTE: Regardless of how you implement memory management, make sure that there are no adjacent free blocks. Any such blocks should be merged into one large block. We have also given you a few functions to help you monitor what happens when you call your functions. Most important is the try_mymem() function. If you run your code with mem -try , it will call this function, which you can use to demonstrate the effects of your memory operations. These functions have no effect on test code, so use them to your advantage. Running your code: After running make, run 1) mem to see the available tests and strategies. 2) mem -test to test your code with our tests. 3) mem -try to run your code with your own tests (the try_mymem function). You can also use make test and make stage1-test for testing. make stage1-test only runs the tests relevant to stage 1. As in previous MPs, running mem -test -f0 will allow tests to run even after previous tests have failed. Similarly, using all for a test or strategy name runs all of the tests or strategies. Note that if all is selected as the strategy, the 4 tests are shown as one. One of the tests, stress, runs an assortment of randomized tests on each strategy. The results of the tests are placed in tests.out . You may want to view this file to see the relative performance of each strategy. Stage 1 - Implement all the above functions, for the first-fit strategy. Use mem -test all first to test your implementation. Stage 2 - A) Implement the other three strategies: worst-fit, best-fit, and next-fit. The strategy is passed to initmem(), and stored in the global variable myStrategy. Some of your functions will need to check this variable to implement the correct strategy. You can test your code with mem -test all worst, etc., or test all 4 together with mem -test all all. The latter command does not test the strategies separately; your code passes the test only if all four strategies pass. Questions ========= 1) Why is it so important that adjacent free blocks not be left as such? What would happen if they were permitted? 2) Which function(s) need to be concerned about adjacent free blocks? 3) Name one advantage of each strategy. 4) Run the stress test on all strategies, and look at the results (tests.out). What is the significance of Average largest free block? Which strategy generally has the best performance in this metric? Why do you think this is? 5) In the stress test results (see Question 4), what is the significance of Average number of small blocks? Which strategy generally has the best performance in this metric? Why do you think this is? 6) Eventually, the many mallocs and frees produces many small blocks scattered across the memory pool. There may be enough space to allocate a new block, but not in one place. It is possible to compact the memory, so all the free blocks are moved to one large free block. How would you implement this in the system you have built? 7) If you did implement memory compaction, what changes would you need to make in how such a system is invoked (i.e. from a users perspective)? 8) How would you use the system you have built to implement realloc? (Brief explanation; no code) 9) Which function(s) need to know which strategy is being used? Briefly explain why this/these and not others. 10) Give one advantage of implementing memory management using a linked list over a bit array, where every bit tells whether its corresponding byte is