JAVA-COMP2207

1 Distributed file system coursework Leonardo Aniello and Kirk Martinez Course: COMP2207 Document version: 1.0 Mar 26, 2021 1 Introduction In this coursework you will build a distributed storage system. This will involve knowledge of Java, networking and distributed systems. The system has one Controller and N Data Stores (Dstores). It supports multiple concurrent clients sending store, load, list, remove requests. Each file is replicated R times over different Dstores. Files are stored by the Dstores, the Controller orchestrates client requests and maintains an index with the allocation of files to Dstores, as well as the size of each stored file. The client actually gets the files directly from Dstores – which makes it very scalable. For simplicity all these processes will be on the same machine but the principles are similar to a system distributed over several servers. Files in the distributed storage are not organised in folders and sub-folders. Filenames do not contain spaces. The Controller is started first, with R as an argument. It waits for Dstores to join the datastore (see Rebalance operation). The Controller does not serve any client request until at least R Dstores have joined the system. As Dstores may fail and be restarted, and new Dstores can join the datastore at runtime, rebalance operations are required to make sure each file is replicated R times and files are distributed evenly over the Dstores. 2 2 Networking The modules will communicate with each other via TCP connections. Because they could be on the same machine for testing, the datastores will listen on different ports. 3 Code development Only use Java openjdk-14-jdk-headless, on Linux/Unix. Do not use Windows. The code must be testable and not depend on any IDE directory structure/config files. Command line parameters to start up the system: Controller: java Controller cport R timeout rebalance_period A Dstore: java Dstore port cport timeout file_folder A client: java Client cport timeout Where the Controller is given a port to listen on (cport), a replication factor (R), a timeout in milliseconds (timeout) and how long to wait (in milliseconds) to start the next rebalance operation (rebalance_period). The Dstore is started with the port to listen on (port) and the controller’s port to talk to (cport), timeout in milliseconds (timeout) and where to store the data locally (file_folder). Each Dstore should use a different path and port so they don’t clash. The client is started with the controller port to communicate with it (cport) and a timeout in milliseconds (timeout). 3 Store operation Client -> Controller: STORE filename Controller o updates index, “store in progress” o Selects R Dstores, their endpoints are port1, port2, …, portR o Controller -> Client: STORE_TO port1 port2 … portR For each Dstore i o Client->Dstore i: STORE filename filesize o Dstore i -> Client: ACK o Client->Dstore i: file_content o Once Dstore i finishes storing the file, Dstore i -> Controller: STORE_ACK filename Once Controller received all acks o updates index, “store complete” o Controller -> Client: STORE_COMPLETE Failure Handling Malformed message received by Controller o Log the error and continue Malformed message received by Client o Log the error and continue Malformed message received by Dstore o Log the error and continue If filename already exists in the index o Controller->Client: ERROR ALREADY_EXISTS filename Client cannot connect or send data to all R Dstores, or the Controller does not receive R acks within a timeout o Log the error and continue; future rebalances will sort things out 4 Load operation Client -> Controller: LOAD filename Controller selects one the R Dstores that stores that file, let port be its endpoint Controller->Client: LOAD_FROM port filesize Client -> Dstore: LOAD_DATA filename Dstore -> Client: file_content Failure Handling Malformed message received by Controller o Log the error and continue Malformed message received by Client o Log the error and continue Malformed message received by Dstore o Log the error and continue If file does not exist in the index o Controller -> Client: ERROR DOES_NOT_EXIST If Client cannot connect to or receive data from Dstore o Client -> Controller: RELOAD filename o Controller selects a different Dstore with endpoint port’ o Controller->Client: LOAD_FROM port’ filesize o If Client cannot connect to or receive data from any of the R Dstores Controller->Client: ERROR LOAD If Dstore does not have the requested file o Dstore -> Client: ERROR DOES_NOT_EXIST 5 Remove operation Client -> Controller: REMOVE filename Controller updates index, “remove in progress” For each Dstore i storing filename o Controller->Dstore i: REMOVE filename o Once Dstore i finishes removing the file, Dstore i -> Controller: REMOVE_ACK filename Once Controller received all acks o updates index, “remove complete” o Controller -> Client: REMOVE_COMPLETE Failure Handling Malformed message received by Controller o Log the error and continue Malformed message received by Client o Log the error and continue Malformed message received by Dstore o Log the error and continue If filename does not exist in the index o Controller->Client: ERROR DOES_NOT_EXIST Controller cannot connect to some Dstore o Controller logs error and the protocol continues If Dstore does not have the requested file o Dstore -> Controller: ERROR DOES_NOT_EXIST filename List operation Client->Controller: LIST Controller->Client: file_list Failure Handling Malformed message received by Controller o Log the error and continue Malformed message received by Client o Log the error and continue 6 Storage Rebalance operation This operation is started periodically by the Controller (e.g. based on the rebalance_period argument) and when a new Dstore joins the data store. In the latter case, this is the message (where port is the endpoint of the new Dstore) Dstore -> Controller: JOIN port For each Dstore i o Controller -> Dstore i: LIST o Dstore i -> Controller: file_list Controller revises file allocation to ensure (i) each file is replicated over R Dstores, and (ii) files are evenly stored among Dstores1 Controller produces for each Dstore i a pair (files_to_send, files_to_remove), where o files_to_send is a list of entries, each entry in the form (filename, dstores), where dstores is the list of endpoints of Dstores to send filename to o files_to_remove is the list of filenames to remove For each Dstore i o Controller->Dstore i: REBALANCE files_to_send files_to_remove Example Assume that file f1 needs to be sent to Dstores p1 and p2 file f2 needs to be sent to Dstore p3 file f2 needs to be removed file f3 needs to be removed REBALANCE 2 f1 2 p1 p2 f2 1 p3 2 f2 f3 o Dstore i will send required files to other Dstores, e.g., to send a file to Dstore j Dstore i -> Dstore j: STORE filename filesize Dstore j -> Dstore i: ACK Dstore i -> Dstore j: file_content o Dstore i will remove specified files o When rebalance is completed Dstore i -> Controller: REBALANCE COMPLETE Failure Handling Malformed message received by Controller o Log the error and continue Malformed message received by Dstore o Log the error and continue Controller does not receive REBALANCE COMPLETE from a Dstore within a timeout o Storage rebalance executed again from the beginning 7 4 Logging Each process (i.e., Controller, Dstore, Client) appends the network messages and log events mentioned above to a textual log file. Each time a process is started, a new log file is created. Java classes will be provided to manage logging. 5 Submission Requirements Your submission should include the following files: – Controller.java Controller.class – Dstore.java Dstore.class As well as all the additional .java files you developed (and corresponding .class files) These files should be contained in a single zip file called username.zip There should be no package structure to your java code When extracted from the zip file, the files should be located in the current directory. These will be executed at the Linux command line by us for automatic testing 6 Marking Scheme You are asked to implement a Controller and Dstores. You will be given the client, as an obfuscated jar. The client allows the execution of operations via a terminal. You will also be given some Java classes for the logging (to file). >= 40% o The datastore works in compliance with the protocol and correctly serves sequential requests from a single client >= 50% o In addition to the criteria above o Each file is replicated R times o Files are evenly spread over the Dstores (only when stored, not when Dstores fail or new Dstores join the datastore) >= 60% o In addition to the criteria above o The datastore correctly serves concurrent requests from more clients >= 70% o In addition to the criteria above o The datastore correctly tolerates the failure of one Dstore >= 80% o In addition to the criteria above o The datastore correctly tolerates the failure of up to N-R Dstores >= 90% o In addition to the criteria above o Files are evenly spread over the Dstores despite Dstores failing and new Dstores joining the datastore 8 7 Code development suggestions There are various things to develop step-by-step. This includes making TCP connections and passing data to/from, implementing timeouts for when the communication is broken, and so on. This is a good place to start. Draw an outline of your system to keep track of the functionality/code structure Use techniques you tested from the Java sockets worksheet. for the Controller you can start by making it accept connections Avoid multithreading until you are ready for it Write a simple log(message) function which prints the timestamp and message into a logfile. Parse incoming commands from the client one at a time, with error codes for failures. Send commands to a simplified Dstore (e.g. LIST) Work with just the Dstore to be able to save and read files Progressively add the features such as delete and allocating files to Dstores Test progressively so you know each area works and can return errors. Finally write the rebalance operations