Topics ● Getting Started and Basics ● Threadpool Design ● Logistics ○ Grading ○ Test Driver ○ Scoreboard ● Debugging ● Advice Getting Started and Basics 1. One member will fork the base repository: https://git.cs.vt.edu/cs3214-staff/threadlab 2. Invite partner to collaborate – Go to Settings > Members to add them – Check partner role permissions too 3. Both members will clone the forked repository on their machines: 4. IMPORTANT: Set forked repository to private – Go to Settings > General > Visibility, project features, permissions – Potential Honor Code Violation if not set to private First Step! $ git clone .git *Your forked repository will have a navigation menu on the left side. Click under Settings to add members and set repo to private threadlab Some Basics ● Thread ○ A single sequential flow within a program ○ A process can have multiple threads ● Threadpool ○ Container of idle threads that wait to do work for an external program ○ Useful for adding concurrency to programs without having to add locking semantics ○ Acts as an API for external programs or clients to use Basic Illustration Source: https://www.classes.cs.uchicago.edu/archive/2016/spring/12300-1/pa3.html Where you come in… ● You will create your own Threadpool API that external programs will call ● What do we write ○ threadpool.c- ○ Forward provided functions and structs from -threadpool.h- and implement them ○ Any other helper functions you create should be declared static Functions you will implement struct thread_pool * thread_pool_new(int nthreads); void thread_pool_shutdown_and_destroy(struct thread_pool *); struct future * thread_pool_submit(struct thread_pool *pool, fork_join_task_t task, void * data) void * future_get(struct future *); void future_free(struct future *); ● Read over threadpool.h for full documentation: you must implement these functions! ● Not included are static function(s) you’ll add to threadpool.c ● What is a future thread_pool_new ● Create thread pool ● Initialize worker threads ● Call pthread_create: starts a new thread in the calling process. The new thread starts execution by invoking start_routine(); arg is passed as the argument of start_routine() Futures ● When referred in this project, a future is an instance of a task that you must execute ● From the client’s perspective, a future is a promise that is returned once they submit a task for work struct future: struct future { fork_join_task_t task; // typedef of a function pointer type that you will execute void* args; // the data from thread_pool_submit void* result; // will store task result once it completes execution … // may also need synchronization primitives (mutexes, semaphores, etc) }; Futures (cont.) ● You will invoke “task” as a method, it represents the method passed through by thread_pool_submit, the return value gets stored into the result Future Illustration Task ———– ———– Task ———– ———– Task ———– ———- KEY: 1. Client submits task to threadpool’s global queue 2. Client immediately receives a future 3. A worker thread snags the submitted task to their local deque for work 4. Client demands to get the completed task affiliated with future 5. The completed task is returned, client frees the future 1. 2. 3. 4. 5. struct thread_pool ● Should contain any state you need for a threadpool ● Ideas: ○ Locks (pthread_mutex_t) ○ Queues/Deques (use provided list struct from previous project) ○ Semaphores / Conditional Variables (your choice) ○ Etc. ● Need a way to protect global queue and a shutdown flag (like ex3) Threadpool Design Threadpool Design ● Methodologies ○ Split up tasks among n workers ○ Internal and External Submissions ○ Work Stealing / Work Sharing ○ Work Helping ● No global variables! (exception of thread-local variables) Internal vs External Task Submissions ● External Submission ○ The client submits a new task for the threadpool ○ Task gets added to the global queue ● Internal Submission ○ When the thread submits a subtask ○ This “subtask” will get added to worker’s local deque – Worker executes it later – Or a co-worker steals the task to execute itself ● For thread_pool_submit you’ll need to distinguish these cases ○ But how Mergesort sort(A[0..64]) sort(A[0..32]) sort(A[0..16]) sort(A[16..32]) sort(A[32..48]) sort(A[48..64]) sort(A[32..64]) Example mergesort_parallel(int *array, int N) { int * tmp = malloc(sizeof(int) * (N)); struct msort_task root = { .left = 0, .right = N-1, .array = array, .tmp = tmp }; struct thread_pool * threadpool = thread_pool_new(nthreads); //EXTERNAL submission from client struct future * top = thread_pool_submit(threadpool, //internal function (fork_join_task_t) mergesort_internal_parallel, &root); //demands answer once it’s ready future_get(top); future_free(top); thread_pool_shutdown_and_destroy(threadpool); free (tmp); } * Check out mergesort.c to see full functions Example (part 2) static void mergesort_internal_parallel(struct thread_pool * threadpool, struct msort_task * s) { //If array small, no more submitting just internal sort (BASE CASE) if (right – left <= min_task_size) { mergesort_internal(array, tmp+left, left, right); } ... not all code shown .... //INTERNAL Submission from the worker thread struct future * lhalf = thread_pool_submit(threadpool, (fork_join_task_t) mergesort_internal_parallel, &mleft); //Worker thread works on other half mergesort_internal_parallel(threadpool, &mright); future_get(lhalf); future_free(lhalf); merge(array, tmp, left, left, m, right); } } Work Stealing ● Global list of tasks ● Local lists of tasks for each worker ● Worker main loop: ○ Do I have tasks Pop from front (LIFO) ○ Are there global tasks Pop from back (FIFO) ○ Does anyone else have tasks Pop from back (FIFO) ● Idle threads spread out the work evenly ● Each queue/deque has its own synchronization Work Sharing ● Single, central queue from which all threads remove tasks ● Drawback: queue can become a point of contention especially with handling small tasks Design Advice ● You are asked to implement the work stealing thread pool design ○ Better load balancing ○ Lower synchronization requirements ● However, you can implement work sharing for only 80% credit (not recommended) Work Helping ● In future_get, you can only return the result once the future is done executing ● The task might not be completed when future_get is called (or even running) ● Consider cases for getting the result from future_get: ○ If future already executed -> Hurray! ○ But what happens if the future isn’t ready ■ What should the thread do while waiting Task ———– ———– Task ———– ———– Work Helping (cont.) ● Threads should make best use of their time ○ Minimize threads sleeping ○ Maximize threads executing tasks ● If no threads are executing a task you depend on, do it yourself ● If a thread is executing the task you depend on May be beneficial to execute other tasks instead of waiting Thread Local Variables ● Want to be able to access your workers deque (and probably locks) during thread_pool_submit() ● How can we distinguish external/internal submissions Thread Local Variables ● Naive approach would be to loop through workers and check pthread_self(),… ● Instead, use some variable which would be different for each thread ○ AKA thread-local variables/storage Logistics Grading ● Submit code that compiles ● Test using the driver before submitting ● When grading, tests will be run 3-5 times, if you crash a single time it’s considered failing ● Benchmarked times will be the average of the 3-5 runs, assuming you pass all of them Grading (cont.) ● Breakdown ○ Git Usage ○ Functionality Tests ○ Performance ● GTAs will determine the exact breakdown of points ● Performance will breakdown into rough categories based on real time scores ● You must pass the basic tests before getting anything for performance, there’s a flag in the driver that indicates if you meet minimum requirements Performance ● Relative to peers and sample implementations ● Points only for the tests on the scoreboard ○ N Queens, Mergesort, Quicksort (8, 16, and 32 threads), possibly Fibonacci ● A rough cutoff for real time benchmarks will be posted later on by Dr. Back (last year’s scoreboard) Performance ONLY for reference – numbers will likely change Test Driver ● Can take a long time to run all tests ● Reports if you passed each test, and times for the benchmarked ones $ ~cs3214/bin/fjdriver.py [options] Performance Visual Studio Code Terminal Issues ● Use a seperate terminal (like git bash) to run the tests ● VS Code spins up some processes on rlogin to manage file synchronization or something, they interfere with the somewhat strict thread limits we enforce on the tests to guarantee your thread pool isn’t creating additional workers to juice performance numbers Test Driver ● Make sure to run tests multiple times, race conditions can cause you to crash only 20% of the time ● Will run multiple times to ensure consistency when grading (and get a good average for times) ● All of the tests are C programs, compiled against your threadpool Test Driver ● Runs the tests 5 times and averages the results ● Helpful to simulate grading environment $ ~cs3214/bin/fjdriver.py -g -B 5 Scoreboard ● https://courses.cs.vt.edu/~cs3214/fall2021/#!/p2scoreboard ● You can post your results to the scoreboard by using the fjpostresults.py script Debugging Tools Debugging ● Debugging multi-threaded programs can be difficult ○ Don’t just use printf() ● This project will challenge you in your debugging skills (GDB, Helgrind..) ● Helgrind** ○ Valgrind tool ○ Enable using –tool=helgrind in Valgrind command line ○ Your best friend for tracing deadlocks and synchronization errors ○ https://www.valgrind.org/docs/manual/hg-manual.html/ Helpful GDB Commands General Advice ● Start Early (…now) ● How many lines of code ○ ~250-300 lines (may vary, not a good benchmark for difficulty) ● Most time is spent debugging ○ GDB, Helgrind, and Valgrind are your friends ○ Debugging multi-threaded programs is difficult and time consuming ● Try different strategies ○ Most of the learning is trying out different approaches – telling you exactly what would give the best results would reduce the educational experience Any Questions Good Luck!