Implement a dynamic thread pool, which provides good (superior compared to thread-per-connection) performance for short queries and many threads and does not block on on long queries.
High-level overview =================== MariaDB 5.5 will implement a dynamic threadpool. There was already a threadpool starting from 5.1, the new one is a complete reimplementation of it, the most distinguishing feature is the this new pool will adapt itself to the load . The size of the pool will grow when there is a load and shrink if there is no work. For the short-queries and multiple concurrent clients, number of threads that are CPU-active will usually be smaller than in one-thread-per-connection method, but we aim still to load CPUs 100%. The benefits of smaller number of concurrently running threads are reduced lock contention inside database server and reduced context switching. High-level implementation ========================= Ideally, all operating systems, we support would provide thread pooling facilities, including asynchronous IO integration. We could rely on the OS kernel to size the pool – it knows best how many threads should run at any given point, when thread is running and when it is sleeping and it whether it makes sense to create or activate another thread from pool. Such facility makes implementation a lot smaller and more robust. In practice however, it is only Windows and recent version of OSX that provide an integrated threadpool . The first version in MariaDB will have two different threadpool implementations – one based on Windows threadpooling API , and one generic for Unix systems that provide scalable IO multiplexing primitives (that is, Linux with epoll, BSD/OSX with kevent, and Solaris with event ports). It would be great to have an implementation based on OSX’s native libdispatch, but we’ll have to delay this to some later point , until we get a solid Mac system to test it So why there is an own implementation for Windows: -Most importantly, it aligns with the goal to provide the best implementation for each platform. -It would be rather awkward to port Windows asynchronous IO to poll-like egde- triggered Unix polls(), also it would be very awkward to impossible to port shared memory connections based to those APIs (named pipes could be a bit simpler though). Shared memory is easy with Windows threadpool, since it provides a way to asynchronously wait for anything, not just for IO completions, but for example for event handles to be signaled. What needs to be implemented? ============================= We implement a thread scheduler. There is already an existing framework for it, there are couple of callback functions to implement - scheduler initialization and destruction, add_connection() callback when new connection is about to login, wait_begin() and wait_end() callbacks introduced in MySQL 5.5 that are called whenever thread has to wait and couple of other optional callbacks that we won’t implement at all. Functionality common for threadpools implementation: Will be implemented in sql/threadpool_common.cc. We need just 3 functions here - threadpool_add_connection(THD*) –connection login (authentication handshake, initialization of connection structures) - threadpool_remove_connection(THD*) – called on any error or whenever client sends logs out (sends QUIT). Closes connection. - threadpool_process_request(THD*) – process single query Each of this functions additionally takes care of setting/resetting thread local storage variables for the processing thread (thd->mysys_var, PSI thread, DBUG structures) Functionality specific to each threadpool implementation -“posting work” to the threadpool. Currently used to offload connection login to the worker threads (s. below) -Start asynchronous read or asynchronous wait for socket read readiness. This is used whenever we wait for client to send queries. The life of a connection, high-level perspective ================================================ -Client logs in. “acceptor” thread accepts the connection and calls add_connection scheduler callback. add_connection() posts connection to the thread pool. A worker thread calls threadpool_add_connection() . The purpose of offloading authentication and thread initialization to a worker is to avoid blocking the single acceptor thread – this if client is not responsive, it will block the acceptor thread. After login is finished, threadpool starts an asynchronous read on the connections’ socket . -Client issues a query, or client dies, or connectivity is lost. Async IO comes back (or read readiness is signaled) and one of the worker threads handles the query . Once query is finished and result is sent back to the client, then again threadpool issues an asynchronous read. Unless client shutdown or an communication error is detected, in which case threadpool_remove_connection() is called to close connection and free resources associated with the client. How KILL CONNECTION is handled ============================== Killing connections reliably while client is idle, was already tricky without threadpool (http://bugs.mysql.com/bug.php?id=37780), and it is even trickier with threadpool. The old method used prior to MySQL 5.5 was either to close the socket on Windows (makes recv() come back with an error) or to issue pthread_kill() on Unix to interrupt the recv()). Neither works well in a threadpool environment. pthread_kill() on Unix will not work because there is no thread that is stuck in recv(). There is possibly a thread that does one of the poll() variations and polls for multiple clients at the same time, polling thread is interrupted, it is not immediately clear which client should be killed. On the other hand, closing socket does not work either. Neither epoll_wait() nor kevent() nor port_get() will get a close() notification, so if only close(socket) is done, client will remain listed in processlist and will still consume the resources. Besides, closing the connection makes race conditions (mentioned by Davi here http://lists.mysql.com/commits/63632) . Note however that since 5.5, MySQL is using the “closesocket” method on all platforms, despite race condition. A better solution for the Unix is not to close() but shutdown(2), it does exactly what one would need, the socket read (or write) is interrupted without closing the socket. shutdown(SHUT_RD) happens to break the recv() in progress without closing the socket. Better yet, it also works with different poll() variations – socket comes back with EOF flag, and subsequent recv() will also return 0 indicating EOF. Based on the said above, we will implement a new function vio_shutdown(), using shutdown(2) on Unixes (see below notes about Windows) We will use vio_shutdown() for KILL to interrupt IO in progress. It will have the effect of naturally emulating client error, the connection will behave as if async IO came back and the client died. The big plus in this solution is that no additional inter-thread synchronization is required. A nice side-effect is fixing the existing race condition in KILL on Windows, even if threadpool is not used. vio_shutdown on Windows. Unfortunately, shutdown() does not interrupt IO on Windows, but there are other methods to achieve it – CancelIoEx() on Vista+ , or a more involved method on XP possible by queueing APC to the reading thread and issue CancelIo () inside APC. How client inactivity timeout (wait_timeout) is handled ======================================================= Neither Windows threadpool not any of Unix poll variation automatically support client inactivity timeout, so we need to implement this ourselves. On Windows, we’ll use a lightweight timer queue per client (part of Windows threadpool API). If timer is signaled and there was no activity on the connection for too long, connection is killed. On Unix we will recalculate the next wait deadline for current connection whenever query finishes and store it in the connection structure. We also maintain a global variable containing for minimum of all next deadline values. Once current time exceeds this minimum, periodically running timer will scan through all THDs looking for connections that were not active for too long and killing them. next_min_deadline is also recomputed inside the same timer routine. Threadpool implementation on Unix. ================================= In the first version we will be using a similar design that is used by Oracle Enterprise MySQL threadpool. closed-source implementation was described in Mikael Ronström’s blog series http://mikaelronstrom.blogspot.com/2011_10_01_archive.html in great details. Thread pool is partitioned into groups. Each group in itself is a complete small pool with a listener thread (the one waiting for network events) , work queue and worker threads. A group has the responsibility of keeping one thread running, if there is a work to be done. More than one thread in a group can be running, depending on circumstances (more about this later). Clients are assigned to the groups in a round-robin fashion. This will keep (statistically) about the same ratio of clients per group. Listener and worker roles are dynamically assigned. Listener can become worker, after waiting for network events; it can pick an event and handle thus it becoming a worker. Vice versa, once worker is finished processing a query, it can become listener. The benefit of dynamic listener assignment, compared to traditional Leader/Follower model with statically assigned is that it spares thread wakeup, context switch and allows thread to use its full time quantum. When worker finishes processing a query, it first checks whether the queue contains new requests. If queue is not empty, it picks a request from queue and handles is. If queue is empty, it checks whether group has a listener, and if not, it registers itself as listener and waits for network events. Otherwise (queue is empty, and there is already a listener), worker tries to pick single request and handle it from a network, issuing a non-blocking epoll_wait/kevent/port_get with timeout value of 0. This is yet another optimization designed to keep running thread running (prevent switching thread state from running to sleeping, and a subsequent wakeup) . However, worker will not pick a request from queue or from network, if group is already “overcommitted” – i.e there are too many (4+) threads active. In this case, worker will either become a listener if there is no listener currently, or it will add itself to the sleep queue and sleep. Finally, if there is definitely no work to be done, the thread registers itself in the LIFO list of waiting threads, and sleeps on the per-worker-thread condition variable, waiting for a wakeup. If no wakeup comes within timeout (default is 1 minute), meaning there is no work to be done, then worker thread exists. Sleeping threads are woken or new workers are created in one of the following cases - Listener can wake up a sleeping thread, when it populates work queue and decides not to pick event itself - A worker thread is executing a request, and needs to wait. Scheduler’s wait_begin() callback decrements number of currently active threads in the group and if it goes down to zero, it either wakes sleeping thread, or creates a new one. - Threads can be woken( or created) inside a periodic timer routine, that checks whether currently active thread spent too much time executing work, while the group was either left without listener or there are request in the queue that are staying there for too long time. New thread creation is throttled; i.e a new thread is not always created at exactly the same time where we decide there is a shortage. Only if number of threads in the group is currently small, new ones are created without any delay. Once the group size grows, we will create threads with a short delay, to preventing having too many threads in the pool. The more threads are currently in the group, the longer will be the throttling period. Threadpool implementation on Windows ==================================== Windows implementation is simpler than the Unix one, since it is OS that takes over the task of creating and waking threads, queueing work and managing threadpool size is . There are couple of tweaks however that are worth noting. Every MySQL thread that uses mysys functions needs some thread local storage structures setup upon thread creation . my_thread_init() is called whenever new thread is started and my_thread_exit() does the cleanup prior to thread exit. Windows threadpool creates threads transparently to the user, thus we will used fiber local storage variable (basically TLS with destructor) to ensure that every used thread calls my_thread_init() once. FLS destructor will call my_thread_exit() ensuring proper cleanup once Windows threadpool decides to shutdown a thread. It is worth noting that there will be no threadpool on XP/2003 since the new threadpool API does not exist on on XP. However we still need to ensure that server starts on XP even without threadpool, and this means we cannot use threadpool API directly, otherwise executable would have unresolved symbols and refuse to start. Therefore, all required functions are loaded at runtime using GetProcAddress(GetModuleHandle(“kernel32”), function_name). Similar technique is already used in several other places (e.g to use Vista condition variables and reader-writer locks) Using threadpool ================ Setting the scheduler. thread_handling=pool-of-threads We provide several variables that tweak threadpool behavior (However generally, there should be no need to tweak them, perhaps apart from thread_pool_size on Unix) - thread_pool_size (Unix only) – number of thread groups. Minimum value is 1, maximum value is 128, and the default is number of CPUs on the machine as OS reports it (can be number of cores and hyperthreads also count). thread_pool_size is roughly equivalent to the maximum number of threads that should be actively running at the same time. However there are different factors that can make actual number of threads running at the same time a little higher or lower than the value. - thread_pool_stall_limit (Unix only) – The interval for the periodic timer that checks for thread stalls (and possibly wakes or creates new threads if stall is detected). Default value is 500. - thread_pool_idle_timeout(Unix only) – Maximum time, in seconds, after which sleeping worker thread will retire, i.e exit. - thread_pool_min_threads(Windows only) – minimum number of threads in the pool. Default is 0. The actual work is done by SetThreadpoolThreadMinimum() http://msdn.microsoft.com/en-us/library/windows/desktop/ms686268(v=vs.85).aspx - thread_pool_max_threads(Windows and Unix) Maximum number of threads in the threadpool. New threads are not created if this limit is reached . On Windows, the actual work is done by SetThreadpoolThreadMaximum() http://msdn.microsoft.com/en- us/library/windows/desktop/ms686266(v=vs.85).aspx Status variables ================ There are 2 system variables introduced by the thread pool Threadpool_threads – total number of threads in threadpool Threadpool_idle_threads – number of threads in sleep queues. On Windows it is always 0. Running benchmarks ================== When running sysbench and maybe other benchmarks, that create many threads on the same machine as the server, it is advisable to run benchmark driver and server on different CPUs to get the realistic results. Running lots of driver threads and only several server threads on the same CPUs will have the effect that OS scheduler will schedule benchmark driver threads to run with much higher probability than the server threads, that is driver will preempt the server. Use “taskset –pc” on Linuxes, and “set /affinity” on Windows to separate benchmark driver and server CPUs Comparison with MySQL enterprise threadpool =========================================== Here are the downsides of Oracle’s implementation. -MySQL/Oracle is using efficient epoll() on Linux and inefficient poll() everywhere else. -Client login. Login just blocks the acceptor thread, and network IO is done synchronously, which means slow client will prevent logons of other clients. -wait_timeout does not work with MySQL Enterprise threadpool. -KILL seems to have a high overhead and require careful locking due to inefficient implementation. Here are features Oracle has and we don’t - Restricting number of concurrent simultaneous transactions (http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number- of_21.html ). The implementation lowers the priority of a connection once transactions starts (after “BEGIN” statement). I’m not sure what was the exact problem solved by this method, and there is no scientific explanation. Perhaps it was kernel_mutex problems solved in subsequent versions of MySQL. So far it I have to assume this was merely a trick to get better sysbench graph. - There is an information_schema table for threadpool with extensive of information about the internals. Here are features that we have and Oracle does not - Restrict maximum number of pool threads Threadpool parameter differences ================================ Not available in MariaDB, but in MySQL thread_pool_prio_kickup_timer – related to restriction of concurrent transaction count thread_pool_high_priority_connection – related to restriction of concurrent transaction count thread_pool_max_unused_threads –specific to MySQL Enterprise Threadpool thread retirement logic. In MariaDB, use thread_pool_idle_timeout thread_pool_algorithm – magic pixie dust that was likely can be used to improve the published sysbench results. No explanation is provided on what it actually does. Not available in MySQL Enterprise Threadpool, but in MariaDB threadpool_idle_timeout – specific to MariaDB thread retirement logic threadpool_max_threads – maximum number of threadpool worker threads Works differently in MySQL and MariaDB threadpool_stall_limit – in MySQL Enterprise threadpool, the unit of measurement is 10 ms, such that value of 6 means 60 ms. In MariaDB, the unit of measurement is 1ms (for no other reason than using common units of measurements)
High-Level Specification modified. --- /tmp/wklog.246.old.14189 2012-01-14 03:13:17.000000000 +0000 +++ /tmp/wklog.246.new.14189 2012-01-14 03:13:17.000000000 +0000 @@ -235,7 +235,7 @@ - thread_pool_size (Unix only) – number of thread groups. Minimum value is 1, maximum value is 128, and the default is number of CPUs on the machine as OS reports it (can be number of cores and hyperthreads also -count also count). thread_pool_size is roughly equivalent to the maximum number +count). thread_pool_size is roughly equivalent to the maximum number of threads that should be actively running at the same time. However there are different factors that can make actual number of threads running at the same time a little higher or lower than the value.
High-Level Specification modified. --- /tmp/wklog.246.old.14137 2012-01-14 03:11:14.000000000 +0000 +++ /tmp/wklog.246.new.14137 2012-01-14 03:11:14.000000000 +0000 @@ -232,10 +232,11 @@ We provide several variables that tweak threadpool behavior (However generally, there should be no need to tweak them, perhaps apart from thread_pool_size on Unix) -- thread_pool_size (Unix only) – number of thread groups. Value 0 (default) -has a special meaning – the is equivalent to setting variable to the number of -CPUs. Maximum value for thread_pool_size is 128. thread_pool_size is roughly -equivalent to the maximum number of threads that should be actively running at +- thread_pool_size (Unix only) – number of thread groups. +Minimum value is 1, maximum value is 128, and the default is number of CPUs on +the machine as OS reports it (can be number of cores and hyperthreads also +count also count). thread_pool_size is roughly equivalent to the maximum number +of threads that should be actively running at the same time. However there are different factors that can make actual number of threads running at the same time a little higher or lower than the value. - thread_pool_stall_limit (Unix only) – The interval for the periodic timer
Observers changed: Sergei
High-Level Specification modified. --- /tmp/wklog.246.old.7503 2011-12-09 22:33:30.000000000 +0000 +++ /tmp/wklog.246.new.7503 2011-12-09 22:33:30.000000000 +0000 @@ -102,14 +102,17 @@ conditions (mentioned by Davi here http://lists.mysql.com/commits/63632) . Note however that since 5.5, MySQL is using the “closesocket” method on all platforms, despite race condition. + A better solution for the Unix is not to close() but shutdown(2), it does exactly what one would need, the socket read (or write) is interrupted without closing the socket. shutdown(SHUT_RD) happens to break the recv() in progress without closing the socket. Better yet, it also works with different poll() variations – socket comes back with EOF flag, and subsequent recv() will also return 0 indicating EOF. + Based on the said above, we will implement a new function vio_shutdown(), using shutdown(2) on Unixes (see below notes about Windows) + We will use vio_shutdown() for KILL to interrupt IO in progress. It will have the effect of naturally emulating client error, the connection will behave as if async IO came back and the client died. The big plus in this solution is
Version updated. --- /tmp/wklog.246.old.7485 2011-12-09 22:32:15.000000000 +0000 +++ /tmp/wklog.246.new.7485 2011-12-09 22:32:15.000000000 +0000 @@ -1,2 +1,2 @@ -Maria-2.0 +Server-5.5
Supervisor updated: -> Monty
High-Level Specification modified. --- /tmp/wklog.246.old.7476 2011-12-09 22:31:39.000000000 +0000 +++ /tmp/wklog.246.new.7476 2011-12-09 22:31:39.000000000 +0000 @@ -1,2 +1,323 @@ +High-level overview +=================== +MariaDB 5.5 will implement a dynamic threadpool. There was already a threadpool +starting from 5.1, the new one is a complete reimplementation of it, the most +distinguishing feature is the this new pool will adapt itself to the load . The +size of the pool will grow when there is a load and shrink if there is no +work. +For the short-queries and multiple concurrent clients, number of threads that +are CPU-active will usually be smaller than in one-thread-per-connection +method, but we aim still to load CPUs 100%. The benefits of smaller number of +concurrently running threads are reduced lock contention inside database server +and reduced context switching. + +High-level implementation +========================= +Ideally, all operating systems, we support would provide thread pooling +facilities, including asynchronous IO integration. We could rely on the OS +kernel to size the pool – it knows best how many threads should run at any +given point, when thread is running and when it is sleeping and it whether it +makes sense to create or activate another thread from pool. Such facility makes +implementation a lot smaller and more robust. In practice however, it is only +Windows and recent version of OSX that provide an integrated threadpool . The +first version in MariaDB will have two different threadpool implementations – +one based on Windows threadpooling API , and one generic for Unix systems that +provide scalable IO multiplexing primitives (that is, Linux with epoll, +BSD/OSX with kevent, and Solaris with event ports). It would be great to have +an implementation based on OSX’s native libdispatch, but we’ll have to delay +this to some later point , until we get a solid Mac system to test it +So why there is an own implementation for Windows: +-Most importantly, it aligns with the goal to provide the best implementation +for each platform. +-It would be rather awkward to port Windows asynchronous IO to poll-like egde- +triggered Unix polls(), also it would be very awkward to impossible to port +shared memory connections based to those APIs (named pipes could be a bit +simpler though). Shared memory is easy with Windows threadpool, since it +provides a way to asynchronously wait for anything, not just for IO +completions, but for example for event handles to be signaled. + +What needs to be implemented? +============================= +We implement a thread scheduler. There is already an existing framework for +it, there are couple of callback functions to implement - scheduler +initialization and destruction, add_connection() callback when new connection +is about to login, wait_begin() and wait_end() callbacks introduced in MySQL +5.5 that are called whenever thread has to wait and couple of other optional +callbacks that we won’t implement at all. + +Functionality common for threadpools implementation: +Will be implemented in sql/threadpool_common.cc. We need just 3 functions here +- threadpool_add_connection(THD*) –connection login (authentication +handshake, initialization of connection structures) +- threadpool_remove_connection(THD*) – called on any error or whenever client +sends logs out (sends QUIT). Closes connection. +- threadpool_process_request(THD*) – process single query +Each of this functions additionally takes care of setting/resetting thread +local storage variables for the processing thread (thd->mysys_var, PSI +thread, DBUG structures) + + +Functionality specific to each threadpool implementation +-“posting work” to the threadpool. Currently used to offload connection login +to the worker threads (s. below) +-Start asynchronous read or asynchronous wait for socket read readiness. This +is used whenever we wait for client to send queries. + +The life of a connection, high-level perspective +================================================ +-Client logs in. “acceptor” thread accepts the connection and calls +add_connection scheduler callback. + +add_connection() posts connection to the thread pool. A worker thread calls +threadpool_add_connection() . The purpose of offloading authentication and +thread initialization to a worker is to avoid blocking the single acceptor +thread – this if client is not responsive, it will block the acceptor thread. +After login is finished, threadpool starts an asynchronous read on the +connections’ socket . + +-Client issues a query, or client dies, or connectivity is lost. Async IO comes +back (or read readiness is signaled) and one of the worker threads handles +the query . Once query is finished and result is sent back to the client, then +again threadpool issues an asynchronous read. Unless client shutdown or an +communication error is detected, in which case threadpool_remove_connection() +is called to close connection and free resources associated with the client. + + +How KILL CONNECTION is handled +============================== +Killing connections reliably while client is idle, was already tricky without +threadpool (http://bugs.mysql.com/bug.php?id=37780), and it is even trickier +with threadpool. The old method used prior to MySQL 5.5 was either to close +the socket on Windows (makes recv() come back with an error) or to issue +pthread_kill() on Unix to interrupt the recv()). Neither works well in a +threadpool environment. pthread_kill() on Unix will not work because there is +no thread that is stuck in recv(). There is possibly a thread that does one of +the poll() variations and polls for multiple clients at the same time, polling +thread is interrupted, it is not immediately clear which client should be +killed. On the other hand, closing socket does not work either. Neither +epoll_wait() nor kevent() nor port_get() will get a close() notification, so +if only close(socket) is done, client will remain listed in processlist and +will still consume the resources. Besides, closing the connection makes race +conditions (mentioned by Davi here http://lists.mysql.com/commits/63632) . Note +however that since 5.5, MySQL is using the “closesocket” method on all +platforms, despite race condition. +A better solution for the Unix is not to close() but shutdown(2), it does +exactly what one would need, the socket read (or write) is interrupted without +closing the socket. shutdown(SHUT_RD) happens to break the recv() in progress +without closing the socket. Better yet, it also works with different poll() +variations – socket comes back with EOF flag, and subsequent recv() will also +return 0 indicating EOF. +Based on the said above, we will implement a new function vio_shutdown(), using +shutdown(2) on Unixes (see below notes about Windows) +We will use vio_shutdown() for KILL to interrupt IO in progress. It will have +the effect of naturally emulating client error, the connection will behave as +if async IO came back and the client died. The big plus in this solution is +that no additional inter-thread synchronization is required. A nice side-effect +is fixing the existing race condition in KILL on Windows, even if threadpool is +not used. + +vio_shutdown on Windows. +Unfortunately, shutdown() does not interrupt IO on Windows, but there are +other methods to achieve it – CancelIoEx() on Vista+ , or a more involved +method on XP possible by queueing APC to the reading thread and issue CancelIo +() inside APC. + + +How client inactivity timeout (wait_timeout) is handled +======================================================= +Neither Windows threadpool not any of Unix poll variation automatically support +client inactivity timeout, so we need to implement this ourselves. On Windows, +we’ll use a lightweight timer queue per client (part of Windows threadpool +API). If timer is signaled and there was no activity on the connection for too +long, connection is killed. +On Unix we will recalculate the next wait deadline for current connection +whenever query finishes and store it in the connection structure. We also +maintain a global variable containing for minimum of all next deadline values. +Once current time exceeds this minimum, periodically running timer will scan +through all THDs looking for connections that were not active for too long and +killing them. next_min_deadline is also recomputed inside the same timer +routine. + +Threadpool implementation on Unix. +================================= +In the first version we will be using a similar design that is used by Oracle +Enterprise MySQL threadpool. closed-source implementation was described in +Mikael Ronström’s blog series +http://mikaelronstrom.blogspot.com/2011_10_01_archive.html in great details. +Thread pool is partitioned into groups. Each group in itself is a complete +small pool with a listener thread (the one waiting for network events) , work +queue and worker threads. A group has the responsibility of keeping one thread +running, if there is a work to be done. More than one thread in a group can be +running, depending on circumstances (more about this later). +Clients are assigned to the groups in a round-robin fashion. This will keep +(statistically) about the same ratio of clients per group. +Listener and worker roles are dynamically assigned. Listener can become worker, +after waiting for network events; it can pick an event and handle thus it +becoming a worker. Vice versa, once worker is finished processing a query, it +can become listener. + +The benefit of dynamic listener assignment, compared to traditional +Leader/Follower model with statically assigned is that it spares thread wakeup, +context switch and allows thread to use its full time quantum. +When worker finishes processing a query, it first checks whether the queue +contains new requests. If queue is not empty, it picks a request from queue and +handles is. If queue is empty, it checks whether group has a listener, and if +not, it registers itself as listener and waits for network events. Otherwise +(queue is empty, and there is already a listener), worker tries to pick single +request and handle it from a network, issuing a non-blocking +epoll_wait/kevent/port_get with timeout value of 0. This is yet another +optimization designed to keep running thread running (prevent switching thread +state from running to sleeping, and a subsequent wakeup) . +However, worker will not pick a request from queue or from network, if group is +already “overcommitted” – i.e there are too many (4+) threads active. In this +case, worker will either become a listener if there is no listener currently, +or it will add itself to the sleep queue and sleep. +Finally, if there is definitely no work to be done, the thread registers itself +in the LIFO list of waiting threads, and sleeps on the per-worker-thread +condition variable, waiting for a wakeup. If no wakeup comes within timeout +(default is 1 minute), meaning there is no work to be done, then worker thread +exists. + +Sleeping threads are woken or new workers are created in one of the following +cases +- Listener can wake up a sleeping thread, when it populates work queue +and decides not to pick event itself +- A worker thread is executing a request, and needs to wait. +Scheduler’s wait_begin() callback decrements number of currently active +threads in the group and if it goes down to zero, it either wakes sleeping +thread, or creates a new one. +- Threads can be woken( or created) inside a periodic timer routine, +that checks whether currently active thread spent too much time executing +work, while the group was either left without listener or there are request +in the queue that are staying there for too long time. +New thread creation is throttled; i.e a new thread is not always created at +exactly the same time where we decide there is a shortage. Only if number of +threads in the group is currently small, new ones are created without any +delay. Once the group size grows, we will create threads with a short delay, to +preventing having too many threads in the pool. The more threads are currently +in the group, the longer will be the throttling period. + +Threadpool implementation on Windows +==================================== +Windows implementation is simpler than the Unix one, since it is OS that takes +over the task of creating and waking threads, queueing work and managing +threadpool size is . There are couple of tweaks however that are worth noting. +Every MySQL thread that uses mysys functions needs some thread local storage +structures setup upon thread creation . my_thread_init() is called whenever +new thread is started and my_thread_exit() does the cleanup prior to thread +exit. Windows threadpool creates threads transparently to the user, thus we +will used fiber local storage variable (basically TLS with destructor) to +ensure that every used thread calls my_thread_init() once. FLS destructor will +call my_thread_exit() ensuring proper cleanup once Windows threadpool decides +to shutdown a thread. +It is worth noting that there will be no threadpool on XP/2003 since the new +threadpool API does not exist on on XP. However we still need to ensure that +server starts on XP even without threadpool, and this means we cannot use +threadpool API directly, otherwise executable would have unresolved symbols and +refuse to start. Therefore, all required functions are loaded at runtime using +GetProcAddress(GetModuleHandle(“kernel32”), function_name). Similar technique +is already used in several other places (e.g to use Vista condition variables +and reader-writer locks) + +Using threadpool +================ +Setting the scheduler. +thread_handling=pool-of-threads + + +We provide several variables that tweak threadpool behavior (However generally, +there should be no need to tweak them, perhaps apart from thread_pool_size on +Unix) +- thread_pool_size (Unix only) – number of thread groups. Value 0 (default) +has a special meaning – the is equivalent to setting variable to the number of +CPUs. Maximum value for thread_pool_size is 128. thread_pool_size is roughly +equivalent to the maximum number of threads that should be actively running at +the same time. However there are different factors that can make actual number +of threads running at the same time a little higher or lower than the value. +- thread_pool_stall_limit (Unix only) – The interval for the periodic timer +that checks for thread stalls (and possibly wakes or creates new threads if +stall is detected). Default value is 500. +- thread_pool_idle_timeout(Unix only) – Maximum time, in seconds, after which +sleeping worker thread will retire, i.e exit. +- thread_pool_min_threads(Windows only) – minimum number of threads in the +pool. Default is 0. The actual work is done by SetThreadpoolThreadMinimum() +http://msdn.microsoft.com/en-us/library/windows/desktop/ms686268(v=vs.85).aspx +- thread_pool_max_threads(Windows and Unix) +Maximum number of threads in the threadpool. New threads are not created if +this limit is reached . On Windows, the actual work is done by +SetThreadpoolThreadMaximum() http://msdn.microsoft.com/en- +us/library/windows/desktop/ms686266(v=vs.85).aspx + +Status variables +================ +There are 2 system variables introduced by the thread pool +Threadpool_threads – total number of threads in threadpool +Threadpool_idle_threads – number of threads in sleep queues. On Windows it is +always 0. + +Running benchmarks +================== +When running sysbench and maybe other benchmarks, that create many threads on +the same machine as the server, it is advisable to run benchmark driver and +server on different CPUs to get the realistic results. +Running lots of driver threads and only several server threads on the same CPUs +will have the effect that OS scheduler will schedule benchmark driver threads +to run with much higher probability than the server threads, that is driver +will preempt the server. +Use “taskset –pc” on Linuxes, and “set /affinity” on Windows to separate +benchmark driver and server CPUs + +Comparison with MySQL enterprise threadpool +=========================================== +Here are the downsides of Oracle’s implementation. +-MySQL/Oracle is using efficient epoll() on Linux and inefficient poll() +everywhere else. +-Client login. Login just blocks the acceptor thread, and network IO is done +synchronously, which means slow client will prevent logons of other clients. + +-wait_timeout does not work with MySQL Enterprise threadpool. +-KILL seems to have a high overhead and require careful locking due to +inefficient implementation. + + +Here are features Oracle has and we don’t +- Restricting number of concurrent simultaneous transactions +(http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number- +of_21.html ). The implementation lowers the priority of a connection once +transactions starts (after “BEGIN” statement). I’m not sure what was the exact +problem solved by this method, and there is no scientific explanation. Perhaps +it was kernel_mutex problems solved in subsequent versions of MySQL. So far it +I have to assume this was merely a trick to get better sysbench graph. + +- There is an information_schema table for threadpool with extensive of +information about the internals. + +Here are features that we have and Oracle does not +- Restrict maximum number of pool threads + +Threadpool parameter differences +================================ +Not available in MariaDB, but in MySQL + +thread_pool_prio_kickup_timer – related to restriction of concurrent +transaction count +thread_pool_high_priority_connection – related to restriction of concurrent +transaction count +thread_pool_max_unused_threads –specific to MySQL Enterprise Threadpool thread +retirement logic. In MariaDB, use thread_pool_idle_timeout +thread_pool_algorithm – magic pixie dust that was likely can be used to improve +the published sysbench results. No explanation is provided on what it actually +does. + +Not available in MySQL Enterprise Threadpool, but in MariaDB + +threadpool_idle_timeout – specific to MariaDB thread retirement logic +threadpool_max_threads – maximum number of threadpool worker threads + +Works differently in MySQL and MariaDB + +threadpool_stall_limit – in MySQL Enterprise threadpool, the unit of +measurement is 10 ms, such that value of 6 means 60 ms. In MariaDB, the unit of +measurement is 1ms (for no other reason than using common units of measurements)