There is a library called threading in Python and it uses threads (rather than just processes) to implement parallelism. This may be surprising news if you know about the Python’s Global Interpreter Lock, or GIL, but it actually works well for certain instances without violating the GIL. And this is all done without any overhead — simply define functions that make I/O requests and the system will handle the rest.
Global Interpreter Lock
The Global Interpreter Lock reduces the usefulness of threads in Python (more precisely CPython) by allowing only one native thread to execute at a time. This made implementing Python easier to implement in the (usually thread-unsafe) C libraries and can increase the execution speed of single-threaded programs. However, it remains controvertial because it prevents true lightweight parallelism. You can achieve parallelism, but it requires using multi-processing, which is implemented by the eponymous library multiprocessing. Instead of spinning up threads, this library uses processes, which bypasses the GIL.
It may appear that the GIL would kill Python multithreading but not quite. In general, there are two main use cases for multithreading:
- To take advantage of multiple cores on a single machine
- To take advantage of I/O latency to process other threads
In general, we cannot benefit from (1) with threading but we can benefit from (2).
threading library is fairly low-level but it turns out that
multiprocessing wraps this in
multiprocessing.pool.ThreadPool, which conveniently takes on the same interface as
One benefit of using
threading is that it avoids pickling. Multi-processing relies on pickling objects in memory to send to other processes. For example, if the
timed decorator did not
wrapper function it returned, then CPython would be able to pickle our functions
selenium_func and hence these could not be multi-processed. In contrast, the
threading library, even through
multiprocessing.pool.ThreadPool works just fine. Multiprocessing also requires more ram and startup overhead.
We analyze the highly I/O dependent task of making 100 URL requests for random wikipedia pages. We compare:
We run each of these requests in three ways and measure the time required for each fetch:
- In serial
- In parallel in a
threadingpool with 10 threads
- In parallel in a
multiprocessingpool with 10 threads
Each request is timed and we compare the results.
Firstly, the per-thread running time for
requests is obviously lower than for
selenium, as the latter requires spinning up a new process to run a
PhantomJS headless browser. It’s also interesting to notice that the individual threads (particularly selenium threads) run faster in serial than in parallel, which is the typical bandwidth vs. latency tradeoff.
In particular, selenium threads are more than twice as slow, problably because of resource contention with 10 selenium processes spinning up at once.
Likewise, all threads run roughly 4 times faster for
selenium requests and roughly 8 times faster for
requests requests when multithreaded compared with serial.
There was no significant performance difference between using
multiprocessing. The performance between multithreading and multiprocessing are extremely similar and the exact performance details are likely to depend on your specific application. Threading through
multiprocessing.pool.ThreadPool is really easy, or at least as easy as using the
multiprocessing.pool.Pool interface — simply define your I/O workloads as a function and use the
ThreadPool to run them in parallel.
Improvements welcome! Please submit a pull request to our github.