parallel-computing.md
March 21, 2016 ยท View on GitHub
Parallel Computing / Concurrency
Common questions on parallel computing / concurrency:
- multi-process vs multi-thread
- Multi-threading Pros * easy / fast to shared data between threads (it's just shared memory with an unified address space) * easy to communicate with parent process * beneficial for large dataset. Don't have to pay the penalty of distributing/duplicating data across process boundaries. * supported by many 3rd party libraries (OpenMP)
- Multi-threading Cons * no isolation: one thread crash ==> whole process crash * multi-threaded apps are hard to debug. (race conditions, locks) * Too many threads ==> lots of context switching * No scalable path to multiple machines / public cloud
- Multi-process Pros * process crashes won't harm other processes (you can recover process crashes, think microservices) * much easier to debug an atomic process * Less locking (unless you try to share memory / resources between multiple process) * Scalable across machines
- Multi-process Cons * communication between processes are more complicated and are less efficient * smaller set of supporting libraries (and are generally not portable across different platform)
TL;DR
- Generally if you need to maintain synchronized state then prefer threads.
- Do you have large dataset? Then prefer threads.
- Do you want easier development / debugging + finer-grain control? prefer processes
- Do you want to be able to scale into multiple machine or the cloud? prefer processes