Java: Concurrency In Practice

  1. Introduction
    • history
      • computers used to execute a single program beginning to end before operating systems
      • Os evolved to allow more than one program to run at once (individual processes)
        • motivated by resource utilization, fairness, convenience
      • threads allow multiple streams of program control to coexist within a process sharing process-wide resources
    • benefits
      • exploiting multiple processors
      • simpler modeling / simpler tasks
      • simplified handling of asynchronous events
      • more responsive user interfaces
    • risks
      • safety hazards in the absence of proper synchronization
      • liveness hazards (getting stuck under specific situations)
      • performance hazards
    • threads are used in timers, servlets, remote invocations, GUI applications, etc
  2. Thread Safety
    • What is thread-safety?
      • At the core, thread safety is about managing access to shared, mutable state
      • thread-safe means it behaves correctly when accessed from multiple threads regardless of scheduling or interleaving
      • thread-safe classes encapsulate any needed synchronization
    • Atomicity
      • incorrect results due to unlucky timing are race conditions
      • the most common type is check-then-act
      • occurs when a sequence of operations need to be atomic or indivisible
      • either all or none is executed
      • where practical use existing thread-safe objects
    • Locking
      • individual thread-safe objects not enough if compound actions that rely on both are required
      • need to preserve invariants
      • need a single atomic operation
      • can use intrinsic locking
      • synchronized methods lock the object, but need to be judicial e.g. don’t lock the entire servlet
      • intrinsic locks are re-entrant, holder of lock can reenter the same lock
    • Guarding state with locks
      • enable serialized access
      • synchronized must be used everywhere that variable is accessed
      • need to use the same lock
      • all variables involved in an invariant must be guarded by the same lock
    • Liveness and performance
      • if locking is too coarse-grained, can result in poor concurrency
      • avoid holding locks during lengthy computations
  3. Sharing Objects
    • Visibility
      • no guarantee reading threads will see value written by another thread on a timely basis or at all
      • maybe due to processor-specific caches
      • always use proper synchronization whenever data is shared across threads
      • data may be stale
      • some 64-bit operations are non-atomic
      • intrinsic locking can be used to guarantee one thread sees the effects of another in a predictable manner
      • locking is not just about mutual exclusion, it is also about memory visibility
      • volatile variables also return the most recent write
        • use when writes do not depend on the current value
        • the variable does not participate in other invariants
        • locking is not required for any other reason while the variable is being accessed
      • locking guarantees visibility and atomicity, volatile just visibility
    • Publication and escape
      • making the object available to code outside of its current scope
      • don’t allow the mutable state to escape
      • be careful about returning inner class since they contain a hidden reference to the enclosing instance
      • do not allow the “this” reference to escape during construction
      • a common mistake is starting a thread from a constructor
    • Thread confinement
      • one way to avoid is to not share
      • don’t need to worry if only one thread has access at a time
      • programmers responsibility to ensure thread-confined objects do not escape from their intended thread
      • stack confinement when only reached through local variables
      • ThreadLocal is per-thread value, can replace global variables, but easy to abuse
    • Immutability
      • immutable objects are always thread-safe
      • good practice to make all fields private and final
    • Safe Publication
      • can’t rely on the integrity of a partially constructed object
      • must always synchronize reads of shared mutable objects
      • immutable objects can be safely shared without synchronization
      • idioms
        • initializing an object reference from a static initializer
        • storing a reference to it into a volatile field or atomic reference
        • storing a reference to it into a final field
        • storing a reference to it into a field that is properly guarded by a lock
      • if an object isn’t modified after publication, is effectively immutable
      • when you acquire a reference to an object, you should know what you’re allowed to do with it
  4. Composing Objects
    • Designing a thread-safe class
      • identify the variables that form the object’s state
      • identify the invariants that constrain the state variables
      • establish a policy for managing concurrent access to the object’s state
      • understand state space
      • you cannot ensure thread safety without understanding invariants and postconditions
    • Instance confinement
      • If not thread-safe, there are techniques to use safely
      • Can only access from a single thread
      • monitor pattern is where object encapsulates all mutable state and guards it with its own object’s intrinsic lock
      • benefits of private lock are clients can’t access it
    • Delegating thread safety
      • relying on a thread-safe class
      • delegation fails when they have invariants that rely on other state variables
    • Adding functionality to existing thread-safe classes
      • Use existing data structures and composition
    • Documenting synchronization policies
      • document synchronization policies for maintainers
      • the best time to document is at design time
      • at the very least document the thread-safe guarantees made by the class
  5. Building Blocks
    • Synchronized collections
      • encapsulate state and synchronize every method
      • need additional client-side locking for compound operations
      • iterators can fail during iteration due to concurrent modification exception
    • Concurrent collections
      • supports common compound actions
      • the concurrent hash map offers finer-grained locking
        • allows concurrent access
        • a limited number of writers can modify at same time
        • provide iterators that do not throw concurrent
        • return copy of iterators which are weakly consistent
      • concurrent collections should be expected to change their contents continuously
    • Blocking queues and the producer-consumer pattern
      • supports producer-consumer pattern
      • bounded queues are a powerful resource management tool
      • build resource management into your design early when it’s easier to account for
      • serial thread confinement, ensure only one thread receives the object being handed off
      • deques allow work-stealing
    • Blocking and interruptible methods
      • must have a plan for responding to interruption
      • propagate the InterruptedException or restore the interrupt
      • never catch it and do nothing which deprives the caller an opportunity to handle it
    • Synchronizers
      • synchronizers coordinate the control flow of threads based on its state
      • latches delay the progress of threads until it reaches its terminal state e.g. service does not start until other services it depends on has started
      • FutureTask also acts as a latch, the state is waiting, running, or complete
      • Semaphores are used to control the number that can be accessed i.e. virtual permits, resource pools, bounded hash set
      • barriers block a group of threads until an event has occurred
        • unlike latches, barriers are waiting for other threads
    • Building an efficient, scalable result cache
      • synchronizing entire Memoizer means everything is sequential
      • Using a hashMap is better, but can result in computation to trigger again if the first request is still in progress
      • storing a future better, but there is still a small window where two threads can compute
      • using putIfAbsent ensures only once
  6. Task Execution
    • Executing tasks in threads
      • identify sensible, ideally independent task boundaries
      • should represent a small fraction of your application’s processing capacity
      • in the single-threaded server, execution is sequential which can block connections and make the server look unavailable
      • explicitly creating a thread per request works if small load, but some disadvantages
        • thread lifecycle overhead
        • resource consumption (memory, CPU resource contention)
        • stability (can result in out of memory error)
    • The Executor framework
      • framework for asynchronous task execution
      • based on producer-consumer pattern
      • can configure various execution policies
      • thread pools are bound to worker queues
      • reusing threads are more efficient
      • can also set lifecycle policies
      • timer
        • should favor ScheduledThreadPoolExecutor
        • the timer creates only a single thread and can affect other timer tasks if it takes too long
        • exceptions make the timer look like it was canceled
    • Finding exploitable parallelism
      • useful for I/O bound tasks
      • use Callable and Future for result-bearing tasks
      • Future represents the lifecycle of a task
      • a little trickier to parallelize heterogenous tasks
        • disparate sizes, processing time and coordination overhead
      • really pays off for large independent, homogeneous tasks
      • completion service is useful if you want to access the results as they become available
      • place time limits on your tasks
    • executor framework lets you decouple task submission from execution
  7. Cancellation and Shutdown
    • Task cancellation
      • external code can move a task to completion prior to normal completion
      • some reasons: user requested, time-limited, application events, errors, shutdown
      • some tasks may never get canceled
      • some blocking methods try to check if a thread has been interrupted
      • interrupt only delivers the message and is usually the most sensible way to implement cancellation
      • threads should have an interruption policy
      • a thread should only be interrupted by its owner
      • can propagate exception or restore interruption status so code higher up can deal with it
      • only code that implements a thread’s interruption policy may swallow an interruption request
      • Futures abstract task life cycle
      • can customize cancellation with futures using newTaskFor hook
    • Stopping a thread-based service
      • threads need to be terminated
      • thread pool owns its worker threads
      • provide lifecycle methods whenever a thread-owning service has a lifetime longer than that of the method that created it
      • a poison pill is one way for producer-consumer pattern
    • Handling abnormal thread termination
      • thread API provide UncaughtExceptionHandler for detecting when a thread dies due to uncaught exception
      • in long-running applications, always use uncaught exception handlers for all threads that at least log the exception
    • JVM shutdown
      • when shutting down, JVM first starts all registered shutdown hooks
      • should be coded extremely defensively
      • one way to avoid problems is to execute shutdown actions sequentially
      • daemon threads used for some helper function and don’t want its presence to prevent shutting down
      • use sparingly best saved for housekeeping tasks
      • try to avoid finalizers, no guarantees on when or even if they run
  8. Applying Thread Pools
    • Implicit coupling between tasks and execution policies
      • some tasks require specific execution policies
      • dependent tasks
      • tasks that rely on thread confinement
      • response-time-sensitive tasks
      • tasks that use thread-local
      • tasks that depend on other tasks can deadlock / thread starvation deadlock
      • responsiveness problems if tasks block for extended periods of time / can use resourced timeouts
    • sizing thread pools
      • depends on type of task
      • need to consider nature of tasks
      • how much memory, IO, db connections, etc
      • for compute-intensive tasks, number of cpu + 1
      • for db, might be limited by connection pool size
    • Configuring ThreadPoolExecutor
      • queues can help prevent resource exhaustion
      • need to throttle to avoid running out of memory
      • use bounded queues
      • saturation policy for when a queue fills up
    • Extending ThreadPoolExecutor
      • can extend methods for e.g. metric tracking, logging, monitoring, etc
    • Parallelizing recursive algorithms
      • if each iteration is independent of others
  9. GUI Applications
    • Why are GUIs single-threaded?
      • modern GUI frameworks create a dedicated event dispatch thread for GUI events
      • persistent problems with race conditions and deadlocks
      • due to interaction between input event processing and oo modeling of GUI components
      • user actions bubble up from OS and app initiated actions bubble down
      • because only single thread, process events sequentially
    • Short-running GUI tasks
      • for short tasks, can stay in the event thread
      • confine presentation objects to event thread
    • Long-running GUI tasks
      • e.g. spell check, background compilation, fetching remote resource
      • get long-running task out of event thread
      • completion thread must submit another task to run in event thread to update UI
    • Shared data models
      • don’t access the data model or presentation components from main thread
      • consider thread safe models
      • another option is versioned
      • consider split-model (presentation vs data) when data must be shared and want to avoid blocking / adding complexity
    • Other forms of single-threaded subsystems
      • sometimes thread confinement is forced on the developer
  10. Avoiding Liveness Hazards
    • Deadlocks
      • results from cyclic dependencies
      • program will be fixed of lock-ordering deadlocks if all threads acquire the locks in a fixed global order
      • sometimes can be less obvious if objects can be passed in different orders
      • can induce ordering with hashcode or a tie-breaking lock
      • invoking an alien method with a lock held is asking for liveness troubles
    • Avoiding and diagnosing deadlocks
      • lock ordering must be part of design
      • specify timeouts
      • use thread dumps to look at locking
    • Other liveness hazards
      • starvation can occur when thread is denied resources
      • try to avoid using priorities, couples to platform
      • can introduce some randomness in retry mechanism
  11. Performance and Scalability
    • Thinking about performance
      • usually focus is on increasing throughput not necessarily latency
      • do more work with more resources
      • avoid premature optimization
      • make right then fast (if not fast enough)
      • measure and don’t guess
    • Amdahl’s law
      • all concurrent applications have some serialization
      • proportion limits speedup
    • Costs introduced by threads
      • context switching
      • memory synchronization
      • blocking and suspended threads
    • Reducing lock contention
      • how long and how often you hold lock
      • reduce duration
      • reduce frequency
      • replace exlcusive locks with coordination mechanisms
      • split into multiple locks e.g. ConcurrentHashMap uses 16 locks that covers different ranges
      • avoid hot fields e.g. a counter variable that everything must update
      • monitor CPU utilization
    • Reducing context switch overhead
      • e.g. move logging IO to another thread
      • move work to thread where cost isn’t perceived by user
  12. Testing Concurrent Programs
    • Intro
      • degree of non determinism and more failure modes
      • most tests are around safety and liveliness
      • performance tests consider throughput, responsiveness, scalability
    • Testing for correctness
      • starting with similar checks as unit tests helps problems not related to concurrency
      • may need to wait some time period for tests involving blocking
      • use interrupt
      • need to be careful that failure auditing code isn’t artificially limiting concurrency
      • can compute checksums of elements that are enqueued and dequed
      • checksums should not be guessable by the compiler
      • can use countdown latch to ensure threads start at the same time
      • can check undesirable memory retention using heap-inspection tools
      • can use callbacks to assert invariants
      • can use thread yield to try to cause more interleavings
    • Testing for performance
      • meaure time taken to run and how it scales
      • compare multiple algorithms
    • Avoiding performance testing pitfalls
      • GC can be unpredictable can try to ensure GC runs during tests
      • compilation can be unpredictable, let it be a small part of a test run or do a warm-up
      • if you don’t use some results e.g. checksum, the compiler may optimize it away can print an empty value
    • Complementary testing approaches
      • code reviews
      • static analysis tools e.g. FindBugs
      • aspect oriented testing, profiles, and monitoring tools
  13. Explicit Locks
    • Lock and ReentrantLock
      • offers unconditional, polled, timed, and interruptible lock acquisition
      • must be released in a finally block
      • more flexible but more dangerous than block structured blocking
    • Performance considerations
      • performance was much better for reentrant locks perior to Java 6, but now similar
    • Fairness
      • purely fair locks tend to come with significant performance costs
      • don’t pay for fairness if you don’t need it
      • they tend to work best when locks held for a relatively long time
    • Choosing between synchronized and ReentrantLock
      • only use explicit lock when you need its advanced features e.g. timed, polled, interruptible lock acquisition, fair queueing, or non-block-structured locking
    • Read-write locks
      • mutual exclusion is usually more strict than we need
      • determine if you need the improvement via profiling
  14. Building Customer Synchronizers
    • Managing state dependence
      • in multithreaded program, state can change based on actions of other threads
      • often need to wait until some precondition becomes true
      • operations that block until the operation can succeed are less error-prone than ones that imply fail
      • state dependent operations can throw exception, but then the caller needs to handle retry
      • another option is sleeping and polling, but choosing sleep length will be trade off between responsiveness and CPU usage
      • condition queues wait until a certain condition becomes true and uses notify/wait semantics
    • Using condition queues
      • key is identifying the condition predicate
      • can have multiple condition predicate per condition queue
      • best practice is to use notifyAll
      • it’s possible for thread to wake up too soon always wait in a while loop based on the condition check
      • make sure someone will perform a notification whenever the condition becomes true to not miss
      • state-dependent class should either full expose (and document) its waiting/notification protocols or prevent subclasses from participating in them
      • generally best to encapsulate the condition queue / e.g. having private lock object, but trade off is no more client side locking
    • Explicit condition objects
      • can use explicit Condition object
      • allows multiple conditions per lock
      • can make more readable
    • Anatomy of a synchronizer
      • a lot of concurrent data structures built using the AbstractQueuedSynchronizer
      • e.g. ReentrantLock, Semaphore, CountDownLatch, ReentrantReadWriteLock, SynchronousQueue, FutureTask
    • AbstractQueuedSynchronizer
      • manages single state of integer information that can be manipulated through getState, setState, compareAndSetState
      • the state can be used to represent a count, status, etc
      • acquisition can be exclusive or nonexclusive
      • simple latch (binary)
        • open/close state is represented by 0 or 1
    • AQS in java.util.concurrent synchronizer classes
      • reentrant lock, also tracks an owner and how many times lock was acquired
      • semaphore tracks available
      • future task holds task status
      • reentrantreadwritelock uses 16 bits for write lock count and other 16 bits for read lock count
    • often best strategy is using existing library, but if you need more control consider AQS
  15. Atomic Variables and Nonblocking Synchronization
    • Disadvantages of locking
      • thread suspension and resuming has a lot of overhead
      • volatile variables do not involve context switches of thread scheduling
      • when locks are held briefly, being put to sleep is a harsh penalty
    • Hardware support for concurrency
      • exclusive locking is pessimistic
      • optimistic approach is when you proceed hopefuly you can complete without interference
      • relies on collision detection
      • modern processors has some form of read-modify-write instruction i.e. compare and swap
      • caller can decide what to do if they fail
      • compare and swap significantly outperform lock-based counters
    • Atomic variable classes
      • atomic variables are finer-grained and lighter-weight than locks
      • limit scope of contention to a single variable
      • atomic classes provide a generalization of volatile variables to support atomic conditional read-modify-write operations
      • atomics tend to scale better than locks because atomics deal more effectively with typical contention levels
    • Nonblocking algorithms
      • an algorithm is non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread
      • an algorithm is lock-free if, at each step, some thread can make progress
      • algorithms that use CAS exclusively for coordination can be both nonblocking and lock-free
      • the key to create nonblocking algorithms is to figure out how to limit the scope of atomic changes to a single variable
        • e.g. concurrent stack, try to install new node, if top has changed, CAS fails otherwise succeeds
        • retry if update fails
        • in linked list, need to update tail and next of last node atomically
      • to avoid ABA problems, use versions
  16. The Java Memory Model
    • What is a memory model, and why would I want one?
      • addresses question “Under what conditions does a thread that reads a variable see the value”
      • compilers can generate instructions in different orders, store variables in registers instead of memory, etc
      • excessive intra-thread coordination would be very costly
      • in shared-memory multiprocessor architecture, each processor has its own cache that is periodically reconciled with main memory
      • most of the time processors relax their memory-coherence guarantees to improve performance
      • synchronization inhibits the compiler, runtime, and hardware from reordering memory operations in a way that would violate the visibility guarantees provided by the JMM
      • JMM defines a partial ordering
        • program order: each action in a thread happens before every action that comes later in that thread
        • monitor lock: an unlock on a monitor lock happens before every subsequent lock on the same monitor lock
        • volatile variable: a write to a volatile field happens before every subsequent read
        • thread start: a call to thread.start happens before every action in the started thread
        • thead termination: any action in a thread happens before any other thread detercts thread has terminated
        • interruption rule: a thread calling interrupt on another thread happens before the interrupted thread detects the interrupt
        • finalizer: the end of a constructor for an object happens before the start of the finalizer for that object
        • transitivity: if A happens before B, and B happens-before C, then A happens before C
    • Publication
      • publishing an object without proper synchronization can allow another thread to see a partially constructed object
      • using a shared variable guarded by a lock ensure that reads and writes of that variable are ordered by happens before
      • statically initialized objects require no explicit synchronization because JVM acquires lock during class initialization time, but only as-constructed state (if mutable, syncrhaonization is still required for subsequent modifications)
      • don’t use double check locking, use lazy initialization holder instead
    • Initialization Safety
      • initialization safety guarantees that for properly constructed objects, all threads will see the correct values of final fields that were set by the constructor regardless of how the object is published
      • for objects with final fields, initialization safety prohibits reordering any part of construction with the initial load of a reference to that object
      • initialization safety makes visibility guarantees only for values that are reachable through final fields as of the time the constructor finishes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s