Deadlock Detector and Solver

Abstract

Deadlock is one of the most serious and complex problems concerning the reliability of concurrent Java Programs. This paper presents Deadlock Detector and Solver which detects and resolves circular deadlocks of a java program. An agent written in C++ runs parallel to Java Program and monitors the Java Virtual Machine for deadlocks. If the deadlock is detected, the solver agent is used to resolve the deadlock

.

(A) Introduction

The onset of multicore processors forces the programmers to use multiple threads in order to take advantage of hardware parallelism. Java is one of the first languages to make multithreading available to developers. Along with advantages of concurrent systems and multithreading, there are some challenges involved. Java has inter-process communication model which means it has set of methods for exchange of data among multiple threads and processes. It is based on shared data structures and meshes well with hardware architectures in which multiple cores share memory. However Java is susceptible to deadlocks. Deadlock is a condition under which the entire program is halted as each thread in a set attempts to acquire a lock already held by another thread in a set. Java is susceptible to deadlocks because (a) threads exchanges information by sharing variables that they lock with mutex locks and (b) the locking mechanism interacts with other language features, such as aliasing. Consider a simple banking transaction example.

Figure 1: Deadlock scenario in banking transaction

.In this case if there are two threads attempting to run transfer(a,b) and transfer(b,a) at the same time then deadlock is going to occur because both threads try to acquire the resources in reverse order. Once a program is deadlocked the source of deadlock is difficult to determine making it difficult to resolve the deadlock.

We propose a method of detecting and resolving circular deadlocks during run time of java program. The goal is to detect and resolve deadlocks through a supervisory control approach. We propose deadlock detector and solver agent to detect deadlock dynamically in a java application and resolve them. The agent resolves deadlock by pre-empting one of the threads involved in the deadlock of a resource it holds. The approach assumes that the thread victimized is not in the middle of modifying a complex, shared data structure. The agent is written in C++ which runs parallel with the java program under test and monitors JVM for creation of deadlocks. Once the deadlock is detected, the agent runs the solver to resolve the deadlock. We report an empirical studies aimed at determining the overhead of supervision on the behaviour of a running multi-threaded program.

The Java program is executed in Java Virtual Machine. In order to detect deadlock at runtime JVM events need to be tracked. Java Virtual Machine Tool Interface (JVMTI) is used by deadlock detector to monitor JVM. Java Virtual Machine Tool Interface is a native interface of JVM. It allows a program to inspect the state and control the execution of applications running in JVM. Two JVMTI events used for deadlock detector and solver are monitor_contended_enter & monitor_contended_entered. The JVMTI monitors the JVM and issues call back to agent.  The agent reacts to it and checks for deadlocks in java program. In case deadlock is detected, the agent uses Java Native Interface to release lock in chain of deadlock thus resolving the deadlock. The agent builds the graph with threads as nodes and edges between two threads represent that one thread is waiting to acquire a resource held by another thread. If cycle exists in graph the agent declares deadlock and breaks the cycle to resolve it. So we assume here that pre-emption of thread is allowed that means a thread can be interrupted without requiring its co-operation.

The main deliverables are to detect and resolve deadlocks at run time through supervisory control approach and evaluation of effectiveness of agent which is done through testing various deadlock programs and non-deadlock programs. We used dining philosopher’s non deadlocking version and monitored the agent performance. Philosophers were increased from 5-50 and each philosopher ate 50 times .Running the program under supervision gave an overhead of 5.5% on average. The supervisor itself showed linear behaviour with respect to philosophers. We used programs varying number of deadlocks and varying number of threads involved in a deadlock. The programs are executed 10 times to test if agent is able to detect and resolve deadlock accurately. The agent resolved deadlocks in linear time with respect to number of threads involved.

Next sections of the paper are as follows, Section 2 of the paper talks about Java multithreading, locks on objects, synchronization and how circular deadlock occurs in a multithreaded program. Section 3 gives detail about the approach used to detect and resolve deadlock i.e. the agent. Section 4 and Section5 shows an empirical study calculating overhead of supervision on various versions of non-deadlocked deadlocked programs.

(B) Java Multi-Threading

Java is a multithreading programming language. A multithreaded program contains two or more part that can run concurrently and each part can handle different tasks at the same time making use of optimal resources. Multithreading extends the idea of multitasking into applications where one can subdivide specific operations within a single application into individual threads. A thread is an independent path of execution within a program. Each of the threads can run in parallel. The JVM allows an application to have multiple threads of execution running concurrently. There are two ways of to create a new thread to execution (a) by extending Thread class (b) by implementing the Runnable interface. Both of them have a method run(). This method is the starting point of thread execution. The JVM will call run() method when a thread starts executing. A thread’s lifetime can be in one of the following possible state – new, runnable, waiting, blocked, sleeping or terminated. Every thread has a priority. Threads with higher priority are executed in preference to thread with lower priority.

If the code is executing in multi-threaded environment, we need synchronization of objects which are shared among multiple threads to avoid data race conditions. Data race conditions occur when two threads operate on same object without synchronization and there operation interleaves on each other. Data race can lead to unpredictable results and subtle program bugs. Synchronization in java provides locking, which ensures mutual exclusive access of shared resources and prevent data race. It also prevent reordering of code state by compiler which can cause subtle concurrent issue without synchronization. This synchronization process involves locking and unlocking. The process works as follows. Every object has a lock. At any moment, that lock is controlled by at most, one single thread. The lock control access to object’s synchronized code. A thread that wants to execute an object’s synchronized code must first attempt to acquire that object’s lock. If the lock is available that is, if it is not controlled by any other thread then the attempting thread will acquire the lock. If the lock is under another thread’s control, then the attempting thread goes into seeking lock state and becomes ready only when the lock becomes available. When a thread that owns a lock passes out of synchronized code, the thread automatically gives up the lock. We can mark code as synchronized in two ways: (a) synchronize an entire method by putting the synchronized modifier in the method’s declaration. To execute the method, a thread must acquire the lock of object that owns the method. (b)  synchronize a set of statements inside the method. Unlike synchronized method, synchronized statements must specify the object that provides the lock.

Obtaining and using locks is tricky and can lead to lots of problems. One of the difficult (and common) problem is known as deadlock. A deadlock arises when locking threads result in a situation where they cannot proceed and thus wait indefinitely for others to terminate. In general terms, if one thread blocks for a resource because it is waiting for a condition, and some other thread in the program makes it impossible for that condition to arise. One of the classic example of deadlock is “dining philosopher” problem. Five philosophers sit around a table with 5 plates (one for each philosopher), 5 chopsticks and a bowl of rice. Each philosopher alternatively thinks and eats. To eat, she needs two chopsticks next to her plate. When finished eating she puts the chopsticks back on the table and continues thinking. Philosophers here are processes and chopsticks are resources. The system will be in a deadlock if all 5 philosophers take up their left chopstick simultaneously. The system will halt unless one of them put it back.

Threads often need to coordinate their actions. The wait()/notify() mechanism is useful when threads must communicate in order to provide a functionality. When wait() is invoked, the thread releases the lock and suspends execution. A thread remains in wait() state until some other thread calls the notify() or notifyAll() method on wait object. notify() and notifyAll() methods are used to wake up a waiting thread in Thread class. The notify() method wakes up one thread waiting for the lock (the first thread that called wait on that lock). The notifyAll() method wakes up all the threads waiting for the lock; the JVM selects one of the threads from list of threads waiting for the lock and wakes that thread up.

(C) Deadlock Detector and Solver

The deadlock detector and solver agent is a program which detects and resolves deadlocks in a java program at runtime. The agent is a Java Virtual Machine Tool Interface (JVMTI) agent written in C++ which runs in parallel with java program. The agents monitors Java Virtual Machine, issues call backs to the agent and takes appropriate action whenever a thread has acquired a resource or whenever a thread is waiting to acquire a resource. Gathering such information the agent builds a graph with threads as nodes and edges between threads if one thread is waiting for resource acquired by another thread. If cycle exists in graph, then agent declares that there is a deadlock in the program. Then the agent uses Java Native Interface (JNI) to release lock in the chain of deadlocks thus resolving the deadlock.

Java Virtual Machine Tool Interface is a programming interface used by development and monitoring tools. It provides both a way to inspect the state and to control the execution of applications running in Java Virtual Machine. JVMTI is a two way interface that allows communication between JVM and client which is Deadlock Detector agent in this case. The agent is notified of errors and other interesting occurrences through events. To handle events we designated callback functions with SetEventCallback. For each event corresponding callback function will be called. The agent access JVMTI features by calling JVMTI functions. Access to JVMTI functions is by use of an interface pointer which is called environment pointer. The environment pointer is of type jvmtiEnv*. Figure 2 shows how Deadlock Detector and Solver works. JVMTI monitors JVM and notify to agent of any event. When each event related to deadlock detection occurs, information concerning deadlock is passed to agent. Once deadlock is detected, the solver agent uses Java Native Interface to release lock in the chain of deadlocks thus resolving the deadlock. Java Native Interface (JNI) is a programming framework that enables java code running in JVM to call and be called by native applications (programs specific to hardware and operating system platform) or libraries written in other languages like C, C++. Native code which in our case written in C++ uses JNI method call mechanism to invoke wait(), notify and notifyAll() methods to provide thread coordination.

The solver agent uses two main events monitor_contended_enter & monitor_contended_entered. These are the call backs which JVMTI registers to. monitor_contended_enter is called when a thread is attempting to enter the monitor already acquired by another thread. All the threads involved in cyclic deadlock will have to send a monitor contended enter signal to JVMTI as this is necessary condition for deadlocks. This feature enables us to add edges in a resource mapping graph. monitor_contended_entered is called when a thread enters the monitor after waiting to be released by another thread. The signal enables us to remove edges from a resource mapping graph, thus resolving the deadlock. The solution assumes that the thread from which edge is removed and deadlock is resolved is not in middle of modifying a complex, shared data structure. The deadlock is resolved by pre-empting one of the threads involved in deadlock of a resource it holds.

Figure 2: Deadlock Detector and Solver Architecture

(D) Behaviour of non-deadlocking programs under agent’s supervision

To determine the cpu time attributable to supervisor and cpu time attributable to running program we used non-deadlocking version of dining philosophers program. Each philosopher locks the left chopstick, locks the right chopstick, eat for a while, unlock the left chopstick, unlock the right chopstick and thinks for a while. Time taken to eat is randomly generated between 0-20ms and time taken to think is also randomly generated between 0-40ms. The program terminates when each philosopher has eaten at least 50 times. Two versions of Dining Philosophers program are used, one with re-entrant locks and other with intrinsic locks. The programs executed with different number of philosophers varying from 5-50 and the cpu time taken to execute the program is measured. The same programs are then run with the agent to check the difference and effect on performance. As monitor_contended_enter and monitor_contended_entered are two main callback events in the agent, total time taken by these two callbacks is used to compare the effect on performance when program ran under supervision. The programs were executed 10 times to get the accurate time measurements and average time is noted along with minimum and maximum time.

  • Dining Philosopher with re-entrant locks

Table 1: Times (in millisecond) for dining philosopher without agent supervision

Table 2: Times (in millisecond) for dining philosopher with agent supervision.

Table 3: Percentage overhead when running program under supervision.

Table1, Table 2 and Table 3 shows that there was an average increase of 5.6% if the non-deadlocking program ran with the agent. Also the agent’s runtime was linear with number of philosophers

  • Dining Philosopher with intrinsic locks

Table 4: Times in millisecond for dining philosopher without agent supervision

Table 5: Times in millisecond for dining philosopher with agent supervision.

Table 6: Percentage overhead when running program under supervision.

Table4, Table 5 and Table 6 shows that there was an average increase of 5.58% if the non-deadlocking program ran with the agent. Also the agent’s runtime was sub linear with number of philosophers

(E) Empirical Results on deadlocking programs

In order to detect and resolve under supervisory control approach, we ran programs with various number of deadlock and various number of threads involved in the deadlock. The agent’s effectiveness is also measured to check the overhead imposed to resolve deadlock. As monitor_contended_enter and monitor_contended_entered are two main callback events in the agent, total time taken by these two callbacks is used to compare the effect on performance. The programs are executed 10 times and average of results in noted to test whether agent is able to detect and resolve deadlock accurately.

To effectively test various deadlocked programs, Deadlock code generator is created. It is a java application which generates various types of cyclic deadlocks with varying parameters. The application accepts user inputs which is number of deadlocks to be generated, number of threads involved in each deadlock, number of extra resources and extra threads. The extra threads and resources are not involved in deadlock, they are just used to increase complexity of program. The application generate programs in which threads will wait to acquire resources held by other threads. The program creates as many number of objects (locks) as number of threads. Thread j acquires object j and tries to acquire j+1; if j equals to zero it acquires object zero and nth object. When thread j has acquired lock on object j, countdown latch is used and make thread j wait till the countdown becomes zero. In the beginning of program countdown latch is created with count as the number of threads involved in deadlock. Each time a thread acquires one of the resources countdown function on latch is called, then await function is called which makes thread wait till the countdown goes zero. All threads have acquired first level lock by now and as countdown reaches zero all threads will try to acquire second level lock which is already being acquired by some other thread. This is how the application makes it certain that the program will end up in deadlock.

  • Results with one deadlock

In this experiment, program with single deadlock among threads is used to measure how much time it takes to detect and resolve deadlock using the agent. The number of threads varied from 2-50.

Table 7: Times (in milliseconds) for single deadlocking with various number of threads.

It is concluded from Table 7, that the agent is able to detect and resolve deadlock having 10 threads in 15.6ms and with 50 threads in 167ms. The program itself took to complete in 19ms for deadlocks between 10 threads and 99ms for deadlock between 50 threads.  Time taken by agent is linear with number of threads.

  • Results with two deadlocks

In this experiment program with 2 deadlocks in used to measure time taken to detect and resolve the deadlocks. The number of threads are not equal in this case. For first deadlock number of threads are 2, for second deadlock number of threads varies from 2-50. Thus for a program with total of 10 threads means 10 threads for first deadlock and two threads for second.

Table 8: Times (in milliseconds) for two deadlocks in program with various number of threads.

It is concluded from Table 8, that the agent is able to detect and resolve deadlock having 10 threads in 21.6ms and with 50 threads in 172ms. The program itself took to complete in 23.2ms for deadlocks between 10 threads and 104.3ms for deadlock between 50 threads. Time taken by agent is linear with number of threads.

  • Results with three deadlocks

In this experiment program with three deadlocks is used to measure how much time it takes to detect and resolve deadlocks using the agent. The number of threads involved in each deadlock is equal that is in a program of 10 threads, each deadlock has 10 threads totalling to 30 threads. Number of threads in each deadlock varies here from 2-50

Table 9: Times (in milliseconds) for 3 deadlocks in program with various number of threads.

It is concluded from Table 9, that the agent is able to detect and resolve deadlock having 10 threads in each deadlock 78.3ms and with 50 threads in 958.3ms. The program itself took to complete in 52.5ms for deadlocks between 10 threads in each deadlock and 455.5ms for deadlock between 50 threads. Time taken by agent is linear with number of threads

  • Results with five deadlocks

In this experiment program with five deadlocks is used to measure how much time it takes to detect and resolve deadlocks using the agent. The number of threads involved in each deadlock is equal that is in a program of 10 threads, each deadlock has 10 threads totalling to 50 threads. Number of threads in each deadlock varies here from 2-50

Table 10: Times (in milliseconds) for 5 deadlocks in program with various number of threads.

It is concluded from Table 10, that the agent is able to detect and resolve deadlock having 10 threads in each deadlock 172.4ms and with 50 threads in 2314.5ms. The program itself took to complete in 111.5ms for deadlocks between 10 threads in each deadlock and 995.0ms for deadlock between 50 threads. Time taken by agent is linear with number of threads

(F)  Conclusions

The empirical study shows that the deadlock detector and solver agent was successful in detecting and removing deadlocks in java programs at run time. The experimentation done using two levels of deadlocking program proves the effectiveness of the agent. The agent was also tested for hypothetical cases, though such cases will be rare occurrence in real world, but they were used to prove effectiveness of tool. In real world deadlocks do not occur as frequently as in these experiments hence agent performance was also tested for non-deadlocking programs. The average performance in this case was in overhead of 5.6%. The deadlock detection and solver agent does not affect performance much as the detection part runs in parallel and solver is called only if deadlock is detected. This seems to be very effective as it does not affect the flow of non-deadlocking java programs. To conclude, the goal of detecting and resolving deadlocks was successful with the deadlock detector and solver agent without imposing overhead.

(G) References

[1]http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html

[2]https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/jniTOC.html

[3] OCA/OCP Java SE 7 Programmer I & II Study Guide

[4]https://docs.oracle.com/cd/B19306_01/server.102/b14220/consist.htm

[5]https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html

[6]https://docs.oracle.com/javase/tutorial/essential/concurrency/locksync.html

[7]https://docs.oracle.com/javase/tutorial/essential/concurrency/deadlock.html

[8] Munnavar Khan (2013) Master’s project report Deadlock Code Generator

[9] Deepak Kini (2013) Master’s project report on Deadlock Code Generator and Runtime Deadlock Detection and Solver Agent.