Collecting Thread Dumps Using Jstack - tenji/ks GitHub Wiki
JVM线程转储(JVM Thread Dump)是在指定时间获取的线程快照,它可以提供所有创建的Java线程的完整列表。
一个线程可以和其它线程同时被处理。 为了确保多线程尝试使用共享资源时的兼容性,使用线程同步(Thread Synchronization)可以保证共享资源同时只能被一个线程访问。
Java上的线程同步可以使用monitor(监视器、检视锁)完成。 每个Java对象都有一个监视器。监视器只能由一个线程拥有。如果一个线程想获取其它线程已经获取的监视器,它需要在等待队列(wait queue)中等待,直到其他线程释放其监视器。
获取线程monitor
的方法:
-
执行该对象的同步方法
public synchronize a () {}
-
执行该对象的同步块
synchronize(obj) {}
-
执行某个类的静态同步方法
public static synchronize b() {}
为了分析线程转储,你需要知道线程都有哪些状态。线程的状态在java.lang.Thread.State
中声明。
- NEW: The thread is created but has not been processed yet.
- RUNNABLE: The thread is occupying the CPU and processing a task. (It may be in WAITING status due to the OS's resource distribution.)
- BLOCKED: The thread is waiting for a different thread to release its lock in order to get the monitor lock.
- WAITING: The thread is waiting by using a wait, join or park method.
- TIMED_WAITING: The thread is waiting by using a sleep, wait, join or park method. (The difference from WAITING is that the maximum waiting time is specified by the method parameter, and WAITING can be relieved by time as well as external changes.)
- TERMINATED: The thread has completed execution.
Java线程可以分为两种:
- daemon threads(守护线程)
- and non-daemon threads(用户线程)
当没有其他用户线程时,守护线程将停止工作。 即使你在启动JVM时不创建任何线程,Java应用程序也将默认创建多个线程。大多数是守护线程,主要用于处理垃圾收集或JMX等任务。
运行static void main(String [] args)
方法的线程被创建为用户线程,当该线程停止工作时,所有其他守护线程也将停止。(运行此主方法的线程被称为VM thread in HotSpot VM)
使用jstack
获取线程快照的命令:
$ jstack -f [pid]
"pool-1-thread-13" prio=6 tid=0x000000000729a000 nid=0x2fb4 runnable [0x0000000007f0f000] java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
- locked <0x0000000780b7e688> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
- locked <0x0000000780b7e688> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
- Thread name: When using Java.lang.Thread class to generate a thread, the thread will be named Thread-(Number), whereas when using java.util.concurrent.ThreadFactory class, it will be named pool-(number)-thread-(number).
- Priority: Represents the priority of the threads.
- Thread ID: Represents the unique ID for the threads. (Some useful information, including the CPU usage or memory usage of the thread, can be obtained by using thread ID.)
- Thread status: Represents the status of the threads.
- Thread callstack: Represents the call stack information of the threads.
1、查询CPU使用率最高的线程:(或者top
然后Shift + H
)
[user@linux ~]$ ps -eLo pid,lwp,stime,time,pcpu | grep [pid]
PID LWP STIME TIME %CPU
10029 - Dec07 00:02:02 99.5
- 10039 Dec07 00:00:00 0.1
- 10040 Dec07 00:00:00 95.5
获取使用CPU最多的轻量级进程(Light Weight Process, LWP),并将其唯一编号(10039)转换为十六进制数(0x2737)。
2、获取线程快照后,检查线程正在执行的操作。
获取PID
为10029的进程的线程快照,然后找到nid
为0x2737的线程.
"NioProcessor-2" prio=10 tid=0x0a8d2800 nid=0x2737 runnable [0x49aa5000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x74c52678> (a sun.nio.ch.Util$1)
- locked <0x74c52668> (a java.util.Collections$UnmodifiableSet)
- locked <0x74c501b0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at external.org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65)
at external.org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:708)
at external.org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
每小时获取几次线程快照,并检查该线程的状态变化来确定引起改线程CPU占用率过高的原因。
多次获取线程转储后,查找具有BLOCKED状态的线程列表。
" DB-Processor-13" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
- waiting to lock <0xe0375410> (a beans.ConnectionPool)
at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)
"DB-Processor-14" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f020]
java.lang.Thread.State: BLOCKED (on object monitor)
at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
- waiting to lock <0xe0375410> (a beans.ConnectionPool)
at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)
" DB-Processor-3" daemon prio=5 tid=0x00928248 nid=0x8b waiting for monitor entry [0x000000000825d080]
java.lang.Thread.State: RUNNABLE
at oracle.jdbc.driver.OracleConnection.isClosed(OracleConnection.java:570)
- waiting to lock <0xe03ba2e0> (a oracle.jdbc.driver.OracleConnection)
at beans.ConnectionPool.getConnection(ConnectionPool.java:112)
- locked <0xe0386580> (a java.util.Vector)
- locked <0xe0375410> (a beans.ConnectionPool)
at beans.cus.Cue_1700c.GetNationList(Cue_1700c.java:66)
at org.apache.jsp.cue_1700c_jsp._jspService(cue_1700c_jsp.java:120)
如果线程被 BLOCKED,则尝试找到这个线程等待的锁相关的其它线程。
通过线程快照,你可以确认线程状态之所以保持 BLOCKED,是因为无法获取到**<0xe0375410>**这个锁。可以通过分析正在持有这个锁的线程的堆栈信息(Stack Trace)解决该问题。
在使用DBMS的应用程序中出现上述的情况时通常有两个原因。第一个是配置不正确。尽管线程仍然可以工作,但由于DBCP的配置不正确等原因,所以不能表现出最佳性能。如果您多次抓取线程快照并进行比较,你可以看到曾经被 BLOCKED 的一些线程处于不同的状态。
第二个原因是连接异常。 当与DBMS的连接一直异常时,线程会一直等待直到超时。在这种情况下,即使多次抓取线程快照并比较,你可以发现与DBMS相关的线程会一直处于 BLOCKED 状态。然后通过调整超时时间等数值,可以在一定程度上提高系统处理性能。
当使用java.lang.Thread
对象创建线程时,该线程将被命名为 Thread-(Number)。当使用java.util.concurrent.DefaultThreadFactory
对象创建线程时,该线程将被命名为 pool-(Number)-thread-(Number)。当分析应用程序的数万到数千个线程时,如果所有线程仍然有其默认名称,分析它们就变得非常困难,因为很难区分要分析的线程。
因此,建议你养成创建新线程时为线程命名的习惯。
当您使用java.lang.Thread
创建线程时,您可以通过使用 creator 参数为线程指定一个自定义名称。
public Thread(Runnable target, String name);
public Thread(ThreadGroup group, String name);
public Thread(ThreadGroup group, Runnable target, String name);
public Thread(ThreadGroup group, Runnable target, String name, long stackSize);
当你使用java.util.concurrent.ThreadFactory
创建线程时,你可以通过生成你自己的 ThreadFactory 来命名它。如果你不需要特殊功能,那么您可以使用 MyThreadFactory,如下所述:
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.atomic.AtomicInteger;
public class MyThreadFactory implements ThreadFactory {
private static final ConcurrentHashMap<String, AtomicInteger> POOL_NUMBER =
new ConcurrentHashMap<String, AtomicInteger>();
private final ThreadGroup group;
private final AtomicInteger threadNumber = new AtomicInteger(1);
private final String namePrefix;
public MyThreadFactory(String threadPoolName) {
if (threadPoolName == null) {
throw new NullPointerException("threadPoolName");
}
POOL_NUMBER.putIfAbsent(threadPoolName, new AtomicInteger());
SecurityManager securityManager = System.getSecurityManager();
group = (securityManager != null) ? securityManager.getThreadGroup() :
Thread.currentThread().getThreadGroup();
AtomicInteger poolCount = POOL_NUMBER.get(threadPoolName);
if (poolCount == null) {
namePrefix = threadPoolName + " pool-00-thread-";
} else {
namePrefix = threadPoolName + " pool-" + poolCount.getAndIncrement() + "-thread-";
}
}
public Thread newThread(Runnable runnable) {
Thread thread = new Thread(group, runnable, namePrefix + threadNumber.getAndIncrement(), 0);
if (thread.isDaemon()) {
thread.setDaemon(false);
}
if (thread.getPriority() != Thread.NORM_PRIORITY) {
thread.setPriority(Thread.NORM_PRIORITY);
}
return thread;
}
}
你可以使用 MBean 获取 ThreadInfo 对象。你还可以通过使用 ThreadInfo 获取通过线程转储难以获取的更多信息。
ThreadMXBean mxBean = ManagementFactory.getThreadMXBean();
long[] threadIds = mxBean.getAllThreadIds();
ThreadInfo[] threadInfos =
mxBean.getThreadInfo(threadIds);
for (ThreadInfo threadInfo : threadInfos) {
System.out.println(
threadInfo.getThreadName());
System.out.println(
threadInfo.getBlockedCount());
System.out.println(
threadInfo.getBlockedTime());
System.out.println(
threadInfo.getWaitedCount());
System.out.println(
threadInfo.getWaitedTime());
}
您可以通过 ThreadInfo 中的方法获取线程WAITed
或BLOCKED
的时间,通过该方法还可以获取异常长时间不活动的线程列表。
- Linux下,执行
jstack
命令的用户需和进程启动用户一致;
Collecting Thread Dumps Using Jstack utility
How to Analyze Java Thread Dumps
JVM: How to analyze Thread Dump
How to analyze Thread Dump – part 1
How to analyze Thread Dump – Part2: JVM Overview