Collecting Thread Dumps Using Jstack - tenji/ks GitHub Wiki

使用Jstack分析线程快照

JVM线程转储(JVM Thread Dump)是在指定时间获取的线程快照,它可以提供所有创建的Java线程的完整列表。

Java线程的背景信息

线程同步(Thread Synchronization)

一个线程可以和其它线程同时被处理。 为了确保多线程尝试使用共享资源时的兼容性,使用线程同步(Thread Synchronization)可以保证共享资源同时只能被一个线程访问。

Java上的线程同步可以使用monitor(监视器、检视锁)完成。 每个Java对象都有一个监视器。监视器只能由一个线程拥有。如果一个线程想获取其它线程已经获取的监视器,它需要在等待队列(wait queue)中等待,直到其他线程释放其监视器。

获取线程monitor的方法:

  1. 执行该对象的同步方法

     public synchronize a () {}
    
  2. 执行该对象的同步块

     synchronize(obj) {}
    
  3. 执行某个类的静态同步方法

     public static synchronize b() {}
    

线程状态(Thread Status)

为了分析线程转储,你需要知道线程都有哪些状态。线程的状态在java.lang.Thread.State中声明。

  • NEW: The thread is created but has not been processed yet.
  • RUNNABLE: The thread is occupying the CPU and processing a task. (It may be in WAITING status due to the OS's resource distribution.)
  • BLOCKED: The thread is waiting for a different thread to release its lock in order to get the monitor lock.
  • WAITING: The thread is waiting by using a wait, join or park method.
  • TIMED_WAITING: The thread is waiting by using a sleep, wait, join or park method. (The difference from WAITING is that the maximum waiting time is specified by the method parameter, and WAITING can be relieved by time as well as external changes.)
  • TERMINATED: The thread has completed execution.

线程类型(Thread Type)

Java线程可以分为两种:

  1. daemon threads(守护线程)
  2. and non-daemon threads(用户线程)

当没有其他用户线程时,守护线程将停止工作。 即使你在启动JVM时不创建任何线程,Java应用程序也将默认创建多个线程。大多数是守护线程,主要用于处理垃圾收集或JMX等任务。

运行static void main(String [] args)方法的线程被创建为用户线程,当该线程停止工作时,所有其他守护线程也将停止。(运行此主方法的线程被称为VM thread in HotSpot VM

获取线程快照

使用jstack

使用jstack获取线程快照的命令:

$ jstack -f [pid]

线程快照中的线程信息

"pool-1-thread-13" prio=6 tid=0x000000000729a000 nid=0x2fb4 runnable [0x0000000007f0f000] java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:129)
          at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
          at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
          at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
          - locked <0x0000000780b7e688> (a java.io.InputStreamReader)
          at java.io.InputStreamReader.read(InputStreamReader.java:167)
          at java.io.BufferedReader.fill(BufferedReader.java:136)
          at java.io.BufferedReader.readLine(BufferedReader.java:299)
          - locked <0x0000000780b7e688> (a java.io.InputStreamReader)
          at java.io.BufferedReader.readLine(BufferedReader.java:362)
  • Thread name: When using Java.lang.Thread class to generate a thread, the thread will be named Thread-(Number), whereas when using java.util.concurrent.ThreadFactory class, it will be named pool-(number)-thread-(number).
  • Priority: Represents the priority of the threads.
  • Thread ID: Represents the unique ID for the threads. (Some useful information, including the CPU usage or memory usage of the thread, can be obtained by using thread ID.)
  • Thread status: Represents the status of the threads.
  • Thread callstack: Represents the call stack information of the threads.

问题排查场景

场景一、CPU使用率过高

1、查询CPU使用率最高的线程:(或者top然后Shift + H

[user@linux ~]$ ps -eLo pid,lwp,stime,time,pcpu | grep [pid]

PID         LWP          STIME              TIME        %CPU
10029            -        Dec07          00:02:02         99.5
     -       10039        Dec07          00:00:00         0.1
     -       10040        Dec07          00:00:00         95.5

获取使用CPU最多的轻量级进程(Light Weight Process, LWP),并将其唯一编号(10039)转换为十六进制数(0x2737)。

2、获取线程快照后,检查线程正在执行的操作。

获取PID为10029的进程的线程快照,然后找到nid为0x2737的线程.

"NioProcessor-2" prio=10 tid=0x0a8d2800 nid=0x2737 runnable [0x49aa5000]
java.lang.Thread.State: RUNNABLE
            at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
            at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
            at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
            at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
            - locked <0x74c52678> (a sun.nio.ch.Util$1)
            - locked <0x74c52668> (a java.util.Collections$UnmodifiableSet)
            - locked <0x74c501b0> (a sun.nio.ch.EPollSelectorImpl)
            at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
            at external.org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65)
            at external.org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:708)
            at external.org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
            at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
            at java.lang.Thread.run(Thread.java:662)

每小时获取几次线程快照,并检查该线程的状态变化来确定引起改线程CPU占用率过高的原因。

场景二、处理性能异常低下

多次获取线程转储后,查找具有BLOCKED状态的线程列表。

" DB-Processor-13" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f000]
java.lang.Thread.State: BLOCKED (on object monitor)
            at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
            - waiting to lock <0xe0375410> (a beans.ConnectionPool)
            at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
            at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)
"DB-Processor-14" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f020]
java.lang.Thread.State: BLOCKED (on object monitor)
            at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
            - waiting to lock <0xe0375410> (a beans.ConnectionPool)
            at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
            at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)
" DB-Processor-3" daemon prio=5 tid=0x00928248 nid=0x8b waiting for monitor entry [0x000000000825d080]
java.lang.Thread.State: RUNNABLE
            at oracle.jdbc.driver.OracleConnection.isClosed(OracleConnection.java:570)
            - waiting to lock <0xe03ba2e0> (a oracle.jdbc.driver.OracleConnection)
            at beans.ConnectionPool.getConnection(ConnectionPool.java:112)
            - locked <0xe0386580> (a java.util.Vector)
            - locked <0xe0375410> (a beans.ConnectionPool)
            at beans.cus.Cue_1700c.GetNationList(Cue_1700c.java:66)
            at org.apache.jsp.cue_1700c_jsp._jspService(cue_1700c_jsp.java:120)

如果线程被 BLOCKED,则尝试找到这个线程等待的锁相关的其它线程。

通过线程快照,你可以确认线程状态之所以保持 BLOCKED,是因为无法获取到**<0xe0375410>**这个锁。可以通过分析正在持有这个锁的线程的堆栈信息(Stack Trace)解决该问题。

在使用DBMS的应用程序中出现上述的情况时通常有两个原因。第一个是配置不正确。尽管线程仍然可以工作,但由于DBCP的配置不正确等原因,所以不能表现出最佳性能。如果您多次抓取线程快照并进行比较,你可以看到曾经被 BLOCKED 的一些线程处于不同的状态。

第二个原因是连接异常。 当与DBMS的连接一直异常时,线程会一直等待直到超时。在这种情况下,即使多次抓取线程快照并比较,你可以发现与DBMS相关的线程会一直处于 BLOCKED 状态。然后通过调整超时时间等数值,可以在一定程度上提高系统处理性能。

让分析线程堆变得更容易的编码方法

给线程命名(Naming Threads)

当使用java.lang.Thread对象创建线程时,该线程将被命名为 Thread-(Number)。当使用java.util.concurrent.DefaultThreadFactory对象创建线程时,该线程将被命名为 pool-(Number)-thread-(Number)。当分析应用程序的数万到数千个线程时,如果所有线程仍然有其默认名称,分析它们就变得非常困难,因为很难区分要分析的线程。

因此,建议你养成创建新线程时为线程命名的习惯。

当您使用java.lang.Thread创建线程时,您可以通过使用 creator 参数为线程指定一个自定义名称。

public Thread(Runnable target, String name);
public Thread(ThreadGroup group, String name);
public Thread(ThreadGroup group, Runnable target, String name);
public Thread(ThreadGroup group, Runnable target, String name, long stackSize);

当你使用java.util.concurrent.ThreadFactory创建线程时,你可以通过生成你自己的 ThreadFactory 来命名它。如果你不需要特殊功能,那么您可以使用 MyThreadFactory,如下所述:

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.atomic.AtomicInteger;

public class MyThreadFactory implements ThreadFactory {
  private static final ConcurrentHashMap<String, AtomicInteger> POOL_NUMBER =
                                                       new ConcurrentHashMap<String, AtomicInteger>();
  private final ThreadGroup group;
  private final AtomicInteger threadNumber = new AtomicInteger(1);
  private final String namePrefix;

  public MyThreadFactory(String threadPoolName) {

      if (threadPoolName == null) {
          throw new NullPointerException("threadPoolName");
      }
            POOL_NUMBER.putIfAbsent(threadPoolName, new AtomicInteger());

      SecurityManager securityManager = System.getSecurityManager();
      group = (securityManager != null) ? securityManager.getThreadGroup() :
                                                    Thread.currentThread().getThreadGroup();

      AtomicInteger poolCount = POOL_NUMBER.get(threadPoolName);

      if (poolCount == null) {
            namePrefix = threadPoolName + " pool-00-thread-";
      } else {
            namePrefix = threadPoolName + " pool-" + poolCount.getAndIncrement() + "-thread-";
      }
  }

  public Thread newThread(Runnable runnable) {
      Thread thread = new Thread(group, runnable, namePrefix + threadNumber.getAndIncrement(), 0);

      if (thread.isDaemon()) {
            thread.setDaemon(false);
      }

      if (thread.getPriority() != Thread.NORM_PRIORITY) {
            thread.setPriority(Thread.NORM_PRIORITY);
      }

      return thread;
  }
}

使用 MBean 获取更详细的信息(Obtaining More Detailed Information by Using MBean)

你可以使用 MBean 获取 ThreadInfo 对象。你还可以通过使用 ThreadInfo 获取通过线程转储难以获取的更多信息。

ThreadMXBean mxBean = ManagementFactory.getThreadMXBean();
long[] threadIds = mxBean.getAllThreadIds();
ThreadInfo[] threadInfos =
                mxBean.getThreadInfo(threadIds);

for (ThreadInfo threadInfo : threadInfos) {
  System.out.println(
      threadInfo.getThreadName());
  System.out.println(
      threadInfo.getBlockedCount());
  System.out.println(
      threadInfo.getBlockedTime());
  System.out.println(
      threadInfo.getWaitedCount());
  System.out.println(
      threadInfo.getWaitedTime());
}

您可以通过 ThreadInfo 中的方法获取线程WAITedBLOCKED的时间,通过该方法还可以获取异常长时间不活动的线程列表。

注意事项

  • Linux下,执行jstack命令的用户需和进程启动用户一致;

参考链接

Collecting Thread Dumps Using Jstack utility

How to Analyze Java Thread Dumps

JVM: How to analyze Thread Dump

How to analyze Thread Dump – part 1

How to analyze Thread Dump – Part2: JVM Overview

How to analyze Thread Dump – Part 3: HotSpot VM

How to analyze Thread Dump – Part 4: IBM VM

⚠️ **GitHub.com Fallback** ⚠️