【APIM】Azure APIM抛出 java.lang.RuntimeException 错误定位 - LuBu0505/My-Code GitHub Wiki

问题描述

Azure APIM服务日志中发现 java.lang.RuntimeException 错误,在进一步通过Application Insights采集的错误信息日志,发现真实的请求错误为:‘The remote name could not be resolved 'xxxx.xxx.xx'"。

问题解答

APIM服务,在没有配置自定义的DNS服务器时,默认会使用Azure平台的DNS服务器(168.63.129.16)进行解析。

Azure APIM服务所托管的虚拟机操作系统为Windows,在遇到多个DNS Server时的选择顺序如下:

The DNS Client service queries the DNS servers in the following order: DNS 客户端服务按以下顺序查询 DNS 服务器:

  1. The DNS Client service sends the name query to the first DNS server on the preferred adapter’s list of DNS servers and waits one second for a response. DNS 客户端服务将名称查询发送到首选适配器的 DNS 服务器列表中的第一个 DNS 服务器,并等待一秒钟以获取响应。

  2. If the DNS Client service does not receive a response from the first DNS server within one second, it sends the name query to the first DNS servers on all adapters that are still under consideration and waits two seconds for a response. 如果 DNS 客户端服务在一秒钟内未收到来自第一个 DNS 服务器的响应,则会将名称查询发送到仍在考虑中的所有适配器上的第一个 DNS 服务器,并等待两秒以获取响应。

  3. If the DNS Client service does not receive a response from any DNS server within two seconds, the DNS Client service sends the query to all DNS servers on all adapters that are still under consideration and waits another two seconds for a response. 如果 DNS 客户端服务在两秒内未收到任何 DNS 服务器的响应,则 DNS 客户端服务会将查询发送到仍在考虑的所有适配器上的所有 DNS 服务器,并再等待两秒以获得响应。

  4. If the DNS Client service still does not receive a response from any DNS server, it sends the name query to all DNS servers on all adapters that are still under consideration and waits four seconds for a response. 如果 DNS 客户端服务仍未收到任何 DNS 服务器的响应,它将名称查询发送到仍在考虑中的所有适配器上的所有 DNS 服务器,并等待四秒钟以获取响应。

  5. If it the DNS Client service does not receive a response from any DNS server, the DNS client sends the query to all DNS servers on all adapters that are still under consideration and waits eight seconds for a response. 如果 DNS 客户端服务未收到来自任何 DNS 服务器的响应,则 DNS 客户端会将查询发送到仍在考虑的所有适配器上的所有 DNS 服务器,并等待 8 秒以获得响应。

引用文档:https://learn.microsoft.com/zh-cn/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/dd197552(v=ws.10)?redirectedfrom=MSDN

因为错误消息“The remote name could not be resolved ”已经非常明确的指出了是域名无法解析的错误,所以排查思路是:

  • 如果配置了自定义DNS服务器,可以在DNS服务器中查看日志,检查是否有未能解析的错误。

  • 如果没有配置,则需要检查Azure DNS服务器日志。如果在Azure DNS服务器的解析日志中发现 RCODE 为 NXDOMAIN(3) 的错误码,说明Azure DNS服务器上并未找到所查找目的域名相关A记录 image.png

  • 此外,如果配置有多个DNS服务器,会存在 第一个DNS Server没有响应时, 会向其他DNS Server发送解析请求,并延长等待时间(1-2-2-4-8秒),如都没有返回或返回错误,则APIM日志记录 not resolved。

参考资料

APIM中对后端API服务的DNS域名缓存问题 :https://www.cnblogs.com/lulight/p/13590755.html

DNS Processes and Interactions : https://learn.microsoft.com/zh-cn/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/dd197552(v=ws.10)?redirectedfrom=MSDN

当在复杂的环境中面临问题,格物之道需:浊而静之徐清,安以动之徐生。 云中,恰是如此!