admin管理员组文章数量:1130349
记录一次生产异常,spark driver 连接RM报错,不断尝试重连接,报错如下:
21/04/16 17:00:05 INFO RetryInvocationHandler: java.io.EOFException: End of File Exception between local host is: "prod-hadoop01/172.19.51.11"; destination host is: "prod-hadoop03":8032; : java.io.EOFException; For more details see: http://wiki.apache/hadoop/EOFException, while invoking $Proxy16.getApplicationReport over Failover proxy for [rm1, rm2]. Trying to failover immediately.
21/04/16 17:00:05 INFO RequestHedgingRMFailoverProxyProvider: Connection lost with rm2, trying to fail over.
21/04/16 17:00:05 INFO RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/04/16 17:00:05 INFO RetryInvocationHandler: java.ConnectException: Call From prod-hadoop01/172.19.51.11 to prod-hadoop03:8032 failed on connection exception: java.ConnectException: Connection refused; For more details see: http://wiki.apache/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over null. Retrying after sleeping for 15000ms.
由于是生产,客户正在使用该功能,只能快速定位问题:“prod-hadoop03”:8032,根据日志提示是从prod-hadoopo1到prod-hadoop03,而且是8032端口,很快判断可能是ResourceManager HA切换导致的问题,快速进入ambari停止RM,重新启动RM,是active ResourceManager 切换到prod-hadoop01 上,暂时解决生产问题。
记录一次生产异常,spark driver 连接RM报错,不断尝试重连接,报错如下:
21/04/16 17:00:05 INFO RetryInvocationHandler: java.io.EOFException: End of File Exception between local host is: "prod-hadoop01/172.19.51.11"; destination host is: "prod-hadoop03":8032; : java.io.EOFException; For more details see: http://wiki.apache/hadoop/EOFException, while invoking $Proxy16.getApplicationReport over Failover proxy for [rm1, rm2]. Trying to failover immediately.
21/04/16 17:00:05 INFO RequestHedgingRMFailoverProxyProvider: Connection lost with rm2, trying to fail over.
21/04/16 17:00:05 INFO RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/04/16 17:00:05 INFO RetryInvocationHandler: java.ConnectException: Call From prod-hadoop01/172.19.51.11 to prod-hadoop03:8032 failed on connection exception: java.ConnectException: Connection refused; For more details see: http://wiki.apache/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over null. Retrying after sleeping for 15000ms.
由于是生产,客户正在使用该功能,只能快速定位问题:“prod-hadoop03”:8032,根据日志提示是从prod-hadoopo1到prod-hadoop03,而且是8032端口,很快判断可能是ResourceManager HA切换导致的问题,快速进入ambari停止RM,重新启动RM,是active ResourceManager 切换到prod-hadoop01 上,暂时解决生产问题。
版权声明:本文标题:spark driver 连接RM报错,不断尝试重连 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://it.en369.cn/jiaocheng/1754579461a2703906.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。


发表评论