目前部署架构和发布方式,发布采用滚动发布的方式,k8s已经实现了停机时将服务从注册列表自动移除的机制。
k8s对pod是有一个terminationGracePeriodSeconds的配置项(默认是30秒),当采用RollingUpdate方式更新应用时,一个新pod起来并经过healthcheck后,会通知k8s终止一个pod。Pod被设置为Terminating状态,并从k8s的service/endpoint列表中删除,Pod停止获得新流量。K8S向Pod中的容器发送SIGTERM(kill 15)信号,通知应用进程。
这给应用层面省去了不少麻烦,不然得下线时挂钩子(shutdownHook)将服务主动从注册列表移除,还没那么简单,容器重启服务时,如果发布前后ip无变化,还得对外暴露接口,配置执行脚本将服务加入注册列表,需要一整套方案去保证。
DiscoveryManager.getInstance().shutdownComponent();
测试代码:
@GetMapping("/test")
public String test1(@RequestParam int s) throws InterruptedException {
System.out.println(Thread.currentThread().getName() + " start");
TimeUnit.SECONDS.sleep(s);
System.out.println(Thread.currentThread().getName() + " finished");
return "m1";
}
Jetty 停机日志:可以看到系统会处理完成请求,但是客户端接收不到响应!
qtp1648582256-24 start
2022-04-18 17:21:54.040 INFO 28284 --- [ Thread-13] ConfigServletWebServerApplicationContext : Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@647e447: startup date [Mon Apr 18 17:21:38 CST 2022]; root of context hierarchy
2022-04-18 17:21:54.041 INFO 28284 --- [ Thread-13] o.s.j.e.a.AnnotationMBeanExporter : Unregistering JMX-exposed beans on shutdown
2022-04-18 17:21:54.050 INFO 28284 --- [ Thread-13] o.e.jetty.server.AbstractConnector : Stopped ServerConnector@2caf6912{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
2022-04-18 17:21:54.050 INFO 28284 --- [ Thread-13] org.eclipse.jetty.server.session : Stopped scavenging
2022-04-18 17:21:54.053 INFO 28284 --- [ Thread-13] o.e.j.s.h.ContextHandler.application : Destroying Spring FrameworkServlet 'dispatcherServlet'
2022-04-18 17:21:54.054 INFO 28284 --- [ Thread-13] o.e.jetty.server.handler.ContextHandler : Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@5910de75{/,[file:///private/var/folders/fd/ts66c75s6pl47q8cy2q5szy80000gn/T/jetty-docbase.7513543250728756019.8080/],UNAVAILABLE}
qtp1648582256-24 finished
Process finished with exit code 130 (interrupted by signal 2: SIGINT)
jetty客户端响应
org.apache.http.NoHttpResponseException: localhost:8080 failed to respond
tomcat 停机日志:可以看到系统收到指令,业务中断
http-nio-8080-exec-2 start
2022-04-18 17:37:19.863 INFO 28365 --- [ Thread-6] ConfigServletWebServerApplicationContext : Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@6580cfdd: startup date [Mon Apr 18 17:36:33 CST 2022]; root of context hierarchy
2022-04-18 17:37:19.865 INFO 28365 --- [ Thread-6] o.s.j.e.a.AnnotationMBeanExporter : Unregistering JMX-exposed beans on shutdown
Process finished with exit code 130 (interrupted by signal 2: SIGINT)
客户端响应
org.apache.http.NoHttpResponseException: localhost:8080 failed to respond。
根据关闭打印日志可以看到,关闭相关类AnnotationConfigEmbeddedWebApplicationContext,
父类AbstractApplicationContext提供了对优雅下线的扩展支持,Springboot 在关闭时,如果有请求没有响应完,
在不同的容器会出现不同的结果,实际上具体关闭处理是交给具体容器自己实现的。
贴一下关键源码:
/**
* Register a shutdown hook with the JVM runtime, closing this context
* on JVM shutdown unless it has already been closed at that time.
* <p>Delegates to {@code doClose()} for the actual closing procedure.
* @see Runtime#addShutdownHook
* @see #close()
* @see #doClose()
*/
@Override
public void registerShutdownHook() {
if (this.shutdownHook == null) {
// No shutdown hook registered yet.
this.shutdownHook = new Thread() {
@Override
public void run() {
synchronized (startupShutdownMonitor) {
doClose();
}
}
};
Runtime.getRuntime().addShutdownHook(this.shutdownHook);
}
}
/**
* Actually performs context closing: publishes a ContextClosedEvent and
* destroys the singletons in the bean factory of this application context.
* <p>Called by both {@code close()} and a JVM shutdown hook, if any.
* @see org.springframework.context.event.ContextClosedEvent
* @see #destroyBeans()
* @see #close()
* @see #registerShutdownHook()
*/
protected void doClose() {
if (this.active.get() && this.closed.compareAndSet(false, true)) {
if (logger.isInfoEnabled()) {
logger.info("Closing " + this);
}
LiveBeansView.unregisterApplicationContext(this);
try {
// Publish shutdown event.
publishEvent(new ContextClosedEvent(this));
}
catch (Throwable ex) {
logger.warn("Exception thrown from ApplicationListener handling ContextClosedEvent", ex);
}
// Stop all Lifecycle beans, to avoid delays during individual destruction.
try {
getLifecycleProcessor().onClose();
}
catch (Throwable ex) {
logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
}
// Destroy all cached singletons in the context's BeanFactory.
destroyBeans();
// Close the state of this context itself.
closeBeanFactory();
// Let subclasses do some final clean-up if they wish...
onClose();
this.active.set(false);
}
}
## 开启优雅停机, 如果不配置是默认IMMEDIATE, 立即停机
server.shutdown=graceful
## 优雅停机宽限期时间
spring.lifecycle.timeout-per-shutdown-phase=30s
相当于手动实现高版本功能,完美一点,可以做开关和关闭时间可配置化。
(1) Jetty 实现:
先看一下jetty核心类org.eclipse.jetty.server.Server两个关键的方法
/**
* Set a graceful stop time.
* The {@link StatisticsHandler} must be configured so that open connections can
* be tracked for a graceful shutdown.
* @see org.eclipse.jetty.util.component.ContainerLifeCycle#setStopTimeout(long)
*/
@Override
public void setStopTimeout(long stopTimeout)
{
super.setStopTimeout(stopTimeout);
}
/** Set stop server at shutdown behaviour.
* @param stop If true, this server instance will be explicitly stopped when the
* JVM is shutdown. Otherwise the JVM is stopped with the server running.
* @see Runtime#addShutdownHook(Thread)
* @see ShutdownThread
*/
public void setStopAtShutdown(boolean stop)
解决方法:Jetty 是用 StatisticsHandler 做请求数量监控,关闭应用时,请求数量从N变为O的过程其实就是优雅关闭。
@Bean
public EmbeddedServletContainerFactory jettyEmbeddedServletContainerFactory() {
JettyEmbeddedServletContainerFactory factory = new JettyEmbeddedServletContainerFactory();
factory.addServerCustomizers(server -> {
server.setStopAtShutdown(false);
StatisticsHandler statisticsHandler = new StatisticsHandler();
statisticsHandler.setHandler(server.getHandler());
server.setHandler(statisticsHandler);
server.setStopTimeout(30000); // 为了简单,写死30s
});
return factory;
}
(2) Tomcat 实现:
@Bean
public EmbeddedServletContainerCustomizer tomcatCustomizer() {
return container -> {
if (container instanceof TomcatEmbeddedServletContainerFactory) {
((TomcatEmbeddedServletContainerFactory) container).addConnectorCustomizers(new GracefulShutdown());
}
};
}
private static class GracefulShutdown implements TomcatConnectorCustomizer,
ApplicationListener<ContextClosedEvent> {
private static final Logger log = LoggerFactory.getLogger(GracefulShutdown.class);
private volatile Connector connector;
@Override
public void customize(Connector connector) {
this.connector = connector;
}
@Override
public void onApplicationEvent(ContextClosedEvent event) {
this.connector.pause();
Executor executor = this.connector.getProtocolHandler().getExecutor();
if (executor instanceof ThreadPoolExecutor) {
try {
ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
threadPoolExecutor.shutdown();
if (!threadPoolExecutor.awaitTermination(30, TimeUnit.SECONDS)) {
log.warn("Force shutdown ...");
}
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
}
}
评估每个服务停机处理完请求需要的下线时间
保证应用在下线指定时间内的请求都能正常处理并且响应调用方
手动对自定义线程资源优雅下线
业务处理线程资源释放示例:
@Bean
public ExecutorService bizExecutorService() {
ExecutorService executorService = Executors.newFixedThreadPool(10);
// shutdownAndAwaitTermination 可以参考guava线程池优雅关闭
Runtime.getRuntime().addShutdownHook(new Thread(() -> shutdownAndAwaitTermination(executorService, 10l, TimeUnit.SECONDS)));
return executorService;
}
对生产发布因未在指定时间内完成的业务请求进行监控,并优化性能
Graceful shutdown:
Graceful shutdown is supported with all four embedded web servers (Jetty, Reactor Netty, Tomcat, and Undertow) and with both reactive and Servlet-based web applications.
When enabled using server.shutdown=graceful, upon shutdown, the web server will no longer permit new requests and will wait for a grace period for active requests to complete.
The grace period can be configured using spring.lifecycle.timeout-per-shutdown-phase. Please see the reference documentation for further details.
Spring Boot 2.3.0.RELEASE引入了Graceful Shutdown的功能。其中应用在等待下线期间对待新请求的方式,取决于我们所使用的 Server 类型。根据官方文档Tomcat、Jetty 和 Reactor Netty将会在网络层面停止接收新的请求。Undertow 会继续接收新的请求,但立即会以 HTTP 503(服务不可用)来响应。
我们以 Tomcat 为例看一下是SpringBoot 2.3如何实现graceful shutdown的来看一下doShutdown的逻辑
org.springframework.boot.web.embedded.tomcat.GracefulShutdown#doShutdown
private void doShutdown(GracefulShutdownCallback callback) {
List<Connector> connectors = getConnectors();
connectors.forEach(this::close);
try {
for (Container host : this.tomcat.getEngine().findChildren()) {
for (Container context : host.findChildren()) {
while (isActive(context)) {
if (this.aborted) {
logger.info("Graceful shutdown aborted with one or more requests still active");
callback.shutdownComplete(GracefulShutdownResult.REQUESTS_ACTIVE);
return;
}
Thread.sleep(50);
}
}
}
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
logger.info("Graceful shutdown complete");
callback.shutdownComplete(GracefulShutdownResult.IDLE);
}
先是关闭掉所有的连接,在网络层停止接受请求,然后再等待所有请求处理完毕。
在微服务场景下问题似乎依旧存在…
总结一下一个 Spring Cloud 应用正常分批发布的流程
我们看到,当一个Spring Cloud服务端通过SpringBoot提供的graceful shutdown下线时,它会拒绝客户端新的请求,并且等待已经在处理的线程处理完成后,或者在配置的应用最长等待时间到了之后进行下线。
但是在服务端重启开始拒绝客户端新的请求的时刻开始,即执行了Connectors.stop开始,到客户端感知到服务端该实例下线这段时间内,客户端向该实例发起的所有请求都会被拒绝,从而引起服务调用异常。
如果客户端考虑增加重试能力,这一定程度上可以缓解发布过程中服务调用报错的问题,但是无法根本上保证下线过程的无损,如果服务调用报错期过程,或者分批发布时候同一批次下线的节点数过多,无法保证仅仅增加多次重试就能够调用到未下线的节点上。这不能根本解决问题!
同时需要考虑配置重试带来的业务上存在不幂等的风险。