事故描述,运行一段时间后,系统运行一段时间后变慢,gc和cpu都正常。直接贴样例代码
public static void main(String[] args) throws IOException { final ExecutorService executorService = Executors.newFixedThreadPool(100); for (int i = 0; i < 1000000; i++) { executorService.submit(()->{ final UUID uuid = UUID.randomUUID(); final Mono<Integer> log = Mono.just(1).log(uuid.toString()); log.subscribe(new Consumer<Integer>() { @SneakyThrows @Override public void accept(Integer integer) { System.out.println(integer); } }); }); } System.in.read(); }
我们采用log 和uuid 的方式实现在reactor框架下对请求的日志跟踪。也正是这个功能引起该次事故。
上述代码运行后,可以看到我们的线程state 几乎都是BLOCKED ,但并没有发现死锁
代码分析
public final Mono<T> log(@Nullable String category) { return log(category, Level.INFO); } public final Mono<T> log(@Nullable String category, Level level, SignalType... options) { return log(category, level, false, options); }
public final Mono<T> log(@Nullable String category, Level level, boolean showOperatorLine, SignalType... options) { SignalLogger<T> log = new SignalLogger<>(this, category, level, showOperatorLine, options); if (this instanceof Fuseable) { return onAssembly(new MonoLogFuseable<>(this, log)); } return onAssembly(new MonoLog<>(this, log)); }
SignalLogger(CorePublisher<IN> source, @Nullable String category, Level level, boolean correlateStack, SignalType... options) { this(source, category, level, correlateStack, Loggers::getLogger, options); }
SignalLogger(CorePublisher<IN> source, @Nullable String category, Level level, boolean correlateStack, Function<String, Logger> loggerSupplier, @Nullable SignalType... options) { this.source = Objects.requireNonNull(source, "source"); this.id = IDS.getAndIncrement(); this.fuseable = source instanceof Fuseable; if (correlateStack) { operatorLine = Traces.extractOperatorAssemblyInformation(Traces.callSiteSupplierFactory.get().get()); } else { operatorLine = null; } boolean generated = category == null || category.isEmpty() || category.endsWith("."); category = generated && category == null ? "reactor." : category; if (generated) { if (source instanceof Mono) { category += "Mono." + source.getClass() .getSimpleName() .replace("Mono", ""); } else if (source instanceof ParallelFlux) { category += "Parallel." + source.getClass() .getSimpleName() .replace("Parallel", ""); } else { category += "Flux." + source.getClass() .getSimpleName() .replace("Flux", ""); } category += "." + id; } 此处获取log this.log = loggerSupplier.apply(category); this.level = level; if (options == null || options.length == 0) { this.options = ALL; } else { int opts = 0; for (SignalType option : options) { if (option == SignalType.CANCEL) { opts |= CANCEL; } else if (option == SignalType.CURRENT_CONTEXT) { opts |= CONTEXT_PARENT; } else if (option == SignalType.ON_SUBSCRIBE) { opts |= ON_SUBSCRIBE; } else if (option == SignalType.REQUEST) { opts |= REQUEST; } else if (option == SignalType.ON_NEXT) { opts |= ON_NEXT; } else if (option == SignalType.ON_ERROR) { opts |= ON_ERROR; } else if (option == SignalType.ON_COMPLETE) { opts |= ON_COMPLETE; } else if (option == SignalType.SUBSCRIBE) { opts |= SUBSCRIBE; } else if (option == SignalType.AFTER_TERMINATE) { opts |= AFTER_TERMINATE; } } this.options = opts; } }
获取logger
总结: 由于我们的traceId 采用uuid 生成,每一次都不一样。每次调用log方法时,LoggerContext都先从Map<String, Logger> loggerCache中查询,没有的话进入rootLogger的同步代码块,在同步代码块中先对rootlog 的childrenList遍历寻找name相同的childLogger .? 由于调用的累计?childrenList越来越大,每次遍历时长越来越长,rootlog的锁持有时间越来越长,最后导致线程Blocked。并且并没有发现对childrenList的回收操作。导致内存越来越大。