因为业务需求以及网上的解决方案不完整,花了两天时间研究出一行代码,所以写下此文就当2023与2024之间的承上启下之作了。(代码手打,有错自己改,狗头保命)
在网上搜索java spring中对于文档的合并输出,解决方案不外乎
public class DocxService {
private static final String CONTENT_TYPE = "application/vnd.openxmlformats-officedocument.wordprocessingml.document";
public InputStream mergeDocx(final List<InputStream> streams) throws Docx4JException, IOException {
WordprocessingMLPackage target = null;
final File generated = File.createTempFile("generated", ".docx");
int chunkId = 0;
Iterator<InputStream> it = streams.iterator();
while (it.hasNext()) {
InputStream is = it.next();
if (is != null) {
if (target == null) {
// Copy first (master) document
OutputStream os = new FileOutputStream(generated);
os.write(IOUtils.toByteArray(is));
os.close();
target = WordprocessingMLPackage.load(generated);
} else {
// Attach the others (Alternative input parts)
insertDocx(target.getMainDocumentPart(), IOUtils.toByteArray(is), chunkId++);
}
}
}
if (target != null) {
target.save(generated);
return new FileInputStream(generated);
} else {
return null;
}
}
private static void insertDocx(MainDocumentPart main, byte[] bytes, int chunkId) {
try {
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/part" + chunkId + ".docx"));
afiPart.setContentType(new ContentType(CONTENT_TYPE));
afiPart.setBinaryData(bytes);
Relationship altChunkRel = main.addTargetPart(afiPart);
CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
chunk.setId(altChunkRel.getId());
main.addObject(chunk);
} catch (Exception e) {
e.printStackTrace();
}
}
}
以及自己编写相应的代码,但是需要懂docx4j的运用以及docx解压包之后的xml引用的原理,这里就不赘述了。
因为具体需求,直接跳过基础转入合并部分:
public void mergeFile(WordprocessingMLPackage wordMLP, WordprocessingMLPackage wordMLToP) {
try {
//通过xpath获取docx中w:body的正文节点
List<Object> bodies = wordMLToP.getMainDocumentPart().getJAXBNodesViaXPath("//w:body",false);
//对于多个body逐次遍历加入,这里的样式默认与主文档有关
for (Object bodyObject : bodies ) {
Body body = (Body) bodyObject;
for (Object content : body.getContent()) wordMLP.getMainDocumentPart().addObject(content);
}
} catch (Exception e) {
throw TechnicalException(e.getMessage())
}
}
但是这里的代码只考虑到了body部分,并没有考虑到docx中的relashionship中rId的重复和资源不能引入的问题,最后需要合并的文档也并没有做到另启一页。
private static final ObjectFactory objectFactory = new ObjectFactory();
void addPageBreak(MainDocumentPart dp) {
P paragraph = objectFactory.createP();
R run = objectFactory.createR();
paragraph.getContent().add(run);
Br br = objectFactory.createBr();
run.getContent().add(br);
br.setType(org.docx4j.wml.STBrType.PAGE);
documentPart.setObject(paragraph);
}
因为文档需要,一些标题段前需要svg进行修饰,网上目前给到的方案如下:
可以参考java - Merge word(docx) documents with DOCX4J: how to copy images? - Stack Overflow
List<Object> blips = s.getMainDocumentPart().getJAXBNodesViaXPath("//a:blip", false);
for (Object el : blips) {
try {
CTBlip blip = (CTBlip) el;
RelationshipsPart parts = s.getMainDocumentPart().getRelationshipsPart();
Relationship rel = parts.getRelationshipByID(blip.getEmbed());
Part part = parts.getPart(rel);
if (part instanceof ImagePngPart)
System.out.println(((ImagePngPart) part).getBytes());
if (part instanceof ImageJpegPart)
System.out.println(((ImageJpegPart) part).getBytes());
if (part instanceof ImageBmpPart)
System.out.println(((ImageBmpPart) part).getBytes());
if (part instanceof ImageGifPart)
System.out.println(((ImageGifPart) part).getBytes());
if (part instanceof ImageEpsPart)
System.out.println(((ImageEpsPart) part).getBytes());
if (part instanceof ImageTiffPart)
System.out.println(((ImageTiffPart) part).getBytes());
Relationship newrel = f.getMainDocumentPart().addTargetPart(part, AddPartBehaviour.RENAME_IF_NAME_EXISTS);
blip.setEmbed(newrel.getId());
f.getMainDocumentPart().addTargetPart(s.getParts().getParts().get(new PartName("/word/" + rel.getTarget())));
} catch (Exception ex) {
ex.printStackTrace();
}
}
这个代码中间的if可以删去,但是因为是对a:blip的全文搜索,所以对svg的引用一点作用都没有。即使用此段代码后,虽然media资源都被加入、引用都被覆写,但是因为document.xml中asvg:svgBlip对于r:embed的引用依然生效,所以合并后的media索引依然会被之前选择合并到的文档索引覆盖,就是在_rels目录下document.xml.rels会出现对于用一个id的重复指向。为了解决这一问题需要重置a:blip节点下的a:extList子节点。也就是在原来的答案代码中多加入一行代码:
...
blip.setEmbed(newrel.getId());
blip.setExtList(null);
...
至此就可以得到非常完好的合并wordprocessingpackage了。