This task converts a Microsoft Word document from the docx format into the PSML format according to rules expressed in an XML file called word-import-config.xml.
A list of the latest changes are available at github .
To use this Ant extension standalone outside PageSeeder you can download the
pso-docx-core-x.jar files from bintray .
Following are two examples of task definitions.
<import-docx src="[source]" />
<import-docx src="[source]" dest="[destination]" working="[working-directory]" mediafolder="[media folder name]" componentfolder="[component folder name]" config="[config.xml]" />
|src||Path of the source file to process. It must point to a docx file||Yes|
|dest||Path of the destination folder or file. If no value is specified, this defaults to the folder of the source file. If it doesn’t end with ‘.psml’, it is treated as a folder and the destination filename is the source filename in lowercase with spaces changed to underscores.||No|
|working||The folder holding temporary files. Defaults to ||No|
|componentfolder||The name of the subfolder where documents referenced by the main document is placed (||No|
|mediafolder||The name of the subfolder where images are placed. Defaults to ||No|
|config||Path to ||No|
Invoking through the user interface
As part of the PageSeeder Upload interface, the
import-docx task is associated with files that have an extension of “.docx”. After being uploaded into the PageSeeder loading zone, the following options become available:
- No processing – uploads the file as a docx document.
- Import document as PageSeeder PSML document – runs the
import-docxAnt task with the default configuration.
- Validate DOCX document – runs a Schematron validation on the docx file to ensure the contents can be converted.
In the Developer perspective, the following additional options are available:
- Remove original document – deletes the docx file after the transformation has run.
- Create subfolder for document – because a large docx file can translate into hundreds of PSML files, this option creates a folder to store the documents.
- Use external configuration – to override the default import settings, creating an XML file according to Import Microsoft Word DOCX config usage and named the following:
Invoking using Ant
Typically, this task is run through PageSeeder using the Task ps-upload-get (deprecated) to download the docx and Task ps-upload-put (deprecated) to upload the generated PSML and image files. Running this task without being connected to PageSeeder requires a <taskdef/> .
<project ... xmlns:psd="antlib:org.pageseeder.docx.ant"> <!-- only required for standalone --> <taskdef uri="antlib:org.pageseeder.docx.ant" esource="org/pageseeder/docx/ant/antlib.xml" classpath="pso-docx-ant-0.5.9.jar"/> <target ... > <!-- Invoke Task --> <psd:import-docx src="test.docx" dest="result"/> </target> </project>
Using a namespace is not required, but it is a good practice for documenting the task. The recommended namespace is ‘psd’.
How does it work?
The Ant task does the following:
- Loads the config file.
- Unzips the docx format.
- Transforms the docx XML files into PSML text and organize the image references.
- Uploads the content back to a PageSeeder group.