This task will convert a Microsoft Word document from the docx format into the PSML format according to rules expressed in an XML file called Microsoft Word docx Import Config.
A list of the latest changes are available at github .
To use this Ant extension standalone outside PageSeeder you can download the
pso-docx-core-x.jar files from bintray .
Following are two examples of task definitions.
<import-docx src="[source]" />
<import-docx src="[source]" dest="[destination]" working="[working-directory]" mediafolder="[media folder name]" config="[config.xml] />
|Path of the source file to process. It should point to a docx file.||Yes|
|Path of the destination folder. If no value is specified, this defaults to the location of the source file.||No|
|The directory holding temporary files. Defaults to ||No|
|The name of the subfolder where images will be placed. Defaults to ||No|
|Path to ||No|
Invoking via the user interface
As part of the PageSeeder Upload interface, the
import-docx task is associated with files that have an extension of ".docx". After being uploaded into the PageSeeder loading zone, the following options become available:
- No processing – uploads the file as a docx document;
- Import document as PageSeeder PSML document – runs the
import-docxAnt task with the default configuration.
- Validate DOCX document – runs a Schematron validation on the docx file to ensure the contents can be converted.
In the Developer perspective, the following additional options are available:
- Remove original document – deletes the docx file after the transformation has run.
- Create subfolder for document – because a large docx file can translate into hundreds of PSML files, this option creates a folder to store the documents.
- Use external configuration – to override the default import settings, creating an XML file according to Microsoft Word docx Import Config and named the following:
Invoking via Ant
Typically this task is run through PageSeeder using the Task ps-upload-get to download the docx and Task ps-upload-put to upload the generated PSML and image files. Running this task without being connected to PageSeeder requires a <taskdef/> .
<project ... xmlns:psd="antlib:org.pageseeder.docx.ant"> <!-- only required for standalone --> <taskdef uri="antlib:org.pageseeder.docx.ant" resource="org/pageseeder/docx/ant/antlib.xml" classpath="pso-docx-ant-0.5.9.jar"/> <target ... > <!-- Invoke Task --> <psd:import-docx src="test.docx" dest="result"/> </target> </project>
Using a namespace is not required, but it is a good practice for documenting the task. The recommended namespace is 'psd'.
How does it work?
The Ant task above will do the following:
- Load the config file.
- Unzip the docx format.
- Transform the docx XML files into PSML text and organize the image references.
- Upload the content back to a PageSeeder group.
Created on , last edited on