Skip to main content

 Advanced

Advanced topics

Default Word Import Processing

To control the docx import process, use the Default config.

Following is the default import handling for docx files:

References document

The references document type has been designed to represent a book structure, with front matter, a table of contents and then a collection of documents.

  • If the Word document has content in the Document Property Title field, that content becomes the Document title of the PSML Reference document.
  • The Word document filename is the filename for the PSML Reference document.
  • If there is no Title in the Word document Properties, the docx filename becomes both the Document title AND the filename of the PSML References document.
  • Content in the Word document that is before the first heading 1 imports into the first section (front matter) of the Reference document.
  • The content of headings 1 and 2 in the Word document import as the content of cross-references in the second section of the Reference document.  These cross-references point to a collection of PSML component documents. This second section is equivalent to a Word Table of Contents.

Component documents

  • Each component PSML document has a PSML Document title and contains the content of its heading 1 (or heading 2) and any content following it the Word document until the next heading 1 or 2. The text of the PSML Document title is generated from the content in the first paragraph of each referenced document - being the content of the heading 1 or heading 2.
    • For example, given a Word document containing a heading 1 followed directly by a heading 2 then a paragraph and then a heading 3 and a list, this splits into a first PSML component document containing only the text of the heading 1. A second PSML component document contains the content of the heading 2, the following paragraph, the heading 3 and the list.

Fragments within component documents

  • Upon upload, the content of a heading 1 (or heading 2) in the Word document is in the first fragment of the component document.
  • Any content immediately following the heading imports into the second fragment, whether or not it is a heading 3 or lower, or any other content.
  • Content that is a heading 3 is in a new fragment and any content that follows the heading is in the same fragment until another heading 3 or a heading 4 is encountered. 
  • Content of a heading 4 and any content that follows the heading is in a new fragment. Heading 5 or lower headings and any content that follows them is in the same fragment as the parent heading 4.

Hyperlinks and cross-references

 

Created on , last edited on