Publishing

Publishing PageSeeder data to print, the Web or both

Microsoft Word docx – export config 

for creating a docx file from a PSML document

this is a companion technology to the Microsoft Word docx Import Config.

Overview

The Microsoft Word Export Config (word-export-config.xml) together with the Word Export Template (word-export-template.dotx) allow PSML data to be converted into the docx file format which can be used with Microsoft Word, Google Docs and other word processing and publishing tools.

Note

Many of Word's features are accessible through the Export Config, but changing the Export Config or the Template requires administrator access to the server.

Usage

To override the default config follow the steps below:

  1. Login to PageSeeder and select a project or group.
  2. Select the Developer Perspective from the top left of the page.
  3. To change the config for yourself ONLY go to a document, select Export, Create DocX and click Edit config. This is available to group managers only.
  4. To change the config for everyone in the project select Document config under the Dev menu and follow the steps below. This is available to project managers and administrators only.
  5. Click on the Publish configurations link at the top right of the page.
  6. To change the default config for all documents click create next to word-export-config.xml under Media types.
  7. To change the config for a specific PSML document type click Create document type, then click create next to word-export-config.xml for that type.

Note

When editing this config file in PageSeeder pressing ctrl-space will display autocomplete options to make editing easier.

Word export template file

A Word template is used to format documents exported from PageSeeder. The template contains formats that can be customized to meet specific visual requirements. Known as “styles”, the formats can be applied to:

  • Characters – equivalent to an inline label
  • Paragraphs – equivalent to a block label
  • Tables – can be categorized in PageSeeder by the use of the @role attribute,
  • Lists – typically a list style in Word will correspond to headings or block labels in PageSeeder.

The location of the template is:

publisher::WEB-INF/template/[project-name]/document/psml/
   export/word-export-template.dotx

Word export config file

This XML file must be placed in:

publisher::WEB-INF/template/[project-name]/
    document/psml/export/word-export-config.xml

The information that follows provides a description of the different ways that the export config file determines the transformation from PSML into docx format.

<core>

defines the properties for the exported docx files.

<core>
    <creator select="PageSeeder" />
    <description select="PageSeeder" />
    <subject select="PageSeeder Document" />
    <title select="Default Title" />
    <category select="No Category" />
    <version select="1.0" />
    <revision select="1" />
</core>

The transform tool only uses these values when selected through the PageSeeder interface:

export-word.png

The other two options are to use:

  • the core values of the template used (default option),
  • the manual editing interface.

Each of the elements below correspond to a Word core attribute mapped as follows:

<creator select="Pageseeder" />
  • <creator> is mapped to the Author property in Word,
<description select="Pageseeder" />
  • <description> is mapped to the Comment property in Word,
<subject select="Pageseeder Document" />
  • <subject> is mapped to the Subject property in Word,
<title select="Default Title" />
  • <title> is mapped to the Title property in Word,
<category select="No Category" />
  • <category> is mapped to the Category property in Word,
<version select="1.0" />
  • <version> is mapped to the Version property in Word,
<revision select="1" />
  • <revision> is mapped to the Revision property in Word,

<toc> (Table of Contents)

<toc generate="true" style="TOC2">
    <headings generate="true" select="1-9" />
    <!-- 1|1-2|5-9|etc from 1 up to 9-->
    <outline generate="true" select="1-9" />
    <!-- 1|1-2|5-9|etc from 1 up to 9-->
    <paragraph generate="false">
        <style value="[word style]" indent="[indent level]" />
        <!-- 
             any paragraph style defined in the document with the 
             corresponding TOC indent level  
         -->
    </paragraph>
</toc>

Element <toc> is not transformed. It acts as a placeholder for where to insert the Table of Contents field in the docx file.

Word will generate a Table of Contents (ToC) from any of the following types of objects:

  • <headings> – these are the native Word styles numbered 1 to 9 and can be used by setting the value of the @generate to "true". Any other value will not generate a ToC. The value of the  @select attribute can be an integer from 1 to 9, including groupings separated by a pipe character (regular expression syntax).
<headings generate="true" select="1-9" />

  • <outline> – can be invoked by setting the @generate attribute of <outline> to "true". Any other value will not generate a ToC. The value of the  @select attribute can be an integer from 1 to 9, including groupings separated by a pipe character (regular expression syntax).
<outline generate="true" select="1|3-6|8" />
  • <paragraph> – paragraph styles can be used inside the ToC by setting the @generate attribute of <paragraph> to "true". Any other value will not generate a ToC. The <paragraph> element contains <style> elements, each with a unique @value attribute that declares which paragraph styles will generate the ToC. An @indent attribute sets the level for the paragraph style in the ToC. hierarchy.
<paragraph generate="false">
    <style value="Heading 1" indent="1" />    
    <style value="Heading 2" indent="2" />
        <!-- 
             any paragraph style defined in the document with the 
             corresponding ToC indent level  
         -->
</paragraph>
  • Default behavior – by default all heading levels and outline levels are added to the ToC and the value of the @generate attribute on the <paragraph> element is set to "false".

<default>

This element is where the default conversion options are set.

<default>
    <defaultparagraphstyle wordstyle="Body Text" />
    <defaultcharacterstyle wordstyle="Default Paragraph Font" />
    <comments generate="false" />
    <mathml generate="false" />
    <master select="urititle"/>
</default>

Following are the valid options:

  • <defaultparagraphstyle> – where a <block> is not specifically mapped to a style name in the docx file, it will be transformed to a default style, usually "Body Text" but it can be the name of any style that exists in the Word Export Template. Where the docx file does not have a style that matches, the default style will be set to "Normal".
<defaultparagraphstyle style="Body Text"/>
  • <defaultcharacterstyle> – defines which style is applied to all inline elements that do not have  a personalized transformation. The default value is "Default Paragraph Font"<defaultcharacterstyle> can be any Word Character Style Name that exists in the Word Export Template. If a style is used that does not exist in this template, Word, by default, will reset the value.
<defaultcharacterstyle style="Default Paragraph Font"/>
  • <comments> – the <comments> element is used to add comments for each section, with a link to add a comment to PageSeeder section by the users' default mail engine. By default, this value is set to "false".
<comments generate="false"/>
  • <mathml> – converts any Math ML object that is referenced by the exported psml document. These will be converted back to Open Office math ml objects in word. By default, this value is set to "false".
<mathml generate="false"/>
  • <master> – used with the export of a master references word document. It adds the option of using uriids as the name of the file, or the uri title. By default, this value is set to "uriids".
<master select="uriids"/>
  • <xref> – if type="cross-reference" then cross references will be generated.
<xref type="cross-reference"/>

<elements>

Groups the options available to PSML elements when transforming from PSML to docx. These include:

  • <block>
  • <inline>
  • <table>
  • <heading>
  • <para>
  • <title>
  • <nlist>
  • <list>

They can be transformed specifically for a document with a certain label. This can be defined by setting the @label attribute inside a separate <elements> element.

<elements label="warranty">

If this is set the options defined under this element will only apply to documents with this label. In the example above, the options that would be defined under <elements> would only apply to elements under a document with label 'warranty'.

<block>
<block default="generate-ps-style">
    <label value="Abstract" wordstyle="Instructions"/>
    <label value="Prompt"   wordstyle="Prompt"/>
    <label value="generate-ps-style" wordstyle="Notes">
      <keep-paragraph-with-next />
    </label>
    <ignore label="Notes"/>
</block>

<block> element contains the handling of all <block> elements. The @default attribute is used to define what Paragraph Style will be applied to any <block> elements that have no transformation. It accepts two values:

  • "generate-ps-style" – is used to generate a word style for each of the labels found. It will generate a Word Paragraph Style named psblock[name-of-label].
  • "para" – is used to generate the default paragraphs set inside the configuration file.

Under <block>, two elements are valid:

  • <ignore> – means the content inside this block will not be transformed on export
  • <label> – then the contents inside this block will be transformed into the Word Style declared as the value of the attribute @wordstyle. The <keep-paragraph-with-next/> element can be used to keep this with the next content.
<inline>
<inline default="generate-ps-style">
    <label value="Optional" wordstyle="OptionalNormal"/>
    <ignore   label="Notes"/>
    <tab    label="TabLabel"/>
    <fieldcode  label="n" value="LISTNUM  LegalDefault \\l 1 \\s 2 "/>
</inline>

<inline> element contains the handling of all <inline> elements. @default attribute is used to define what Paragraph Style will be applied to any block elements that have no transformation. It accepts two options: "generate-ps-style" and any other value. 'generate-ps-style' is used to generate a word style for each of the labels found. It will generate a Word Character Style named psinline[name-of-label]. All other values will generate only text.

Under <inline> , four elements are accepted: <ignore>, <tab>, <fieldcode> and <label>. <ignore> is used to define ignored labels on transformation.

If a label name is defined under the attribute @label under <ignore>, then the contents inside this inline will not be transformed on transformation.

If a label name is defined under the attribute @value under <label>, then the contents inside this inline will be transformed into the Word Style selected under the attribute @wordstyle. The attribute @wordstyle under <label> can also contain "generate-ps-style" forcing the process to generate a unique style for that inline label.

If a label name is defined under the attribute @label under <tab>, then the contents inside this inline will be transformed to a word tab on output.

If a label name is defined under the attribute @label under <fieldcode>, then the contents inside this inline will be transformed to a word fieldcode on output. Currently only the LISTNUM word fieldcode can be used. The fieldcode generated will be the one set in the @value attribute. This is still in Beta version and should not be used in production servers.

<tables>
<tables>
   <table default="Table Normal">
     <width type="pct" value="5000"/>
   </table>
   <table role="pstablerole" tablestyle="Table Grid" />
   <table role="pstablerole 2" tablestyle="Table Grid 2" >
     <width type="dxa" value="5000"/>
   </table>
</tables>

<tables> element is used to define table styles for each PageSeeder table.

By default "Table Normal" is set as the default value.

Each individual table role can also be transformed using  a <table> element with a specific @role attribute. These tables will be set with the table style defined under @tablestyle attribute. The width of the table can also be set with the <width> element under each <table> element. This takes the @type of width to be set: dxa ( twentieth-of-point) and pct ( percent). In word, 100% is 5000 and 1 dxa is 15 twentieths-of-point.

<heading>
<heading>
    <level value="1" wordstyle="heading 1">
        <numbered select="true">
            <fieldcode regexp="%arabic%" type="SEQ" />
        </numbered>
    </level>
    <level value="2" wordstyle="heading 2">
        <prefix select="false">
            <fieldcode regexp="\d+\.%arabic%" type="SEQ" />
        </prefix>
    </level>
    <level value="3" wordstyle="heading 3">
    </level>
    <level value="4" wordstyle="heading 4">
        <numbered select="true">
        </numbered>
    </level>
    <level value="5" wordstyle="heading 5">
        <numbered select="true">
            <fieldcode regexp=
               "^heading-1^.^heading-2^.^heading-3^.^heading-4^.%arabic%"
                 type="SEQ" />
        </numbered>
        <keep-paragraph-with-next />
    </level>
    <level value="6" wordstyle="heading 6">
        <numbered select="true">
            <fieldcode regexp=
      "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.%arabic%"
                 type="SEQ" />
        </numbered>
        <keep-paragraph-with-next />
    </level>
 </heading>

The <heading> element defines each heading style in Word mapped from each level of heading in PageSeeder. By default each level is mapped to the corresponding heading in Word ( so level 1 to heading 1, level 2 to heading 2, etc.). Each PSML heading level can also be transformed into a unique style in the docx file.

The <keep-paragraph-with-next/> element can be used to keep this with the next content.

<para>
<para>
    <indent level="0" wordstyle="Body Text">
        <prefix select="true">            
            <fieldcode regexp="Note %arabic%" type="SEQ" /> 
        </prefix>
    </indent>
    <indent level="1" wordstyle="List Continue" >
        <numbered select="true">
            <fieldcode regexp=
  "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.%arabic%"
                 type="SEQ" />
        </numbered>
    </indent>
    <indent level="2" wordstyle="List Continue" >
        
    </indent>
    <indent level="3" wordstyle="List Continue" >
        
    </indent>
    <indent level="4" wordstyle="List Continue" >
       <keep-paragraph-with-next />
    </indent>
    <indent level="5" wordstyle="List Continue" >
        <numbered select="true">
           <fieldcode regexp=
  "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.^para-1^.%arabic%"
                type="SEQ" />
        </numbered>
    </indent>
    <indent level="6" wordstyle="List Continue" >
        <numbered select="true">
           <fieldcode regexp=
  "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.^para-1^.^para-5^.%arabic%"
                 type="SEQ" />
        </numbered>
        <keep-paragraph-with-next />
    </indent>
</para>

Element <para> is used to define the style for each indent level of a paragraph in PageSeeder. Using the @wordstyle attribute, each unique level of indent can be transformed into a unique style in the docx file.

If the paragraph is numbered, or has a prefix, it can be transformed into a fieldcode, or text at the output. This is defined by the <fieldcode> or <text> element under the corresponding indent/numbered or indent/prefix.

The <keep-paragraph-with-next/> element can be used to keep this with the next content.

<title>
<title wordstyle="heading 1"/>

Element <title> is used to define the Paragraph Style of each section title.

By default title is set to "heading 1". It can be transformed into any value that exists in the word export template.

<nlist>

Contains the list definitions for numbered lists inside of Word.

<nlist>
    <default>
        <level value="1" wordstyle="List Number" />
        <level value="2" wordstyle="List Number 2" />
        <level value="3" wordstyle="List Number 3" />
        <level value="4" wordstyle="List Number 4" />
        <level value="5" wordstyle="List Number 5" />
        <level value="6" wordstyle="List Number 6" />
    </default>
    <role value="numberedlist1">
        <level value="1" wordstyle="List Number A" />
        <level value="2" wordstyle="List Number A 2" />
        <level value="3" wordstyle="List Number A 3" />
        <level value="4" wordstyle="List Number A 4" />
        <level value="5" wordstyle="List Number A 5" />
        <level value="6" wordstyle="List Number A 6" />
    </role>
 </nlist>

<nlist> element contains each of the levels that can be each transformed into different word style paragraphs. If the @role attribute exists, it can also be associated with a word paragraph style on transformation using the <role> element.

<list>

Contains the list definitions for unnumbered lists inside of Word.

<list>
    <default>
        <level value="1" wordstyle="List Bullet" />
        <level value="2" wordstyle="List Bullet 2" />
        <level value="3" wordstyle="List Bullet 3" />
        <level value="4" wordstyle="List Bullet 4" />
        <level value="5" wordstyle="List Bullet 5" />
        <level value="6" wordstyle="List Bullet 6" />
    </default>
    <role value="unnumberedlist1">
        <level value="1" wordstyle="List Bullet A" />
        <level value="2" wordstyle="List Bullet A 2" />
        <level value="3" wordstyle="List Bullet A 3" />
        <level value="4" wordstyle="List Bullet A 4" />
        <level value="5" wordstyle="List Bullet A 5" />
        <level value="6" wordstyle="List Bullet A 6" />
    </role>
 </list>

The <list> element contains each of the levels that can be each transformed into different word style paragraphs. If the @role attribute exists, it can also be associated with a word paragraph style on transformation using the <role> element.

Created on , last edited on