Publishing

Publishing PageSeeder data to print, the Web or both

Microsoft Word docx – export config 

For creating a docx file from a PSML document. This is companion technology to the Import Microsoft Word docx config.

An in-progress replacement for this page is Word docx export schema reference.

Overview

The Microsoft Word Export Config (word-export-config.xml) together with the Word Export Template (word-export-template.dotx) allow PSML data to be converted into the docx file format which can be used with Microsoft Word, Google Docs and other word processing and publishing tools.

Note

Many of Word's features are accessible through the Export Config, but changing the Export Config or the Template requires administrator access to the server.

Usage

To override the default config follow the steps below:

  1. Login to PageSeeder and select a project or group.
  2. Select the Developer Perspective from the top left of the page.
  3. To change the config for yourself ONLY go to a document, select Export, Create DocX and click Edit config. This is available to group managers only.
  4. To change the config for everyone in the project select Document config under the Dev menu and follow the steps below. This is available to project managers and administrators only.
  5. Click on the Publish configurations link at the top right of the page.
  6. To change the default config for all documents click create below word-export-config.xml under Media types.
  7. To change the config for a specific PSML document type click Create document type, then click create below word-export-config.xml for that type.

Note

When editing this config file in PageSeeder pressing ctrl-space will display autocomplete options to make editing easier.

Word export template file

A Word template is used to format documents exported from PageSeeder. The template contains formats that can be customized to meet specific visual requirements. Known as “styles”, the formats can be applied to:

  • Characters – equivalent to an inline label
  • Paragraphs – equivalent to a block label
  • Tables – can be categorized in PageSeeder by the use of the @role attribute,
  • Lists – typically a list style in Word will correspond to headings or block labels in PageSeeder.

The location of the default template for a project is:

publisher::WEB-INF/template/[project-name]/document/psml/
   export/word-export-template.docx

To customize the word template, first download the default template from your project on PageSeeder by going to Dev > Document config, clicking Publish configurations and then download under Word export template.

Modify the styles in Word and save the changes. Then upload the template via the Publish configurations page by clicking create under Word export template.

Note

If creating a template from scratch (i.e. not modifying the default template) you must create a new numbered list style with each level linked to the corresponding List Number style (i.e. List Number, List Number 2, List Number 3, etc.). You must also create a new bullet list style with each level linked to the corresponding List Bullet style (i.e. List Bullet, List Bullet 2, List Bullet 3, etc.).
To do this use the Define New List Style option and then select Format > Numbering.  Use The More button on the Modify Multilevel list dialog to see the Link level to style option.

Word export config file

The location of the default config for a project is:

publisher::WEB-INF/template/[project-name]/
    document/psml/export/word-export-config.xml

The information that follows provides a description of the different ways that the export config file determines the transformation from PSML into docx format.

Note

When specifying word styles that contain a comma (separating names), only the name before the first comma should be used in the config file. For example style "My heading,Special" can be referenced by:

<level value="1" wordstyle="My heading">

<core>

The element defines the Word properties for the exported docx file.

  <core>
    <creator select="[ps-current-user]" />
    <description select="[ps-document-description]" />
    <title select="[ps-document-title]" />
    <!-- <modified select="[ps-document-modified]" /> -->
    <created select="[ps-current-date]" />
    <keywords select="[ps-document-labels]" />
    <subject select="" />
    <category select="" />
    <version select="1.0" />
    <revision select="1" />
  </core>

The following tokens can be used for the @select of any of the properties above and will be substituted with the correspond value from PageSeeder.

TokenValue
[ps-current-user]The full name of the current user (requires current-user parameter on export task)
[ps-document-description]The PSML document description
[ps-document-title]The PSML document title
[ps-document-created]The PSML document created date
[ps-document-modified]The PSML document modified date
[ps-current-date]The current date
[ps-document-labels]The PSML document labels

The export only uses these values when 'Config' is selected through the PageSeeder interface or the manual-core parameter is set to "Config":

export-word.png

The other two options are to use:

  • the core values of the template used (default option),
  • the manual editing interface.

Each of the elements below correspond to a Word core attribute mapped as follows:

<creator select="Pageseeder" />
  • <creator> is mapped to the Author property in Word,
<description select="Pageseeder" />
  • <description> is mapped to the Comment property in Word,
<subject select="Pageseeder Document" />
  • <subject> is mapped to the Subject property in Word,
<title select="Default Title" />
  • <title> is mapped to the Title property in Word,
<category select="No Category" />
  • <category> is mapped to the Category property in Word,
<version select="1.0" />
  • <version> is mapped to the Version property in Word,
<revision select="1" />
  • <revision> is mapped to the Revision property in Word,

<toc> (Table of Contents)

<toc generate="true" style="TOC2">
    <headings generate="true" select="1-9" />
    <!-- 1|1-2|5-9|etc from 1 up to 9-->
    <outline generate="true" select="1-9" />
    <!-- 1|1-2|5-9|etc from 1 up to 9-->
    <paragraph generate="false">
        <style value="[word style]" indent="[indent level]" />
        <!-- 
             any paragraph style defined in the document with the 
             corresponding TOC indent level  
         -->
    </paragraph>
</toc>

Element <toc> is not transformed. It acts as a placeholder for where to insert the Table of Contents field in the docx file.

Word will generate a Table of Contents (ToC) from any of the following types of objects:

  • <headings> – these are the native Word styles numbered 1 to 9 and can be used by setting the value of the @generate to "true". Any other value will not generate a ToC. The value of the  @select attribute can be an integer from 1 to 9, including groupings separated by a pipe character (regular expression syntax).
<headings generate="true" select="1-9" />

  • <outline> – can be invoked by setting the @generate attribute of <outline> to "true". Any other value will not generate a ToC. The value of the  @select attribute can be an integer from 1 to 9, including groupings separated by a pipe character (regular expression syntax).
<outline generate="true" select="1|3-6|8" />
  • <paragraph> – paragraph styles can be used inside the ToC by setting the @generate attribute of <paragraph> to "true". Any other value will not generate a ToC. The <paragraph> element contains <style> elements, each with a unique @value attribute that declares which paragraph styles will generate the ToC. An @indent attribute sets the level for the paragraph style in the ToC. hierarchy.
<paragraph generate="false">
    <style value="Heading 1" indent="1" />    
    <style value="Heading 2" indent="2" />
        <!-- 
             any paragraph style defined in the document with the 
             corresponding ToC indent level  
         -->
</paragraph>
  • Default behavior – by default all heading levels and outline levels are added to the ToC and the value of the @generate attribute on the <paragraph> element is set to "false".

<default>

This element is where the default conversion options are set.

<default>
    <defaultparagraphstyle wordstyle="Body Text" />
    <defaultcharacterstyle wordstyle="Default Paragraph Font" />
    <comments generate="false" />
    <mathml generate="false" />
    <citations documenttype="bibliography" pageslabel="pages" />
    <endnotes documenttype="endnotes" />
    <footnotes documenttype="footnotes" />
    <xrefs hyperlinkstyle="PS Hyperlink" referencestyle="PS Reference"/>
</default>

Note

All these elements are optional but must be in the above order when present.

Following are the valid options:

  • <defaultparagraphstyle> – where a <block> is not specifically mapped to a style name in the docx file, it will be transformed to a default style, usually "Body Text" but it can be the name of any style that exists in the Word Export Template. Where the docx file does not have a style that matches, the default style will be set to "Normal". This will have no effect if block/@default described below is set.
<defaultparagraphstyle style="Body Text"/>
  • <defaultcharacterstyle> – defines which style is applied to all inline elements that do not have specific mapping to a style. The default value is "Default Paragraph Font"<defaultcharacterstyle> can be any Word Character Style Name that exists in the Word Export Template. If a style is used that does not exist in this template, Word, by default, will reset the value. This will have no effect if inline/@default described below is set.
<defaultcharacterstyle style="Default Paragraph Font"/>
  • <comments> – the <comments> element is used to add comments for each section, with a link to add a comment to PageSeeder section by the users' default mail engine. By default, this value is set to "false".
<comments generate="false"/>
  • <mathml> – converts any Math ML object that is referenced by the exported psml document. These will be converted back to Open Office math ml objects in word. By default, this value is set to "false".
<mathml generate="false"/>
  • <citations> – XRefs to properties fragments in a document with type @documenttype will become DocX citations and a DocX bibliography will be inserted after this document's <section id="title"> content. If there is an inline label @pageslabel directly after the XRef it's content will become the citation page or page range (e.g. <xref documenttype="bibliography" ... >...</xref><inline label="pages">10 - 20</inline>). Requires additional post-processing and pso-docx version 0.7.4 or later.
<citations documenttype="bibliography" pageslabel="pages"/>
  • <endnotes> – XRefs to fragments under <section id="content"> in a document with type @documenttype will become DocX endnotes. This document must be embedded in the publication but will be ignored. Requires pso-docx version 0.7.4 or later.
<endnotes documenttype="endnotes"/>
  • <footnotes> – XRefs to fragments under <section id="content"> in a document with type @documenttype will become DocX footnotes. This document must be embedded in the publication but will be ignored. Requires pso-docx version 0.7.4 or later.
<footnotes documenttype="footnotes"/>
  • <xrefs> – if type="cross-reference" then XRefs will become DocX REF instead of HYPERLINK. This allows the 'Update field' option in Word to be used so that the link text can be updated when pointing to numbered list items.

    As of version 0.7.0 XRefs with @display="template" and @title containing {parentnumber} or {prefix} will become DocX REF even without setting @type above. Also <xrefs> can have the following attributes:
    • @hyperlinkstyle: The word style for XRefs pointing outside the document (default PS Hyperlink).
    • @referencestyle: The word style for XRefs pointing within the document (default PS Reference).
 <xrefs type="cross-reference"
       hyperlinkstyle="My Hyperlink"
       referencestyle="My Reference" />

Note

In the word-export-template.docx any hyperlink or reference styles must be character only style.

<elements>

Groups the options available to PSML elements when transforming from PSML to docx. These include:

  • <block>
  • <inline>
  • <table>
  • <heading>
  • <para>
  • <title>
  • <nlist>
  • <list>
  • <preformat>
  • <xref>

They can be transformed specifically for a document with a certain label. This can be defined by setting the @label attribute inside a separate <elements> element.

<elements label="warranty">

If this is set the options defined under this element will only apply to documents with this label. In the example above, the options that would be defined under <elements> would only apply to elements under a document with label 'warranty'.

<block>
<block default="generate-ps-style">
    <label value="Abstract" wordstyle="Instructions"/>
    <label value="Prompt"   wordstyle="Prompt"/>
    <label value="generate-ps-style" wordstyle="Notes">
      <keep-paragraph-with-next />
    </label>
    <ignore label="Notes"/>
</block>

<block> element contains the handling of all <block> elements. The @default attribute is used to define what Paragraph Style will be applied to any <block> elements that have no transformation. It accepts two values:

  • "generate-ps-style" – is used to generate a word style for each of the labels found. It will generate a Word Paragraph Style named ps_blk_[name-of-label].
  • "para" – is used to generate the default paragraphs set inside the configuration file.

Under <block>, two elements are valid:

  • <ignore> – means the content inside this block will not be transformed on export
  • <label> – then the contents inside this block will be transformed into the Word Style declared as the value of the attribute @wordstyle.

    The attribute @wordstyle under <label> can also contain "generate-ps-style" forcing the process to generate a unique style for that block label. The <keep-paragraph-with-next/> element can be used to keep this with the next content.

<inline>
<inline default="generate-ps-style">
    <label value="Optional" wordstyle="OptionalNormal"/>
    <ignore   label="Notes"/>
    <tab    label="TabLabel"/>
    <fieldcode  label="n" value="LISTNUM  LegalDefault \\l 1 \\s 2 "/>
</inline>

<inline> element contains the handling of all <inline> elements. @default attribute is used to define what Paragraph Style will be applied to any block elements that have no transformation. It accepts two options: "generate-ps-style" and any other value. 'generate-ps-style' is used to generate a word style for each of the labels found. It will generate a Word Character Style named ps_inl_[name-of-label]. All other values will generate only text.

Under <inline> , four elements are accepted: <ignore>, <tab>, <fieldcode> and <label>. <ignore> is used to define ignored labels on transformation.

If a label name is defined under the attribute @label under <ignore>, then the contents inside this inline will not be transformed on transformation.

If a label name is defined under the attribute @value under <label>, then the contents inside this inline will be transformed into the Word Style selected under the attribute @wordstyle. The attribute @wordstyle under <label> can also contain "generate-ps-style" forcing the process to generate a unique style for that inline label.

If a label name is defined under the attribute @label under <tab>, then the contents inside this inline will be transformed to a word tab on output.

If a label name is defined under the attribute @label under <fieldcode>, then the contents inside this inline will be transformed to a word fieldcode on output. The fieldcode generated will be the one set in the @value attribute.

<tables>
<tables>
   <table default="PS Table" headstyle="PS Table Header" bodystyle="PS Table Body" />>
     <width type="pct" value="5000"/>
   </table>
   <table role="pstablerole" tablestyle="Table Grid"
       headstyle="My Table Header" bodystyle="My Table Body"/>
   <table role="pstablerole2" tablestyle="Table Grid 2" >
     <width type="dxa" value="5000"/>
   </table>
</tables>

<tables> element is used to define table styles for each PageSeeder table.

In the above example by default "PS Table" is set as the default table style, "PS Table Header" as the default paragraph style for header cells and "PS Table Body" as the default paragraph style for other cells.

Each individual table role can also be transformed using  a <table> element with a specific @role attribute. These tables will be set with the corresponding styles defined under @tablestyle@headstyle and @bodystyle attributes. The width of the table can also be set with the <width> element under each <table> element. This takes the @type of width to be set: dxa ( twentieth-of-point) and pct ( percent). In word, 100% is 5000 and 1 dxa is 15 twentieths-of-point.

<heading>
<heading>
    <level value="1" numbered="true" wordstyle="heading 1">
        <numbered select="true">
            <fieldcode regexp="%arabic%" type="SEQ" />
        </numbered>
    </level>
    <level value="2" wordstyle="heading 2">
        <prefix select="true" separator="space">
            <fieldcode regexp="\d+\.%arabic%" type="SEQ" />
        </prefix>
    </level>
    <level value="3" wordstyle="heading 3">
    </level>
    <level value="4" wordstyle="heading 4">
        <numbered select="true">
        </numbered>
    </level>
    <level value="5" wordstyle="heading 5">
        <numbered select="true">
            <fieldcode regexp=
               "^heading-1^.^heading-2^.^heading-3^.^heading-4^.%arabic%"
                 type="SEQ" />
        </numbered>
        <keep-paragraph-with-next />
    </level>
    <level value="6" wordstyle="heading 6">
        <numbered select="true">
            <fieldcode regexp=
      "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.%arabic%"
                 type="SEQ" />
        </numbered>
        <keep-paragraph-with-next />
    </level>
 </heading>

The <heading> element defines each heading style in Word mapped from each level of heading in PageSeeder. By default each level is mapped to the corresponding heading in Word ( so level 1 to heading 1, level 2 to heading 2, etc.). Each PSML heading level can also be transformed into a unique style in the docx file.

As of pso-docx version 0.7.2 the @numbered and @prefixed on <level> can be used to apply different word styles to PSML <heading> with @prefix or @numbered="true".

If the heading is numbered, or has a prefix, it can be transformed into a fieldcode in the output. This is defined by the <fieldcode> element under the corresponding level/numbered or level/prefix. NOTE: The level/@prefixed attribute is ignored in this case.

As of pso-docx version 0.6.2 the <prefix> element can have a @separator="[tab|space|none]" to define the character between the prefix and the heading text (tab is the default) but if it has @select="false" the prefix and separator will not be output.

The <keep-paragraph-with-next/> element can be used to keep this with the next content. NOTE: The level/@prefixed attribute is ignored in this case.

Note

Word styles "heading [x]" must be in lower case but all other word styles must match the case in the docx template (e.g. "Heading Numbered 1").

<para>
<para>
    <indent level="0" wordstyle="Body Text">
        <prefix select="true" separator="space">            
            <fieldcode regexp="Note %arabic%" type="SEQ" /> 
        </prefix>
    </indent>
    <indent level="1" wordstyle="List Continue" >
        <numbered select="true">
            <fieldcode regexp=
  "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.%arabic%"
                 type="SEQ" />
        </numbered>
    </indent>
    <indent level="2" wordstyle="List Continue" >
        
    </indent>
    <indent level="3" wordstyle="List Continue" >
        
    </indent>
    <indent level="4" wordstyle="List Continue" >
       <keep-paragraph-with-next />
    </indent>
    <indent level="5" wordstyle="List Continue" >
        <numbered select="true">
           <fieldcode regexp=
  "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.^para-1^.%arabic%"
                type="SEQ" />
        </numbered>
    </indent>
    <indent level="6" wordstyle="List Continue" >
        <numbered select="true">
           <fieldcode regexp=
  "^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.^para-1^.^para-5^.%arabic%"
                 type="SEQ" />
        </numbered>
        <keep-paragraph-with-next />
    </indent>

    <!-- Prefixed paragraphs -->
    <indent level="1" prefixed="true" wordstyle="List Manual"/>
    <indent level="2" prefixed="true" wordstyle="List Manual 2"/>
    <indent level="3" prefixed="true" wordstyle="List Manual 3"/>

    <!-- Numbered paragraphs -->
    <indent level="1" numbered="true" wordstyle="List Number"/>
    <indent level="2" numbered="true" wordstyle="List Number 2"/>
    <indent level="3" numbered="true" wordstyle="List Number 3"/>
</para>

Element <para> is used to define the style for each indent level of a paragraph in PageSeeder. Using the @wordstyle attribute, each unique level of indent can be transformed into a unique style in the docx file.

As of pso-docx version 0.6.1 the @numbered and @prefixed on <indent> can be used to apply different word styles to PSML <para> with @prefix or @numbered="true".

If the paragraph is numbered, or has a prefix, it can be transformed into a fieldcode in the output. This is defined by the <fieldcode> element under the corresponding indent/numbered or indent/prefix. NOTE: The indent/@prefixed attribute is ignored in this case.

As of pso-docx version 0.6.2 the <prefix> element can have a @separator="[tab|space|none]" to define the character between the prefix and the paragraph text (tab is the default) but if it has @select="false" the prefix and separator will not be output.

The <keep-paragraph-with-next/> element can be used to keep this with the next content. NOTE: The indent/@prefixed attribute is ignored in this case.

<title>
<title wordstyle="heading 1"/>

Element <title> is used to define the Paragraph Style of each section title.

By default title is set to "heading 1". It can be transformed into any value that exists in the word export template.

<nlist>

Contains the list definitions for numbered lists inside of Word.

<nlist liststyle="Numbered List"/>
  <role value="highlight" liststyle="Highlighted Numbered List" />
</nlist>

As of pso-docx version 0.6.1 the @liststyle on <nlist> and <role> can be used to apply a specific word list style. The <default> and <level> elements are no longer supported under these elements.

If the PSML list has a @role attribute, it can be associated with a word list style using the <role> element.

<list>

Contains the list definitions for unnumbered lists inside of Word.

<list liststyle="Bulleted List">
  <role value="highlight" liststyle="Highlighted Bulleted List" />
</list>

As of pso-docx version 0.6.1 the @liststyle on <list> and <role> can be used to apply a specific word list style. The <default> and <level> elements are no longer supported under these elements.

If the PSML list has a @role attribute, it can be associated with a word list style using the <role> element.

Note

The @type on <list> and <nlist> in PSML is ignored as it could clash with the default and role based word styles. To alert editors to this in PageSeeder the following custom CSS could be added:

.psml-content ul[data-type]:before, ol[data-type]:before {
    color: white;
    content:"LIST TYPE IS NOT SUPPORTED BY WORD EXPORT";
    background: red;
    border-radius: 2px;
    padding: 1px 4px;
}

.psml-content ul[data-type] li, ol[data-type] li {
    color: red
}

<listpara>

Contains style for a <para> inside a <list> or <nlist> that is not the first <para>. The @value corresponds to the nesting level of the <list> or <nlist>. Requires pso-docx version 0.7.5 or later.

    <listpara>
      <level value="1" wordstyle="List Continue" />
      <level value="2" wordstyle="List Continue 2" />
      <level value="3" wordstyle="List Continue 3" />
      <level value="4" wordstyle="List Continue 4" />
      <level value="5" wordstyle="List Continue 5" />
      <level value="6" wordstyle="List Continue 6" />
    </listpara>
<preformat>

Contains style for <preformat> elements.

<preformat wordstyle="HTML Preformatted"/>
<xref>

Contains styles for specific types of cross-references (the child elements are optional but must appear in the order below). Requires pso-docx version 0.7.4 or later.

<xref>
  <citation referencestyle="My Citation Reference" />
  <endnote textstyle="My Endnote Text" referencestyle="My Endnote Reference" />
  <footnote textstyle="My Footnote Text" referencestyle="My Footnote Reference" />
  <xrefconfig name="field"
      hyperlinkstyle="My Field Link" referencestyle="My Field Reference" />
  <xrefconfig name="term"
      hyperlinkstyle="My Term Link" />
</xref>

For <citation> the text style is always Bibliography and the default @referencestyle is the default character style (see <default> above).

For <endnote> the default @textstyle is Endnote Text and the default  @referencestyle is Endnote Reference.

For <footnote> the default @textstyle is Footnote Text and the default @referencestyle is Footnote Reference.

There may be multiple <config> elements and the @name refers to the @config on the XRef. The default @hyperlinkstyle and @referencestyle are defined in the <default> element above. Requires pso-docx version 0.7.6 or later.

Note

In the word-export-template.docx any hyperlink or reference styles must be character only style.

Created on , last edited on