Publishing

Publishing PageSeeder data to print, the Web or both

Microsoft Word docx Export Config 

for generating docx file format from PMSL

this functionality is a companion to the Microsoft Word docx Import Config.

Overview

The Microsoft Word Export Config (word-export-config.xml) together with the Word Export Template (word-export-template.dotx) allow PSML data to be converted into the docx file format which can be used with Microsoft Word, Google Docs and other word processing and publishing tools.

Note

Many of Word's features are accessible through the Export Config, but changing the Export Config or the Template requires administrator access to the server.

Usage

To override the PageSeeder default config in PageSeeder follow the steps below:

  1. Login to PageSeeder and select a project or group.
  2. Select the Developer Perspective from the top left of the page.
  3. To change the config for yourself ONLY go to a document, select Export, Create DocX and click Edit config. This is available to group managers only.
  4. To change the config for everyone in the project select Document config under the Dev menu and follow the steps below. This is available to project managers and administrators only.
  5. Click on the Publish configurations link at the top right of the page.
  6. To change the default config for all documents click create next to word-export-config.xml under Media types.
  7. To change the config for a specific PSML document type click Create document type, then click create next to word-export-config.xml for that type.

 

Note

When editing this config file in PageSeeder pressing ctrl-space will display autocomplete options to make editing easier.

Word Export Template File

A Word template is used to format documents exported from PageSeeder. It will contain different classes styles that can all contribute to a customized document layout. The styles are:

  • Characters
  • Paragraphs
  • Tables
  • Lists

The location of the template is:

publisher::WEB-INF/template/[project-name]/document/psml/
export/word-export-template.dotx

Word Export Config File

This XML file must be placed in:

publisher::WEB-INF/template/[project-name]/
document/psml/export/word-export-config.xml

This file contains the options for the following details in the PSML to Word transformation:

  • core
  • toc
  • default
  • elements

<core> element

defines the property values for the exported Word document.

<core>
    <creator select="PageSeeder" />
    <description select="PageSeeder" />
    <subject select="PageSeeder Document" />
    <title select="Default Title" />
    <category select="No Category" />
    <version select="1.0" />
    <revision select="1" />
</core>

The transform tool will only use the values defined here when it is selected on the GUI of PageSeeder:

export-word.png

The other two options are to use the core values of the template used ( default option) and Manual input, using the text boxes provided to fill each of the Word core attributes.

Each of the elements below this element correspond to a Word core attribute. Currently the following Word core attributes are handled:

<creator select="Pageseeder" />

Mapped to the Author word core attribute;

<description select="Pageseeder" />

Mapped to the Comments word core attribute;

<subject select="Pageseeder Document" />

Mapped to the Subject word core attribute;

<title select="Default Title" />

Mapped to the Title word core attribute;

<category select="No Category" />

Mapped to the Category word core attribute;

<version select="1.0" />

Mapped to the Version word core attribute; (Beta)

<revision select="1" />

Mapped to the Revision word core attribute; (Beta)

<toc> element (Table of Contents)

<toc generate="true" style="TOC2">
    <headings generate="true" select="1-9" />
    <!-- 1|1-2|5-9|etc from 1 up to 9-->
    <outline generate="true" select="1-9" />
    <!-- 1|1-2|5-9|etc from 1 up to 9-->
    <paragraph generate="false">
        <style value="[word style]" indent="[indent level]" />
        <!-- 
             any paragraph style defined in the document with the 
             corresponding TOC indent level  
         -->
    </paragraph>
</toc>

Element <toc> is not transformed. It will act as a marker for where to add the Table of Contents information inside of Word. The Table of Contents can be generated from the following three different types of Word objects:

Headings

Headings can be used inside the Table of Contents by setting the @generate attribute of <headings> to "true". Any other value will not generate a ToC.

<headings generate="true" select="1-9" />

The <headings> @select attribute can take any integer value from 1 to 9, including groupings ( 1-3 ), separated by a pipe character. In other words, a regular expression of integer values between 1 through 9.

Outline

Outline can be used inside the ToC by setting the @generate attribute of <outline> to "true". Any other value will not generate a toc.

<outline generate="true" select="1|3-6|8" />

The <outline> @select attribute can take any integer value from 1 to 9, including groupings ( 1-3 ), separated by a pipe character. In other words, a regular expression of integer values between 1 through 9.

Paragraph

Paragraph styles can be used inside the ToC by setting the @generate attribute of <paragraph> to "true". Any other value will not generate a toc.

<paragraph generate="false">
    <style value="Heading 1" indent="1" />    
    <style value="Heading 2" indent="2" />
        <!-- 
             any paragraph style defined in the document with the 
             corresponding TOC indent level  
         -->
</paragraph>

The <paragraph> element accommodates <style> elements, each with a unique @value attribute that used word paragraph style names used to set the paragraph styles to generate a ToC. A @indent attribute is used to set the level inside of the Toc for the paragraph style to be set.

Default Behavior

By default all heading levels and outline levels are added to the toc and paragraph generation is set to "false".

Default

<default>
    <defaultparagraphstyle wordstyle="Body Text" />
    <defaultcharacterstyle wordstyle="Default Paragraph Font" />
    <comments generate="false" />
    <mathml generate="false" />
    <master select="urititle"/>
</default>

<default> element is where the current default conversion options are set.

Following are the options defined by this:

<defaultparagraphstyle> element

<defaultparagraphstyle style="Body Text"/>

Defines which style is mapped to all <block> elements that are not specifically mapped otherwise. 

The default value is "Body Text" but it can be the name of any style that exists in the Word Export Template. If a style name is used that does not exist in this template, by default it is set to "Normal" style.

<defaultcharacterstyle> element

<defaultcharacterstyle style="Default Paragraph Font"/>

The <defaultcharacterstyle> element defines which style is applied to all inline elements that do not have  a personalized transformation.

The default value is "Default Paragraph Font".

<defaultcharacterstyle> can be any Word Character Style Name that exists in the Word Export Template. If a style is used that does not exist in this template, Word, by default, will reset the value.

<comments> element

<comments generate="false"/>

The <comments> element is used to add comments for each section, with a link to add a comment to PageSeeder section by the users' default mail engine.

By default, this value is set to "false".

<mathml> element

<mathml generate="false"/>

The <mathml> element is used to convert any math ml object that is referenced by the exported psml document. These will be converted back to Open Office math ml objects in word.

By default, this value is set to "false".

<master> element

<master select="uriids"/>

The <master> element is used with the export of a master references word document. It adds the option of using uriids as the name of the file, or the uri title.

By default, this value is set to "uriids".

 

Elements

Elements groups all specific options available to transform from PSML to docx applied to specific PSML elements.

These include:

  • <block>
  • <inline>
  • <table>
  • <heading>
  • <para>
  • <title>
  • <nlist>
  • <list>

These can be transformed specifically for a document with a certain label. This can be defined by setting the @label attribute inside a separate <elements> element.

<elements label="warranty">

If this is set the options defined under this element will only apply to documents with this label. In the example above, the options that would be defined under <elements> would only apply to elements under a document with label 'warranty'.

<block>

<block default="generate-ps-style">
    <ignore label="Notes"/>
    <label value="Abstract" wordstyle="Instructions"/>
    <label value="Prompt"   wordstyle="Prompt"/>
    <label value="generate-ps-style" wordstyle="Notes"/>
</block>

<block> element contains the handling of all <block> elements. The @default attribute is used to define what Paragraph Style will be applied to any <block> elements that have no transformation. It accepts two values: "generate-ps-style" and "para". "generate-ps-style" is used to generate a word style for each of the labels found. It will generate a Word Paragraph Style named psblock-[name-of-label]. 'para' is used to generate the default paragraphs set inside the configuration file.

Under <block> , two elements are accepted: <ignore> and <label>. <ignore> is used to define ignored labels on transformation. If a label name is defined under the attribute @label under <ignore>, then the contents inside this block will not be transformed on transformation. If a label name is defined under the attribute @value under <label>, then the contents inside this block will be transformed into the Word Style selected under the attribute @wordstyle. The attribute @value under <label> can also contain "generate-ps-style" forcing the process to generate a unique style for that block label.

<inline>

<inline default="generate-ps-style">
    <ignore   label="Notes"/>
    <tab    label="TabLabel"/>
    <fieldcode  label="n" value="LISTNUM  LegalDefault \\l 1 \\s 2 "/>
    <label    value="Optional" wordstyle="OptionalNormal"/>
</inline>

<inline> element contains the handling of all <inline> elements. @default attribute is used to define what Paragraph Style will be applied to any block elements that have no transformation. It accepts two options: "generate-ps-style" and any other value. 'generate-ps-style' is used to generate a word style for each of the labels found. It will generate a Word Character Style named psinline-[name-of-label]. All other values will generate only text.

Under <inline> , four elements are accepted: <ignore>, <tab>, <fieldcode> and <label>. <ignore> is used to define ignored labels on transformation.

If a label name is defined under the attribute @label under <ignore>, then the contents inside this inline will not be transformed on transformation.

If a label name is defined under the attribute @value under <label>, then the contents inside this inline will be transformed into the Word Style selected under the attribute @wordstyle.

If a label name is defined under the attribute @label under <tab>, then the contents inside this inline will be transformed to a word tab on output.

If a label name is defined under the attribute @label under <fieldcode>, then the contents inside this inline will be transformed to a word fieldcode on output. Currently only the LISTNUM word fieldcode can be used. The fieldcode generated will be the one set in the @value attribute.

This is still in Beta version and should not be used in production servers.

<tables> element

<tables>
   <table default="Table Normal">
     <width type="pct" value="5000"/>
   </table>
   <table role="pstablerole" tablestyle="Table Grid" />
   <table role="pstablerole 2" tablestyle="Table Grid 2" >
     <width type="dxa" value="5000"/>
   </table>
</tables>

<tables> element is used to define table styles for each PageSeeder table.

By default "Table Normal" is set as the default value.

Each individual table role can also be transformed using  a <table> element with a specific @role attribute. These tables will be set with the table style defined under @tablestyle attribute. The width of the table can also be set with the <width> element under each <table> element. This takes the @type of width to be set: dxa ( twentieth-of-point) and pct ( percent). In word, 100% is 5000 and 1 dxa is 15 twentieths-of-point.

<heading> element

<heading>
    <level value="1" wordstyle="heading 1">
        <numbered select="true">
            <fieldcode regexp="%arabic%" type="SEQ" />
        </numbered>
    </level>
    <level value="2" wordstyle="heading 2">
        <prefix select="false">
            <fieldcode regexp="\d+\.%arabic%" type="SEQ" />
        </prefix>
    </level>
    <level value="3" wordstyle="heading 3">
    </level>
    <level value="4" wordstyle="heading 4">
        <numbered select="true">
        </numbered>
    </level>
    <level value="5" wordstyle="heading 5">
        <numbered select="true">
            <fieldcode regexp="^heading-1^.^heading-2^.^heading-3^.^heading-4^.%arabic%" type="SEQ" />
        </numbered>
    </level>
    <level value="6" wordstyle="heading 6">
        <numbered select="true">
            <fieldcode regexp="^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.%arabic%" type="SEQ" />
        </numbered>
    </level>
 </heading>

The <heading> element  is used to define each heading style for each level of heading in PageSeeder.

By default each level is defined as the corresponding heading in Word ( so level 1 to heading 1, level 2 to heading 2, etc.).

Each PSML heading level can also be transformed into a unique Paragraph Style.

<para> element

<para>
    <indent level="0" wordstyle="Body Text">
        <prefix select="true">            
            <fieldcode regexp="Note %arabic%" type="SEQ" /> 
        </prefix>
    </indent>
    <indent level="1" wordstyle="List Continue" >
        <numbered select="true">
            <fieldcode regexp="^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.%arabic%" type="SEQ" />
        </numbered>
    </indent>
    <indent level="2" wordstyle="List Continue" >
        
    </indent>
    <indent level="3" wordstyle="List Continue" >
        
    </indent>
    <indent level="4" wordstyle="List Continue" >
       
    </indent>
    <indent level="5" wordstyle="List Continue" >
        <numbered select="true">
           <fieldcode regexp="^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.^para-1^.%arabic%" type="SEQ" />
        </numbered>
    </indent>
    <indent level="6" wordstyle="List Continue" >
        <numbered select="true">
           <fieldcode regexp="^heading-1^.^heading-2^.^heading-3^.^heading-4^.^heading-5^.^heading-6^.^para-1^.^para-5^.%arabic%" type="SEQ" />
        </numbered>
    </indent>
</para>

Element <para> is used to define each para style for each indent level of a paragraph in PageSeeder.

Each PSML para with unique indent level can be transformed into a unique Paragraph Style. This is defined by the @wordstyle attribute.

If the paragraph is numbered, or has a prefix, It can be transformed into a fieldcode, or text at the output. This is defined by the <fieldcode> or <text> element under the corresponding indent/numbered or indent/prefix.

<title> element

<title name="heading 1"/>

Element <title> is used to define the Paragraph Style of each section title.

By default title is set to "heading 1". It can be transformed into any value that exists in the word export template.

<nlist> element

Contains the list definitions for numbered lists inside of Word.

<nlist>
    <default>
        <level value="1" wordstyle="List Number" />
        <level value="2" wordstyle="List Number 2" />
        <level value="3" wordstyle="List Number 3" />
        <level value="4" wordstyle="List Number 4" />
        <level value="5" wordstyle="List Number 5" />
        <level value="6" wordstyle="List Number 6" />
    </default>
    <role value="numberedlist1">
        <level value="1" wordstyle="List Number A" />
        <level value="2" wordstyle="List Number A 2" />
        <level value="3" wordstyle="List Number A 3" />
        <level value="4" wordstyle="List Number A 4" />
        <level value="5" wordstyle="List Number A 5" />
        <level value="6" wordstyle="List Number A 6" />
    </role>
 </nlist>

<nlist> element contains each of the levels that can be each transformed into different word style paragraphs. If the @role attribute exists, it can also be associated with a word paragraph style on transformation using the <role> element.

<list> element

Contains the list definitions for unnumbered lists inside of Word.

<list>
    <default>
        <level value="1" wordstyle="List Bullet" />
        <level value="2" wordstyle="List Bullet 2" />
        <level value="3" wordstyle="List Bullet 3" />
        <level value="4" wordstyle="List Bullet 4" />
        <level value="5" wordstyle="List Bullet 5" />
        <level value="6" wordstyle="List Bullet 6" />
    </default>
    <role value="unnumberedlist1">
        <level value="1" wordstyle="List Bullet A" />
        <level value="2" wordstyle="List Bullet A 2" />
        <level value="3" wordstyle="List Bullet A 3" />
        <level value="4" wordstyle="List Bullet A 4" />
        <level value="5" wordstyle="List Bullet A 5" />
        <level value="6" wordstyle="List Bullet A 6" />
    </role>
 </list>

The <list> element contains each of the levels that can be each transformed into different word style paragraphs. If the @role attribute exists, it can also be associated with a word paragraph style on transformation using the <role> element.

Created on , last edited on