Handling and managing documents

Word regex

Often the best way to import Word documents into PageSeeder is to clean them up first. In this circumstance, the best way to clean up Word files is in Word itself.

Unaware that the Word Find and Replace supports a form of Regular Expressions, many users struggle with the clean-up task. The point of this article is to provide some practical examples of the Word regex syntax known as “wildcards”.

Wildcards are not quite the same syntax that most developers are familiar with, but their inbuilt knowledge of the docx format makes the differences a little more bearable. 

Examples of this are the following:

CharacterWildcard expression
Opening field bracket^19 – ^19 REF – will find every cross-reference
Tab character^t
Page or section break^m
Column break^n

Find and Replace

The following is an example of how wildcard Find works in Word and how it can be used to capture dates that have been expressed inconsistently in a document.


([ ][0-9]{1,2})[ ](<[AFJMNSOD]*>)[ ]([0-9]{4})
  • "(" and ")" provide boundaries around the patterns. Counted from left to right on the Find field, they can be output in any order in the Replace With field simply by using "\2", "\3" or "\1". Changing the order of the output is how different Find patterns can be used to harmonize inconsistent data.
  • "[" and "]" delimits the individual character patterns and the hyphen specifies characters in a range.
  • Within the pattern, "<" and ">" specifies the start and end of a word, asterisk "*" matches any character and question mark "?" matches a single character.
  • "{" and "}" specifies the occurrence of whatever precedes it, where comma "," separates the occurrence values.

Therefore, in the pattern above, the expression would be processed as follows:

([0-9]{1,2})[ ] will find any number between zero and nine that occurs one or two times followed by a space.

(<[AFJMNSOD]*>)[ ] says that after matching the pattern in point one, the system will look for the start of the word, then look for one of the capital letters.

[ ]([0-9]{4}) looks for a space after digits.

Removing excess returns

This macro removes excess paragraph marks from a document. 

Sub ReplacePara() 
   Selection.HomeKey Unit:=wdStory
   With Selection.Find
      .Text = "^p^p"
      .Replacement.Text = ""
      .Forward = True .Wrap = wdFindContinue
      .Format = False .MatchCase = False
      .MatchWholeWord = False
      .MatchWildcards = False
      .MatchSoundsLike = False
      .MatchAllWordForms = False End With
   While Selection.Find.Found
       Selection.MoveRight Unit:=wdCharacter, Count:=1
       Selection.MoveLeft Unit:=wdCharacter, Count:=2
  End Sub

The first part of the macro uses Word's built-in Find and Replace capabilities to find all instances of two paragraph marks in sequence. The macro doesn't replace the sequential paragraph marks; it simply finds them. The second part uses the Selection.Find.Found property to delete the second of the two sequential paragraph marks.

The reason for this approach is because it leaves the formatting intact on the remaining paragraph mark. If consecutive paragraph marks are replaced with a single paragraph mark, it is possible that important formatting may be lost.

This is a similar reason for never using the '^13' character in the Replace field. Because Word stores the formatting on the paragraph mark, collapsing paragraphs does not have the same consequences as collapsing spaces, where all values are the same.

Find and remove excess spaces

  1. In the Find What box, enter a single space followed by the characters '{2,}'.
  2. In the Replace With box, type 'space character'.
  3. Make sure the Use Wildcards check box is selected.

Find numbering



Find numbering prefixed by upper/lower case letter



Find word then numbering



“Smart” quotation marks

Left and right curly quotation marks (also known as “smart quotes”) can be generated on a Windows keyboard using alt+0417 “ and alt+0418 ” – or use the autocorrect function in Word to replace straight quotation marks.

Paragraph numbers into text

There are times when it is best to freeze the numbers on headings and paragraphs rather than allowing them to update. 


This one line macro 

Created on , last edited on