Documents

Handling and managing documents

Word regex

Many users are unaware that Word supports regular expressions in the Find and Replace interface. Called 'wildcards', this feature is a powerful tool for improving the consistency of Word documents prior to upload to PageSeeder.

The purpose of this document is to provide guidance and examples for better practices with Word.

Find and Replace

The following is an example of how wildcard Find works in Word and how it can be used to capture dates that have been expressed inconsistently in a document.

word_regex-date.jpg

([ ][0-9]{1,2})[ ](<[AFJMNSOD]*>)[ ]([0-9]{4})
  • "(" and ")" provide boundaries around the patterns. Counted from left to right on the Find field, they can be output in any order in the Replace With field simply by using "\2", "\3" or "\1". Changing the order of the output is how different Find patterns can be used to harmonize inconsistent data.
  • "[" and "]" delimits the individual character patterns and the hyphen specifies characters in a range.
  • Within the pattern, "<" and ">" specifies the start and end of a word, asterisk "*" matches any character and question mark "?" matches a single character.
  • "{" and "}" specifies the occurrence of whatever precedes it, where comma "," separates the occurrence values.

Therefore, in the pattern above, the expression would  be processed as follows:

([0-9]{1,2})[ ] will find any number between zero and nine that occurs one or two times followed by a space.

(<[AFJMNSOD]*>)[ ] says that after matching the pattern in point one, the system will look for the start of the word then look for one of the capital letters.

[ ]([0-9]{4}) looks for a space after

Removing excess returns

This macro removes excess paragraph marks from a document. 

Sub ReplacePara() 
   Selection.HomeKey Unit:=wdStory
   Selection.Find.ClearFormatting
   With Selection.Find
      .Text = "^p^p"
      .Replacement.Text = ""
      .Forward = True .Wrap = wdFindContinue
      .Format = False .MatchCase = False
      .MatchWholeWord = False
      .MatchWildcards = False
      .MatchSoundsLike = False
      .MatchAllWordForms = False End With
   Selection.Find.Execute
   While Selection.Find.Found
       Selection.MoveRight Unit:=wdCharacter, Count:=1
       Selection.TypeBackspace
       Selection.MoveLeft Unit:=wdCharacter, Count:=2
       Selection.Find.Execute
    Wend
  End Sub

The first part of the macro uses Word's built-in find and replace capabilities to find all instances of two paragraph marks in sequence. The macro doesn't replace the sequential paragraph marks; it simply finds them. The second part uses the Selection.Find.Found property to delete the second of the two sequential paragraph marks.

The reason for this approach is because it leaves the formatting in tact on the remaining paragraph mark. If consecutive paragraph marks are replaced with a single paragraph mark, it is possible that important formatting may be lost.

This is a similar reason for never using the '^13' character in the replace field. Because Word stores the formatting on the paragraph mark and collapsing paragraphs does not have the same consequences as collapsing spaces. Where all values are the same.

Find and remove excess spaces

  1. In the Find What box, enter a single space followed by the characters '{2,}'.
  2. In the Replace With box, type 'space character'.
  3. Make sure the Use Wildcards check box is selected.

Find numbering

Description

word_regex_numbering_3.PNG

Find numbering prefixed by upper/lower case letter

Description

word_regex_numbering_4.PNG

Find word then numbering

Description

word_regex_numbering_5.PNG

“Smart” quotation marks

Left and right curly quotation marks (also known as “smart quotes”) can be generated on a Windows keyboard using alt+0417 “ and alt+0418 ” – or use the autocorrect function in Word to replace straight quotation marks.

Paragraph numbers into text

There are times when it is best to freeze the numbers on headings and paragraphs rather than allowing them to update. 

Selection.Range.ListFormat.ConvertNumbersToText

This one line macro 

Created on , last edited on