Shred My Code - Parsing text in a Rich Text Field
Category IBM/Lotus
I thought about naming this feature "Pimp My Code", but I know what people have done to code I posted in the past. Therefore, "shred" sounded more appropriate. :)
But seriously, I wanted to share this code and open it up for comment. It's a part of the Domino Object Model I haven't played with before, and I'm sure there's a ton of room for improvement.
Scenario: I have a Rich Text field in a Notes client document. The document is edited in Notes, but viewed in the browser. The field stores an "audit history" of document changes from the business perspective. It's in Rich Text because they wanted certain parts of it colored and bolded, and well... it made sense at the time. Here's the format of the information, one line per audit event:
02-07-08 09:00 AM - Modification - This document was changed to conform to the updated regulations.
12-12-07 01:00 PM - Add - Originally added for reference
Forget the fact that the dates are in mm-dd-yy format, as this isn't used internationally.
When I examine each line of the Rich Text field, there are three text runs. The red text is one run, the description is a second run, and what appears to be a blank (but acts like a linefeed) is the final run in the line.
My task was to create an agent that would go through this field and eliminate any lines where the date was older than 12 months from today. Had I known this earlier, I would have had them use HTML in the field, made it a multi-value text field, and it would have been a breeze. Given that it was already Rich Text, I wasn't sure how I was going to be able to do this. But I started to dig into the NotesRichText objects, I realized that I might be able to string together some NotesRichTextNavigator and NotesRichTextRange methods and properties and pull this off.
My logic below grabs a navigator of the whole Rich Text item, grabs the text run element, and examines it. If it's a date that's before the 12 month cutoff, then you just keep parsing until you find another date or you get to the end. Once you find a date that is older than 12 months, then you go to the purge loop that will eliminate everything in the field from that point on (since the date entries are in descending date order).
Normally I don't use loops and gotos like this. But as I was hacking through this during various parts of three days, this is the code that evolved as working. The only glitch I see so far is it tends to leave blank lines at the end if it's deleting lines.
Feel free to discuss, comment, slice, dice, R&D, whatever. I haven't ever seen much in the way of code that parses Rich Text using these DOM objects, so this was all new to me...
Sub Initialize
'Create objects to be used in this routine
Dim session As New NotesSession
Dim ws As New NotesUIWorkspace
Dim dbThis As NotesDatabase
Dim uidocThis As NotesUIDocument
Dim docThis As NotesDocument
Dim rtiHistory As NotesRichTextItem
Dim rtnHistory As NotesRichTextNavigator
Dim rtrHistory As NotesRichTextRange
Dim ndtCutoffDate As NotesDateTime
Dim varCutoffDate As Variant
Dim strItemDate As String
Dim lngTotalElements As Long
On Error Goto logError
'Get the current document that's open
Set dbThis = session.CurrentDatabase
Set uidocThis = ws.CurrentDocument
Set docThis = uidocThis.Document
Set ndtCutoffDate = New NotesDateTime(Today)
Call ndtCutoffDate.AdjustMonth(-12)
varCutoffDate = Cdat(ndtCutoffDate.DateOnly)
lngTotalElements = 1
'Get the field that has the history we need to purge
Set rtiHistory = docThis.GetFirstItem("subHistoryArchiving")
Set rtnHistory = rtiHistory.CreateNavigator
LoopForGoodItems:
'Start iterating through the history. For each history line, there are three text runs.
'The first is the date/time/action, the second is the detail, and the third is the linefeed.
Call rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements)
If rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements) Then
Set rtrHistory = rtiHistory.CreateRange
Call rtrHistory.SetBegin(rtnHistory)
Call rtrHistory.SetEnd(rtnHistory)
'Don't try to parse out a date if the returned textrun is nothing more than a linefeed or blank.
If Len(rtrHistory.TextRun) > 0 Then
'Parse out the date, see if it *is* a date, and then compare it to 12 month cutoff. If you hit the cutoff date,
'then we'll break out to the purge loop to clear out everything that follows. Otherwise, keep on looping.
strItemDate = Left$(rtrHistory.TextRun, (Instr(1,rtrHistory.TextRun, " ") - 1))
If Isdate(strItemDate) Then
If Cdat(strItemDate) >= varCutoffDate Then
lngTotalElements = lngTotalElements + 1
Goto LoopForGoodItems
Else
Goto LoopForPurge
End If
Else
lngTotalElements = lngTotalElements + 1
Goto LoopForGoodItems
End If
Else
lngTotalElements = lngTotalElements + 1
Goto LoopForGoodItems
End If
Else
Goto LoopForPurge
End If
LoopForPurge:
'This loop just keeps on purging text runs until you get to the end of the rich text area.
If rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements) Then
Call rtrHistory.Remove()
Set rtrHistory = rtiHistory.CreateRange
Call rtrHistory.SetBegin(rtnHistory)
Call rtrHistory.SetEnd(rtnHistory)
Goto LoopForPurge
End If
'This will be fired once you have nothing left in the field to purge.
Call docThis.Save(True, False)
Exit Sub
logError:
Call LogError
Exit Sub
End Sub
I thought about naming this feature "Pimp My Code", but I know what people have done to code I posted in the past. Therefore, "shred" sounded more appropriate. :)
But seriously, I wanted to share this code and open it up for comment. It's a part of the Domino Object Model I haven't played with before, and I'm sure there's a ton of room for improvement.
Scenario: I have a Rich Text field in a Notes client document. The document is edited in Notes, but viewed in the browser. The field stores an "audit history" of document changes from the business perspective. It's in Rich Text because they wanted certain parts of it colored and bolded, and well... it made sense at the time. Here's the format of the information, one line per audit event:
02-07-08 09:00 AM - Modification - This document was changed to conform to the updated regulations.
12-12-07 01:00 PM - Add - Originally added for reference
Forget the fact that the dates are in mm-dd-yy format, as this isn't used internationally.
When I examine each line of the Rich Text field, there are three text runs. The red text is one run, the description is a second run, and what appears to be a blank (but acts like a linefeed) is the final run in the line.
My task was to create an agent that would go through this field and eliminate any lines where the date was older than 12 months from today. Had I known this earlier, I would have had them use HTML in the field, made it a multi-value text field, and it would have been a breeze. Given that it was already Rich Text, I wasn't sure how I was going to be able to do this. But I started to dig into the NotesRichText objects, I realized that I might be able to string together some NotesRichTextNavigator and NotesRichTextRange methods and properties and pull this off.
My logic below grabs a navigator of the whole Rich Text item, grabs the text run element, and examines it. If it's a date that's before the 12 month cutoff, then you just keep parsing until you find another date or you get to the end. Once you find a date that is older than 12 months, then you go to the purge loop that will eliminate everything in the field from that point on (since the date entries are in descending date order).
Normally I don't use loops and gotos like this. But as I was hacking through this during various parts of three days, this is the code that evolved as working. The only glitch I see so far is it tends to leave blank lines at the end if it's deleting lines.
Feel free to discuss, comment, slice, dice, R&D, whatever. I haven't ever seen much in the way of code that parses Rich Text using these DOM objects, so this was all new to me...
Sub Initialize
'Create objects to be used in this routine
Dim session As New NotesSession
Dim ws As New NotesUIWorkspace
Dim dbThis As NotesDatabase
Dim uidocThis As NotesUIDocument
Dim docThis As NotesDocument
Dim rtiHistory As NotesRichTextItem
Dim rtnHistory As NotesRichTextNavigator
Dim rtrHistory As NotesRichTextRange
Dim ndtCutoffDate As NotesDateTime
Dim varCutoffDate As Variant
Dim strItemDate As String
Dim lngTotalElements As Long
On Error Goto logError
'Get the current document that's open
Set dbThis = session.CurrentDatabase
Set uidocThis = ws.CurrentDocument
Set docThis = uidocThis.Document
Set ndtCutoffDate = New NotesDateTime(Today)
Call ndtCutoffDate.AdjustMonth(-12)
varCutoffDate = Cdat(ndtCutoffDate.DateOnly)
lngTotalElements = 1
'Get the field that has the history we need to purge
Set rtiHistory = docThis.GetFirstItem("subHistoryArchiving")
Set rtnHistory = rtiHistory.CreateNavigator
LoopForGoodItems:
'Start iterating through the history. For each history line, there are three text runs.
'The first is the date/time/action, the second is the detail, and the third is the linefeed.
Call rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements)
If rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements) Then
Set rtrHistory = rtiHistory.CreateRange
Call rtrHistory.SetBegin(rtnHistory)
Call rtrHistory.SetEnd(rtnHistory)
'Don't try to parse out a date if the returned textrun is nothing more than a linefeed or blank.
If Len(rtrHistory.TextRun) > 0 Then
'Parse out the date, see if it *is* a date, and then compare it to 12 month cutoff. If you hit the cutoff date,
'then we'll break out to the purge loop to clear out everything that follows. Otherwise, keep on looping.
strItemDate = Left$(rtrHistory.TextRun, (Instr(1,rtrHistory.TextRun, " ") - 1))
If Isdate(strItemDate) Then
If Cdat(strItemDate) >= varCutoffDate Then
lngTotalElements = lngTotalElements + 1
Goto LoopForGoodItems
Else
Goto LoopForPurge
End If
Else
lngTotalElements = lngTotalElements + 1
Goto LoopForGoodItems
End If
Else
lngTotalElements = lngTotalElements + 1
Goto LoopForGoodItems
End If
Else
Goto LoopForPurge
End If
LoopForPurge:
'This loop just keeps on purging text runs until you get to the end of the rich text area.
If rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements) Then
Call rtrHistory.Remove()
Set rtrHistory = rtiHistory.CreateRange
Call rtrHistory.SetBegin(rtnHistory)
Call rtrHistory.SetEnd(rtnHistory)
Goto LoopForPurge
End If
'This will be fired once you have nothing left in the field to purge.
Call docThis.Save(True, False)
Exit Sub
logError:
Call LogError
Exit Sub
End Sub





Comments
Posted by Bart At 07:02:24 On 09/10/2008 | - Website - |
The line of data is created when the document is saved. There's a QuerySave routine that builds the new line of data (complete with color and formatting), uses that to create a new RT field, appends the existing RT contents onto it, and that becomes the new version of the field.
Posted by Duffbert At 07:31:10 On 09/10/2008 | - Website - |
- date
- action taken (Add, Modification, etc)
- comments
Would probably even make things easier in terms of parsing, reporting, etc.
PS: GoTo's are evilllllllllllll
Posted by Pedro Quaresma At 07:46:29 On 09/10/2008 | - Website - |
And I know that gotos are not great coding practice. But I also didn't want to immediately recode it once I got it working. :)
Posted by Duffbert At 08:06:33 On 09/10/2008 | - Website - |
Why don't you take this opportunity to get rid of the old logging ánd to redesign the form/application so it's more maintainable in the future?
Why accept something you don't like and work around it?
Posted by Bart At 03:37:23 On 10/10/2008 | - Website - |