About Duffbert...

Duffbert's Random Musings is a weblog semi/sorta related to IBM/Lotus Notes & Domino software, but I don't let that be a limiting criteria. I'm Thomas Duff, and you can find out more about me here...

Email me!

Search This Site!

Google
Web
duffbert.com

Recent book reviews...

Idea Jam!

MiscLinks

Visitor Count...



View My Stats

« Book Review - Fade Away by Harlan Coben | Main| Book Review - Personal Development for Smart People: The Conscious Pursuit of Personal Growth by Steve Pavlina »

Shred My Code - Parsing text in a Rich Text Field

Category IBM/Lotus

I thought about naming this feature "Pimp My Code", but I know what people have done to code I posted in the past.  Therefore, "shred" sounded more appropriate.  :)

But seriously, I wanted to share this code and open it up for comment.  It's a part of the Domino Object Model I haven't played with before, and I'm sure there's a ton of room for improvement.

Scenario:  I have a Rich Text field in a Notes client document.  The document is edited in Notes, but viewed in the browser.  The field stores an "audit history" of document changes from the business perspective.  It's in Rich Text because they wanted certain parts of it colored and bolded, and well...  it made sense at the time.  Here's the format of the information, one line per audit event:

02-07-08 09:00 AM - Modification - This document was changed to conform to the updated regulations.
12-12-07 01:00 PM - Add - Originally added for reference

Forget the fact that the dates are in mm-dd-yy format, as this isn't used internationally.  

When I examine each line of the Rich Text field, there are three text runs.  The red text is one run, the description is a second run, and what appears to be a blank (but acts like a linefeed) is the final run in the line.

My task was to create an agent that would go through this field and eliminate any lines where the date was older than 12 months from today.  Had I known this earlier, I would have had them use HTML in the field, made it a multi-value text field, and it would have been a breeze.  Given that it was already Rich Text, I wasn't sure how I was going to be able to do this.  But I started to dig into the NotesRichText objects, I realized that I might be able to string together some NotesRichTextNavigator and NotesRichTextRange methods and properties and pull this off.

My logic below grabs a navigator of the whole Rich Text item, grabs the text run element, and examines it.  If it's a date that's before the 12 month cutoff, then you just keep parsing until you find another date or you get to the end.  Once you find a date that is older than 12 months, then you go to the purge loop that will eliminate everything in the field from that point on (since the date entries are in descending date order).

Normally I don't use loops and gotos like this.  But as I was hacking through this during various parts of three days, this is the code that evolved as working.  The only glitch I see so far is it tends to leave blank lines at the end if it's deleting lines.

Feel free to discuss, comment, slice, dice, R&D, whatever.  I haven't ever seen much in the way of code that parses Rich Text using these DOM objects, so this was all new to me...

Sub Initialize

 'Create objects to be used in this routine
 Dim session As New NotesSession
 Dim ws As New NotesUIWorkspace
 Dim dbThis As NotesDatabase
 Dim uidocThis As NotesUIDocument
 Dim docThis As NotesDocument
 Dim rtiHistory As NotesRichTextItem
 Dim rtnHistory As NotesRichTextNavigator
 Dim rtrHistory As NotesRichTextRange
 Dim ndtCutoffDate As NotesDateTime
 Dim varCutoffDate As Variant
 Dim strItemDate As String
 Dim lngTotalElements As Long
 
 On Error Goto logError
 
 
'Get the current document that's open
 Set dbThis = session.CurrentDatabase
 Set uidocThis = ws.CurrentDocument
 Set docThis = uidocThis.Document
 Set ndtCutoffDate = New NotesDateTime(Today)
 Call ndtCutoffDate.AdjustMonth(-12)
 varCutoffDate = Cdat(ndtCutoffDate.DateOnly)
 lngTotalElements = 1
 
 
'Get the field that has the history we need to purge
 Set rtiHistory = docThis.GetFirstItem("subHistoryArchiving")
 Set rtnHistory = rtiHistory.CreateNavigator
 
LoopForGoodItems:
 
 
'Start iterating through the history.  For each history line, there are three text runs. 
 'The first is the date/time/action, the second is the detail, and the third is the linefeed.

 
 Call rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements)
 If rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements) Then
  Set rtrHistory = rtiHistory.CreateRange
  Call rtrHistory.SetBegin(rtnHistory)
  Call rtrHistory.SetEnd(rtnHistory)
  
'Don't try to parse out a date if the returned textrun is nothing more than a linefeed or blank.
  If Len(rtrHistory.TextRun) > 0 Then
   
'Parse out the date, see if it *is* a date, and then compare it to 12 month cutoff.  If you hit the cutoff date,
   'then we'll break out to the purge loop to clear out everything that follows.  Otherwise, keep on looping.

   strItemDate = Left$(rtrHistory.TextRun, (Instr(1,rtrHistory.TextRun, " ") - 1))
   If Isdate(strItemDate) Then
    If Cdat(strItemDate) >= varCutoffDate Then
     lngTotalElements = lngTotalElements + 1
     Goto LoopForGoodItems
    Else
     Goto LoopForPurge
    End If
   Else
    lngTotalElements = lngTotalElements + 1
    Goto LoopForGoodItems
   End If
  Else
   lngTotalElements = lngTotalElements + 1
   Goto LoopForGoodItems
  End If
 Else
  Goto LoopForPurge
 End If
 
LoopForPurge:
 
 
'This loop just keeps on purging text runs until you get to the end of the rich text area.
 
 If rtnHistory.FindNthElement(RTELEM_TYPE_TEXTRUN, lngTotalElements) Then
  Call rtrHistory.Remove() 
  Set rtrHistory = rtiHistory.CreateRange
  Call rtrHistory.SetBegin(rtnHistory)
  Call rtrHistory.SetEnd(rtnHistory)
  Goto LoopForPurge
 End If
 
 
'This will be fired once you have nothing left in the field to purge.
 Call docThis.Save(True, False)
 
 Exit Sub
 
logError:   
 Call LogError
 Exit Sub
 
End Sub

Comments

Gravatar Image1 - I'm not sure I have a good picture of how this audit history is formatted exactly but I guess I would start with RichTextText.Text to grab the plain text and Split(text, Chr$(13)+Chr$(10)) to create an array with text lines. As I understand it, these will all start with a date/time value so parsing them shouldn't be to difficult.




Gravatar Image2 - That would work in terms of getting the data programmatically. But it doesn't address removing runs of text while still keeping the formatting.

The line of data is created when the document is saved. There's a QuerySave routine that builds the new line of data (complete with color and formatting), uses that to create a new RT field, appends the existing RT contents onto it, and that becomes the new version of the field.

Gravatar Image3 - Tom, I'd probably still prefer to have 3 separate text, multivalue fields:

- date
- action taken (Add, Modification, etc)
- comments

Would probably even make things easier in terms of parsing, reporting, etc.

PS: GoTo's are evilllllllllllll Emoticon

Gravatar Image4 - @3: In hindsight, the design would have been closer to what you propose. Then I could just display the data with the appropriate formatting and CSS. But given the existing format, I resorted to using this.

And I know that gotos are not great coding practice. But I also didn't want to immediately recode it once I got it working. :)

Gravatar Image5 - @2 You are right. Didn't give it enough thinking.

Why don't you take this opportunity to get rid of the old logging ánd to redesign the form/application so it's more maintainable in the future?

Why accept something you don't like and work around it?

Post A Comment

:-D:-o:-p:-x:-(:-):-\:angry::cool::cry::emb::grin::huh::laugh::lips::rolleyes:;-)