Lab Note book: Data Prep 2 and Wordsmith and other tools

 

Now, in my previous post, I explained that I was tagging some data in my exam corpus for future use as well as simply making reading and author lists.  I was planning on using a feature in Wordsmith Tools that allows you to use tags as selectors in addition to Markup to INclude or EXclude. But I’ve been having a heck of a time of making it work the way I want. Initially, I was using a text file encoded in big indian format for some reason, but after some emails to Wordsmith Tools creator, Mike Scott, he quickly and politely explained that WS doesn’t like that format and instead prefers little indian. WS thankfully also has a unicode converter within its utilities that quickly fixed that problem–but WS 5’s version is buggy; WS 6’s beta version worked great.

I tried the following based on Wordsmith’s help (using Tags) on the web.

So withing the settings section, Tags, I told WS to automatically load a tag file in Markup to Include section with just this line: <book>,</book>:

 

 

 

 

 

 

 

 

 

Don’t forget to click “load” (it’ll say “clear” if you have already loaded it). It’ll show you what tags it found:

 

In the Only Part of File, Sections to Keep, I have : <book> to <book>    (the Sections to Cut out is empty):

 

 

 

 

 

 

 

When I create a new concordance on the tag <book>*, it finds all 8 entries in my test file, however, it brings back the full concordance context line:

 

 

 

After trying a bunch of different search words as well as checking the settings, I emailed Mike Scott just to make sure I’m understanding what WS can actually do with this as well as to check my settings file. So while I’m waiting to hear back from him, I decided to go ahead an write a VBA macro (MS Word). I basically recorded my find of the tags and then edited it to save the file as a text file:

Selec All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Sub WriteTagsToTextFile()
'
' WriteTagsToTextFile
'
'declare my variables
Dim sBookList As String
Dim oRange As Range
Dim iCounter As Integer
Dim strPath As String
 
'initilize variables
iCounter = 0
strPath = "C:\Users\Me\BookList2.txt"
 
With Selection
    .HomeKey (wdStory) 'the homekey is like pressing ctrl home to move to beginning of document
    .Find.ClearFormatting 'get rid of formatting
End With
 
With Selection.Find
    .Text = "(\<book>*\</book\>)" 'find the tagged entries I'm interested in; in the future I may try to creat a input box to manually enter this in
    .Forward = True
    .Wrap = wdFindStop
     .MatchWildcards = True
    Do While .Execute
        Set oRange = Selection.Range
        With oRange
            'copying all the found tag sets into a variable, inserting a carraige return / line feed after each set
            sBookList = sBookList & .Text & vbCrLf
            iCounter = iCounter + 1
        End With
    Loop
End With
 
'cleanup
Set oRange = Nothing
Selection.Find.ClearFormatting
Selection.HomeKey (wdStory)
 
 
'Now, create the text file and save the list--if the file exists, it will be over-written

' first, open the text file (to create or overwrite it)
'The #1 is how we refer to the file later to write and close it; I would like to use a save as box here instead, but this is easier for now...
Open strPath For Output As #1
' Write the tag list to the file
Write #1, sBookList
' remember to close the file
Close #1 'again, we use #1 to refer to the opened file
'Remind yourself of what you just did--it isn't necessary, but it's also helpful to know that the script really finished
MsgBox iCounter & " enteries found and saved to " & strPath
 
 
End Sub

(btw, I’m trying out My Syntax to display my code snippits.) 

After my macro runs, I see:

 

and my text file looks like this:

 

I can now clean up this by finding/replacing the tags with nothing, or open up in Excel. The point is, I have my lists. I also can now count the frequency that particular titles are referenced.

As I said in my comments, I would like to have to have NOT hard-coded the file name and path; that is, I would like to have the macro prompt the user for that information. But maybe later. Or maybe not at all if I am able to do this in Wordsmith. But now that I have my macro, I have other options…  My main point in doing this in vba (besides getting at the data I want) is to highlight the use of having multiple tools to get at the same or similar data. As I’ve mentioned elsewhere, I’m more interested in getting at the data rather than elegant solutions (though they may be cool ones).

Next, I’ll share some results and future directions.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *