Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /home/johna00/public_html/wp/wp-content/plugins/my-syntax/my-syntax.php on line 115

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /home/johna00/public_html/wp/wp-content/plugins/my-syntax/my-syntax.php on line 116

Lab Note book: Data Prep 2 and Wordsmith and other tools


Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /home/johna00/public_html/wp/wp-content/plugins/my-syntax/my-syntax.php on line 115

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /home/johna00/public_html/wp/wp-content/plugins/my-syntax/my-syntax.php on line 116

 

Now, in my previous post, I explained that I was tagging some data in my exam corpus for future use as well as simply making reading and author lists.  I was planning on using a feature in Wordsmith Tools that allows you to use tags as selectors in addition to Markup to INclude or EXclude. But I’ve been having a heck of a time of making it work the way I want. Initially, I was using a text file encoded in big indian format for some reason, but after some emails to Wordsmith Tools creator, Mike Scott, he quickly and politely explained that WS doesn’t like that format and instead prefers little indian. WS thankfully also has a unicode converter within its utilities that quickly fixed that problem–but WS 5’s version is buggy; WS 6’s beta version worked great.

I tried the following based on Wordsmith’s help (using Tags) on the web.

So withing the settings section, Tags, I told WS to automatically load a tag file in Markup to Include section with just this line: <book>,</book>:

 

 

 

 

 

 

 

 

 

Don’t forget to click “load” (it’ll say “clear” if you have already loaded it). It’ll show you what tags it found:

 

In the Only Part of File, Sections to Keep, I have : <book> to <book>    (the Sections to Cut out is empty):

 

 

 

 

 

 

 

When I create a new concordance on the tag <book>*, it finds all 8 entries in my test file, however, it brings back the full concordance context line:

 

 

 

After trying a bunch of different search words as well as checking the settings, I emailed Mike Scott just to make sure I’m understanding what WS can actually do with this as well as to check my settings file. So while I’m waiting to hear back from him, I decided to go ahead an write a VBA macro (MS Word). I basically recorded my find of the tags and then edited it to save the file as a text file:

Selec All Code:
1
 

(btw, I’m trying out My Syntax to display my code snippits.) 

After my macro runs, I see:

 

and my text file looks like this:

 

I can now clean up this by finding/replacing the tags with nothing, or open up in Excel. The point is, I have my lists. I also can now count the frequency that particular titles are referenced.

As I said in my comments, I would like to have to have NOT hard-coded the file name and path; that is, I would like to have the macro prompt the user for that information. But maybe later. Or maybe not at all if I am able to do this in Wordsmith. But now that I have my macro, I have other options…  My main point in doing this in vba (besides getting at the data I want) is to highlight the use of having multiple tools to get at the same or similar data. As I’ve mentioned elsewhere, I’m more interested in getting at the data rather than elegant solutions (though they may be cool ones).

Next, I’ll share some results and future directions.

Comments (1)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: