The “spirit of entrepreneurship and egalitarianism” and collaboration

I love it when I serendipitously discover a terrific article or posting. Today, while searching for something entirely different, I came across Amanda Gailey and Dot Porter’s posting on Alt-Academy’s site, “Credential Creep in the Digital Humanities.” This posting’s title is a quote from their article. It gets at what I feel myself, hanging out with digital humanists.  Although it’s been about a year since they wrote the article, it seems that many of their concerns regarding the hiring practices for the digital humanities are being born out.

I remember something similar happening in the programming world back in the 1990s–originally, companies were hiring people who taught themselves to program. As the field grew, so did a perceived need for certifications for advancement, and then later, even for qualifying as a hiree. (On a separate note, I’ve always wondered if this was more due to the influence/assertions of the peripheral markets, such as certification companies and programming manual publishers.)  However, it seems that later on, although those accomplishments certainly did not hurt a person’s career, many managers  (at least in my organization at the time), realized that “real world” experience was preferable to degrees or certifications–as the deciding factor. I remember one manager expressing that as far as specialized skills were concerned, the company was often changing out different technologies as the different technologies advanced. This particular manager was more concerned that the programmers, networking people, and tech support persons he hired, possessed the ability to learn the new technologies as they changed–it was much cheaper to give that kind of training than to train new employees with specific skills from scratch. And it made the employees feel much more integral to the company’s success. It makes me wonder if the digital humanities will eventually follow this pattern. But if Gailey and Porter are correctly assessing how it’s following the academic model/pattern in general, well, it may be a longer time-frame.

Gailey and Porter end their post with three helpful recommendations to counter the credential explosion within hiring practices. I would add to their suggestions, that  smaller schools trying to add digital humanities components to their programs, work with an already established tech savvy group–a computer science department (if their institution has one). There may be problems with this type of setup, particularly in terms of budgeting issues between departments (sharing resources); I don’t know. But I do know that having worked in the corporate world, at least for the computer science program, it would be a boon for their students’ marketability to be able to gain experience by working on real world projects under the guidance of their professors. Not only would they gain technical experience, they would also gain invaluable project management experience–either as the manager or the managee.  That last point also applies to people within the humanities. Though they have much experience working with graduates, committees, and their individual research projects, learning to work with technologists on a project will teach them another subset of these types of skills: managing a programming project. It would also make it easier for the humanities people to gain experience with the different technologies. This has to help complete projects sooner rather than necessitating an individual professor or graduate student to learn five different technologies for one project. I’m sure this type of setup has already been in place with different institutions, at least in individual cases/projects. But establishing a formal partnership across programs would help facilitate the likelihood of more such projects. And who knows? Eventually, the institution may create a dedicated digital humanities center based on the interdisciplinary relationships in the “spirit of entrepreneurship and egalitarianism” we all understand are necessary for this field.

Collaborative Learning and Teaching

Over the last few years or so, whether I was teaching a composition class or some other course like tech writing, I usually incorporated a group project that involved a group writing assignment (for example an essay or brochure). Students usually resisted this–heck, I would have not liked it back in my undergraduate days. And even now, I find it hard to let myself rely on others to do their work or at least up to some sort of standard that comes to me while looking at their work… But having worked in the world outside of academia, the corporate world, I found that many minds help to not only solve problems but also help to develop creative approaches to begin with–approaches I may have been resistant to, had I even been lucky enough to think of them myself. Of course, this doesn’t always work. Personality, personal agendas can get in the way. However, good or bad, all of these group projects, at the very least, always had the one benefit of making us actually start a project. And there is much to be said for starting. In writing terms, that means dirtying that awful blank page staring back at us. I’ve found that in college, especially the 101 and 102 composition courses, such group writing had the effect of showing less than good writers examples of better writing and how it worked down to a sentence level–at least if they happened to be paired with a better writer. And if they weren’t, the group dynamic still helped because they were able to challenge one another, speaking out when the sentences or ideas didn’t make sense–they may have not known how to fix them, but it gave them enough of a feeling of solidarity as well as confidence of having honestly tried, to come ask for my advice–which is another difficult thing to get students to do at the beginning of the semester.

I had planned on doing the same thing this semester for my Early American Literature class. However, the weekend before classes started, I happened to read Cathy Davidson’s Now You See It: How the Brain Science of Attention Will Transform the Way We Live, Work, and Learn.  I thought her Revolution and the Word: The Rise of the Novel in America was an excellent piece of scholarship. Now You See It was more of an evangelical piece, particularly promoting education reform, for which it does a terrific job, and I recommend it, especially for teachers. Basically, it inspired me to try more collaborative work for this class, even for setting up the class itself, where the students decide what to do for class projects as well as the midterm exam and Final.

I thought about implementing the crowdsourced and contract grading system Davidson discusses on the HASTAC blogs. At first I was really resistant to this idea because I couldn’t see how quality control fit in. But finally, I realized that the crowd sourced part was the quality control part of it, using the contracts as the measuring stick (however, I’ve still to come up with a good response to a fellow teacher’s response to the contract part : “Well, isn’t that really how the normal grading (a,b,c, f) along with a description of that grading system works?”). The trick is, I think, to make sure all work is open to all students. And so at first, I had planned on using Moodle forums to make it easier for that visibility to occur (hoping that would also motivate students to turn in better work than they might otherwise do).

In this spirit of collaborative learning/teaching, I got my students’ opinion about this setup. Almost without fail, students said that they wanted a way to make this anonymous. And they had cited good reasons:  negative feedback from fellow group members, as well as other people in the class, might could create a hostile environment. I explained to them that that was part of the course goals—learning how to give and receive feedback in ways that helps everyone—skills also needed in the business world (dept meetings, team projects, etc.). But the push-back was unanimous. And so, I switch to individual uploadings of these  assignments to Moodle so that I could gather/collated evaluations then distribute a compilation in order to keep them private. Because of this, I decided to drop the crowd sourced/contract grading. I still am interested in the idea, and will try it out next semester, hopefully—it’s just that I needed more planning than the weekend before classes!

Originally, the assignment schedule looked like this: I had a set of readings assigned for each class. Instead of giving reading quizzes, the students would be responsible for uploading to Moodle, one question (and its answer) about the author (sometimes two authors) we were discussing that day.  For each class day, two groups would present a particular author/text we were covering (based on our reading schedule). Although I would have preferred to have only one presentation per class where I could spend more time with setting up contexts, there are 38 students in the class, necessitating that two group per class presented (so as to have a grade beside homework grade before the midterm).  The rest of the class was also responsible for filling out an evaluation form for each presentation, consisting of ranking from 1-4 the areas of Subject Knowledge, Organization, and Delivery. Additionally, they were required to give comments on specific things that they found helpful to understanding the text/author as well as suggestions on how the group might have improved the presentation. And to help the students with paying attention, they were also to include one question with its answer based on the material from the presentation. They were to upload these evaluations to Moodle by the start of the next class. The members of the group presentations obviously did not need to fill out the form. However, they were given a week to upload to Moodle a 3-5 page evaluation essay of their experience of the project, discussing what their goal was, how each of the other people’s contributions helped or hurt the presentation, as well as their own contributions, and what they learned or might have done differently.

All of this was happening every class. Although this seems like a lot of things for each student to be responsible for each class, I thought they were fairly small in scope and manageable. I was looking for ways to engage them in the texts as well as discussions of the texts beyond the typical reading quiz. However, I forgot that I had to provide comments for everything. The grading was based on completion rather than quality, but even so, commenting on their comments, etc took much time that would have been better spent elsewhere.  I quickly decided that we were going to do things differently once we were through the midterm, though I am willing to keep on due to the improvement.

My Moodle setup has been somewhat of a nightmare—in terms of my keeping track of everything. So no matter what, this part is going to change.

Here’s just a snapshot of the schedule but it should illustrate the management issue I created:

The Midterm assignment was a 3-5 page compare and contrast essay of two authors/texts we had discussed so far (anything from writing/rhetorical styles, to evolution of belief systems—or anything else). As I mentioned, the class was involved with coming up with their assignments. However, for the midterm, they showed no imagination–everyone want a traditional test of sorts. This was disappointing (I really don’t believe literature ought to be about memorization of facts). So I decided to give them an example of what to expect on such a test. They changed their minds, but still refused to suggest anything in place of it. So that’s how we ended up with an essay. Looking back, it wasn’t a bad idea. But still, I’m surprised at the resistance to move from one form of education to a new one.

How has all this worked so far?

Most of the reading questions/answers came right out of the Heath anthology introductions at first, rather than from the text.  I really wanted to grade them on quality to motivate them, but as the semester progressed, and as I commented heavily on them, the questions greatly improved, and so I feel satisfied with how most students were progressing with their critical readings:


Here’s an example of the evaluation forms the students used for the group presentations:

I would then add all of their responses to a spreadsheet:

Notice the grade at the bottom. This was my version of crowdsource grading. It included not only my own evaluation but the rest of the class’s as well. I noticed that students consistently either wanted to not ever say anything negative (“I wouldn’t change a thing” along with “Great job!” variety), or they tended to be very critical of the performance part of the presentation. Much more so than myself. Again, with lots of commenting on their evaluations from me, these evaluations started to focus more on the content and what helped the student understand the author or text better–and what didn’t.

I would then make a copy of the spreadsheet, removing the students’ names, and print it to a pdf file, which I then sent to the group members:

As I expected, the presentations improved with each group. It was rewarding that their peers noticed this as well:

Again, I was surprised though, that given they could do anything–and I mean anything (i.e. put Thomas Paine on trial over his Age of Reason), everyone one did the exact same thing:  a PowerPoint presentation, beginning with a bio, and then using quotes, along with some questions for the class. And that’s a perfectly reasonable way to do it. However, I was hoping that being not only allowed but cheered on to do anything out of the box, it would have inspired more people to try different things. My guess is that everyone settled on the idea of what constituted a “presentation” based on what the first group did. But that aside, some things that I noticed about the presentations, was that most groups in the beginning, focused on biography–usually lifted directly from the Heath introductions, as well as, giving a plot summary–rather than digging into the text itself. But I honestly expected this. What works well about these presentation formats, though, is that it allows me to lecture during their presentation based on what points they bring up, providing context and explanations of issues that the groups might not have brought up, or elaborating on the importance of points they did bring to the class’ attention. I noticed that this sort of movement, these shiftings of attention, help keep the class engaged (as opposed to my lecturing the whole period while they try to stay awake–it’s an 8am class). And for the group members, it gives them the opportunity to learn a lot more about (at least one of) the works.

When it came to the groups evaluation essays (of their presentation experience) as well as the Midterm compare/contrast essays, they were a disaster. It was as if they had forgotten everything from their 101 and 102 classes about what makes an essay an essay, or even how  paragraph works. They also tended to be very general–so much so, that often it was unclear that the evaluation’s topic had anything to do with a presentation of an early American writer. Having said this, though, there were a couple of outstanding essays. But for the rest, instead of marking them with poor grades, I decided to let them revise them. Although I provided a lot of feedback, both on essay mechanics as well as content, I also advised them to work with our Writing Center before resubmitting them. It’s not that I’m an overly-kind teacher (I normally wouldn’t want to regrade two sets of 38 essays), but I don’t believe poor writing ought to prevent passing a class on literature–it’s not the goal of this particular course (which is to see how the different belief systems have evolved and still are present within our culture), though it’s one of the few tools by which I can evaluate whether or not they’ve reach that goal. It was also a chance to remind students that the structure and mechanics they learned in 101 and 102 aren’t just there to torment them, but to help make their ideas clear. I’ve only started to grade their revisions, but so far, they look much, much stronger. In retrospect, I think what happened was that they believed since this was not a writing course, they didn’t need to worry about their writing. Along with their revisions, I received many notes thanking me for the opportunity to make those revisions. Gratitude goes a long way.

The original plan for the semester was to two group presentations (before and after the midterm). And I’m still all for them doing that, however, I wanted to be able to slow down after such a flurry of activity during the first part of the semester. I opened up the question of what our next project should be to the class asking them to submit ideas. Only one person responded–doing some sort of skit (which could be fun and informative). I could just say “Okay, since only one person responded, we’ll just do the same thing as the first presentation.” Instead, however, I’m trying to get them to explore different methods, to be inventive, imaginative. And though I’m greatly nervous as to what people will come up with, I’ve decided to put the creative ball back in their court, and ask them to each submit a proposal for a group project. The reason why I am nervous is that I’m guessing many will want to do the same thing as they did for the first one. But, honestly, this is still okay. What it also does, though, is to open the doors for some people to try, if they want, something different.

Given their own resistance to things new, I also decided to help them be creative for the Final. Instead of a written exam or essay, they are going to make a 10-15 minute video, interviewing people. That is, they are treating this as an experiment based on a question they have developed from our readings. I gave them a base question that I myself am interested in:  what makes American literature American? But they could also develop their own. For instance, one group is thinking about exploring the idea that, given we believe our country was founded on religious freedom (they could pull ideas from Winthrop, the Declaration, etc…),  are we really free to believe as much as we think we are? I think given the nature of the current Republican debates, it’s a great question. Some have already prepared a proposal and met with me to make sure they were clear on what the project was about as well as help them narrow down their question and possible methods for exploring it.  They are actually becoming excited (whereas at first, they let out a collective groan…).

I’ll post later on how things work out…

For now, a final thought on collaborative learning/teaching: it takes planning and time. However, even though I decided on this setup at the last second, there is something to also be said for the forced organic nature of it. I’m tickled with how this class is shaping up and how the students’ critical eyes are developing. But this kind of approach is more than just about teachers being willing to try something new; it’s also about overcoming students’ own resistance to new ways of teaching/learning–which is something I haven’t heard discussed before. It seems that, generally, it’s assumed students are demanding new ways of instruction–and that may be the case; but for me, it seems that that’s only the case when it doesn’t involve any changes on the students’ part. If so, there is a lot of resistance. In this sense, the collaboration seems to need to to occur even in the decision to collaborate.

UPDATE or the Rest of the Story:  a followup to an ealier post on Collaborative Learning and Teaching…

Organic Markup?

I was just reading Claire Ross’s latest post which is about integrating visitor interpretations as part of the museum experience and how that experience might be gauged for future improvements. In the course of her discussion, she uses the phrase “exhibition labels” which made me immediately think of markup. Though I natter on occasionally about using markup for research, my feet have only just gotten wet; I don’t pretend to know a whole lot about it. However, thinking of markup while reading about her idea for an organic museum experience by way of visitor interpretations caused me to think about issues with semantic markup. That is, is there a such a thing as dynamic markup? I suppose that doing xsl transformations on xml is  dynamic in a sense, but from what I understand, it’s still non-dynamic in the sense of using preprogrammed selections that can be run dynamically. What I really am asking, I think, is if  there is such a thing as organic markup–markup that can be fed back into the original markup to grow it, rather than making use of pre-interpreted markup–a crowd-sourced, on the fly, sort of markup? The reason I’m wondering is, that I think it could help future interactions of previously marked up texts–as a way to evolve with future interpretations of not only the text, but of  the tag set used to mark up that text. I’m guessing that that is what natural language processing folk are dealing with–trying to interpret the text instead of the tags. Of course, I know even less about that group. But I would imagine that such organic markup could aid natural language processing . It just seems to me that something like this might  treat the interpretive act of marking up text more as conversation rather than a monologue by one person/project team.I can’t help but think that people have already been working with such an idea. Is Wikipedia  really this kind of markup?

In the interest of full disclosure, before reading Ross’ post, I had also just watched this video  by Barry Ridge, who is a Ph.D. student at the ViCoS lab, showing how their robot, George, used interactive learning to create knowledge updates (basically, how they started off with a simple knowledge schema and slowly grew it by way of his asking questions of the humans). Very cool stuff. And another thing I think is cool about it is that I think the core of what Ridge  is doing is also what Ross is getting at  (but in terms of a different discipline). It also shows how my procrastination tends to guide my reading into very cool things… Oh, the positive reinforcement!

Digital Humanists Skillsets

Recently on Claire Ross’ blog, she asks the question, “Do you need to be procedural literate to be a great digital humanist?” in response to a previous discussion of a paper by Michael Mateas (“Procedural Literacy – Educating the New Media Practitioner”). Her summation is that Mateas

“…suggests that procedural literacy is necessary for DH and new media researchers, because without understanding the back end of the programme, researchers will never be able to think critically about digital projects.”

I think her question is a great one and the easy answer is that being literate would definitely help, but is it necessary? It seems like the answer should be obvious but like all things worth pondering, it really depends.

For one thing, the scope and time-frame of any project will dictate much of who can do what by when and for whom. Before academia, I used to program for a large corporation. Many of our projects–all, if they were not an internal tool for the IT group or an infrastructure project for the company–were managed by people with the business expertise, usually having no formal IT skills (except what they gained through working on such projects). The company’s policy was that business needs ought to guide development and not the other way around. Having project managers didn’t necessary mean top down workflow. These managers had to listen to input from the particular experts as well as be able to ask good questions. It was basically a collaborative learning as well as teaching environment.  And it makes sense for large-scale projects.

But likewise, for smaller projects–helping improve particular department’s tools/workflow or create something new based on new business demands usually consisted of a developer or two acting as a project manager to work with a representative  from the department–again, someone who had the the particular business expertise. It was a collaborative effort. In either of these scenarios, it took someone with vision as well as someone with the particular know-how. In my own experiences, any sort of successful project often boils down to someone having great trouble-shooting skills regardless of whether it’s an IT related project or a strictly business practice related one.

Having said this though, I believe these same sorts of trouble-shooting skills are at the heart of writing essays as well research projects in general. You break down the paper into sections that you know you need to explore, then work on learning what it is you need to in order to do the exploring. This may involve asking other experts, such as advisers, for leads to articles or books. Granted, projects involving developing research/archival sites or tools can feel a lot more like building a house (which can require a lot of different domains of expertise)–which brings me round again to my opening comments about the scope and time-frame of a project. I’ve been wrestling this last year on my own project, knowing I don’t have forever to learn all the necessary programming languages and tools I believe I need to pull it off. But with slow, very minor steps, such as getting my feet wet last year with TEI via Brown University’s text encoding workshops followed by an XSLT class at the Digital Humanities Summer Institute last month,  though I don’t possess any experience with these tools, I’m seeing how I can actually get at some of my project’s questions while also seeing a way to maybe narrow the scope.  At least today I feel this way.  I admit though, that after hearing at the DHSI of all the different projects people are working, I was overwhelmed by how large they were, and as well as the large infrastructures (whether it was time, training, developers, etc through such organizations as the Nines) they required; resources I don’t have. But the good news is that experiences with my smaller projects may lead to work with these larger collaborative efforts.

Back in my IT days, we used to refer to ourselves with that old saw about being a Jack of all trades, master of none. These days, I feel more like the squire…. The cool part about the promise of the Digital Humanities is the amount of cross-collaborative possibilities it holds. As organizations like Project Bamboo mature, they hopefully will become the model of an open market place of skills within which different universities and organizations can trade such skills frequently and within a flattened hierarchy. When that comes, I think the idea of cross-collaboration project managers will become more important than any one individual needing to know not only how to program but multiple languages.  But this then brings up a different issue:  what happens when the majority of people within the field want to be only project managers? Will that create an imbalance that will eventually force people to acquire procedural literacy that Claire and the rest of us are asking about? Hmmm. Might best to work on those skills, if slowly, just in case.


More on Using Microsoft Word’s Find/Replace tool

Just because I’m finding  using Microsoft Word’s find/replace tool so much fun, I thought I would share another experience with it.

I had originally converted all my italicized words and phrases to tagged items in my earlier data prep entries. After having run my macro to output author and book lists, I found lots of mistakes in my (manual) tagging. But they were easy enough to correct. As I worked along, I also corrected spellings (words connected to other words, filling out fullnames, etc). But I didn’t want to do all this again for my “clean” (readable) copy of the text file.

Now, I must admit, that for readable copies of texts, I need formatting such as italics. You may argue with me on future compatibility grounds, but the fact is, my reading experience, even with data, must take precedence over any such issues. So to “reprettify” my text, I used Microsoft Word’s find/replace feature to 1) find certain sets of tags and then 2) get rid of the surrounding tags, leaving the embedded content, and 3) format that content.

So in the Find/Replace window, I searched for:


and replaced with:


and added the italics format.















(*be sure to click More and check the Use wildcards option)

Like in other programs that use regular expressions, you can group them so you can refer to them later, in this case, with parentheses (for example, the first, second, or third grouped item). In this case, I wanted to get rid of the tags (<work> and </work>) but retain all the content between, which is the second group.

The backslashes “\” tell Word that I’m looking for the following character (in other words, escaping out special characters like “<“s). The reason I added a wildcard “*” after “<work” was that I had originally created my data file using “<book>” , <poem>, etc. However, while starting to markup  a different set of files, I decided to use a broader tagn ame, <work>, with an attribute of “type” (i.e. <work type=”book”>). So when I want to find complete tags, I have to account for the extra information between the name and the end bracket of the opening tag.


Here is a example of what I’m starting with:


And here’s what it looks like after finding/replacing the text:



Now, I could have done this for all tags rather than just this one particular set. But I didn’t want to highlight names within my reading (though I may change my mind about that). To do so,  I would have to alter the search terms to something like this:


Notice how I added (\<[!/]*\>)(*)(\</*\>). This tells Word to make sure that the character following the “<” is not a forward slash “/”. I needed to do this because I saw that although Word found the first complete phrase just fine, without that restriction, the start of the next phrase would begin with that closing tag of the previous phrase, causing my opening/ending tags to get out of alignment. I didn’t run into this problem in my first example because the first group within in my search terms precluded it bringing back a closing tag; that is, it could only be an opening tag.

So just to walk through this (to help make sure I’m following this myself), Group 1,  (\<[!/]*\>), reads:

“Look for the “<” character followed by something that is NOT a “/” character (again, to exclude the closing tag from the beginning), followed by all character until and including  the opening tag’s closing bracket, “>”

The second group, (*), reads as

“(continue to) get me all characters”, followed by

the third group, (\</*\>), the final piece of the pattern:

“Look for the characters,  “</” (the opening bracket of the closing tag), followed by any text until and including the final bracket of the closing tag.”

As I was typing this out, I noticed that there is a potential problem with using multiply embedded tags which my first find/replace would not encounter (again, due to specifying the particular tag). That is, I’m guessing that if I had the line,

“<person>John Anderson’s</person> essay, <work type=”essay”>Surviving <person>Walt Whitman’s</person> <work type=”book”>Leaves of Grass</work></work>”

the find/replace pattern would probably retrieve:

<work type=”essay”>Surviving <person>Walt Whitman’s</person>

which is not aligned. I can see how to fix this easily in a macro, just saving off the tagname part in a variable to use later. If Word’s search/replace tool worked like other programs using regular expressions, I imagine that you could do the same thing by subdividing Group 1’s pattern,  (\<[!/]*\>), into more groups:  (\<[!/])(*)(\>)  so that I could then use “\2” in  tag name part of the original Group 3’s pattern (where the “*” was originally):  (\</\2\>), so that it would look like this:


Just out of curiosity, I went ahead and tested this and the embedded tags were indeed a problem. However, my solution only partially worked; that is, it found the first person tag just fine, but my solution only works with tag names that don’t contain anything else (like attributes) within the brackets besides the tagname itself. There might be away around this using exclusion (!) and the range “[]” (the square brackets), but for now, I still think this is pretty cool. It just is good to know about problems/limitations before you start tagging your file so that you can come up with a scheme that will work using the tools you want (or have).

Although I’ve mentioned before that I’m not so worried about “elegant” solutions, the danger of finding new tricks for myself is that it can sometimes keep me from moving forward on projects because of the fun in tweaking them…

Lab Note book: Data Prep 2 and Wordsmith and other tools


Now, in my previous post, I explained that I was tagging some data in my exam corpus for future use as well as simply making reading and author lists.  I was planning on using a feature in Wordsmith Tools that allows you to use tags as selectors in addition to Markup to INclude or EXclude. But I’ve been having a heck of a time of making it work the way I want. Initially, I was using a text file encoded in big indian format for some reason, but after some emails to Wordsmith Tools creator, Mike Scott, he quickly and politely explained that WS doesn’t like that format and instead prefers little indian. WS thankfully also has a unicode converter within its utilities that quickly fixed that problem–but WS 5’s version is buggy; WS 6’s beta version worked great.

I tried the following based on Wordsmith’s help (using Tags) on the web.

So withing the settings section, Tags, I told WS to automatically load a tag file in Markup to Include section with just this line: <book>,</book>:










Don’t forget to click “load” (it’ll say “clear” if you have already loaded it). It’ll show you what tags it found:


In the Only Part of File, Sections to Keep, I have : <book> to <book>    (the Sections to Cut out is empty):








When I create a new concordance on the tag <book>*, it finds all 8 entries in my test file, however, it brings back the full concordance context line:




After trying a bunch of different search words as well as checking the settings, I emailed Mike Scott just to make sure I’m understanding what WS can actually do with this as well as to check my settings file. So while I’m waiting to hear back from him, I decided to go ahead an write a VBA macro (MS Word). I basically recorded my find of the tags and then edited it to save the file as a text file:

Selec All Code:
Sub WriteTagsToTextFile()
' WriteTagsToTextFile
'declare my variables
Dim sBookList As String
Dim oRange As Range
Dim iCounter As Integer
Dim strPath As String
'initilize variables
iCounter = 0
strPath = "C:\Users\Me\BookList2.txt"
With Selection
    .HomeKey (wdStory) 'the homekey is like pressing ctrl home to move to beginning of document
    .Find.ClearFormatting 'get rid of formatting
End With
With Selection.Find
    .Text = "(\<book>*\</book\>)" 'find the tagged entries I'm interested in; in the future I may try to creat a input box to manually enter this in
    .Forward = True
    .Wrap = wdFindStop
     .MatchWildcards = True
    Do While .Execute
        Set oRange = Selection.Range
        With oRange
            'copying all the found tag sets into a variable, inserting a carraige return / line feed after each set
            sBookList = sBookList & .Text & vbCrLf
            iCounter = iCounter + 1
        End With
End With
Set oRange = Nothing
Selection.HomeKey (wdStory)
'Now, create the text file and save the list--if the file exists, it will be over-written

' first, open the text file (to create or overwrite it)
'The #1 is how we refer to the file later to write and close it; I would like to use a save as box here instead, but this is easier for now...
Open strPath For Output As #1
' Write the tag list to the file
Write #1, sBookList
' remember to close the file
Close #1 'again, we use #1 to refer to the opened file
'Remind yourself of what you just did--it isn't necessary, but it's also helpful to know that the script really finished
MsgBox iCounter & " enteries found and saved to " & strPath
End Sub

(btw, I’m trying out My Syntax to display my code snippits.) 

After my macro runs, I see:


and my text file looks like this:


I can now clean up this by finding/replacing the tags with nothing, or open up in Excel. The point is, I have my lists. I also can now count the frequency that particular titles are referenced.

As I said in my comments, I would like to have to have NOT hard-coded the file name and path; that is, I would like to have the macro prompt the user for that information. But maybe later. Or maybe not at all if I am able to do this in Wordsmith. But now that I have my macro, I have other options…  My main point in doing this in vba (besides getting at the data I want) is to highlight the use of having multiple tools to get at the same or similar data. As I’ve mentioned elsewhere, I’m more interested in getting at the data rather than elegant solutions (though they may be cool ones).

Next, I’ll share some results and future directions.

Using Wordsmith Tools at ULL

Here’s set of instructions for accessing and using Wordsmith Tools on the ULL campus. Note that your ULL user id must be added to the share’s security (for licensing purposes), which you can request through myself, Dr. Clai Rice, or Dr. John Laudun. If you are logged onto the campus’ domain, you won’t need to enter your credentials, but if you are using the wireless network, you will (see the instructions):

Accessing and Starting Wordsmith for ULL Users


Lab Notebook: Data Prep

So recently, in preparing for one of my comprehensive exams in early American literature, I scanned in the last eleven years worth of exams into PDF format to make it easier to take notes, copy and paste book titles, authors, etc. Unfortunately, the state of the copies in our department folders meant that I had to first make clean photocopies in order to be able to scan them using our Digital Humanities department’s Fujitsu Snapscan (the side effect of which is a serious case of scanner envy). It wasn’t until I was looking through the newly created PDF that I found a few missing pages, page sequence issues, as well as page direction problems. But using Acrobat Pro’s tools, cleanup of this sort was easy. I also used Acrobat’s OCR tool on the file so that I could copy and paste the text into Word and Excel. I had used Word as my first step so that I could clean up the text (lots of minor OCR misreads) before copying and pasting into Excel (I’m using different worksheets based on the different sections of the exams; though the format has evolved a little over the last decade, they basically fall into IDs, shorter essays, and longer essay sections). And since I was already cleaning up the text within Word, I decided to also keep the Word document just to have a cleaner version of the exams. In cleaning up my Word file, I made sure to try to maintain all the original significant formatting such as italics for book titles; it just makes the exams easier to read.

I also stripped out nonessential information, such as sectional instructions (though that could make for an interesting rhetorical analysis in itself), and just labeled each section “Part I,” “Part II,” and “Part III.”


My Excel file is just beginning with rudimentary information for now:

Eventually, I’ll add columns such as themes, persons of interests, periods, related critics, related novels, etc.


After  getting this far, it occurred to me this task might be easier and more valuable in the long run for gathering different sets of stats, if I were able to insert markup tags. For instance, I know Word will let me search and replace based on text formatting, and so I might be able to search and replace all the italicized words with their original words but also with opening and closing tags (something like <book>…</book>). I couldn’t figure out how to do the replace part until I googled around for Word and using regular expressions. Sure enough it handles them (read a good introduction on this by Microsoft: “Putting regular expressions to work in Word”.

However, in the end, I didn’t need to use them. I spent a great deal of time yesterday trying to get past the problem of being able to only replace every italicized word rather than the entire phrase. I eventually got it working using a VBA macro. But it turns out I didn’t need to use the regular expressions or my VBA code. Today, while retyping all of this (I lost my file while running test code—the lesson being, save all open files before testing out your VBA code!), I found exactly what I needed here. I just swapped out their replacement text with what I was looking for and it worked like a charm. The “^&” below is the  code for what I originally was finding (think of it like a variable that contains the original text). By using it in the replace box, I’m able to insert what I needed to as well as the original search terms (in this case, the formatted phrase).

Very cool. And powerful.

One thing to note, though, was that before I did this, I first had to do a search and replace italicized paragraph mark with a non-formatted paragraph mark because it would create an empty set of tags if the paragraph marker was also formatted as italics.


Since I didn’t remove formatting within the replace box, however, my <book>…</book> tags were inserted as italicized text. So a simple search and replace the tags with an non-formatted version and presto:

Just remember to do this with the closing tag as well. Though there are other cooler ways of doing this, simple and fast go a long way in my book.


Of course, I also had to manually verify that all of these tags actually were for book titles. There were a few cases of quotes or exam instructions that I hadn’t taken out, or cases where the question text was being emphasized. In those cases, I used other tags (such as <emphasis>why</emphasis>  or <foreign>fin de siècle</foreign>). I currently have no need of these tags, but since it was easy (and I was verifying the text anyway), I decided to go ahead and use them.


Next, I want to use this file in Wordsmith Tools to see if anything interesting or useful pops up and see if I can simply create a list of the books based on the <book> tags.