Category: Technology

Podcast Project in the Class Part 1: Podcasts as an English 102 Research Paper

I am still reworking and adding materials to this series of posts

Part 1: Introduction

This is an expanded discussion of a presentation I made for my English department faculty meeting at the end of 2019.

Currently, the roadmap takes this path:

  1. Introduction
  2. Working with Groups
  3. Contextualization
  4. Elements of Podcasts and Stories
  5. Interviewing
  6. The Practice Podcast
  7. The Suicide Podcast
  8. Workflow and Mechanics of Evaluations
  9. Final Thoughts
  10. Resources

It has been a while since I’ve written about my classroom experiments. A few years ago, my English 102 composition students created researched video essays. We had some really interesting results. Maybe I will eventually write about them. But today, I am going to walk through my English 102 class from last winter (2020), where instead of the traditional research essays, we created podcasts.

As my note above explains, this post is an attempt to fill in the gaps of a mere slide pack. I say, “this post” but really mean “these posts” as after having read through it/these, I realized that I’ve gone into much more detail and explanations than I would have for a slide presentation. So, as with an earlier set of posts, I’ve broken out this discussion into multiple sub-posts for readability’s sake as well as to satisfy my own tendency to beat to death horses:

The Backstory

The idea for using podcasts as research papers was born out of work for an NEH grant my dean at the time had written for creating contextualized courses for both English and Math with other disciplines–any approach was okay. I had recently started working with Bonne Smith, one of our Journalism instructors, on developing a contextualized English 102 research writing course for a newspaper course. At first, we wanted to take a community learning approach (teaching both the newspaper and the composition classes together). However, as the newspaper course requires students to interview and be accepted in order to become a member of the editing staff, it became apparent that those logistics would not work for every student. So instead, we decided to treat the composition class as a feeder course for the newspaper class, using journalism as the framework to generate student interest in the newspaper. After much brainstorming about various projects that could teach journalism skills as well as allow me the ability to still cover the principles of composition, research, documentation, and organization of information, we decode that the vehicle for this endeavor would be podcast creation. But as a feeder class, we needed the newspaper to be involved.

So Bonne and I decided that the topics for the podcast would be chosen by the editors of the college’s newspaper each quarter (pending their approval, of course). These topics would revolve around the campus and larger community issues (we’re a small community college and saw this as a opportunity to encourage our students to learn more about their college and town). The topics would be based on sets of stories and articles on which the newspaper had published during previous quarters. This not only provided the field in which my students played, but also acted as a source from where students could begin their initial research, to get a lay of the land, so to speak. Also, it showed how students just like them could produce/publish materials that would be seen by a much larger audience than their classmates.

It just so happened that the during previous quarter, the newspaper had published a series of articles regarding suicide. These articles provide an overview of other research and statistics that we used to prompt discussion for our students. We then asked the students to look for topics in the articles about which they wanted to know more; that is, to find topics they were interested in and in which they could dig deeper. At the end of the quarter, the students would be able to submit their podcast to the newspaper, and if selected, would be published on the paper’s website. This way, we could better connect the class to the college’s newspaper.

A continual source of surprise for me still is the realization that the front part of the course proved to be so much more difficult and demanding than any technological aspects, whatever that may be (audio, video, multimedia).

Because of that lesson, I’m going to spend much more time over the course of the next few posts discussing Groups, Interviews, and Medium Structure [BLAH BLAH BLAH]. While I do not plan on getting into the trenches of specific technology such as Audacity or even the Rodecaster Pro (compliements of our Library’s still developing Audio/Visual lab), I will note what we used, or at least some of my suggestions to students on what they could use. I am currently developing instructional materials for that new A/V lab (which is in desperate need of naming!). The quick explanation for why I am not going to discuss that is because I did NOT teach the specifics of any recording devices or audio editing package. I merely suggested apps and general approaches, offering my assistance as needed. Only once or twice over the years of doing projects like this has any group actually asked me for specific help. And in those cases, I googled alongside them and learned as they learned. Of course, having lots of experience with all sort of programs did help. But still, the lesson ought to be clear: trust your students’ abilities.

So why a podcast?

Like an essay, a podcast typically introduces the topic that is the subject of the episode, then supports it (using primary as well as secondary sources), then arrives at some sort of conclusion. In the movies or in books, this is commonly referred to as a beginning, middle, and end, though a podcast’s beginning and ending may be much shorter. And like an essay, some research will be required prior to interviewing anyone. That is, in order to know what kinds of questions should be asked of the interviewees (sources), the interviewers (author) will need to know more about the topic of their podcast (research paper).  But all this will stem from developing a research or focus question (The Pitch).

This is not to say that all podcasts are research projects. They obviously can be creative works such as stories, poems, plays, music, etc.. All of which require no research. I can easily imagine for an English 101 class, a professor who is teaching the narrative essay. This could make for an interesting podcast as well as learning moment to reinforce the elements of a story such as translating written transitions to audible ones. What all these modalities of essays, or any creative work, require is the ability to tell a story. Depending on the goal of the class or project, there will be different ways to tell that story. The same is true of podcasts.

However, what I typically focus on in my English 101 composition course is the argumentative essay and its elements and structure; it allows me to touch on all the other types (narrative, compare/contrast, etc.) because a well-written argument, in my experience, often includes many elements/strategies from those other modes. I personally am of the ilk who believe that all communication is a form of argument. In English 102, I continue to focus on structure and elements but switch to a medium with which students have much less (even zero) experience. I do this in order to help slow students down during the creation (writing) process to be more deliberative and reflective so that they can make more informed choices about that structure and why they use whichever elements they choose. Part of that process is choosing what not to include, which is just as important as what to include. Organizing that content is dependent upon what story is being told as well as its goal and audience—and the medium. In other words, the medium of the story isn’t a special case; as with podcasts, those dependencies are true of other mediums like videos, multimedia projects, blogs and all the rest.

The goals behind this contextulized English 102 composition class were for students to learn:

  • more about the structure of writing by using a less familiar medium (podcasts) and so, more about the possibilities and power of narratives
  • research methods, documentation and integration of sources
  • collaborative learning (group work)
  • more about journalism: interviewing and reporting, integrity and ethics
  • empathy; how to create relationships within the community and the value of primary resources; that is, not merely using people as resources for their own gain
  • more about the college’s newspaper course to increase their student enrollment

Part 1: Introduction

Strategies for using Regular Expressions for converting text documents to xml

Thanks to @davidamichelson (by way of Nuzzle) for retweeting a post from the University of Pittsburgh’s Digital Humanities / Digital Studies program about their excellent tutorial on different strategies for using RegEx for “autotagging” text documents with xml. Although they are specifically using <oXygen/> as their editor, their suggestions still apply to many others.

While many of the people using such pattern replacements would probably create scripts for their reuse and the processing of multiple documents, I am waiting still for someone to develop a good, full-blown gui application version that not only includes a RegEx/pattern builder but also includes as part of its tools a text analysis engine to help discover patterns that might be of interest to users for tagging purposes.

Cucidati, Sicilian Fig Cookies

Recently, a friend whose fig tree was particularly productive this year, asked me what all my mother would do with such a bounty (they are both wonderful cooks). Rather than giving her just the recipe, I decided to film my mom preparing and talking about her own grandmother’s fig cookie recipe. Not only do I get to share my mom with the world, I also get to preserve one of my favorite memories:

Recording her also provided me the opportunity afterwards to play around with video editing software, something about which I keep wanting to learn more but have had few projects to do so. It also served as a reminder of what my folklorist friends like to say about fieldwork: be sure to check the batteries before leaving the night before and make sure to bring spares. Well, I forgot and less than halfway through filming, ended up needing to switch to recording from my smartphone.

I was a little anxious about splicing the clips together because I also did not pay attention to the video format in which I was recording. One camera used the .mov format while the other used .mp4. But I assumed that a video editor should be able handle multiple formats.

The least expensive software usually is an online version. However, I’ve played a little with Google’s online editor via YouTube, and found it slow (at least my connection made it slow) and its interface limited. This time, I wanted to try a desktop package instead.

A while back, I familiarized myself with Microsoft Movie Maker in case students from my Early American Literature course needed technical help for a video essay assignment that was to serve as their Final for the semester. I remember thinking it was adequate (and free), so I went to find an updated version. Unfortunately, only the same 2012 version was available for Windows 7. Regardless, I installed it.  Like YouTube, it has a fairly simple interface, including a Timeline in which you can rearrange or cut unwanted footage. However, my footage contained a number of uncontrolled moments (background noises and other conversations) that needed finer control in order to lower the volume or edit those out.

Frustrated, I decided to try a different product. I’m already a big fan of TechSmith’s screen capture program, SnagIt. I’ve used it for a couple of years, and love it for jobs requiring more than Windows’ basic screen capturing ability. So I decided to try their Camtasia Studio program. (The availability of a fully functional 30 day trial was also a big carrot).

I must admit up front that I am almost a complete novice with video editing. And like any novice, I jumped into the program manuals be damned. I discovered that the interface isn’t as intuitive as it seems–again, this is a novice’s perspective. However, after much “suggesting” by TechSmith to watch their tutorial videos before getting started, I finally gave in and watched some of the intro tutorial videos. I’m happy I did; they quickly oriented me. Even so, their videos didn’t always give the depth I needed. For example, adding a Title or Credits is not as straightforward (in version 8) as one might expect. It’s part of their Callouts object but even after watching their tutorials on the subject, I still needed help from the community with adding them before the actual start of the video.

Even so, after spending only a few hours of adding, splitting, deleting, and stitching clips, as well as integrating transitions and finally adding credits, I felt quite relaxed with the interface. I do realize there is still much I need to learn and experiment with, but my 30 day trial is almost over, and unfortunately, the cost, even for educators, is currently prohibitive. Having said that, though, I do realize other similar packages also cost quite a bit. For future class projects, I still would like to find a free solution, even an online one, that is just as powerful and easy to use. Although I have never tried it, I hear great things about Apple’s iMovie, which is included free with iOS.

Sample Project, Notes from the Collaborative Classroom pt 4

Sample Project: Recipes

While the early posts on this topic were discussing the mechanics and workflow of group work, one of the main reasons I wanted to write about this experience, was also to highlight some of the outstanding projects produced by my students.

And so finally…!

The Recipe Project is one of my favorites because it requires really understanding the medium and genre of the documents involved. That understanding comes in the form of two different requirements of this particular project:  1) breaking down all the elements and concepts of recipe documents, and 2) choosing  a different genre and blending them together. Also, groups needed to produce two versions of this document where one focused on an older audience and the other on a younger one. As with all of the projects in the course, my goal is to get my students to see and work beyond conventional concepts of what constitutes a particular type of documents (in this case, a recipe).

Here is a copy of the Recipe Project Prompt.

Before they began the actual project planning files, they were to first decide as a group who their audience was and which recipe they were going to use as well as which type of document they were going to use as the source for the blended document. They prepared a relaxed “presentation” for the class so that we could ask questions and make sure they were all on the same page. In all, there were two areas with which they had difficulty. The first was with the idea of blending documents itself.  For example, the Penguins group decided they wanted to use a recipe for Lenga (beef tongue) since it was the favorite of a group member. However, during that initial class discussion, it was clear that they hadn’t decided on a source type of document; rather, they focused on their audience, but in a general way: for the adults, they would write it more “seriously” and for the younger people, it would be more “fun”. But they were confused when questioned about in what type of format (document) the recipes would show up. To be fair, this was typical during that first discussion. To help, I would ask them about their audience and why they might be wanting to show these people this recipe. That is, I was trying to get them to think about context.

One very clever group, the Hedgehogs, decided that they wanted to use a cake ball recipe in the form of the wedding invitation genre.  While maintaining the parameters of the writing prompt, they went one step further for the younger audience version by also converting the invitation/recipe into a puzzle.

An interesting thing that began to occur later in the semester, was that the groups began to consider their planning forms as part of their finished documents, and began tailoring them to echo the content and style of their finished projects, creating a sort of brand. Here are their Recipe Project Planning Forms they used.

And here are the electronic versions of both the adult and children invitations that they uploaded to Moodle:

Wedding Cake Balls – Adult Invitation

Wedding Cake Balls – Children Invitation

If you read through them, you will notice the group’s extreme attention to the details of their different genres. For example, the numbers were spelled out in the adult version because that is part of the invitation’s formality. Although it was a simple thing, they switch to numeric representation for the children’s version because the numbers would be easier to read, yet the tone, while fun, still helped maintain a sense of formality (as did the font usage–which also changed from adult to children audiences).

While the PDFs make the text easier to read, they do not do the documents justice; please browse through the gallery and enjoy the group’s creative use of form and details:

This slideshow requires JavaScript.

The Recipe Project Report shows how their fellow classmates also responded to such details.

Hey, Microsoft! Over here! Pick me!

I’ve been eagerly awaiting the new Microsoft Surface Pro (after the disappointing introduction of the lesser Surface RT) and so have been doing my due diligence by looking at its reviews since its recent release. It seems that many reviews have claimed the Surface Pro as an innovative design but complain about storage, battery life and weight. Okay, I get the battery life complaint–it’s too short. But not because my friends’ iPads or Android tablets get upwards to 10 hours, but because as a laptop, I want longer life. As a laptop–not an iPad or Android Tablet. Having said that, though, my current Toshiba Satellite tops out at about 4 hours, so even there, the Pro is an improvement. Weight is the other factor that these reviews seem to like to compare to the tablet devices–that since it’s heavier than the iPad, its use as a tablet is questionable. Since when did 2 pounds become “too heavy”? I’m sorry, but I’ve played around with an iPad as well as Android Tablets. Just because they are lighter doesn’t make them more useful. I’ve seen some nice apps on these devices, don’t get me wrong (reading an ebook on the Kindle Fire HD is beautiful), but as far as productivity apps like word processing or spreadsheets go, no thanks; I need a device that works with my workflow habits rather than forcing me to conform to it. It’s the software and the hardware. And so far, although I know a number of people who have really made a go at using tablets as primary devices, none of them have succeeded.That is, they still have their Mac or Windows desktops and laptops. And admit it:  how many of us, during that time just before the iPad first came out, dreamed longingly of a world where we could do everything on a tablet?  Though the iPad iterations as well as the host of Android tablets continue to be beautiful,  they have yet to come even close to fulfilling this dream.

Although this reviewer does talk up some of the good points about the Pro, it’s a great example of how many of the reviewers are not quite getting the desire for such a device by people like myself:

“The Surface Pro “suffers from trying to be too many things and not being good at any of them,” commented Carl Howe, a research vice president at the Yankee Group.”

Oh really?

I teach in a university and tote my laptop around with me everywhere I go. I receive student assignments as well as send feedback via Moodle. However, I would also like to handwrite on the documents rather than highlighting my comments or using Microsoft’s comment tools (handwritten notes tend to be much shorter, thereby helping me spend much less time on a single student’s assignment). Tablet mode to the rescue! Typing up assignments and papers or doing research, all using a real keyboard? Laptop mode to the rescue!

The fact that I can use the Surface Pro as a tablet as a more friendly way to consume media such as video or ebooks, or use it as a fully functional and powered laptop to do actual work, gives me great joy.

It seems to me that it’s not that Microsoft doesn’t know who the Surface Pro’s audience is, but that the reviewers don’t. If the reviewers want to really give an accurate comparison, they should be looking at the PC Tablets that have been around since around 2000. I was ecstatic when these first came out. However, they never became cheaper nor did their specs come close to a “real” laptop’s specs (meaning underpowered and little storage). The Surface Pro on the other hand, seems like it could change that. Although it’s battery life is dismal, it’s more than on par compared to the PC Tablets.

One review, from the Verge, did point out a problem with the kickstand not being adjustable nor good on one’s actual lap. How many people actually use their laptops directly on their laps, though? Though there have been one or two occasions in recent memory where I had to do this, I almost always use either a laptop cooling pad, or a clip-board that fits easily in my backpack with my laptop. But an adjustable kickstand would be smart–after all, even when sitting at a desk, I want to adjust my device rather than my chair or desk.’s review was one of the more honest with itself when it came to what they made of the different  device:  “CONCLUSION: We’re mixed”

I’m not saying that this is the perfect device (yet), but it’s a whole lot closer to the machine I want for my work and personal life than any other device currently out there. So Microsoft, here I am:  your audience! I’ve been waiting for this device a long time. Although I may wait a little longer just because I typically don’t buy first generation devices, I hope you will wait for me!

Yet one more use for Word Clouds: job descriptions

I recently was using’s word cloud generator in my Early American Literature class as a demonstration of different ways of looking at text. They make it very easy to create and share.You can either click the “Create” link or tab, paste in your text, and hit “Go”. And then presto! They give you a number of layout and color schemes to work with. I suggest clicking around, trying them all out. I prefer the “mostly horizontal” layout and less cartoony colors. They also give you a number of ways to share it–creating a public link (and deleting it) is just as easy as creating the word cloud itself. Here’s an example Wordle of Ralph Waldo Emerson’s “The Poet”  I used for the class (forgive the color settings–I went with the default).

I must have still had Wordles on the brain, because as I was looking at a new digital humanities job posting that was forwarded my way, I started to wonder what word clouds of two different job descriptions might show:

The first posting:


The second posting:

Although they are not the same sorts of positions overall, they have fairly similar general duties. Now, I would not want to make too much of this analytically; however, it’s still interesting to see the different emphases that the word clouds reveal.


Organic Markup?

I was just reading Claire Ross’s latest post which is about integrating visitor interpretations as part of the museum experience and how that experience might be gauged for future improvements. In the course of her discussion, she uses the phrase “exhibition labels” which made me immediately think of markup. Though I natter on occasionally about using markup for research, my feet have only just gotten wet; I don’t pretend to know a whole lot about it. However, thinking of markup while reading about her idea for an organic museum experience by way of visitor interpretations caused me to think about issues with semantic markup. That is, is there a such a thing as dynamic markup? I suppose that doing xsl transformations on xml is  dynamic in a sense, but from what I understand, it’s still non-dynamic in the sense of using preprogrammed selections that can be run dynamically. What I really am asking, I think, is if  there is such a thing as organic markup–markup that can be fed back into the original markup to grow it, rather than making use of pre-interpreted markup–a crowd-sourced, on the fly, sort of markup? The reason I’m wondering is, that I think it could help future interactions of previously marked up texts–as a way to evolve with future interpretations of not only the text, but of  the tag set used to mark up that text. I’m guessing that that is what natural language processing folk are dealing with–trying to interpret the text instead of the tags. Of course, I know even less about that group. But I would imagine that such organic markup could aid natural language processing . It just seems to me that something like this might  treat the interpretive act of marking up text more as conversation rather than a monologue by one person/project team.I can’t help but think that people have already been working with such an idea. Is Wikipedia  really this kind of markup?

In the interest of full disclosure, before reading Ross’ post, I had also just watched this video  by Barry Ridge, who is a Ph.D. student at the ViCoS lab, showing how their robot, George, used interactive learning to create knowledge updates (basically, how they started off with a simple knowledge schema and slowly grew it by way of his asking questions of the humans). Very cool stuff. And another thing I think is cool about it is that I think the core of what Ridge  is doing is also what Ross is getting at  (but in terms of a different discipline). It also shows how my procrastination tends to guide my reading into very cool things… Oh, the positive reinforcement!

Digital Humanists Skillsets

Recently on Claire Ross’ blog, she asks the question, “Do you need to be procedural literate to be a great digital humanist?” in response to a previous discussion of a paper by Michael Mateas (“Procedural Literacy – Educating the New Media Practitioner”). Her summation is that Mateas

“…suggests that procedural literacy is necessary for DH and new media researchers, because without understanding the back end of the programme, researchers will never be able to think critically about digital projects.”

I think her question is a great one and the easy answer is that being literate would definitely help, but is it necessary? It seems like the answer should be obvious but like all things worth pondering, it really depends.

For one thing, the scope and time-frame of any project will dictate much of who can do what by when and for whom. Before academia, I used to program for a large corporation. Many of our projects–all, if they were not an internal tool for the IT group or an infrastructure project for the company–were managed by people with the business expertise, usually having no formal IT skills (except what they gained through working on such projects). The company’s policy was that business needs ought to guide development and not the other way around. Having project managers didn’t necessary mean top down workflow. These managers had to listen to input from the particular experts as well as be able to ask good questions. It was basically a collaborative learning as well as teaching environment.  And it makes sense for large-scale projects.

But likewise, for smaller projects–helping improve particular department’s tools/workflow or create something new based on new business demands usually consisted of a developer or two acting as a project manager to work with a representative  from the department–again, someone who had the the particular business expertise. It was a collaborative effort. In either of these scenarios, it took someone with vision as well as someone with the particular know-how. In my own experiences, any sort of successful project often boils down to someone having great trouble-shooting skills regardless of whether it’s an IT related project or a strictly business practice related one.

Having said this though, I believe these same sorts of trouble-shooting skills are at the heart of writing essays as well research projects in general. You break down the paper into sections that you know you need to explore, then work on learning what it is you need to in order to do the exploring. This may involve asking other experts, such as advisers, for leads to articles or books. Granted, projects involving developing research/archival sites or tools can feel a lot more like building a house (which can require a lot of different domains of expertise)–which brings me round again to my opening comments about the scope and time-frame of a project. I’ve been wrestling this last year on my own project, knowing I don’t have forever to learn all the necessary programming languages and tools I believe I need to pull it off. But with slow, very minor steps, such as getting my feet wet last year with TEI via Brown University’s text encoding workshops followed by an XSLT class at the Digital Humanities Summer Institute last month,  though I don’t possess any experience with these tools, I’m seeing how I can actually get at some of my project’s questions while also seeing a way to maybe narrow the scope.  At least today I feel this way.  I admit though, that after hearing at the DHSI of all the different projects people are working, I was overwhelmed by how large they were, and as well as the large infrastructures (whether it was time, training, developers, etc through such organizations as the Nines) they required; resources I don’t have. But the good news is that experiences with my smaller projects may lead to work with these larger collaborative efforts.

Back in my IT days, we used to refer to ourselves with that old saw about being a Jack of all trades, master of none. These days, I feel more like the squire…. The cool part about the promise of the Digital Humanities is the amount of cross-collaborative possibilities it holds. As organizations like Project Bamboo mature, they hopefully will become the model of an open market place of skills within which different universities and organizations can trade such skills frequently and within a flattened hierarchy. When that comes, I think the idea of cross-collaboration project managers will become more important than any one individual needing to know not only how to program but multiple languages.  But this then brings up a different issue:  what happens when the majority of people within the field want to be only project managers? Will that create an imbalance that will eventually force people to acquire procedural literacy that Claire and the rest of us are asking about? Hmmm. Might best to work on those skills, if slowly, just in case.


More on Using Microsoft Word’s Find/Replace tool

Just because I’m finding  using Microsoft Word’s find/replace tool so much fun, I thought I would share another experience with it.

I had originally converted all my italicized words and phrases to tagged items in my earlier data prep entries. After having run my macro to output author and book lists, I found lots of mistakes in my (manual) tagging. But they were easy enough to correct. As I worked along, I also corrected spellings (words connected to other words, filling out fullnames, etc). But I didn’t want to do all this again for my “clean” (readable) copy of the text file.

Now, I must admit, that for readable copies of texts, I need formatting such as italics. You may argue with me on future compatibility grounds, but the fact is, my reading experience, even with data, must take precedence over any such issues. So to “reprettify” my text, I used Microsoft Word’s find/replace feature to 1) find certain sets of tags and then 2) get rid of the surrounding tags, leaving the embedded content, and 3) format that content.

So in the Find/Replace window, I searched for:


and replaced with:


and added the italics format.















(*be sure to click More and check the Use wildcards option)

Like in other programs that use regular expressions, you can group them so you can refer to them later, in this case, with parentheses (for example, the first, second, or third grouped item). In this case, I wanted to get rid of the tags (<work> and </work>) but retain all the content between, which is the second group.

The backslashes “\” tell Word that I’m looking for the following character (in other words, escaping out special characters like “<“s). The reason I added a wildcard “*” after “<work” was that I had originally created my data file using “<book>” , <poem>, etc. However, while starting to markup  a different set of files, I decided to use a broader tagn ame, <work>, with an attribute of “type” (i.e. <work type=”book”>). So when I want to find complete tags, I have to account for the extra information between the name and the end bracket of the opening tag.


Here is a example of what I’m starting with:


And here’s what it looks like after finding/replacing the text:



Now, I could have done this for all tags rather than just this one particular set. But I didn’t want to highlight names within my reading (though I may change my mind about that). To do so,  I would have to alter the search terms to something like this:


Notice how I added (\<[!/]*\>)(*)(\</*\>). This tells Word to make sure that the character following the “<” is not a forward slash “/”. I needed to do this because I saw that although Word found the first complete phrase just fine, without that restriction, the start of the next phrase would begin with that closing tag of the previous phrase, causing my opening/ending tags to get out of alignment. I didn’t run into this problem in my first example because the first group within in my search terms precluded it bringing back a closing tag; that is, it could only be an opening tag.

So just to walk through this (to help make sure I’m following this myself), Group 1,  (\<[!/]*\>), reads:

“Look for the “<” character followed by something that is NOT a “/” character (again, to exclude the closing tag from the beginning), followed by all character until and including  the opening tag’s closing bracket, “>”

The second group, (*), reads as

“(continue to) get me all characters”, followed by

the third group, (\</*\>), the final piece of the pattern:

“Look for the characters,  “</” (the opening bracket of the closing tag), followed by any text until and including the final bracket of the closing tag.”

As I was typing this out, I noticed that there is a potential problem with using multiply embedded tags which my first find/replace would not encounter (again, due to specifying the particular tag). That is, I’m guessing that if I had the line,

“<person>John Anderson’s</person> essay, <work type=”essay”>Surviving <person>Walt Whitman’s</person> <work type=”book”>Leaves of Grass</work></work>”

the find/replace pattern would probably retrieve:

<work type=”essay”>Surviving <person>Walt Whitman’s</person>

which is not aligned. I can see how to fix this easily in a macro, just saving off the tagname part in a variable to use later. If Word’s search/replace tool worked like other programs using regular expressions, I imagine that you could do the same thing by subdividing Group 1’s pattern,  (\<[!/]*\>), into more groups:  (\<[!/])(*)(\>)  so that I could then use “\2” in  tag name part of the original Group 3’s pattern (where the “*” was originally):  (\</\2\>), so that it would look like this:


Just out of curiosity, I went ahead and tested this and the embedded tags were indeed a problem. However, my solution only partially worked; that is, it found the first person tag just fine, but my solution only works with tag names that don’t contain anything else (like attributes) within the brackets besides the tagname itself. There might be away around this using exclusion (!) and the range “[]” (the square brackets), but for now, I still think this is pretty cool. It just is good to know about problems/limitations before you start tagging your file so that you can come up with a scheme that will work using the tools you want (or have).

Although I’ve mentioned before that I’m not so worried about “elegant” solutions, the danger of finding new tricks for myself is that it can sometimes keep me from moving forward on projects because of the fun in tweaking them…

Lab Note book: Data Prep 2 and Wordsmith and other tools


Now, in my previous post, I explained that I was tagging some data in my exam corpus for future use as well as simply making reading and author lists.  I was planning on using a feature in Wordsmith Tools that allows you to use tags as selectors in addition to Markup to INclude or EXclude. But I’ve been having a heck of a time of making it work the way I want. Initially, I was using a text file encoded in big indian format for some reason, but after some emails to Wordsmith Tools creator, Mike Scott, he quickly and politely explained that WS doesn’t like that format and instead prefers little indian. WS thankfully also has a unicode converter within its utilities that quickly fixed that problem–but WS 5’s version is buggy; WS 6’s beta version worked great.

I tried the following based on Wordsmith’s help (using Tags) on the web.

So withing the settings section, Tags, I told WS to automatically load a tag file in Markup to Include section with just this line: <book>,</book>:










Don’t forget to click “load” (it’ll say “clear” if you have already loaded it). It’ll show you what tags it found:


In the Only Part of File, Sections to Keep, I have : <book> to <book>    (the Sections to Cut out is empty):








When I create a new concordance on the tag <book>*, it finds all 8 entries in my test file, however, it brings back the full concordance context line:




After trying a bunch of different search words as well as checking the settings, I emailed Mike Scott just to make sure I’m understanding what WS can actually do with this as well as to check my settings file. So while I’m waiting to hear back from him, I decided to go ahead an write a VBA macro (MS Word). I basically recorded my find of the tags and then edited it to save the file as a text file:

Selec All Code:

(btw, I’m trying out My Syntax to display my code snippits.) 

After my macro runs, I see:


and my text file looks like this:


I can now clean up this by finding/replacing the tags with nothing, or open up in Excel. The point is, I have my lists. I also can now count the frequency that particular titles are referenced.

As I said in my comments, I would like to have to have NOT hard-coded the file name and path; that is, I would like to have the macro prompt the user for that information. But maybe later. Or maybe not at all if I am able to do this in Wordsmith. But now that I have my macro, I have other options…  My main point in doing this in vba (besides getting at the data I want) is to highlight the use of having multiple tools to get at the same or similar data. As I’ve mentioned elsewhere, I’m more interested in getting at the data rather than elegant solutions (though they may be cool ones).

Next, I’ll share some results and future directions.