Strategies for using Regular Expressions for converting text documents to xml

Thanks to @davidamichelson (by way of Nuzzle) for retweeting a post from the University of Pittsburgh’s Digital Humanities / Digital Studies program about their excellent tutorial on different strategies for using RegEx for “autotagging” text documents with xml. Although they are specifically using <oXygen/> as their editor, their suggestions still apply to many others.

While many of the people using such pattern replacements would probably create scripts for their reuse and the processing of multiple documents, I am waiting still for someone to develop a good, full-blown gui application version that not only includes a RegEx/pattern builder but also includes as part of its tools a text analysis engine to help discover patterns that might be of interest to users for tagging purposes.

Leave a Reply

Your email address will not be published. Required fields are marked *