Case Study: Automated Copy Editingmikepasini.com

Copy editing doesn't get the respect it deserves, probably because you have to have worked in the publishing business to appreciate it. Enforcing a publication style isn't quite like accessorizing a wardrobe. It's more like having the discipline to call each thing by one name, making text much clearer for the reader. Not all of that task can be automated, but the part that can be, should be. That way, your copy editors can focus on the part that can't be automated.

Original Text

As any publisher knows, contributors provide text in a wide variety of formats. And each contributor -- whether it's a public relations firm or a feature writer -- has their own style of usage, punctuation and, well, ways of saying things.

But readers appreciate text that's been standardized into a publication's style. That's a lot of work for a copy editor. And it often isn't done. When text was committed to print, several sets of eyes would pass over it first. But with text appearing as pixels, the task of copy editing and proofreading is often skipped.

Whether contributors transmit their work as text,a PDF or in that proprietary document format we can't seem to escape, it can be massaged by smart, custom software. Putting your style into code and your code into your workflow are two tricks that don't require magic.

Below, we've concocted a sample text for an imaginary photo publication as it might appear from a new contributor. If you read through it, you'll see it isn't very well formatted. In fact, an author's style often has more to do with how well they type than anything else. The text appears as it does just because it's more convenient for them to do it that way.

original

Every line of this sample requires changes. Doing them by hand is slower and less reliable than the program.


Editing Software

To get from that text to, say, Associated Press style with a few publication preferences tossed in, we write a small program in Perl. The program sets some variables like the current year, initializes some counters and then gets to work.

One of the first things it does is standardize the paragraphs to eliminate any gratuitous white space. In this case, it also enforces a block style (one blank line between unindented paragraphs) no matter how the author submitted the text.

There are about 1,400 global search-and-replace pairs in this program to handle dates, punctuation and things like phone numbers before changing more arbitrary phrases into the publication style.

For the most part, that means enforcing AP style. But the arbitrary changes are equally as significant because they ensure a phrase appears in only one way in the publication, simplifying things for the reader.

This code happens to be available as a menu item in the editor's software, but in can also be a standalone application.

code

Each line is a search-and-replace command. You can easily add to them as the occasion demands.


Results

In the blink of an eye, the original text was converted into the text shown below using a menu item in the copy editor's software. Mouse over the image to see the changes.

With the text cleaned up, the copy editor can start focusing on the content without worrying as much about the form.

And because the code can be edited with the same text editor the copy editor is using, the software can instantly adapt to new requirements.

It's very easy to add on to the list of edits the program can make by simply copying the substitution format of the text-only software module (above).

correction

Mouse over to see the difference. And it happens just that fast, too.


Back to the Home page