Prosecraft: Linguistics for Literature

Make your writing more vivid by mindfully comparing your prose with authors you admire.

Benji Smith
The Shaxpir Blog

--

come explore our library, at http://prosecraft.io

How vivid is your writing?

You’ve probably been told over and over again to get rid of adverbs, or to avoid passive voice, or to use more sensory language in your prose.

But how well are you actually doing? How does your writing actually compare with the authors your admire?

Today, I’m absolutely giddy to announce Shaxpir’s latest product for professional writers: Prosecraft, the world’s first (and only!) linguistic database of literary prose.

Prosecraft finally gives you a reliable measurement of the different aspects of your prose, like vividness or passive-voice, and lets you compare those measurements with authors you admire. It helps you answer some of the nagging questions you might have about the quality and style of your own writing, so that you can create more more compelling stories and be more expressive with your prose.

To do that, we assembled an enormous library of fiction — more than 270 million words of prose, written by over a thousand different authors — and we analyzed all that text using a sophisticated suite of linguistic algorithms.

Word Count

As you start writing your next novel, one of the first questions you might ask yourself, is: how long should it be?

You’ve probably heard that a minimal novel has about 50,000 words, or that traditional publishers consider a “full length” novel to be about 100,000 words. But what about your favorite novels?

When I started writing my memoir, I actually gathered together a few memoirs I admired the most (for example: Eat, Pray, Love and AHWOSG) and counted by hand the number of words on the first few pages. Then I counted the total number of pages, and multiplied the two numbers to get an estimate for the total length of the book.

It was a tedious process, and it made me wonder why I couldn’t find the information already on the internet anywhere.

With Prosecraft, you don’t have to guess anymore.

Prosecraft shows you exactly how many words are in your favorite books, and how those books compare with all the other items in the library. For example, the first Harry Potter book has 77,494 words, which puts it in the 31st percentile, compared to all the books in our library. But by the time the final Harry Potter book came out, that number had grown to 197,985 (about 792 pages), putting it in the 95th percentile.

If you’ve read those books, you can combine your experience as a reader with these newfound facts about word-count to make an informed choice about how long to make your own novel.

Prosecraft is a new kind of measurement tool, empowering authors to understand their linguistic choices, and make more mindful decisions about the craftsmanship of their prose.

Vividness

The most vivid writing always invokes a sensory experience, summoning a world of images, sounds, smells, flavors and textures, and bringing the reader viscerally into the hypnotic trance of the story.

So we devised a new kind of linguistic metric, which we call “vividness,” to measure the relative intensity of the sensory language in any piece of writing.

We trained our linguistic algorithms by analyzing all 270 million words of prose in our library, identifying the most vivid nouns, verbs, and adjectives in each book, and scoring each word on a scale of 1 to 10 according to the intensity of its vividness.

For example, the word “dewdrop” has a vividness score of 9.5, and the word “eyelash” has a score of 6.3.

Then, we tallied up the average vividness for every book in the library, sorted the results into a rank-order list, and published the rankings on each book’s Prosecraft page.

For example, here’s a paragraph from of Max Gladstone’s latest novel, Ruin of Angels, with all the most vivid words highlighted:

The exact number itself (45.11% vividness, in this case) isn’t important, except insofar is it lets us compare different books against one another.

When we compare all the books in our library, we see that Ruin of Angels is in the 95th percentile for vividness, making it one of the most vivid books in our entire catalogue. Only 5% of other novels have a greater proportion of highly vivid words.

By contrast, here’s a paragraph from Agatha Christie’s 1972 detective novel, Elephants Can Remember with the exact same kind of sensory highlighting:

What a difference!

There are far fewer sensory words in this sample, and the ones that remain are a lot less vivid. Analyzing the entire book yields an average vividness of only about 15%, putting it at the very bottom of the vividness rankings — at the zero-percentile mark — compared with all the other books in our library.

Prosecraft makes it possible for any author to research the prose metrics of the authors they admire, and use those metrics to inform the craftsmanship of the prose in their own stories.

Passive Voice

To measure passive voice, we use the same basic process. But instead of counting the number of vivid sensory words, we measure the total number of helping verbs (be, am, is, are, was, were, etc).

Here’s an example paragraph from Nick Hornby’s novel Juliet Naked. With passive verbs constituting about 10.3% of the total word count, this book is at the 99th percentile of passive-voice:

It’s important to emphasize that this process isn’t prescriptive. Prosecraft doesn’t tell you how to write your novel. It just gives you a yardstick to measure the linguistic characteristics of your writing, so that you can make objective comparisons against the authors you admire.

As it turns out, Nick Hornby is one my own favorite authors. Since his stories focus so much on the experience of social detachment, it’s interesting to see how his writing style incorporates so many passive-voice constructions. The stylistic tone of the writing reinforces the themes.

There’s no single correct way to write. But with Prosecraft, when you choose a particular linguistic style, you can do so mindfully, with full awareness of your language choices and how they effect your writing.

Adverbs

You can also use Prosecraft to see the proportion of adverbs in every book. We’ve even separated the ly-adverbs you know so well (slowly, quickly, angrily, etc) from the non-ly-adverbs you might not have even known were adverbs (almost, how, never, now, often, so, etc).

For example, it might not shock you to discover that Jane Austin is among the most absolutely, positively, delightfully indulgent adverb-enthusiasts of the entire literary world. Her beloved novel Pride and Prejudice, with adverbs comprising 4.3% of the total word-count, is in the 98th percentile of adverb-density of all the books in our library.

Here’s a typical example paragraph, to give you an idea of what the extremes of adverb intensity actually look like on the page:

Emotional Story Arc

Finally, we use linguistic sentiment-analysis to score each word according to its latent positive and negative emotions. When we add those numbers up, across the entire book, you can actually see the shape of the story emerge.

When the characters experience conflict, pain, and sadness, the chart goes down. When the characters are happy and content, or when their conflicts resolve, the chart climbs back up again.

For example, here’s a sentiment analysis chart for The Hobbit, with the one of the final sections of the book selected (containing the emotional low-point of the story). The blue bar represents the number of positive emotional words in the chapter, and the red bar shows the number of negative words. Below the chart, we can see a word-cloud with the actual positive and negative words from the corresponding selection. You can click around on any of the bars to see the emotionally-charged words within the corresponding section of the story.

This incredible tool shows you how the individual word-choices that you make as an author contribute to the shape of the story arc you’re building. These techniques were originally developed by computational linguists at the UVM Computational Story Lab, and now for the first time, they’re available to authors in the general public.

Analyze your own writing.

The Prosecraft database is a goldmine of useful information for any working author. But the real magic comes from being able to apply these metrics to your own writing.

The latest version of our professional writing platform Shaxpir 4: Pro lets you apply each of these kinds of linguistic analysis to your own writing, in real-time as you write.

You can use our linguistic highlighting engine to show you exactly which words make your writing the most vivid, and you can use our new linguistics panel to show you a high-level summary of your language metrics, telling you how your writing compares with the rankings of the books in the Prosecraft library.

These features are only available in Shaxpir 4: Pro, so if you haven’t tried Shaxpir yet, now is the perfect time to get started with a free 30-day trial.

This is just the beginning.

We have big plans for Prosecraft: over the next year, we’ll introduce lots of new types of linguistic analysis, including topics like sentence complexity, alliteration, rhythm and rhyme, vocabulary composition, and so much more.

But we also need a much, much bigger library…

With more than a quarter of a billion words of prose in our linguistic database, we’re already off to a pretty good start. But a library like this is only useful if it includes all the books you’ve read, by all the authors you admire, and with so many different kinds of writers in the world, telling so many unique stories, there are still way too many missing books.

We’re calling on the literary world to help us expand our catalogue. If you’re an author or publisher, and you’d like to see your work included in our database, get in touch with us at submissions@prosecraft.io

Together, we can build a truly incredible platform, where everyone will benefit from a more comprehensive linguistic understanding of literary prose.

--

--

Founder of @ShaxpirHQ, Author of Abandoned Ship http://amzn.to/1z609Qw , Loving husband of @emilylaumusic