Taking Down Prosecraft.io

Benji Smith
The Shaxpir Blog
Published in
6 min readAug 7, 2023

--

Today, I’m taking down the prosecraft.io website, which had previously been dedicated to the linguistic analysis of literature, including more than 25,000 books by thousands of different authors.

I originally started working on this project more than ten years ago, when I began writing a memoir about a difficult time in my life. It was my first book, and I didn’t know how many words I should write. I had heard that “real books” should be about 100,000 words. I searched the internet for more specific guidance but I didn’t find much…

So I pulled a few paperbacks off of my own shelves — books by authors I admired — and counted by hand how many words were on the first few pages. Then I counted the total number of pages, and multiplied the two numbers to get an estimate.

I kept a little spreadsheet, and it was precious to me… Precious guidance from authors whose books I adored, when I was struggling to tell my own story.

After I published that book, I was so moved by the experience that I started my own company to make tools for authors. I created Shaxpir to be a desktop word processor for storytellers, and I expanded my little spreadsheet into a database.

I heard a story on NPR about how Kurt Vonnegut invented an idea about the “shapes of stories” by counting happy and sad words. The University of Vermont “Computational Story Lab” published research papers about how this technique could show the major plot points and the “emotional story arc” of the Harry Potter novels (as well as many many other books).

So I tried it myself and found that I could plot a graph of the emotional ups and downs of any story. I added those new “sentiment analysis” tools to the prosecraft website too.

When I ran out of books on my own shelves, I looked to the internet for more text that I could analyze, and I used web crawlers to find more books. I wanted to be mindful of the diversity of different stories, so I tried to find books by authors of every race and gender, from every different cultural and political background, writing in every different genre and exploring all different kinds of themes. Fiction and nonfiction and philosophy and science and religion and culture and politics.

Somewhere out there on the internet, I thought to myself, there was a new author writing a horror or romance or fantasy novel, struggling for guidance about how long to write their stories, how to write more vivid prose, and how much “passive voice” was too much or too little.

I wanted to give those budding storytellers a suite of “lexicographic” tools that they could use, to compare their own writing with the writing of authors they admire. I’ve been working in the field of computational linguistics and machine learning for 20+ years, and I was always frustrated that the fancy tools were only accessible to big businesses and government spy agencies. I wanted to bring that magic to everyone.

It would be fun! It would be interesting! It would be useful!

I researched copyright laws, mindful of not wanting to hurt or offend the community of authors that I cared so much about. Since I was only publishing summary statistics, and small snippets from the text of those books, I believed I was honoring the spirit of the Fair Use doctrine, which doesn’t require the consent of the original author.

And since I never shared the text that I acquired by crawling the internet, I believed that I was in compliance with the relevant laws, including the DMCA. I had never heard of “shadow libraries,” and I never attempted to acquire such a thing.

Users of the Shaxpir desktop application, and their writing which is stored in our cloud databases, was strictly off-limits. I never applied the prosecraft analytics to anyone’s works-in-progress. I only ever incorporated books that were published publicly, and whose text could easily be found by crawling the internet. If someone could write a wikipedia article about the plot and characters of a book, I reasoned that I could publish linguistic summary stats in the same spirit.

For example, here’s what the summary statistics look like for Alice’s Adventures in Wonderland, by Lewis Carroll:

The site also included small snippets to illustrate what “vivid” and “passive” pages look like in context:

The sentiment analysis of each book showed the low-point and the high-point of the story, with word-clouds showing how the specific language on the page contributes to the emotional story arc of the characters. Here’s what that analysis looks like for Alice’s Adventures in Wonderland:

The site also included a chart showing the numerical distributions across the entire library so that authors could understand these linguistic measurements in context. This is what the distribution of “word-count” looks like across all books:

It’s useful to know that a typical book contains about 86,000 words, and that the the top 10% of all books have between 130,000 and 250,000 words.

That’s info you can truly use in your daily writing practice!

I launched the prosecraft website in the summer of 2017, and I started showing it off to authors at writers conferences. The response was universally positive, and I incorporated the prosecraft analytic tools into the Shaxpir desktop application so that authors could privately run these analytics on their own works-in-progress (without ever sharing those analyses publicly, or even privately with us in our cloud).

I’ve spent thousands of hours working on this project, cleaning up and annotating text, organizing and tweaking things. A small handful of authors have even reached out to me, asking to have their books added to the website. I was grateful for their enthusiasm.

But in the meantime, “AI” became a thing.

And the arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process.

That’s not something I ever wanted to participate in.

Today the community of authors has spoken out, and I’m listening. I care about you, and I hear your objections.

Your feelings are legitimate, and I hope you’ll accept my sincerest apologies. I care about stories. I care about publishing. I care about authors. I never meant to hurt anyone. I only hoped to make something that would be fun and useful and beautiful, for people like me out there struggling to tell their own stories.

For what it’s worth, the prosecraft website has never generated any income. The Shaxpir desktop app is a labor of love, and during most of its lifetime, I’ve worked other jobs to pay the bills while trying to get the company off the ground and solve the technical challenges of scaling a startup with limited resources. We’ve never taken any VC money, and the whole company is a two-person operation just working our hardest to serve our small community of authors.

In the future, I would love to rebuild this library with the consent of authors and publishers. I truly believe these tools are useful for creative people. But now is not the right time. I understand. And I’m sorry.

--

--