What it Takes to Write a Best-Seller: Does Science have the Answer?

By on January 17, 2014

Statistical stylometry exposes the crossroads between science and literature in a new study co-authored by Yejin Choi, assistant professor at Stony Brook University in NY. The study, presented at the 2013 Empirical Methods in Natural Language Processing conference, created a model that can predict literature or film success with up to 84% accuracy (89% for film), and describes what writing styles potentially bolster a fictional work’s success.

What is statistical stylometry? It is an analytical, systematic method that measures lexical and syntactical variations between authors or genres, and is reliable in helping researchers formulate conclusions about what words and phrases are correlated with success in the literary market. Choi’s study bases success, a tricky definition, on download hits from Project Gutenberg, an online database of about 42,000 manuscripts. In addition to this extensive collection, researchers took Pulitzer Prize and Nobel Prize winners into account, as well as more well-known novels not available for download from Project Gutenberg. They even included movie scripts in their experimental scope, basing film success on IMBD ratings.

Examining the first 1,000 sentences from each of the 800 manuscripts, the researchers gathered observations of lexicon and syntax, or more simply put, word use and sentence structure. The study found that less successful books tended to use negative words such as “slaves” or “murdered.” It also suggests that action verbs such as “urge” and “glare,” and a focus on physical locations like the “beach” or a “hill” are correlated with the less successful books. On the other hand, more successful works are inclined to use thoughtful words, such as “remember,” subjective pronouns such as “I” or “me,” and connective words, such as “and,” “when,” and “or.”

Choi claims that this study is the first to quantitatively approach the connection between writing style and manuscript success. That is, their research produced a model based on numerical data. Past studies were merely qualitative and examined a much smaller sample of books. So, these studies produced a less reliable way to tell if a book would be successful based on the author’s style. Yet, like many scientific endeavors, this study only demonstrates correlation. In other words, thoughtful words aren’t a guarantee of success – it just happens that successful books had more of these words.

Related Posts:

About Rhiana Simon

 
%d bloggers like this: