Spot underpriced wines with tell-tale words in the description

This article was taken from the September 2013 issue of Wired magazine. Be the first to read Wired's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online.

Gavin Potter's favourite wine is Château Grange Cochard from Burgundy, which, according to cellartracker.com, has "nice raspberry and plum notes with undertones of gravel and smokey minerals".

Sounds tasty. But how do these well-placed adjectives correlate with the price that a bottle fetches? Potter decided to find out, using machine learning algorithms to predict retail price from the words in the description. He looked at how many times a word appeared alongside a particular wine and, from knowing the prices, worked out which words are associated with high-value wines and which indicated the cheaper ones.

After analysing reviews (in English) of 200 Bordeaux wines, Potter discovered a connection between description and price. " Expensive Bordeaux wines have words like 'graphite' or

'pencil', whereas cheaper Bordeaux have words like 'smooth'," says Potter, who is the CEO of RecSys, a data-analysis company that creates recommendation systems for dating sites and television programmes. "My friends now take this list of words around with them to spot underpriced wines." Read on to find out how to sift the fat from the jammy.

Potter's plonk algorithm

Gavin Potter used Google to find out how many times each wine is associated with a particular word. For instance, searching for the terms "'Château L'Evangile' pencil" yields about 1,730,000 results, whereas typing in "'Château La Grande Maye' pencil" returns only eight. After adjusting his calculations for the total number of mentions for both wines, Potter then applied algorithms to hundreds of wines, assigning a numerical value to 26 words that are commonly used in descriptions of wines. The list of words is shown in order according to price per percentage of times each specific word appears in the description of a wine.

Predictions

Potter then predicted prices for wines not in the original data set by calculating new values based on those parameters. Say "graphite" scores £120 for each percent of sites that include the word. If 500 sites mention Château L'Evangile and 100 of them say "graphite", multiply £120 by 20 percent (answer: £24). Adding values for all 26 words yields the predicted price.

This article was originally published by WIRED UK