The importance of outliers in data

The first in a series of blogs from our ‘Data Ninja’, here we discuss outliers within datasets.

An outlier is a piece of data which falls far outside the typically expected variation. It’s easy to simply view outliers as a nuisance, since they can cause problems when you attempt to create models or visualise the data.

However, outliers can reveal all kinds of useful intelligence if you understand how to study them. Here are some steps for turning your outliers from annoyances into information gold mines:

1) Check for errors. Remember, it’s always possible that there’s a mistake in the record, so look for any evidence that the information was logged or processed incorrectly.

2) Think about what the outlier means in context. You aren’t just looking numbers on a a spreadsheet—it’s information about real patterns in things like shopper behaviour or product performance. Think about what a high or low number really means.

3) Gather additional information. See if the outlier is associated with any other unusual patterns in the data. Sometimes you may have to look outside the data set—for example; events such as extreme weather might have an effect on delivery timing.

If you can find out the reason for your outlier—or at least make an informed guess—you have gathered important information about your business that you might have missed by simply focusing on the average.

Posted by Data Ninja