There are two fundamentally different approaches for selecting the data used to predict customer buying behavior.

The first approach involves collecting as much information as possible about your customers (demographic attributes, behaviors, preferences, etc.) so you can see what correlations there might be between this information and purchasing. This is supported by a basic tenant of Big Data theory: Because we are now able to collect ever more data on our customers from all kinds of sources, and we have new, better technologies for managing and analyzing all that data, that's what should be done.

If this sounds similar to claims made by prior generations of data management and analytics platforms, that's because it is. But, presumably, the sheer scale of data is the differentiation now.

Let's call this the "horizontal" approach to predictive analytics. If all the source data for this analysis were contained in a single table with one row for each customer, each customer attribute adds a column expanding the table horizontally. In this case, mining for the association of all these attributes with purchasing behavior can be achieved with a statistically significant sampling of customer records.

Within a small subset of the total population, for example, you might determine that a college education, recent car purchase, a 3-plus minute visit to your website, and frequent tweets about sports events generates a higher probability of buying golf clubs. Now you can extrapolate this to target anyone with similar attributes in your golf club promotion. Accuracy, in terms of nailing down exactly what the most likely customer looks like, improves with the number of attributes analyzed.

The other approach to selecting data for predicting buying behavior is the "vertical" approach. Here, we analyze the data that clearly and directly defines the behavior you are trying to predict. In other words, use historical buying behavior (transactions) to predict future buying behavior.

Transactions can be defined with a small number of data points (e.g., date, item, amount), far fewer than customers. To offset this limitation in available data, the idea is to include every customer's every transaction in the analysis, or as many records as possible.

The goal is to find patterns within the purchasing data itself — the purchase frequencies, sequences, spending, number of items, inter-order times and other derived indicators to detect buying patterns and then apply an appropriate pattern to each individual customer. The state that customer is currently in, and the probability of moving to another state (the purchase of any given product) forms the basis of buying predictions. In this case, the more customers and transaction records analyzed, the more accurate the predictions.

There are quite a few pros and cons between these two different approaches. One advantage of the horizontal approach is it can be applied to a prospect as well as existing customers, as long as the necessary input data is available on that person.

The problem is, it requires all the necessary data to be available on that person. If the model is too sophisticated involving too many attributes, the cost of collecting, integrating, normalizing and governing all the needed data for each customer or prospect skyrockets, and the risk of being unable to calculate accurate predictions increases. Of course, models relying on fewer data points are possible, but it becomes critically important that the right data points, the best predictors, are being used.

Conversely, the advantage of the vertical approach is the simplicity of data collection and maintenance. Transaction data is among the most readily available, complete, accurate and understandable data available on each customer at most companies. The data sourcing and preparation involved could be a tenth the time and cost of the horizontal approach.

And while some customers may consider certain data sensitive or resent real-time monitoring of their behavior, using transaction data for marketing is seldom controversial. That's been going on since the dawn of marketing.

On the other hand, the vertical approach only works for existing customers with a transaction history. Without at least one purchase, there's no way to start associating a person with a purchasing pattern, so this approach is not appropriate for new customer acquisition. It also does not factor in other behaviors that might be a reasonable indicator of purchase intent, such as shopping cart activity.

There is considerably less time and expense involved in data preparation for the vertical approach, while the horizontal approach may apply in more situations. The vertical approach is proven to be highly accurate, usually outperforming horizontal approaches, but not always. The horizontal approach allows many additional elements and behaviors to be factored in.

If you pick the right ones, it can be helpful. But it's tempting to overcomplicate the model either because it's hard to know which attributes are the best predictors, or to use too many weak (or negative) predictors because that is what is already built into the marketing platform or outdated model that is available.

In the end, both approaches have value and might be best in combination, but be sensible. New research by Green & Armstrong, Journal of Business Research shows simpler models tend to be more accurate (81 percent of the time) and less error prone (reduced by 27 percent) than more complex ones.

It makes dollars and sense to use the simplest approach that meets your goals.