Fake data can help backtesters, up to a point


Quantitative investors often complain that they only have one side of the story to test their ideas against.

One way around the problem was to make up the story. Quants have been doing this for a long time already – using bootstrap or Monte Carlo simulations to create alternative time series data for the backtests they run.

A new idea, however, is to employ machine learning techniques to invent entirely artificial data. Quants experiment with these models and say they can produce data that is in some cases indistinguishable from reality.

The potential for new “fake” data gives rise to optimism. With it, quants can test strategies against scenarios that could have happened as well as those that did. There is a caveat, however. Fake data can fix some of the shortcomings of conventional backtesting, but it can’t fix all of them.

The models that quants use to generate the new data effectively learn the process by which the past data was generated.

It worked impressively outside of investing, where models were used to create anything from deep fake videos to so-called Ganimals – synthesized animals, like an elephant crossed with a cat – mentioned in using generative antagonistic networks (Gans).

Amazon used synthetic data to train its robot Alexa to understand instructions in Hindi. Rather than training the speech recognition software with millions of actual commands, the tech company generated fake samples from data on a single subset of recordings.

But applications in financial markets face a major difference from applications in other such fields. Markets are rapidly changing systems, sometimes subject to sudden and unexpected regime changes.

Backtesting with multiple history versions may be better than backtesting with just one. But generative models always recreate a version of events from the past. And even a richer view of history could be a poor guide for the future.

One quant draws a parallel with predicting climate change, a process in which what happened before – by definition – will be largely redundant. And in the case of the stock markets, “even the most intimate knowledge of history won’t tell you where Apple’s stock price will be,” he says.

In another way, too, historical data could turn out to be a bad teacher.

Market mechanisms are extremely complex, including the actions and motivations of thousands of investors, firms and intermediaries and the complex dynamics of the market microstructure. This is before taking into account the influence of the global macroeconomic environment, current events, etc.

There is never any guarantee that the data will lead a model to a full understanding of these mechanisms. That is, the training set may only provide an uneven representation of the truth. “You could end up generating bogus data that is just too simplistic for what’s at stake,” says another quant. “It could be counterproductive.”

These are limitations that data generation models have not faced outside of the financial sector. These are also limitations that apply to any form of backtesting, whether conventional or using fake data. But as investors embrace the new techniques, they’ll need to keep in mind the problems that fake data can’t solve. An image of an elephant or a cat looks like the image of an elephant or a cat forever. The image of a market is constantly changing.


Previous Opinion: The political trap of personal responsibility in social services
Next Washington names corrupt former Albanian officials

No Comment

Leave a reply

Your email address will not be published.