Eight machine-learning practitioners joined moderator Yazann Romahi of JP Morgan Asset Management at machineByte’s Barcelona ThinkTank on 29 February 2019.
Participants discussed how they are dealing with missing data points in their time-series data sets, debated the merits of specific ML applications and gave each other tips on how to educate clients about the pitfalls and opportunities within the ML space.
Data Is Everywhere, But Is It Any Good?
The morning session began with a discussion on the lack of data history, a theme that has pervaded many conversations about the limitations of ML techniques. One popular area of ML—natural-language processing—involves analyzing news to determine sentiment and study the impact of how new information affects stock prices.
This type of research is typically thought to have more data history availability, with one participant noting that, since news outlets are uploading their archival content online, researchers can now look at the impact that news had on the market for a much longer period than other types of data.
However, another attendee pointed out that such a simplistic model wouldn’t take into account the speed at which news now travels. In 1980, for example, it could take a few days for information to fully be priced into the market. Now, information travels so quickly that it’s hard to draw a comparison between the two universes, he explained. Therefore, this data set suffers from a non-stationarity problem.
Another attendee talked about the challenge that financial practitioners face in not being able to generate new out-of-sample data. Whereas doctors, for example, could gather more data if needed to train a neural network, for financial professionals, “the only way to get more out-of-sample data is to run it live and lose money.”
Data cleaning and validation present a huge problem for the ML industry. One participant shared his strategy for filling out missing data points. Instead of filling in the gaps by finding the median, his firm uses clustering techniques to assign missing values to the holes. The algorithm looks at high-frequency data, among other sources, to create longer time-series data and has proven slightly more successful than traditional statistical approaches, he said.
How Can ML Grow?
The group had several ideas about which areas of machine learning could present the biggest opportunity within the financial sector, including determining performance of environmental, social and governance (ESG) stocks, classifying different regimes and predicting movements in commodities pricing based on shipping data and global product flows.
“The turnover is so high and costly that it’s very risky to use black-box techniques.”
Two participants debated if ML algos performed better for single-stock selection or forecasting at the macro level. The macro-technique supporter noted that when predicting performance on a cross-section of the market, practitioners cast a wider net across more stocks and claimed that by placing small bets, they can have a greater chance of making money.
His colleague disagreed: “The turnover is so high and costly that it’s very risky to use black-box techniques.”
Instead, he championed a single-stock approach for ML, which allows algos to process new data sources—like credit card or geolocation data—to estimate the performance of a single company. That approach solves some of the issues surrounding the paucity of macro time-series data, he added.
Bundling Up For The AI Winter
Attendees were also conflicted about the impact that pre-coded models and general machine-learning hype has had on the industry. “One of the risks we have with AI is that when everything becomes canned, it becomes very easy for anyone with basic statistical skills to actually apply an involved neural network algorithm without understanding what assumptions they’re applying,” one speaker said.
“Give them a hammer and everything looks like nails,” another replied, to laughs around the room.
This risk could result in another AI winter, especially as teams and individuals try to rebrand themselves as data scientists, even if they don’t have the right credentials. “If you do that, you basically double your market value,” another joked.
Who Is Responsible For Data Overload?
A major theme the group touched on last was, “Who is responsible for the hype?” Initially, the participants pointed to clients who ask how managers are using AI. “It almost forces us to have an answer,” one said.
“Who is responsible for the hype?” Initially, the participants pointed to clients who ask how managers are using AI. “It almost forces us to have an answer,” one said.
But eventually, the conversation shifted to how managers can be better stewards of their clients’ trust with respect to AI. One speaker recommended that sell-siders separate the new data aspect from the algorithms, highlight the differences between old and new techniques, and always stress simple methods and approaches whenever possible.
One speaker also warns his clients about alternative data vendors who try to create “ridiculous” in-sample results to drive up demand for their data sets. “We have to help managers see through that fog of what is hype and what is real. Otherwise we’ll all be victims of the hype.”
No commitment 30 day free trial
We deliver education to quantitative professionals on the latest innovations and developments in machine learning in investment management and other industries, such as tech, healthcare, transportation and education.
Get the latest exclusive news, innovations and approaches at financial institutions, tech firms and companies across the whole spectrum of machine learning.