nav search
Data Center Software Security Transformation DevOps Business Personal Tech Science Emergent Tech Bootnotes BOFH

Time to rethink machine learning: The big data gobble is OFF the menu

What size do big things start? Small

By Matt Asay, 5 Jul 2017

Machine learning (ML) may well be The Next Big Thing™, but it has yet to register in mainstream enterprise adoption. While breathless prognosticators proclaim 50 per cent of organisations lining up to magically transform themselves in 2017 with ML, more canny observers put the number closer to 15 per cent. And that's being generous.

ML (and its kissing cousin AI) should eventually reshape enterprise computing, but for now a host of factors stand in its way. The most prominent among them? Skills.

Teaching machines to fish

For years the market hyped big data, touting its ability to transform... everything. Yet years into Hadoop, Kafka, Spark, and other oddly named big-data projects, we're nowhere near the 50 per cent adoption that survey after survey told us we were on the verge of surpassing.

Why? It turns out big data is hard.

ML, which in many ways is an extension to the big data revolution, is even harder. As Gartner analyst Nick Heudecker has noted, while hope springs eternal for big data: "Only 15 per cent of orgs get to production." Oh, and machine learning? "Will likely be much lower with ML."

Even so, companies keep expecting rosy days ahead. In response to a Belatrix Software survey, 81 per cent of respondents proclaimed: "Machine learning will have some impact or a significant impact on their organization in the next five years." Given this impact on things like operational efficiency, surely those enterprises have spun up ML projects? Nope. Just 18 per cent of those companies surveyed had bothered to get started, while 40 per cent are kicking the proverbial tires, leaving just 42 per cent to honestly admit: we're doing squat with ML.

Breaking the ML myth

Part of this inaction comes down to the massive gap between ML (and AI) myth and reality. As David Beyer of Amplify Partners puts it: "Too many businesses now are pitching AI almost as though it's batteries included." This is dangerous because it leads companies to either over-invest (and then face a tremendous trough of disillusionment), or to steer clear when the slightest bit of real research reveals that ML is very hard and not something the average Python engineer is going to spin up in her spare time.

One of the gating factors to ML success is data. To properly train models, an enterprise needs "an unearthly amount of data" as Neil Lawrence, a member of Amazon's AI team and professor of machine learning at the University of Sheffield, puts it. More than any mind-blowingly great algorithm, he says, "progress is driven far more by the availability of data than an improvement in algorithms".

Unfortunately, few enterprises have such copious quantities of data. Even those that do, laments Yandex Data Factory chief operating officer Alexander Khaytin, are stymied by that data living in different places. "With data often siloed in separate storage and processing systems, the aggregation of data can be time-consuming and difficult."

Get all that data marching in uniform lock-step, and more problems await. Like, for example, the need to aggressively experiment in production. According to Khaytin: "When it comes to prescriptive analytics, the measure of business impact can only truly be assessed by actually applying a machine learning model in the real business process.

"For most companies, often at the start of their digital transformation, the prospect of launching large-scale machine learning projects which haven't already demonstrated their value in previous trials can be daunting."

Finally, even companies that get past each of these hurdles will often fail on the hardest struggle of all – people. Just as big data before it, ML requires a culture of experimentation. Most companies are happy to talk about being data-driven, but precious few actually are. For years, there has been a gaping void between executive lip service to big data but then disregarding data that doesn't reinforce gut instinct (62 per cent admit to this data blindness, and the other 38 per cent are probably lying).

Nor is the ML people problem simply cultural, bad as that might be.

Please, sir, I want some more machine learning experts

Perhaps ML wouldn't be such a beast if more people knew how to model it but, beasts being, well, beastly, there is a dearth of experts. When I asked Gartner analyst Merv Adrian the biggest reason for ML's paltry success rates, his response was unequivocal. "For me it's mostly about skills. Missing skills."

Just how hard is it to replicate one of these ML gurus? Dubbing them quasi-"data scientists", Ben Lorica and Mike Loukides paint a somewhat bleak picture of finding these data-driven product people.

"They frequently have doctorates in the sciences, with a lot of practical experience working with data at scale. They are almost always strong programmers, not just specialists in R or some other statistical package. They understand data ingestion, data cleaning, prototyping, bringing prototypes to production, product design, setting up and managing data infrastructure, and much more. In practice, they turn out to be the archetypal Silicon 'Valley unicorns': rare and very hard to hire."

Ovum analyst Tony Baer holds out hope that: "Just as colleges and universities bulked up on Data Science programs, history will repeat here," with schools driving more machine learning training. Maybe, but it's probably overly optimistic to believe that academic training can deliver the kind of expertise needed. As Lorica and Loukides stress, machine learning is a practical discipline, not really something easily picked up in a classroom setting. Perhaps for this reason, training efforts often fail. Just ask Motorola Solutions principal data engineer Steve Varner, who "tried to train 50 software engineers on Spark MLlib" but was forced to conclude: "It didn't go so well."

And yet... there's hope.

I don't think that word means what you think it means

For one thing, most of what gets sexed up as "machine learning" really isn't. As Basecamp data scientist Noah Lorang has reasoned: "The dirty little secret of the ongoing 'data science' boom is that most of what people talk about as being data science isn't what businesses actually need... There is a very small subset of business problems that are best solved by machine learning; most of them just need good data and an understanding of what it means that is best gained using simple methods."

Beyer agrees, acknowledging his own "dirty secret". "So many [so-called ML] problems could be solved by just applying simple regression analysis."

Following on this theme, it's also true that even those applications that correctly count as "machine learning" are comprised of many components that non-ML savvy engineers can tackle. According to Lorica and Loukides: "In any application, the part that's strictly 'machine learning' is relatively small: someone needs to maintain the server infrastructure, watch over data collection pipelines, ensure there are sufficient computational resources, and more."

Think of the ML engineer as holding the ML glue to the broader application. They may be involved in the original architecture and development of an application, but they're responsible for retraining an ML model when it grows stale. They aren't the data scientist looking for significance in data. They're the ones, as Lorica and Loukides note, whose "goal is to build the machine that can analyze the data and produce results: to create a neural network that works, that can be tuned to produce reliable results on the input."

ML engineers, in short, don't need to be omnipresent within an organisation for it to successfully attack ML problems. We also need to reset expectations on what ML means within the enterprise: much of what an enterprise wants to do can probably be solved with a series of "if/then" statements, rather than some data-gobbling ML algorithm.

This becomes particularly true – and more likely to yield initial ML success – if organisations will start small with their ML projects and scale up as their internal talent improves. ®

We'll be covering machine learning, AI and analytics – and ethics – at MCubed London in October. Full details, including early bird tickets, right here.

The Register - Independent news and views for the tech community. Part of Situation Publishing