Imperfect Intelligence, Part II – A biased system
The awesome potential and promise of Machine Learning and Artificial Intelligence is coming to fruition in the financial services industry, with 80% of enterprises already investing in AI technology as of 2017. This industry is expected to grow to US$57.6 billion by 2021, with the pledge to provide more accurate and objective judgments to help forecast, predict and make decisions.
As we have seen, data is not objective, but is a product of human design. My previous article discussed how unconscious biases can creep into the data well before it is fed into a Machine Learning programme. The next logical step is to investigate some of the hidden biases that are amplified by the AI algorithms themselves.
Say one system is fed income data and determines, based on that specific sample, that males generally make more than females. If another programme uses this determination to make an eligibility assessment for a small business loan, the algorithm could incorrectly extrapolate that being male is a primary characteristic of succeeding in small business and could disadvantage female loan applicants.
Extrapolation from data happens all the time. According to one ridiculous article, robots will replace 950,000 of the 1 million ground and maintenance workers in the US – despite the fact that there is little automation in this space today and certainly no “robots” operationally ready to replace the heavily manual physical tasks. In this case the conclusion was drawn from incorrectly extrapolating data from a University of Oxford employment report, and it also failed to account for the rate at which new technologies create jobs that may not exist. If human analysts can make extrapolation mistakes even when provided with context, it’s inevitable that these issues will exist within AI programmes.
2. Butterfly effect
Also known as “chaotic systems”, this is where one small tweak in the data can cause a significant change in the output. The best example here is weather forecasting, which has too many inextricable factors to consider, making it nearly impossible to make accurate predictions beyond a few days into the future.
Imagine a system used to create economic forecasts. Even with a mass of holistic data, it will always be difficult for the machine to accurately predict what will happen in the future because unrelated and often subtle events can have a large and unexpected impact on the economy. It would be very easy to take action based on the prediction of a seemingly omnipotent machine, but we should certainly be hesitant when doing so. Banks need to be able to operate with a degree of uncertainty, for one little incident could unleash a huge ripple effect. Like Brexit.
Whilst some may suggest that a larger initial training data set will mitigate against this, Nate Silver argues that if there is “an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate.” With a minuscule data element having the potential to alter the entire Big Data system, it hinders the ability for machine learning systems to correctly pinpoint the answers they seek – and humans to correctly interpret the output.
3. Correlations vs causation
Correlation is merely a relationship between two sets of variables, and this relationship can be caused by three potential factors; pure coincidence, the influence of a mutual third external factor, or the effect of one variable on the other. The big problem occurs when a machine incorrectly interprets correlation as genuine causation, creating biased feedback loops.
Take a bank that uses its historical data to create an AI program to identify which customers are likely to commit credit fraud. Using the results, the bank channels more of its funds into investigating these customers, and in doing so, finds more crime. If this data is fed back into the deep learning program, it will reinforce its finding that these customers are the ones most likely to cause crime, even though it is quite possible that the higher rates of crime identified are caused by the increased scrutiny. The machine will learn from this feedback in a vicious loop to the detriment of its ability to accurately detect fraud in the future.
Successful Uses of Machine Learning
With so much of the focus on the amazing capabilities of AI, it is critical to develop better habits and understanding of data and how deep learning works, so that the algorithms are prepared and trained properly. Triangulating machine learning outputs with customer insight, commonsense and historical data can help mitigate the fallibility of AI. Data should be used to inform decisions, but we need to be conscious that checks and balances are implemented to monitor the success of automation programmes.
In the Information Revolution era, the digitisation of every aspect of industry is only increasing. Whether you are working in agriculture, medicine, banking, transportation, construction or social media, AI and ML are being implemented everywhere. Embracing these digital opportunities is a lynchpin to accelerate progress, but it is important to understand and minimize the data biases that can sour deep learning programmes.