Following my previous post on AI vs AVMs , I thought it would be a good idea to put the AI to the “Tower Bridge Test”. There, the left side of the road is significantly more expensive than the right side. So, the question is: would the AI be able to learn the difference and predict prices with reasonable accuracy?
To answer this question, I selected two postcodes that are about 50 meters apart, one on each side and asked the AI to estimate the value of two hypothetical (but identical) properties on each postcode.
The result was very interesting. The AI valued the left side property 35% higher than the right side at £1,050,000 and £690,000 respectively (AI can value properties that are not on the market, it just needs to know their basic information).
Would AI be able to replace human valuers in the future?!
According to RICS 2017 report “The Future of Valuations”, AI will either help human valuers, or completely replace them at some point.
Today, machine property valuation is done by AVMs, or Automated Valuation Modelling. AVMs have existed for decades, and at best they do well as “click bait”.
Having successfully built a database containing details of 1.5 million properties in Britain that includes internal information such as area, rooms and energy efficiency as well as external information such as average income, safety, transport and amenities in the neighbourhood; I used Machine Learning to train deep learning AI that learns the market value of properties.
In order to judge how well the AI would perform in the real world, I tested it against properties (not used in training) and plotted the results using residual analysis. In layman terms, residual analysis picks a property with known market value, predicts the value and compares the prediction with the actual value. Residual of 0 means perfect prediction. In this case, most predictions were concentrated around 0 with a few outliers here and there. This is beyond amazing!
Thanks to H2O.ai team for including residual analysis in h2o-3.
Every now and then, I get asked this question by friends and colleagues who want to jump the AI ship. Wages are skyrocketing so why not?
Python is one of many high-level programming languages. It’s recently become very popular due to it’s simple syntax and perhaps because all popular machine learning “libraries” are supported, including TensorFlow, MXNet and H2O.
However, building an AI system that works in production environments requires far more than “just” a few lines of code. Why? Because AI is a multidisciplinary subject.
Reliable, scalable, responsive, secure and unbiased AI requires practical knowledge of Computer Architecture, Computer Networks, Databases, Data Mining, Project Management and more in addition to some background in mathematics (and lots of coffee).
To cut the long story short, learning how to code in Python (or whatever) is far from enough. You will need to broaden your knowledge, or join a diverse team that will cover your back
Anyone who got their hands dirty managing software development and computer infrastructure projects probably appreciates the fundamental differences between the two. Most often than not, they are piled up together because they both involve computers (albeit differently).
Developing computer software is typically an incremental process where a prototype is developed first. The prototype can be discarded after proofing the concept (throw-away approach) or converted into the final product (evolutionary approach). Various factors influence the methodology selection, but I reckon the clarity of the requirements and the quality of the prototype itself are the most important.
The IT infrastructure is a different beast. Prototyping is not generally an option and incremental build up can be very risky and pricey. The requirements must be clear and precise because once an investment has been made (i.e. hardware’s procured), it is very difficult to go back. The challenge here is to ensure that new infrastructure components will integrate properly and not create more problems than they will solve.
Machine Learning (and AI) Projects
This is where software development and IT infrastructure overlap. Read on..
Nearly every large scale Machine Learning project involves external data sources that the organisation has no control over. The quality of such data cannot be assured. This is also true for the completion and correctness of the data.
One might argue that these data issues can be detected early on and dealt with. This is not untrue, and there are techniques and tools out there that can help detect these issues. However, there are other type of problems that are harder to detect until late stages, such as the degree of bias in the data.
It is not easy to estimate the computing resources early in the project, and testing with smaller data sample can be misleading. For instance, a model that converges in 10 hours with 10% of the data may not necessarily converge in 100 hours with the full dataset (on the same platform). In fact, the growth in computing requirement and time can be exponential.
How Good is “Good Enough”?
Achieving perfection in ML/AI projects is still a dream, and will be so for a while so that’s something we need to live up with. Therefore, what is hoped to achieve is a system that is “good enough” and “fit for purpose”. The 80/20 role really applies here but it’s the remaining 20% (or rather 19%) that takes up most of the resources.
Arguably, the most expensive resource is the data scientist time with average pay of £80k ~ £100k in the UK. In fixed contracts, the ML/AI vendor would want to finish the project as early as possible. In pay-per-day arrangements, however, the client would want to have the project delivered ASAP. What this means is that there is tendency for ML/AI projects to end prematurely as soon as they are deemed “good enough”.
Can’t Reach the Destination? Change the Journey
Of all possibilities, however, the most troublesome one is when the objective changes somewhere in the middle of the project. As weird as it may sound, it is not that uncommon and can be caused by various reasons. For instance, if the parties realised that the project would require double the amount of resources originally allocated to it. They may decide to lower the objective to avoid wasting the resources used so far. Something is better than nothing after all!
So Why ML/AI projects fail?
From my personal experience, I think that failing to manage expectations is one of the most prominent reasons. There is so much hype out there about ML/AI at the moment, blurring the boundary between what can be achieved, and what can’t. On the flip side, it is the hype that is fuelling the demand with no sign to stop for the time being!
The other reason is the failure to properly integrate ML/AI into the workflow (after successful delivery). Unfortunately, many managers perceive ML/AI projects as “IT things” for the IT guys to do. There is no such a thing as “IT project”. Rather, there is “Business Change” project that involves IT.
Conclusion: No One Size Fits All
The field of Software Engineering is relatively new and far from maturity. Methodologies and frameworks continue to emerge and evolve. This is also true for Machine Learning and Artificial Intelligence projects, which fall under the umbrella of Software Engineering (not perfectly though).
Managing expectations and sound business change management are keys to success. As for the development of ML/AI, the reliance on experienced project managers will continue until the whole process matures enough. Whether you are an organisation interested in ML/AI or a developer, my recommendation is to start with a small, easy-to-manage and low-risk project and move on as your experience and confidence levels build up.