What Is AI? Applications of Artificial Intelligence to Dermatology

X. Du-Harpur; F.M. Watt; N.M. Luscombe; M.D. Lynch


The British Journal of Dermatology. 2020;183(3):423-430. 

In This Article

What are Artificial Intelligence and Machine Learning?

AI is difficult to define precisely. In Alan Turing's seminal paper 'Computing machinery and intelligence', he proposed the well-known Turing test, whereby a machine is deemed intelligent if it is indistinguishable from a human in conversation by an impartial observer.[1] In modern parlance, artificial general intelligence refers to the ability of a machine to communicate, reason and operate independently in both familiar and novel scenarios in a similar manner to a human. This remains far beyond the scope of current methods and is not what is being referred to when the term 'AI' is commonly used. Most references to AI are now often used as an interchangeable term with 'machine learning' or 'deep learning', the latter being a specific form of machine learning that is discussed in more detail below (see Table 1 for a glossary of terms). Machine learning refers to algorithms and statistical models that learn from labelled training data, from which they are able to recognize and infer patterns (Figure 1).

Figure 1.

Schematic depicting how a machine learning algorithm trains on a large dataset to be able to match data to label (supervised learning), the performance of which can then be assessed.

Generally, during the training of a machine learning model a subset of the data is 'held back' and then subsequently used for testing the accuracy of the trained model. The accuracy of the model is assessed on this test dataset according to its accuracy in correctly matching an image to its label, for example melanoma or benign naevus. In any classification system there will be a trade-off between sensitivity and specificity; for example, an AI system may output a probability score for melanoma between 0 and 1, and this would require the operator to set a threshold for the decision boundary. At a low threshold, a higher proportion of melanomas will be captured (high sensitivity) but there is a risk of classifying benign naevi as malignant (low specificity). As the threshold is increased, this would decrease the sensitivity, but increase the specificity (i.e. fewer benign naevi classified as melanoma). The behaviour of a machine learning classifier in response to changing the threshold can be visualized as a receiver operating characteristic (ROC) curve. The greater the area under the curve, the more accurate the classifier (Figure 2).

Figure 2.

Schematic of a receiver operating characteristic (ROC) curve, which is a way of visualizing the performance of a trained model's sensitivity and specificity. Typically, machine learning studies will use ROC curves and calculations of the area under the curve (AUC or AUROC) to quantify accuracy. The dashed line represents the desired perfect performance, when sensitivity and specificity are both 100%; in this scenario, the AUC would be 1·0. In reality, there is a trade-off between sensitivity and specificity, which gives rise to a curve.