This study aimed to investigate how well cultivars could be predicted based on their amino acid profile, regardless of origin or vintage. The amino acid results obtained from the survey of 738 grape must samples (2016 and 2017 harvests) were presented in Part 1, while Part 2 focuses on additional information that we could extract from the data.
The amino acid profile of a particular grape must is a result of a variety of factors since an interaction of the grape variety with the climate, soil and various viticultural practices exists. This knowledge has prompted the investigation of various grape compositional elements in relation to the variety, geographical origin and vintage of the resulting wine. However, the accurate prediction of a grape must variety and origin based on a component of the grape juice matrix implies that the component is characteristic of that particular variety or origin. This information may aid the understanding of winemakers and viticulturists, and, subsequently, help them to make more informed decisions regarding practices and processes that could be employed to ensure the desired quality and style of the final wine.
Overview of amino acid profiles
A heat map of the relative average amino acid concentrations is shown in Figure 1. It presents how much in terms of standard deviations the average amino acid concentration per cultivar deviates from the overall average across all cultivars. This method of representing the data provides a comprehensive overview of how the cultivars may compare to one another based on their amino acid content. Furthermore, the associated dendrogram indicates how the cultivars may relate to one another based on their amino acid profiles. Horizontally, the heat map shows the amino acid profile per cultivar and vertically, the relative average concentrations across cultivars for a specific amino acid. Thus, the cultivars containing very high or very low concentrations in comparison to the mean can be easily identified. For example, the white cultivars, Grenache blanc, Sémillon, Sauvignon blanc and Chenin blanc appear to group together based on the lower concentrations of amino acids compared to the other cultivars included in this study. Furthermore, it is clear that Merlot is the cultivar with the highest concentration of proline and that Pinotage – and to a lesser degree, Cinsaut – generally has higher concentrations of most of the amino acids compared to the other cultivars surveyed. The close genetic relationship between Pinotage and Cinsaut, together with the similarity in the amino acid profile, highlights the influence of the genetic make-up in determining the grape must composition.
FIGURE 1. Heat map of the average amino acid concentrations and dendrogram illustrating how these cultivars relate to one another based on these average concentrations.
Proline and arginine
Studies profiled cultivars according to whether they were proline or arginine accumulators, with proline accumulators indicated by a ratio of >1 and arginine accumulators indicated by a ratio of <1. It was proposed that this ratio could be used as an indicator of the ratio of assimilable nitrogen to non-assimilable nitrogen. (If you remember, arginine is a primary amino acid and therefore included in the FAN/YAN value, while proline is a secondary amino acid and does not contribute to this value.) This rule seems to hold up for most cultivars. For example, Grenache blanc, Pinotage and Cinsaut are all high-YAN yielding cultivars with a proline to arginine ratio of <1, and vice versa for cultivars, such as Merlot, Cabernet Sauvignon and Cabernet franc. On the other hand, this rule does not appear to apply for Chardonnay, a cultivar that is typically found to have very high average YAN concentrations.
When arranging the percentage of arginine content from least to most and proline from most to least, in most cases the cultivars line-up or are relatively close to lining up (Figure 2). Therefore, it appears that proline and arginine concentrations are, to a degree, inversely proportional to one another.
Some early studies on amino acid profiles have suggested that the proline to arginine ratio can be used as an index to discriminate between cultivars. The ratio of proline to arginine as a cultivar indicator was, however, not appropriate when large variations were found in the juices surveyed.
FIGURE 2. Percentage of amino acids per cultivar contributed by arginine, arranged in ascending order and percentage of amino acids contributed by proline, arranged in descending order.
Predictive ability of the grape must amino acid profile
The data was used to evaluate how accurately the amino acid composition could be used to discriminate between cultivars and predict a certain cultivar. Details on the statistical aspects of this work are detailed in a recently published MSc thesis. For the purpose of this article, however, we will present the process and the outcome in a simplified version.
Step 1: Red or white?
As a first step, for the discrimination between white and red grape juices, alanine, leucine, GABA and proline were the amino acids that achieved the best prediction. This means that just by taking into account the concentrations of these amino acids, we could accurately predict in 82.5% of the cases whether a juice was coming from a red or a white grape. When looking at the misclassification table, only 66% of the red grape juice samples were correctly predicted whereas 87% of white samples were correctly predicted.
Step 2: Red and white
Class membership of specific cultivars was predicted in two separate models, one for white and one for red. Chardonnay, Chenin blanc, Sauvignon blanc and Viognier were considered in this classification due to the number of samples in the set. Alanine and proline were again included in the best amino acid subset, in addition to arginine, methionine, threonine and glutamic acid. Using these amino acids, we were able to correctly identify 75.6% of the white grape juice samples according to cultivar. Furthermore, these results confirm the results from the heat map, where Sauvignon blanc, Chenin blanc and Viognier were found to be more similar to one another than to Chardonnay. Specifically, 100% of Chardonnay, 73.6% of Chenin blanc, 65.2% of Sauvignon blanc and 83.3% of Viognier samples were correctly predicted. Sauvignon blanc had the lowest prediction accuracy and was mainly misclassified as Viognier (15%) and Chenin blanc (14%).
Red cultivars considered for the classification included Cabernet franc, Cabernet Sauvignon, Cinsaut, Merlot, Pinotage and Shiraz. Overall, we could correctly predict only 60.1% of the red grape juice samples according to cultivar. Pinotage was most frequently correctly identified (75%), with only one sample being misclassified as Shiraz and two as Cinsaut. The misclassification of Pinotage as Cinsaut may also stem from their close genetic (parent-offspring) relationship. Furthermore, Cabernet franc was most frequently misclassified as Merlot, also possibly due to the close genetic (parent-offspring) relationship exhibited between these cultivars. Only 54.9% of the Shiraz samples were correctly predicted. In addition to this, other cultivars were most often misclassified as Shiraz. Therefore, Shiraz appears to have an amino acid profile which is quite similar to the other cultivars included in the model and thus, not easily distinguishable.
Take home message
Our results showed that there is merit in using amino acid profiles to distinguish between cultivars. However, prediction accuracy seemed to depend on, to a certain degree, how related cultivars were to one another. The classification of white cultivars was found to be more accurate than for red. This is hypothesised to be because of the closer genetic relationships between the red cultivars included in this study than between the white cultivars.
To our knowledge, this was the first study to use the amino acid profile of such a large number of grape juice samples to classify and discriminate between various cultivars. The possibility of this has only been eluded to by previous authors, especially for the prediction of white cultivars.
The aim was to investigate how characteristic the amino acid profile of a particular cultivar is. We also examined how accurately cultivars could be predicted based on their average amino acid concentrations using appropriate statistical tools. This type of data mining is often used to maximise the information output when large amounts of data have been generated in order to increase the return on time and resources investment.
– For more information, contact Astrid Buica at firstname.lastname@example.org.