ILLUSTRATION: Shutterstock.

A Big Data approach can be used in winemaking, using our YAN work as illustration.

Unlike other fermented beverages, wine’s appeal does not stem from its consistent flavour and aroma, but instead from the unique sensory experience that it can offer, from one vintage to the next. Thus, the consistency that is expected from a particular producer refers to a certain level of quality rather than a consistency in flavour. However, what constitutes as a ‘quality wine’ has become an increasingly controversial subject of debate. The emphasis that has been placed on understanding the conundrum of wine quality is primarily due to the increasing power of consumers – fuelled by globalisation and the free flow of information. As a result, the world-wine market has been forced to adopt a more market-driven approach. A few decades ago, quality was the prerogative of the producer, and consumers who did not appreciate a certain style of wine were often regarded as uncultured by their more affluent counterparts. However, in modern times, the definition of quality has moved into the hands of the consumer, and at the same time, has become a more subjective concept.

Moreover, this paradigm shift of power between the producer and the consumer has put, particularly the ‘Old-World’ wine producer, in a vulnerable position. Aside from the dwindling wine consumption patterns reported for the Old-World in contrast to what is being observed in the ‘New-World’, the advantage that these emerging wine producers have, is said to be their willingness to implement innovative technologies, which, in some cases, may involve stepping away from tradition. It must, however, be made clear that the technology being referred to here is over and above the tools required for the success of large-scale production.


What is ‘Big Data’?

A common misconception is that size of the dataset is the only requirement that permits the use of this term. The first attempt and widely accepted definition of Big Data was made by an analyst, Doug Laney from the META group. This definition came to be known as the 3Vs of Big Data: volume, velocity and variety. Later, two additional terms were added by IBM to characterise Big Data; these included value and veracity.


  • Volume: Simply put, volume refers to the magnitude of the dataset. This is highly dependent on the field in which the data is generated.
  • Velocity: is the rate at which the data is generated. Generally, Big Data is defined as data that is continuously being generated, enabled by technology, such as sensors.
  • Variety: Big Data is also characterised by the variety or heterogeneity of the type of data that is collected. It is by this definition that Big Data will enable the understanding of complex systems, such as those leading to the complex and unique flavour and quality of a wine.
  • Veracity: refers to the reliability or trueness of the data and is a particularly major challenge of Big Data.
  • Value: Two-fold: generally, Big Data is described as having ‘low value density’, i.e. the original form of the data has a low value in relation to its volume; however, when processed, this data can impose a great deal of value on a process or activity.


Thus, it becomes apparent that there are many dimensions in addition to just the sheer magnitude of the dataset. Furthermore, how the 5Vs of Big Data are interpreted is specific to the application field. At the crux of the holistic understanding of the winemaking process by the integration of wine research into the ‘Big Data revolution’ lies the need for the development of high-throughput and accessible techniques for chemical analysis.


How can we apply a Big Data approach to winemaking?

As the grape juice matrix provides the nutrients required by the yeast during fermentation, the grape juice has been identified as the primary determinant of the quality of the final wine. Over the years, yeast assimilable nitrogen (YAN) has been identified as one of the key role-players. Developing and adopting technologies that will enable the easy monitoring and control of this important component of the grape juice matrix can facilitate more informed decision-making, and thus, can increase the chances of producing premium wines.

The results previously reported in the August 2018 issue of WineLand, established the variability and range of total YAN concentrations for different cultivars grown in various districts across the Western Cape, which cultivars are most likely to require nutrient additions to ensure a successful fermentation, as well as those that could run the risk of excess nitrogen at the end of fermentation. The impact of geographical origin of the grapes was also explored and as a result, districts that may be frequently associated with nitrogen deficiency could also be identified. This information is invaluable to the local industry due to the current logistical issues associated with obtaining timely information regarding the nitrogen content of the grape juice matrix before the start of fermentation. The composition of the nitrogen status of the grape juice matrix was also investigated taking into account the importance of the effect that YAN has on the aromatic profile of a wine. Individual amino acid concentrations were also reported due to the roles they play in the complex metabolic activities of the yeast.

In light of these findings, it is clear that a large amount of data can help identify underlying patterns, and, subsequently, the major factors that are at play. Thus, the value that could be obtained from this data was primarily due to the volume of data that was collected. ‘Volume’ is particularly important in the context of YAN. This is strongly linked to the number of factors affecting its concentration and composition and, consequently, to the variability associated with this important component of the grape juice matrix. A ‘Big Data’ approach to wine research, and specifically YAN, would be ideal to enable a holistic and integrated understanding of such a complex system. However, by reviewing the characteristics of Big Data, it becomes clear that the current velocity of YAN data generation (through traditional methods), may not be adequate to allow for a realistic ‘Big Data’ approach. Therefore, to facilitate further value creation, it is necessary to use methods which will enable high-velocity data generation.

Due to the simple, rapid and cost-effective nature of spectroscopy and the recent developments in IR instrumentation and chemometrics, the ability of IR technology to accurately measure grape juice YAN was investigated. Due to the sampling and validation strategies employed, it was clear that the proposed models would be capable of providing accurate results in a practical scenario where samples from different cultivars, vintages and origins need to be analysed. Therefore, this research provides not only a technique for effective Big Data collection, but also a more rapid and cost-effective method for winemakers to obtain timely information. Therefore, from a ‘Big Data’ point of view, IR spectroscopy is capable of providing value by means of collecting a high volume of a variety of data at a high velocity. Future success of this technology in the context of Big Data will be spurred on by the development of accurate calibrations on portable hand-held devices providing the means of on-line and real-time data collection.



The term ‘Big Data’ has become a part of modern-day vocabulary, most commonly used in the field of business to facilitate the understanding of consumers. Nevertheless, the so-called ‘Big-Data revolution’ is just as indispensable to scientific research, providing the possibility of more data-driven and informed decision-making and hypothesis generation. Recently, there has been a vast amount of work done at the Department of Viticulture and Oenology (DVO), which focused on addressing some of these issues (Petrovic, 2018). The main aim of the work was to gain insight into the YAN status of South African grape juices currently used for commercial winemaking. To help provide a more comprehensive understanding of the YAN status of the grape juice matrix, we looked into maximising the information output of the surveyed data using various descriptive and exploratory statistical techniques. Furthermore, the data was used to build robust quantitative models using IR spectroscopy for the measurement of total YAN, FAN and ammonia for more rapid and cost-effective analysis and qualitative models to discriminate between cultivars based on amino acid composition. These aspects will be presented in a series of WineLand articles.


– For more information, contact Astrid Buica at


You may like to read these:

Go Back