Origin of Vegetation Indices

The origin of vegetation indices can be traced back to the end of the Cold War and the need for the United States to predict wheat production in the Soviet Union. During the Cold War, the United States sought to monitor the agricultural productivity of the Soviet Union, particularly wheat production, as a means of assessing their economic and political stability nature.com

Vegetation indices, such as the Normalized Difference Vegetation Index (NDVI), are calculated from satellite-based remote sensing data to provide a quantitative measure of vegetation health and productivity nature.com. These indices can be used to estimate crop yields, assess the impact of weather on agricultural production, and monitor land use changes over time. Satellites played a crucial role in obtaining the necessary remote sensing data to calculate vegetation indices. During the Cold War, the United States launched several Earth observation satellites, such as the Landsat program, which provided multispectral imagery that could be used to derive vegetation indices sciencedirect.com.

  1. The launch of the first Earth observation satellite, TIROS-1, in 1960 marked the beginning of satellite-based remote sensing en.wikipedia.org. Although TIROS-1 was primarily focused on weather monitoring, it demonstrated the potential of using satellite technology for Earth observation, which would later be applied to monitor vegetation and agricultural productivity.

  2. The development of multispectral remote sensing techniques in the 1960s and 1970s laid the foundation for deriving vegetation indices from satellite data en.wikipedia.org. These techniques allowed researchers to measure the reflectance of different wavelengths of light from the Earth's surface, which could be used to assess vegetation health and productivity.

  3. The early development of vegetation indices, such as the Normalized Difference Vegetation Index (NDVI), can be traced back to the work of researchers like Rouse et al. in the early 1970s en.wikipedia.org. They showed that the difference between red and near-infrared reflectance could be used to monitor the "green wave effect" of natural vegetation, which would later be adapted to estimate crop yields and monitor agricultural productivity.

  4. The launch of the Landsat program in 1972 provided high-quality multispectral imagery that could be used to derive vegetation indices and monitor agricultural productivity en.wikipedia.org. Although the Landsat program began after the 1960s, it was a critical milestone in the development of remote sensing technology and its application to vegetation monitoring.

 

By monitoring vegetation indices from satellite data, the United States could estimate the wheat production in the Soviet Union, providing valuable information on the potential food supply and the overall economic stability of the country. This information could be used to inform political and strategic decisions during the Cold War.

In summary, the relationship between vegetation indices, satellites, and the Cold War is rooted in the United States' need to predict the wheat production of the Soviet Union. Vegetation indices derived from satellite-based remote sensing data allowed the United States to assess the agricultural productivity of the Soviet Union, which was crucial for understanding their economic and political stability during the Cold War.

First reliable Vegetation Indices

The first vegetation indices (VIs) were developed in the 1960s to enhance the contribution of vegetation properties in spectral imaging and allow reliable spatial and temporal comparisons of terrestrial photosynthetic activity and canopy structural variations en.wikipedia.org. Early VIs aimed to maximize sensitivity to vegetation characteristics while minimizing confounding factors such as soil background reflectance, directional, or atmospheric effects sciencedirect.com.

One of the earliest and most commonly used vegetation indices is the Normalized Difference Vegetation Index (NDVI), which utilizes information contained in the red and near-infrared (NIR) canopy reflectances or radiances. NDVI is calculated as follows:

$NDVI = (ρ_{NIR} - ρ_{RED}) / (ρ_{NIR} + ρ_{RED})$

where $ρ_{RED}$ and $ρ_{NIR}$ represent spectral reflectances in the red and NIR regions. NDVI enhances the contrast between soil and vegetation while minimizing the effects of illumination conditions sciencedirect.com.

However, NDVI is sensitive to optical properties of the soil background. For a given amount of vegetation, darker soil substrates result in higher VI values. To address this issue, the Soil Adjusted Vegetation Index (SAVI) was introduced:

$SAVI = (1 + C) * (ρ_{NIR} - ρ_{RED}) / (ρ_{NIR} + ρ_{RED} + C)$

The constant $C$ is introduced to minimize soil-brightness influences and can vary from zero to infinity as a function of the canopy density. If $C = 0$, SAVI is equivalent to NDVI sciencedirect.com. As remote sensing technology evolved, more advanced VIs were developed, such as the Enhanced Vegetation Index (EVI). EVI optimizes the vegetation signal with improved sensitivity in high biomass regions and is calculated as follows:

$EVI = G * (ρ_{NIR} - ρ_{RED}) / (ρ_{NIR} + C1 * ρ_{RED} - C2 * ρ_{Blue} + L)$

Where $ρ_{Blue}$ is the surface reflectance at the blue band, $C1$ and $C2$ are the coefficients of the aerosol resistance term, $X$ is the canopy background adjustment factor, and $G$ is the gain factor sciencedirect.com.

In summary, the first-generation vegetation indices, such as NDVI and SAVI, were built to enhance the contribution of vegetation properties in spectral imaging and allow reliable spatial and temporal comparisons of terrestrial photosynthetic activity and canopy structural variations. These indices were designed to maximize sensitivity to vegetation characteristics while minimizing confounding factors such as soil background reflectance, directional, or atmospheric effects. A complet list of existing remote sensing indices was built on www.indexdatabase.de.

Why most vegetation indices are empirical

Most vegetation indices are empirical because they are derived from observed relationships between spectral reflectance data and vegetation properties, such as canopy structure, leaf pigment content, and photosynthetic potential pubmed.ncbi.nlm.nih.gov. These relationships are often established by statistically fitting observed values of vegetation properties to corresponding vegetation indices sciencedirect.com but more commonly, these relationsheep was human interpreted, espacially for NDI, SAVI and EVI. 

Empirical vegetation indices are designed to "maximize" sensitivity to vegetation characteristics while "minimizing" confounding factors such as soil background reflectance, directional, or atmospheric effects sciencedirect.com. However, there is no universal vegetation index equation applicable to all vegetation types because the empirical coefficients depend primarily on vegetation types. To operationally use vegetation indices, an equation must be established for each vegetation type, requiring substantial measurements and corresponding remote-sensing data sciencedirect.com. This is mainly due to the fact that early vegetation indices equation was set empirically, without fondment and was manually tunned.

The advantage of using empirical vegetation indices is their simplicity and ease of computation. Since they are based on observed relationships, they can provide "robust" proxies for vegetation properties in various applications, such as monitoring biomass, water use, plant stress, and crop production sciencedirect.com. However, it is essential to understand how external factors, such as soil background, moisture condition, solar zenith angle, view angle, and atmosphere, can alter the computed index values and influence their interpretation sciencedirect.com.

How does the empirical basis of vegetation indices lead to a lack of precision and sub-optimal solutions ?

The lack of precision in empirical vegetation indices can be attributed to several factors:

  1. Variability in vegetation types: Empirical vegetation indices are based on observed relationships between spectral reflectance data and vegetation properties. However, there is no universal equation applicable to all vegetation types, as the coefficients depend primarily on the specific vegetation type hindawi.com. This means that an equation must be established for each vegetation type, which requires substantial measurements and corresponding remote-sensing data hindawi.com.

  2. Influence of external factors: Empirical vegetation indices are sensitive to external factors such as soil background, moisture condition, solar zenith angle, view angle, and atmosphere sciencedirect.com. These factors can alter the computed index values and influence their interpretation, leading to sub-optimal solutions. For example, the Normalized Difference Vegetation Index (NDVI) is sensitive to optical properties of the soil background, which can lead to higher index values for darker soil substrates despite a given amount of vegetation sciencedirect.com.

  3. Spatial and temporal variations: Empirical vegetation indices may not accurately capture spatial and temporal variations in vegetation properties. For example, the 'simplified triangle' method for estimating evaporative fraction (EF) and surface soil moisture (SSM) from remotely sensed data of land surface temperature (Ts) and a vegetation index (VI) derived from ESA's Sentinel-3 satellite showed a Root Mean Square Error (RMSE) of 0.063 and 0.048 vol vol−1 and correlation coefficients (R) of 0.777 and 0.439 for EF and SSM, respectively sciencedirect.com. While these results demonstrate potential, they also highlight the limitations in accurately mapping key biophysical parameters.

The human procedure involved in developing empirical vegetation indices can lead to sub-optimal solutions because it relies on observed relationships between spectral reflectance data and vegetation properties, which are subject to variability in vegetation types and external factors. Additionally, the lack of a universally applicable equation for all vegetation types necessitates the establishment of separate equations for each vegetation type, further contributing to the imprecision of empirical vegetation indices.

Normalized index creation (i.e. NDVI) . Can be the model construction automatized?

Lets assume a multiple response variable curve as for example plant reflectance (measured along different wavelengths). Using PLS, PLSR or other variable selection technique we could select which wavelengths are the more sensible to a specific change (i.e. more dry or wet conditions). Even further, we could use PLS or PLSR over the first derivate to see where this differences are bigger (What explain the higher variability and normalize the data). Once the variables selected, we could create an index as for example NDVI or for this case VAR . For this explanatory case lets assume that the relationship and all variables are know (VAR = X1/Z2).
 
X1<- c( 100,200,300,100,200,300,100,200,300)
X2 <- c(50,100,150,25,50,75,33,66,99)
VAR <- X1/X2
 
How is the relationship between variables found ? a linear model can be fit (VAR~ I(X1/X2) with a perfect fit but only once that relationship is know and in many cases this relationship is not evident.
 
The normalized index is always defined as a map function. If there is an input variable for which two or more different outputs state are expected, one of them is incorrectly annotated. This is the assumption we make in machine learning. In this case, we use a loss function to take into account the most likely case. For example, the L2 distance is used in linear regression. If I remember correctly, the Eigen decomposition is used for PLS. On the other hand, it is also possible that if the same variable describes two or more output states, then the input variable lacks features to discriminate them and you cannot deal with that.
 
The creation of a normalized index can be modeled in different ways. The case of NDVI is interesting, although it is the most used index, it is also one of the least effective for binary soil/vegetation segmentation, probably because it is empirically defined as for most vegetation indices. Only a few recent studies attempt to optimize these indices. Symbolic regression is one way, and can be implemented through a genetic algorithm :
 
the function is randomly mutated and the best model is retained at each iteration. After a series of modifications, the global best model is taken. We therefore assume that the output model is the best. This approach can also be used for spectral band selection.
 
The other possibility is function approximator. We can look for a function that can describe all the others. The Bernstein polynomials is a common approach. The universal function approximators based on the Taylor expansion is another. These approaches can be implemented in deep learning. If we cannot describe the learned function directly, we can ensure that the learned model is the best one since we directly reconstruct the function we are looking for, by optimizing a specific loss function. I strongly recommend you to read this study :
 
 

Machine learning applied algorithm to optimise or learn vegeation indices

Machine learning algorithms have shown great potential in optimizing and learning vegetation indices. In this small literature review, we will discuss various machine learning algorithms and their applications in enhancing vegetation indices, including deep learning.

A study by Balducci et al. link.springer.com discusses the applicability of machine learning-based solutions in various real-world application domains, including agriculture. They mention machine learning applications on agricultural datasets for smart farm enhancement, which could potentially be used to optimize vegetation indices link.springer.com.

Another study ncbi.nlm.nih.gov discusses the use of machine learning algorithms for predicting vegetation indices, such as NDVI, from multispectral satellite images. The authors compare the performance of various machine learning algorithms, including support vector machines (SVM), random forests (RF), and artificial neural networks (ANN), for predicting NDVI values. The results show that machine learning algorithms can achieve high accuracy in predicting vegetation indices, with RF and ANN outperforming SVM in terms of prediction accuracy.

DeepIndices

The paper "DeepIndices: Remote Sensing Indices Based on Approximation of Functions through Deep-Learning, Application to Uncalibrated Vegetation Images" proposes a deep learning-based approach to develop and optimize vegetation indices from uncalibrated remote sensing images mdpi.com. This approach matters for several reasons:

  1. Overcoming limitations of empirical indices: Traditional empirical vegetation indices have limitations due to their reliance on observed relationships between spectral reflectance data and vegetation properties, which are subject to variability in vegetation types and external factors mdpi.com. DeepIndices leverages deep learning techniques to automatically learn and optimize vegetation indices by capturing complex patterns in the data, which can be difficult to achieve using traditional empirical methods.

  2. Applicability to uncalibrated images: One of the significant advantages of DeepIndices is its ability to work with uncalibrated vegetation images mdpi.com. This feature makes the approach more versatile and practical for various real-world applications, as it can be applied to a wide range of remote sensing data without requiring extensive calibration efforts.

  3. Improved accuracy and robustness: DeepIndices can lead to better vegetation indices in various fields by providing more accurate and robust indices compared to traditional empirical indices mdpi.com. By automatically learning and capturing complex patterns in the data, DeepIndices can produce indices that are more representative of vegetation properties, leading to better decision-making in various applications such as agriculture, environmental monitoring, and forest management.

In summary, the DeepIndices approach matters because it addresses the limitations of traditional empirical vegetation indices by leveraging deep learning techniques to automatically learn and optimize vegetation indices from uncalibrated remote sensing images. This approach can lead to better indices in various fields by providing more accurate and robust indices that can be applied to a wide range of remote sensing data.