Home   Lipidomics Research   Data analysis imputation 

Lipidomics data analysis: Imputation

Research Article

Imputation helps to deal with missing values in lipidomics datasets.

About the author


Olga (Olya) Vvedenskaya
Sci. Communications Officer

Dr. Dr. Olya Vvedenskaya studied medicine, and further obtained her PhD in the field of molecular oncology. She loves to deliver scientific messages in a clear and accessible manner.

Resources


Imputation of missing values…

Frölich et al. | proteomics (2024)


Genetic architecture of human plasma…

Gerl et al. | PLOS Biology (2019)


Lipidomics Resource Center

 

About Lipotype


Lipotype is the leading lipidomics service provider. Order your service. Send your samples. Get your data.

Lipotype Lipidomics

Coverage of 100+ lipid classes and 4200+ individual lipids

Rich variety of sample types from subcellular to organs

High-throughput analysis for data in as little as 2 weeks

GMP certified, robust, and highly reproducible

Group of colorful cubes in a row, blue background. Abstract illustration, 3d render.

Summary

• Mass spec data points can be missing due to various reasons
• Missing data can lead to biased interpretations and hinder statistical analyses
• Imputation methods fill in the gaps in lipidomics datasets

Mass spectrometry (MS) is a powerful analytical technique widely used in life sciences. Analyzing lipids with MS allows scientists to get insights into a large variety of biological and medical questions. However, the reliability of insights drawn from mass spectrometry data can be compromised by missing values, a common challenge in real-world experiments. The data points can be missing because of several reasons. It could be due to the compound not being present in a sample, due to sample preparation (for example, added chemicals for analyte degradation), or due to the detection limit of the instrument. This leads to signals from some compounds being below the limit of quantitation (LOQ).

Factors causing missing values in MS lipidomics: compound not in the sample, detection limit, and sample preparation.

Missing data in mass spectrometry experiments can lead to biased interpretations, hinder statistical analyses, and limit the scope of biological insights. For example, an experiment output has five lipids measured. CE 18:1;0 was the only lipid measured in all the samples. All other lipids have at least one missing value.

Typical lipidomic dataset with missing values highlighted in red.

Typical lipidomic dataset with missing values highlighted in red.

Many researchers start their analysis with a principal component analysis (PCA). However, a PCA method does not accept missing data points. Therefore, a researcher should either use another method or try to work around the problem. This can be done by using only completely measured lipids or, due to the high cost of information loss, by imputing the missing data points. The word ‘imputation’ comes from the Latin imputo, which means to reckon, to attribute, to clear up.

Data imputation methods aim to address these challenges by filling in the gaps. They allow for more statistical methods to be applied. It also simplifies the programming effort, and can also often extract more insights from the existing data. There are many methods available today for imputing missing lipid data points and scientists are often faced with the question of which one to choose. The answer to this question is not easy, because there is no single best method. But there is probably a method that works best for the respective data.

An image depicting a scientist thinking of how to deal with the missing value.

An important question to ask is why the data points are missing. Is it completely random? Or is there some factor (tracked or untracked) that influences the probability of a data point being missing? Although these questions are important for the choice of the imputation method, it is not possible to answer them with confidence for every missing data point.

In lipidomics context, for example, there are many data points missing because they are below the limit of detection. Therefore, one can be confident that imputing all the values using the mean or median value for the corresponding lipid would result in poor imputation quality. Also imputing these missing data points consequently with zero cannot be correct, because values below a detection limit are not necessarily zero. Therefore, it seems to be reasonable to impute all missing data points by half the detection limit. However, this is also problematic, because it is obvious, that not all missing data points of the same lipid are given by the same value. It would be unrealistic to impute all missing values of CE 20:1;0 in the example by half the limit of detection. That would mean that 6 out of 9 samples would have the same imputed value for this lipid.

An image depicting a scientist trying to see things clearly.

There is a need for a more advanced method. Lipotype scientists investigated many different imputation method, applied them to lipidomic datasets, and performed simulation studies. They concluded, that the k-nearest neighbor truncation approach described by Shah et al. works best. In brief, k-nearest neighbor truncation, unlike other imputation methods, takes the limit of detection into account. Missing values are imputed by first transforming lipid-wise all values to a common scale, then finding the nearest neighbor (=lipid) for the lipid-containing missing values, imputing the missing values with that of the nearest neighbor, and finally back-transforming the data. Below is the data table of our example after imputing the missing values.

Typical lipidomic dataset with imputed values highlighted in red.

Typical lipidomic dataset with imputed values highlighted in red.

The advantage of a complete dataset is not just the ability to use a broader range of methods and simplify the programming process. The dataset can now be more informative, e.g. statistical tests can be applied at all or with greater confidence. Although the benefits of imputation should be treated with caution, imputation simplifies all further statistical analyses and expands the possibilities when working with lipidomics datasets.

Lipotype Lipidomics technology provides a powerful solution for customers seeking insights into cellular lipid profiles. Lipotype aids in lipid research in various directions, like dermatology or cardiovascular diseases.

Related articles

See all articles

together with
Lipotype


Logo of Lipotype GmbH on white background.

Lipotype is the leading lipidomics provider for all scientists. The mass spectrometry-based platform can be applied to all biological samples, and is completed with data visualization and statistical analyses. Lipotype translates complex lipidomics data sets into convincing lipidomics results, in as little as two weeks


Share this story

About Lipotype


Lipotype is the leading lipidomics service provider to reach your research goals. Order your lipidomics service, send in your samples and get your data in as little as two weeks.