Pooling samples for proteomics – biomarker profiling case studies

Published on July 9, 2014

To pool or not to pool biological samples? This question might pop up in the mind of anyone designing biomarker discovery approaches!

Much debate has been raised on biomarker discovery from clinical cohort studies, since the first experiments linking SNPs to disease phenotypes, to the current and new proteomic and miRNA technologies. The answer to this question strictly depends on clinical data, patient group characteristics and… financial means.

“Should we analyse each and every sample of our cohorts of patients individually”?

“Shall I pool samples first before profiling biomarkers?”

Pooling (or not) of samples before biomarker profiling – This is a recurrent topic our Biomarkers discovery experts tackle when supporting Clinicians or Researchers designing biomarker discovery programs.

Proteomic profiling helps biomarker identification

Whatever the context (pathologies with no appropriate biomarker or with old-dated but still well-established biomarker…), the analysis of patients’ clinical data mixed with new biomarker profiling technologies is crucial to meet personalized or translational medicine standards (ex. glucose blood levels vs. glycated hemoglobin HbA1c concentration in Type II diabetes, IL-15 and MCP-1 in myocardial infarction). This is even more striking with the emerging needs for therapeutic companion diagnostics in the pharma / biotech industries.

Therefore, scientists and clinicians feel the urge to discover accurate biomarkers that will allow to either diagnose a disease, foresee its evolution, or predict its response to different treatments according to patient groups:

Better diagnose a disease, or forecast its evolution / response to treatment
Predict how a disease will evolve or behave in a given patient, allowing advancement in personalized medicine and patient stratification
Monitor disease progression and response to therapy
Develop companion diagnostics to test different therapeutic approaches
Discover new biomarkers. Sometimes what’s already in the literature has a limited value

Biomarker discovery starts from cohorts of patient samples

In an ideal world, quantitative assessment of individual patient samples is preferable (be it ELISA kits (ex. Human CT-1 or HA), Antibody arrays, secretome or kinome testings…). The decision to pool biological samples before performing the experimental assay is, very often, perceived as a “risk factor” adding various biases and artifacts in statistical analysis (elements that Biomarker discoverers dislike!).

Nevertheless, and in real life, any study is biased. After all, if you are going to study a disease, you will never be able to test all the patients in the world with that disease, but only the ones who have been detected and included in clinical sample biobanks.

When designing sampling strategies for a biomarker profiling project, various limitations can rapidly become practical hurdles. A few examples:

Low number of patients in the cohort
Low sample volumes
Lack of grants, experimental skills or time …

Whetever the issue, the design (from a limited study to a larger clinical study) remains possible by pooling samples!

Pooling samples for biomarker discovery is a practical approach – Pros & Cons

Even if most reviews strongly discourage the use of pools, the reality is that more and more publications use this approach. Nothing to object, always that the initial study using pools is then validated in individual sample analysis (and a sufficient number of them!) to give a statistical value to the trends observed at the “profiling phase” of the projects.

Let’s take a typical study with dozens of patients’ samples, to identify new biomarkers in serum / plasma. To put it simply, profiling on secretome:

One may decide to use antibody arrays profiling up to 1,000 secretome biomarkers in each individual sample. This would be the approach if we would like to study individual differences between the patients, and no clear groups can be defined.
However, if in our dozen samples we have, say two clear, homogeneous groups (healthy vs. diseased, responsive vs. non-responsive, etc), pooling biological samples can be an attractive option!

L-Series: Label-Based Semi-Quantitative Glass Slide and Membrane Arrays Raybiotech tebu-bio — Label-Based Semi-Quantitative Biomarker signature. Source: Raybiotech – tebu-bio laboratories.

In this case, here are some things you should be aware of:

Advantages of pooling samples

Get the main biomarkers differentiating each group of patients
Save money at early profiling / discovery phases
Ignore the individual differences and observe a trend
Save sample (which can be crucial when having limited volumes)

Disadvantages of pooling samples

Loss of information on the biological variation. This is specially crucial when the individual differences between patients are very high. Using pools, the p-value is artificially low
Validation of the obtained results after the profiling step using pools is mandatory, in order to obtain a significant p-value and confirm that the discovered biomarkers are meaningful
Missing information on the small changes between samples (or groups of samples), profiling on pools gives you information “only” on the major changes
Having homogeneous groups of patients when pooling, meaning that you need to have a deep knowledge on the clinical history (which is not always the case)

Key criteria when deciding whether to pool or not to pool samples is to study the patients’ data in detail and, define homogeneous groups.

Also, there is some debate on the ideal size of the pool (i.e. how many samples it should include). Setting aside statistical calculations, the number of samples per pool in publications ranges from n=3 to n=50. There is no unique response to this. Again, it depends on what you expect from your study, and most importantly, how you will continue it.

Pooling samples is effective to identfy biomarkers – 2 case studies

Let’s see some cases. They are based on real stories, but for confidentiality reasons, some data are modified.

Case #1 – responsive vs. non-responsive patients

This scientist wanted to study plasma samples from oncology patients, progressing to a given class of tumour after the initial diagnosis (i.e. tumor type A progresses to tumour type B). He also wanted to study the relationship between the prognosis and development of cardiovascular disease.

Scientist grouped the samples in two groups – oncology patients with CVD risk and without CVD risk, based on clinical story. One patient was in the first group, only that he had gone to tumour type C.

We recommended taking this patient’s sample aside, as even if he had cardiovascular risk, he did not fit into the baseline definition of evolution into tumor type B. Two arrays were run (i.e. two pools, each consisting of 10 patients as there were no more available at that stage), and 5 relevant biomarkers were found. These 5 biomarkers were then studied in a cohort of 100 patients, using ELISA in individual samples, with the corresponding biological and technical replicates.

Why weren’t these 5 biomarkers studied from the beginning?

Well, because some of them had never been described before in the literature. They were completely new, and the only way to find them was to do an initial profiling experiment.

These 5 biomarkers were then studied in a cohort of 100 patients, using ELISA in individual samples, with the corresponding biological and technical replicates. Accuracy of these markers could thus be confirmed individually and quantitatively.

Case #2 – influence of hormones on 24 sub-groups of patients

In this case, scientists wanted to study a disease related to reproduction in females, where hormones seemed to have a role.

The original 200 patient cohort was divided into 24 different groups, as not only the responsive vs. non-responsive groups had to be taken into account, but also the high-responsive and low-responsive subgroups, as well as the situation of the hormones depending on the day of the cycle.

After some biomarker candidates were found (a total of 20), different solutions, depending on the number of samples to be used for result validation were suggested by tebu-bio experts. These solutions were mostly based on the design and production of custom arrays as well as target-specific ELISAs.

To summarize, is “pooling” biological samples for biomarker signature by proteomics attractive?

Pooling samples before performing biomarker profiling is not a perfect option. However, it can be of help when financial or sample limitations jeopardize biomarker discovery approaches.

In these specific cases, first running “pre-screening steps” of a few samples representative of homogeneous patient groups, by quantitatively assessing two or three biomarker candidates before pooling is recommended. This can e easily and rapidly be performed with the ready-to-use ELISA kits (so on individual samples) that are now commercially available. This “pre-screening” step will also allow preservation of as much as possible of precious samples for further analysis. Once these “pre-selected” candidates are validated, a deeper analysis mixing proteomic profiling results (using antibody arrays, Quansys Q-Plex™ multiplex ELISA or classical ELISA) with clinical data might help you discover significative biomarkers.

So… to resume in brief, it all depends on what you have in terms of samples, funding, time, skills, appropriate clinical data! There are many open options we’ve taken a look at here to choose from. If you’re thinking of going a step further, you might like to contact Biomarker experts to further discuss your biomarker discovery approaches by proteomics, and experimentally design them through ready-to-use assay kits or laboratory services.

[contact-form to=’ana@arraztio@tebu-bio.com’ subject=’Pool or not sample?’][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]