Project Beam: Pioneering Biomarker Detection through Spectroscopy

Richa Pandya
10 min readJan 6, 2024

--

In July 2022, I was invited for an immersive experience at X, the Moonshot Factory (formerly known as Google [x]) along with 17 other students. We spent time at their Mountain View campus — learning about their projects and approach to tackling some of the hardest problems in the world.

X’s philosophy revolves around betting on ideas that sound insane, but have the potential to redefine humanity. Some of their past projects include building self-driving cars @Waymo, creating carbon-neutral fuel from seawater @Project Foghorn (I wrote an article about this a while ago), and visualizing the electrical grid @Tapestry.

Although X’s moonshots take years to develop, we were challenged to build our own moonshots within two weeks. My team (Aaryan, Graeme, Robert, Valmik and I), built a [very scrappy] project in the diagnostics space that we call Project Beam.

Sick-care > Health care

If you break down the progression of a disease, it always starts at a very small scale — your body tries to fight it, and symptoms typically only start to show when it’s grown to a level where your body cannot handle the stressors.

Our entire healthcare system is built around treating disease after symptoms start to show, rather than preventing it in the first place or detecting the disease before it grows to a large scale. Especially with critical diseases, by the time symptoms start to show, it’s often far too late:

  • 2M people die each year from undetected sepsis or bacterial infections.
  • misdiagnosis of strokes lead to 40–80K preventable deaths each year in the US
  • 70% of lung cancers are diagnosed at stage III-IV

This reactive approach to health ends the lives of millions of people every year: most of which could’ve been prevented with early detection.

Moreover, this leads to billions of dollars lost to both individuals and institutions — some estimates say nearly 75% of healthcare spending in the USA goes into treatment for avoidable diseases like diabetes and heart disease.

The diagnostics and personalized medicine space is growing annually (valued at $28.03B USD in 2021 with 4.9% CAGR from 2022–30), but there is still no widespread, economical approach for detecting any disease early.

Our thought process was: we know that all diseases leave traces in the body years before any symptoms start to show. Is there an easy, non-invasive way to detect the biomarkers for any disease years in advance?

X is known for large, audacious moonshots, but this idea seemed to be more of a mars-shot, so we narrowed our scope to lung cancer (at least for the beginning).

Why Lung Cancer?

Cancer accounts for every one in six deaths worldwide, and lung cancer makes up 25% of these deaths.

Although there are existing treatments like chemotherapy, radiation and immunotherapy, early detection is the biggest needle mover. But despite this, only 18% of lung cancers are diagnosed while the tumour is still localized.

Why? With lung cancer, most symptoms only show in the later stages (it takes around eight years for squamous cell carcinoma to reach 30 mm when it is most commonly diagnosed), and even then, the symptoms are very generic (weight loss, chest pain, unrelieved cough, etc), which makes it more difficult to diagnose.

There are two primary modes of detecting lung cancer:

  • Low-dose spiral CT scans: With a 93% sensitivity, LDCTs are accurate, but are only recommended to individuals that fit within the screening criteria (55–80 years with 20+ years of smoking history). 1/3 individuals diagnosed with lung cancer fall outside of this screening criteria.
  • Liquid biopsies: Liquid biopsies are easy, fast, and have accurate results, but still require doctors prescriptions + are typically used as an alternate to tissue biopsies when the tumour is difficult to reach or dangerous to biopsy traditionally.

Using Multidimensional Spectroscopy to detect biomarkers

Lung cancer (both small cell and non-small cell lung cancer) have three common protein biomarkers: CEA, RBP and a1 antitrypsin (expression of these proteins had a sensitivity of 89.3% and specificity of 84.7%).

Unidimensional vs Multidimensional Spectrography

Most efforts to study biomarkers like glucose in the blood are optical, but this doesn’t carry through to biomarkers found in less concentrations because optical methods alone suffer from low penetration and confounding molecules. This will give weaker readings, which will ultimately give you high error and lower confidence.

Typical spectrography methods yields an amplitude of varying wavelengths, and depending on the specific type of spectrography, you get information about the optical or acoustic properties of a substance.

In this situation, because the substance we’re looking at is blood, and has thousands of different chemicals, one peak on the graph could be correlated with numerous biomolecules, which is why unidimensional spectrography is not effective for biomarker detection.

Our approach was to take three different measurements (photoacoustic → optical characteristics, acoustic → acoustic characteristics, dielectric → electrical impedance characteristics), and plot them on a combined graph.

This graph would then be put through a demixing algorithm to correlate the peaks with different biomarker concentrations. If the biomarker threshold is above normal, the individual or their healthcare provider could be alerted and prompted to do a more thorough assessment to diagnose lung cancer.

To enhance the accuracy and reliability of our approach, we implemented a robust baseline process for handling the incoming signal data from the spectrometers. Initially, we accept the incoming signal data, which serves as the raw input for our analysis. Recognizing the inherent noise in raw spectrometer data, our next step involves de-noising this data. We achieve this by referencing historic averages, which helps in filtering out anomalies and maintaining the integrity of the signal.

Once the data is de-noised, we proceed to calculate the Discrete Fourier Transform (DFT) of the raw data. The DFT is crucial as it transforms the data into a domain where it’s easier to analyze the frequency components. Post-DFT, we apply a low-pass filter to the transformed data to further reduce noise. This step is vital in ensuring that only relevant frequency components are retained, thereby enhancing the clarity of the signal.

The next phase involves isolating the peaks from the DFT. We meticulously store the wavelengths and amplitudes of these peaks, as they are indicative of various biomarkers. The key to our method lies in the analysis of changes in these wavelength amplitudes over time. By comparing these changes with historic trends, which are based on large pre-collected datasets, we can make informed predictions about the welfare of the individual. This comparative analysis allows us to identify deviations from the norm that might indicate the presence of lung cancer or other diseases.

Even though this specific example was focused on lung cancer biomarkers, this general process (with the right amount of data) could theoretically be extrapolated to any disease.

Our Rough Demonstration

One thing I really appreciated at X was their commitment to building out scrappy experiments to test hypotheses super early on — even before all the details have been fleshed out. We took a similar approach and the video below walks through our work.

Obviously, this is a super crude model that was built in one day, so it’s not fully representative of what this project would look at the finished level.

We also built a photoacoustic spectroscopy system (one of the three types of spectrography) that shoots lasers at 1550 nanometers. This excited the chlorophyll and hemoglobin molecules to produce pressure waves that were detected by a piezocrystal.

Although our piezocrystal receiver was not sensitive enough to detect the minute concentration changes for most biomarkers, this is a very rough proof of concept for what this idea could look like. Our eventual idea was for this to be a small device kept at home or at the doctors office and would screen your hand for disease.

Understanding the Monkeys and the Pedestals

The monkey and the pedestal is an analogy used very frequently at X — if you’re training a monkey to recite Shakespeare on a pedestal, the easy thing is to start building the pedestal, but if you can’t teach the monkey how to recite Shakespeare, the project will never turn into reality.

The idea is to always start with the hardest part of the problem, because if that part doesn’t work, the rest of it won’t work either, and it’s not worth investing more time, money, and resources- especially when there’s other high impact projects on the line.

If Project Beam was taken forward at X, one very early step would be to assess the monkeys and pedestals, and that’s what we did.

What needs to be true for this to work?

Multidimensional spectography

The three types of spectrography we’re looking at (photoacoustic, acoustic, and diaelectric) have been done with varying levels of success on blood, skin, and other tissue, but never in conjunction within a portable device.

One thing to further understand would be the lowest concentration of compound we can detect in blood plasma — the answer to this question sets the scene as to which diseases could be detected.

Even with this, the signals must be strong enough to correlate with biomarker concentrations. Photoacoustic spectrography is used currently for non-invasive glucose detection, but this is only possible because glucose is found in high concentrations (definitely way higher than cancer biomarkers, for example).

Maximizing the signal-to-noise ratio (SNR) is a major challenge behind this project. One part of this is through hardware more sensitive sensors, maximizing contact between device and hand, etc. The other part of this is through software — algorithms such as Savitzky-Golay algorithm or wavelet transform have 7xed the SNR.

Demixing + Detecting the biomarkers

The intention behind using multidimensional spectrography > unidimensional spectrography is to increase the accuracy when drawing correlation between the spectogram and biomarkers. Drawing this correlation (using demixing algorithms) is depedent on the quality + quantity of data available.

Currently, the data for this project does not exist and would need to be collected. There are two specific classes of data that we’d need:

  • Spectrogram data, where the person’s hand gets scanned and all three spectrography methods are completed + collected
  • Ground truth data (on blood) to understand the what the actual biomarker concentrations are

Our idea was to build an MVP device and place it in locations where individuals get their blood tested: The individual would scan their hand for spectrogram data, and the ground truth data would come from their blood test. Both tests would be done at the same time to limit discrepancies.

Once the data is collected, we’d need to validate that you can accurately draw correlations to biomarker concentrations: in this case, we’re working with sensitive healthcare data, which makes it accuracy a major factor to consider.

Biomarker concentration α Disease Detection

While our initial findings have shown a promising correlation between certain biomarkers — specifically CEA, RBP, and a1 antitrypsin — and lung cancer, it’s crucial to acknowledge that this is merely the first step in a longer, more complex journey. To diagnose lung cancer with a high degree of accuracy, more research is needed to determine the specific concentrations of these biomarkers that are indicative of the disease.

The same meticulous approach must be applied if we aim to expand our focus to other diseases. Each disease will have its unique set of biomarkers, and understanding the precise relationship between these biomarkers and the disease state is essential.

My Final Thoughts

After spending time at X immersed in this incredible project, it culminated in a moment I’ll always remember: personally presenting our team’s findings and ideas to the executive team at X, including Astroteller and Benoit Schillings. This experience was not just a testament to our hard work, but also an invaluable opportunity to receive feedback that was as enlightening as it was inspiring.

My time spent here was more than just an exploration into the world of diagnostics; it was an endless cycle of challenges and learning. Navigating the complexities of multidimensional spectroscopy, grappling with the nuances of biomarker analysis, and translating these into a coherent presentation taught us lessons far beyond the realms of science and technology. It was about perseverance, teamwork, and the audacity to think big.

A huge shoutout goes to my team — Aaryan, Graeme, Robert, and Valmik. Your brilliance, creativity, and dedication were the lifeblood of Project Beam. And to the mentors and everyone at X, thank you for always making us feel welcome and steering us through this incredible experience — your wisdom was our north star.

The ethos of problem-solving at X, with its genuine excitement to change the world and infectious atmosphere of laughter and positivity, has truly left a lasting impression on me.

And just a heads-up, X — this isn’t the last you’ve seen of me. I’ll be back, armed with the same fresh ideas and relentless spirit. Keep the light on for me; the future’s looking bright and I can’t wait to be a part of it again!

I’ve also written about my personal experiences and the specific lessons learned in another article, which you can read about here.

--

--