Machine learning models for predicting blood pressure phenotypes by combining multiple polygenic risk scores

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 3,19 MB, PDF-dokument

  • Yana Hrytsenko
  • Benjamin Shea
  • Michael Elgart
  • Nuzulul Kurniansyah
  • Genevieve Lyons
  • Alanna C. Morrison
  • April P. Carson
  • Bernhard Haring
  • Braxton D. Mitchell
  • Bruce M. Psaty
  • Byron C. Jaeger
  • C. Charles Gu
  • Charles Kooperberg
  • Daniel Levy
  • Donald Lloyd-Jones
  • Eunhee Choi
  • Jennifer A. Brody
  • Jennifer A. Smith
  • Jerome I. Rotter
  • Matthew Moll
  • Myriam Fornage
  • Noah Simon
  • Peter Castaldi
  • Ramon Casanova
  • Ren Hua Chung
  • Robert Kaplan
  • Sharon L.R. Kardia
  • Stephen S. Rich
  • Susan Redline
  • Tanika Kelly
  • Timothy O’Connor
  • Wei Zhao
  • Wonji Kim
  • Xiuqing Guo
  • Yii Der Ida Chen
  • Tamar Sofer

We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model’s performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1 to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8 to 5.1% (SBP) and 4.7 to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs. In summary, non-linear ML models improves BP prediction in models incorporating diverse populations.

OriginalsprogEngelsk
Artikelnummer12436
TidsskriftScientific Reports
Vol/bind14
Antal sider17
ISSN2045-2322
DOI
StatusUdgivet - 2024

Bibliografisk note

Funding Information:
This study was performed as a collaboration of the NHLBI Trans-Omics in Precision Medicine (TOPMed) Consortium. We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed and CCDG. TOPMed and CCDG acknowledgements, as well as descriptions, acknowledgements, and ethics statements of contributing studies are provided in Supplementary Note 5. TOPMed consortium researchers and their affiliations are listed in Supplementary Note 6. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. We thank Mass General Brigham Biobank for providing samples, genomic data, and health information data. We also thank the HPC support team of Enterprise Research Infrastructure & Services at Mass General Brigham for their support and for the provision of computational resources. This work was approved by the Mass General Brigham Institutional Review Board and by the Beth Israel Deaconess Medical Center Committee on Clinical Investigations. This study was supported by National Heart Lung and Blood Institute (NHLBI) Grant R01HL161012 to TS. MM was supported NHLBI Grant K08HL159318.

Funding Information:
B Psaty serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. G Lyons is currently a full time employee of Alexion, AstraZeneca Rare Disease, and hold stock in the company, however, her contributions to the present manuscript were performed as part of her previous affiliation at the Harvard T.H. Chan School of Public Health and this work is not related to her current occupation and affiliation. M Moll has received grant funding from Bayer and consulting fees from TriNetX, 2ndMD, TheaHealth, Sitka, Verona Pharma, and Axon Advisors. All other authors report no competing interests.

Funding Information:
This study was performed as a collaboration of the NHLBI Trans-Omics in Precision Medicine (TOPMed) Consortium. We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed and CCDG. TOPMed and CCDG acknowledgements, as well as descriptions, acknowledgements, and ethics statements of contributing studies are provided in Supplementary Note . TOPMed consortium researchers and their affiliations are listed in Supplementary Note . The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. We thank Mass General Brigham Biobank for providing samples, genomic data, and health information data. We also thank the HPC support team of Enterprise Research Infrastructure & Services at Mass General Brigham for their support and for the provision of computational resources. This work was approved by the Mass General Brigham Institutional Review Board and by the Beth Israel Deaconess Medical Center Committee on Clinical Investigations. This study was supported by National Heart Lung and Blood Institute (NHLBI) Grant R01HL161012 to TS. MM was supported NHLBI Grant K08HL159318.

Publisher Copyright:
© The Author(s) 2024.

ID: 394538488