Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 92,455 bioRxiv papers from 394,899 authors.

Most downloaded bioRxiv papers, all time

in category epidemiology

1,556 results found. For more information, click each entry to expand.

1: Phenotypic Age: a novel signature of mortality and morbidity risk
more details view paper

Posted to bioRxiv 05 Jul 2018

Phenotypic Age: a novel signature of mortality and morbidity risk
7,441 downloads epidemiology

Zuyun Liu, Pei-Lun Kuo, Steve Horvath, Eileen Crimmins, Luigi Ferrucci, Morgan Levine

Background: A person's rate of aging has important implications for his/her risk of death and disease, thus, quantifying aging using observable characteristics has important applications for clinical, basic, and observational research. We aimed to validate a novel aging measure, 'Phenotypic Age', constructed based on routine clinical chemistry measures, by assessing its applicability for differentiating risk for morbidity and mortality in both healthy and unhealthy populations of various ages. Methods: A nationally representative US sample, NHANES III, was used to derive 'Phenotypic Age' based on a linear combination of chronological age and nine multi-system clinical chemistry measures, selected via cox proportional elastic net. Mortality predictions were validated using an independent sample (NHANES IV), consisting of 11,432 participants, for whom we observed a total of 871 deaths, ascertained over 12.6 year of follow-up. Proportional hazard models and ROC curves were used to evaluate predictions. Results: Phenotypic Age was significantly associated with all-cause mortality and cause-specific mortality. These results were robust to age and sex stratification, and remained even when excluding short-term mortality. Similarly, Phenotypic Age was associated with mortality among seemingly 'healthy' participants, defined as those who were disease-free and had normal BMI at baseline, as well as the oldest-old (aged 85+), a group with high disease burden. Conclusions: Phenotypic Age is a reliable predictor of all-cause and cause-specific mortality in multiple subgroups of the population. Risk stratification by this composite measure is far superior to that of the individual measures that go into it, as well as traditional measures of health. It is able to differentiate individuals who appear healthy, who may have otherwise been missed using traditional health assessments. Further, it can differentiate risk among persons with shared disease burden. Overall, this easily measured metric may be useful in the clinical setting and facilitate secondary and tertiary prevention strategies.

2: Clustering of adult-onset diabetes into novel subgroups guides therapy and improves prediction of outcome
more details view paper

Posted to bioRxiv 08 Sep 2017

Clustering of adult-onset diabetes into novel subgroups guides therapy and improves prediction of outcome
6,006 downloads epidemiology

Emma Ahlqvist, P Storm, A Käräjämäki, M Martinell, M Dorkhan, A Carlsson, P Vikman, RB Prasad, D Mansour Aly, P Almgren, Y Wessman, N Shaat, P Spegel, H Mulder, E Lindholm, O Melander, O Hansson, U Malmqvist, Å Lernmark, K Lahti, T Forsén, T Tuomi, AH Rosengren, L Groop

Background: Diabetes is presently classified into two main forms, type 1 (T1D) and type 2 diabetes (T2D), but especially T2D is highly heterogeneous. A refined classification could provide a powerful tool individualize treatment regimes and identify individuals with increased risk of complications already at diagnosis. Methods: We applied data-driven cluster analysis (k-means and hierarchical clustering) in newly diagnosed diabetic patients (N=8,980) from the Swedish ANDIS (All New Diabetics in Scania) cohort, using five variables (GAD-antibodies, BMI, HbA1c, HOMA2-B and HOMA2-IR), and related to prospective data on development of complications and prescription of medication from patient records. Replication was performed in three independent cohorts: the Scania Diabetes Registry (SDR, N=1466), ANDIU (All New Diabetics in Uppsala, N=844) and DIREVA (Diabetes Registry Vaasa, N=3485). Cox regression and logistic regression was used to compare time to medication, time to reaching the treatment goal and risk of diabetic complications and genetic associations. Findings: We identified 5 replicable clusters of diabetes patients, with significantly different patient characteristics and risk of diabetic complications. Particularly, individuals in the most insulin-resistant cluster 3 had significantly higher risk of diabetic kidney disease, but had been prescribed similar diabetes treatment compared to the less susceptible individuals in clusters 4 and 5. The insulin deficient cluster 2 had the highest risk of retinopathy. In support of the clustering, genetic associations to the clusters differed from those seen in traditional T2D. Interpretation: We could stratify patients into five subgroups predicting disease progression and development of diabetic complications more precisely than the current classification. This new substratificationn may help to tailor and target early treatment to patients who would benefit most, thereby representing a first step towards precision medicine in diabetes.

3: Increased risk of many early-life diseases after surgical removal of adenoids and tonsils in childhood
more details view paper

Posted to bioRxiv 05 Jul 2017

Increased risk of many early-life diseases after surgical removal of adenoids and tonsils in childhood
5,422 downloads epidemiology

Sean G. Byars, Stephen C. Stearns, Jacobus J. Boomsma

BACKGROUND: Surgical removal of the adenoids and tonsils are common pediatric procedures, with conventional wisdom suggesting their absence has little impact on health or disease. However, little is known about long-term health consequences beyond the perioperative risks. Such ignorance is significant, for these lymphatic organs play important roles in both the development and the function of the immune system. METHODS: We tested the long-term consequences of surgery in the population of Denmark by examining risk for 28 diseases with ~1 million individuals followed from birth up to 30 years of age depending on whether any of three common surgeries (adenoidectomy, tonsillectomy, adenotonsillectomy) occurred in the first 9 years of life. To weigh costs and benefits, we also compared the absolute risks for these diseases to the risks for the conditions that these surgeries aimed to treat. We obtained robust results by using stratified Cox regressions with statistically well-powered samples of cases (with surgery) and controls (without surgery) whose general health was no different prior to surgery. We adjusted our estimates of risk for diseases occurring before surgery, stratified for sex (and other effects) and for 18 covariates, including parental disease history and birth metrics. RESULTS: We found significantly elevated relative risks for many diseases, with effects on respiratory, allergic and infectious disorders after removal of adenoids and tonsils being most pronounced. For some of these diseases, absolute risk increases were considerable. In comparison, many risks for conditions that surgeries aimed to treat were either not significantly different or significantly higher following surgery up to 30 years of age. This suggests that any immediate benefits of these surgeries may not continue longer-term, while resulting in slightly compromised early adult health due to significantly increased risk of many non-target diseases. CONCLUSIONS: Our results indicate that surgical removal of tonsils and adenoids early in life are associated with longer-term health risks. They underline the importance of these organs and tissues for normal immune functioning and early immune development, and suggest that these longer-term disease risks may outweigh the short-term benefits of these surgeries.

4: Projected spread of Zika virus in the Americas
more details view paper

Posted to bioRxiv 28 Jul 2016

Projected spread of Zika virus in the Americas
4,000 downloads epidemiology

Qian Zhang, Kaiyuan Sun, Matteo Chinazzi, Ana Pastore-Piontti, Natalie E. Dean, Diana Patricia Rojas, Stefano Merler, Dina Mistry, Piero Poletti, Luca Rossi, Margaret Bray, M. Elizabeth Halloran, Ira M. Longini, Alessandro Vespignani

We use a data-driven global stochastic epidemic model to project past and future spread of the Zika virus (ZIKV) in the Americas. The model has high spatial and temporal resolution, and integrates real-world demographic, human mobility, socioeconomic, temperature, and vector density data. We estimate that the first introduction of ZIKV to Brazil likely occurred between August 2013 and April 2014 (90% credible interval). We provide simulated epidemic profiles of incident ZIKV infections for several countries in the Americas through February 2017. The ZIKV epidemic is characterized by slow growth and high spatial and seasonal heterogeneity, attributable to the dynamics of the mosquito vector and to the characteristics and mobility of the human populations. We project the expected timing and number of pregnancies infected with ZIKV during the first trimester, and provide estimates of microcephaly cases assuming different levels of risk as reported in empirical retrospective studies. Our approach represents an early modeling effort aimed at projecting the potential magnitude and timing of the ZIKV epidemic that might be refined as new and more accurate data from the region become available.

5: MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations
more details view paper

Posted to bioRxiv 16 Dec 2016

MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations
3,780 downloads epidemiology

Gibran Hemani, Jie Zheng, Kaitlin H. Wade, Charles Laurin, Benjamin Elsworth, Stephen Burgess, Jack Bowden, Ryan Langdon, Vanessa Tan, James Yarmolinsky, Hashem A. Shihab, Nicholas Timpson, David M. Evans, Caroline Relton, Richard M Martin, George Davey Smith, Tom R Gaunt, Philip Haycock

Published genetic associations can be used to infer causal relationships between phenotypes, bypassing the need for individual-level genotype or phenotype data. We have curated complete summary data from 1094 genome-wide association studies (GWAS) on diseases and other complex traits into a centralised database, and developed an analytical platform that uses these data to perform Mendelian randomization (MR) tests and sensitivity analyses (MR-Base, http://www.mrbase.org). Combined with curated data of published GWAS hits for phenomic measures, the MR-Base platform enables millions of potential causal relationships to be evaluated. We use the platform to predict the impact of lipid lowering on human health. While our analysis provides evidence that reducing LDL-cholesterol, lipoprotein(a) or triglyceride levels reduce coronary disease risk, it also suggests causal effects on a number of other non-vascular outcomes, indicating potential for adverse-effects or drug repositioning of lipid-lowering therapies.

6: Using the Weibull accelerated failure time regression model to predict time to health events
more details view paper

Posted to bioRxiv 04 Jul 2018

Using the Weibull accelerated failure time regression model to predict time to health events
3,763 downloads epidemiology

Enwu Liu, Karen Lim

Predict mean time to failure (MTTF) or mean time between failures (MTBF) and median survival time are quite common in Engineering reliability researches. In medical literature most prediction models are used to predict probabilities during a certain period of time. In this paper we introduced detailed calculations to predict different survival times using Weibull accelerated failure time regression model and assessed the accuracy of the point predictions. The method to construct confidence interval for the predicted survival time was also discussed.

7: Magnitude of road traffic accident related injuries and fatalities in Ethiopia
more details view paper

Posted to bioRxiv 01 Aug 2018

Magnitude of road traffic accident related injuries and fatalities in Ethiopia
3,275 downloads epidemiology

Teferi Abagaz, Samson Gebremedhin

Background: In many developing countries there is paucity of evidence regarding the epidemiology of road traffic accidents (RTAs). The study determines the rates of injuries and fatalities associated with RTAs in Ethiopia based on the data of a recent national survey. Methods: The study is based on the secondary data of the Ethiopian Demographic and Health Survey conducted in 2016. The survey collected information about occurrence injuries and accidents including RTAs in the past 12 months among 75,271 members of 16,650 households. Households were selected from nine regions and two city administrations of Ethiopia using stratified cluster sampling procedure. Results: Of the 75,271 household members enumerated, 123 encountered RTAs in the reference period and rate of RTA-related injury was 163 (95% confidence interval (CI): 136-195) per 100,000 population. Of the 123 causalities, 28 were fatal, making the fatality rate 37 (95% CI: 25-54) per 100,000 population. The RTA-related injuries and fatalities per 100,000 motor vehicles were estimated as 21,681 (95% CI: 18,090-25,938) and 4,922 (95% CI: 3325-7183), respectively. Next to accidental falls, RTAs were the second most common form of accidents and injuries accounting for 22.8% of all such incidents. RTAs contributed to 43.8% of all fatalities secondary to accidents and injuries. Among RTA causalities, 21.9% were drivers, 35.0% were passenger vehicle occupants and 36.0% were vulnerable road users including: motorcyclists (21.0%), pedestrians (12.1%) and cyclists (2.9%). Approximately half (47.1%) of the causalities were between 15-29 years of age and 15.3% were either minors younger than 15 years or seniors older than 64 years of age. Nearly two-thirds (65.0%) of the victims were males. Conclusion: RTA-related causalities are extremely high in Ethiopia. Male young adults and vulnerable road users are at increased risk of RTAs. There is a urgent need for bringing road safety to the country's public health agenda.

8: Assessment of Menstrual Hygiene Management and Its Determinants among Adolescent Girls: A Cross-Sectional Study in School adolescent girls in Addis Ababa, Ethiopia.
more details view paper

Posted to bioRxiv 22 Oct 2018

Assessment of Menstrual Hygiene Management and Its Determinants among Adolescent Girls: A Cross-Sectional Study in School adolescent girls in Addis Ababa, Ethiopia.
3,212 downloads epidemiology

Ephrem Biruk, Worku Tefera, Nardos Tadesse, Ashagre Sisay

Managing menstruation is essentially dealing with menstrual flow and also in continuing regular activities like going to school, working etc. However, menstruation can place significant obstacles in girls’ access to health, education and future prospects if they are not equipped for effective menstrual hygiene management. The objective of this study was to assess the menstrual hygiene management and its determinant among school girls in Addis Ababa, Ethiopia. Cross-sectional study design with quantitative method was carried out among 770 systematically selected adolescent school girls of Addis Ababa from April 1 to May 5, 2017. A self-administered pre-test close ended Amharic questionnaire at school setting was used for data collection. The coding was done using the original English version and entered to EPI-7 software. The quantitative file exported to statistical package for social science (SPSS) version 25.0 software for analysis. Total mean score was used to categorize individuals as good and poor while AOR; 95% CI with p < 0.05 was used to determine factors of menstrual hygiene management practice. This study had 98% response rate. 530 (70.1%) and 388(51.3%) respondents had good knowledge and practice of menstrual hygiene respectively. The findings also showed a significant positive association between good knowledge of menstruation and girls from mother’s whose education were secondary (AOR = 10.012, 95 % CI = 3.628-27.629). Wealth index quantile five (AOR = 9.038, 95 % CI = 3.728-21.909) revealed significant positive association with good practice of menstrual hygiene. Majority of participants had good knowledge and practice of menstrual hygiene and majority of them were from private school. Although knowledge was better than practice, girls should be educated about the process, use of proper pads or absorbents and its proper disposal. Keywords: practices of menstrual hygiene, Menstrual knowledge, adolescent girl, Sanitary napkins, Menarche, school health.

9: PHESANT: a tool for performing automated phenome scans in UK Biobank
more details view paper

Posted to bioRxiv 26 Feb 2017

PHESANT: a tool for performing automated phenome scans in UK Biobank
3,096 downloads epidemiology

Louise A C Millard, Neil M Davies, Tom R Gaunt, George Davey Smith, Kate Tilling

Motivation: Epidemiological cohorts typically contain a diverse set of phenotypes such that automation of phenome scans is non-trivial, because they require highly heterogeneous models. For this reason, phenome scans have to date tended to use a smaller homogeneous set of phenotypes that can be analysed in a consistent fashion. We present PHESANT (PHEnome Scan ANalysis Tool), a software package for performing comprehensive phenome scans in UK Biobank. General features: PHESANT tests the association of a specified trait with all continuous, integer and categorical variables in UK Biobank, or a specified subset. PHESANT uses a novel rule-based algorithm to determine how to appropriately test each trait, then performs the analyses and produces plots and summary tables. Implementation: The PHESANT phenome scan is implemented in R. PHESANT includes a novel Javascript D3.js visualization, and accompanying Java code that converts the phenome scan results to the required JavaScript Object Notation (JSON) format. Availability: PHESANT is available on GitHub at [https://github.com/MRCIEU/PHESANT]. Git tag v0.2 corresponds to the version presented here.

10: Stacked Generalization: An Introduction to Super Learning
more details view paper

Posted to bioRxiv 18 Aug 2017

Stacked Generalization: An Introduction to Super Learning
3,064 downloads epidemiology

Ashley I. Naimi, Laura B. Balzer

Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into what is now known as "Super Learner". Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of the Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.

11: The global distribution of Bacillus anthracis and associated anthrax risk to humans, livestock, and wildlife
more details view paper

Posted to bioRxiv 19 Aug 2018

The global distribution of Bacillus anthracis and associated anthrax risk to humans, livestock, and wildlife
3,059 downloads epidemiology

Colin J. Carlson, Ian T. Kracalik, Noam Ross, Kathleen Alexander, Martin E. Hugh-Jones, Mark Fegan, Brett Elkin, Tasha Epp, Todd K. Shury, Mehriban Bagirova, Wayne M. Getz, Jason K. Blackburn

Bacillus anthracis is a spore-forming, Gram-positive bacterium responsible for anthrax, an acute and commonly lethal infection that most significantly affects grazing livestock, wild ungulates and other herbivorous mammals, but also poses a serious threat to human health. The geographic extent of B. anthracis endemism is still poorly understood, despite multi-decade research on anthrax epizootic and epidemic dynamics around the world. Several biogeographic studies have focused on modeling environmental suitability for anthrax at local or national scales, but many countries have limited or inadequate surveillance systems, even within known endemic regions. Here we compile an extensive global occurrence dataset for B. anthracis, drawing on confirmed human, livestock, and wildlife anthrax outbreaks. With these records, we use boosted regression trees to produce the first map of the global distribution of B. anthracis as a proxy for anthrax risk. Variable contributions to the model support pre-existing hypotheses that environmental suitability for B. anthracis depends most strongly on soil characteristics such as pH that affect spore persistence, and the extent of seasonal fluctuations in vegetation, which plays a key role in transmission for herbivores. We apply the global model to estimate that 1.83 billion people (95% credible interval: 0.59 - 4.16 billion) live within regions of anthrax risk, but most of that population faces little occupational exposure to anthrax. More informatively, a global total of 63.8 million rural poor livestock keepers (95% CI: 17.5 - 168.6 million) and 1.1 billion livestock (95% CI: 0.4 - 2.3 billion) live within vulnerable regions. Human risk is concentrated in rural areas, and human and livestock vulnerability are both concentrated in rainfed systems throughout arid and temperate land across Eurasia, Africa, and North America. We conclude by mapping where anthrax risk overlaps with vulnerable wild ungulate populations, and therefore could disrupt sensitive conservation efforts for species like bison, pronghorn, and saiga that coincide with anthrax-prone, mixed-agricultural landscapes.

12: Genomic and epidemiological monitoring of yellow fever virus transmission potential
more details view paper

Posted to bioRxiv 16 Apr 2018

Genomic and epidemiological monitoring of yellow fever virus transmission potential
3,015 downloads epidemiology

N. R. Faria, Kraemer M. U. G., Hill S. C., Goes de Jesus J., de Aguiar R. S., Iani F. C. M., Xavier J., Quick J., du Plessis L., Dellicour S., Thézé J., Carvalho R. D. O., Baele G., Wu C.-H., Silveira P. P., Arruda M. B., Pereira M. A., Pereira G. C., Lourenço J., Obolski U., Abade L., Vasylyeva T. I., Giovanetti M., Yi D., Weiss D.J., Wint G. R. W., Shearer F. M., Funk S., Nikolai B., Adelino T. E. R., Oliveira M. A. A., Silva M. V. F., Sacchetto L., Figueiredo P. O., Rezende I. M., Mello E. M., Said R. F. C., Santos D. A., Ferraz M. L., Brito M. G., Santana L. F., Menezes M. T., Brindeiro R. M., Tanuri A., dos Santos F. C. P., Cunha M. S., Nogueira J. S., Rocco I. M., da Costa A. C., Komninakis S. C. V., Azevedo V., Chieppe A. O., Araujo E. S. M., Mendonça M. C. L., dos Santos C. C., dos Santos C. D., Mares-Guia A. M., Nogueira R. M. R., Sequeira P. C., Abreu R. G., Garcia M. H. O., Alves R. V., Abreu A. L., Okumoto O., Kroon E. G., de Albuquerque C. F. C., Lewandowski K., Pullan S. T., Carroll M., Sabino E. C., Souza R. P., Suchard M. A., Lemey P., Trindade G. S., Drumond B. P., Filippis A. M. B., Loman N. J., Cauchemez S., Alcantara L. C. J., Pybus O. G.

The yellow fever virus (YFV) epidemic that began in Dec 2016 in Brazil is the largest in decades. The recent discovery of YFV in Brazilian Aedes sp. vectors highlights the urgent need to monitor the risk of re-establishment of domestic YFV transmission in the Americas. We use a suite of epidemiological, spatial and genomic approaches to characterize YFV transmission. We show that the age- and sex-distribution of human cases in Brazil is characteristic of sylvatic transmission. Analysis of YFV cases combined with genomes generated locally using a new protocol reveals an early phase of sylvatic YFV transmission restricted to Minas Gerais, followed in late 2016 by a rise in viral spillover to humans, and the southwards spatial expansion of the epidemic towards previously YFV-free areas. Our results establish a framework for monitoring YFV transmission in real-time, contributing to the global strategy of eliminating future yellow fever epidemics.

13: Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome
more details view paper

Posted to bioRxiv 10 Aug 2017

Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome
3,008 downloads epidemiology

Gibran Hemani, Jack Bowden, Philip Haycock, Jie Zheng, Oliver Davis, Peter Flach, Tom Gaunt, George Davey Smith

A major application for genome-wide association studies (GWAS) has been the emerging field of causal inference using Mendelian randomization (MR), where the causal effect between a pair of traits can be estimated using only summary level data. MR depends on SNPs exhibiting vertical pleiotropy, where the SNP influences an outcome phenotype only through an exposure phenotype. Issues arise when this assumption is violated due to SNPs exhibiting horizontal pleiotropy. We demonstrate that across a range of pleiotropy models, instrument selection will be increasingly liable to selecting invalid instruments as GWAS sample sizes continue to grow. Methods have been developed in an attempt to protect MR from different patterns of horizontal pleiotropy, and here we have designed a mixture-of-experts machine learning framework (MR-MoE 1.0) that predicts the most appropriate model to use for any specific causal analysis, improving on both power and false discovery rates. Using the approach, we systematically estimated the causal effects amongst 2407 phenotypes. Almost 90% of causal estimates indicated some level of horizontal pleiotropy. The causal estimates are organised into a publicly available graph database (http://eve.mrbase.org), and we use it here to highlight the numerous challenges that remain in automated causal inference.

14: Transfer entropy as a tool for inferring causality from observational studies in epidemiology
more details view paper

Posted to bioRxiv 14 Jun 2017

Transfer entropy as a tool for inferring causality from observational studies in epidemiology
2,948 downloads epidemiology

N. Ahmad Aziz

Recently Wiener's causality theorem, which states that one variable could be regarded as the cause of another if the ability to predict the future of the second variable is enhanced by implementing information about the preceding values of the first variable, was linked to information theory through the development of a novel metric called “transfer entropy”. Intuitively, transfer entropy can be conceptualized as a model-free measure of directed information flow from one variable to another. In contrast, directionality of information flow is not reflected in traditional measures of association which are completely symmetric by design. Although information theoretic approaches have been applied before in epidemiology, their value for inferring causality from observational studies is still unknown. Therefore, in the present study we use a set of simulation experiments, reflecting the most classical and widely used epidemiological observational study design, to validate the application of transfer entropy in epidemiological research. Moreover, we illustrate the practical applicability of this information theoretic approach to real-world epidemiological data by demonstrating that transfer entropy is able to extract the correct direction of information flow from longitudinal data concerning two well-known associations, i.e. that between smoking and lung cancer and that between obesity and diabetes risk. In conclusion, our results provide proof-of-concept that the recently developed transfer entropy method could be a welcome addition to the epidemiological armamentarium, especially to dissect those situations in which there is a well-described association between two variables but no clear-cut inclination as to the directionality of the association.

15: Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
more details view paper

Posted to bioRxiv 30 Jan 2018

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
2,753 downloads epidemiology

Andrew J. Steele, S. Aylin Cakiroglu, Anoop D. Shah, Spiros C. Denaxas, Harry Hemingway, Nicholas M. Luscombe

Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values in datasets are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of over 80,000 patients from the CALIBER programme, we performed a systematic comparison of several machine-learning approaches in EHR. We used Cox models and random survival forests with and without imputation on 27 expert-selected variables to predict all-cause mortality. We also used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values, and with no need to scale or transform continuous data. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build reliable models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research.

16: An epigenetic biomarker of aging for lifespan and healthspan
more details view paper

Posted to bioRxiv 05 Mar 2018

An epigenetic biomarker of aging for lifespan and healthspan
2,748 downloads epidemiology

Morgan E Levine, Ake T. Lu, Austin Quach, Brian H. Chen, Themistocles Assimes, Stefania Bandinelli, Lifang Hou, Andrea A Baccarelli, James D Stewart, Yun Li, Eric A Whitsel, James G Wilson, Alex P Reiner, Abraham Aviv, Kurt Lohman, Yongmei Liu, Luigi Ferrucci, Steve Horvath

Identifying reliable biomarkers of aging is a major goal in geroscience. While the first generation of epigenetic biomarkers of aging were developed using chronological age as a surrogate for biological age, we hypothesized that incorporation of composite clinical measures of phenotypic age that capture differences in lifespan and healthspan may identify novel CpGs and facilitate the development of a more powerful epigenetic biomarker of aging. Using a innovative two-step process, we develop a new epigenetic biomarker of aging, DNAm PhenoAge, that strongly outperforms previous measures in regards to predictions for a variety of aging outcomes, including all-cause mortality, cancers, healthspan, physical functioning, and Alzheimer's disease. While this biomarker was developed using data from whole blood, it correlates strongly with age in every tissue and cell tested. Based on an in-depth transcriptional analysis in sorted cells, we find that increased epigenetic, relative to chronological age, is associated increased activation of pro-inflammatory and interferon pathways, and decreased activation of transcriptional/translational machinery, DNA damage response, and mitochondrial signatures. Overall, this single epigenetic biomarker of aging is able to capture risks for an array of diverse outcomes across multiple tissues and cells, and provide insight into important pathways in aging.

17: The national alert-response strategy against cholera in Haiti: a four-year assessment of its implementation
more details view paper

Posted to bioRxiv 05 Feb 2018

The national alert-response strategy against cholera in Haiti: a four-year assessment of its implementation
2,626 downloads epidemiology

Stanislas Rebaudet, Gregory Bulit, Jean Gaudart, Edwige Michel, Pierre Gazin, Claudia Evers, Samuel Beaulieu, Aaron Aruna Abedi, Lindsay Osei, Robert Barrais, Katilla Pierre, Sandra Moore, Jacques Boncy, Paul Adrien, Edouard Beigbeder, Florence Duperval Guillaume, Renaud Piarroux

Background: A massive cholera epidemic struck Haiti on October 2010. As part of the national cholera elimination plan, the Haitian government, UNICEF and other international partners launched a nationwide alert-response strategy from July 2013. This strategy established a coordinated methodology to rapidly target cholera-affected communities with WaSH (water sanitation and hygiene) response interventions conducted by field mobile teams. An innovative red-orange-green alert system was established, based on routine surveillance data, to weekly monitor the epidemic. Methodology/Principal findings: We used cholera consolidated surveillance databases, alert records and details of 31,306 response interventions notified by WaSH mobile teams to describe and assess the implementation of this approach between July 2013 and June 2017. Response to red and orange alerts was heterogeneous across the country, but significantly improved throughout the study period so that 75% of red and orange alerts were responded within the same epidemiological week during the 1st semester of 2017. Numbers of persons educated about cholera, houses decontaminated by chlorine spraying, households which received water chlorination tablets and water sources that were chlorinated during the same week as cholera alerts significantly increased. Alerts appeared to be an interesting and simple indicator to monitor the dynamic of the epidemic and assess the implementation of response activities. Conclusions/Significance: The implementation of a nationwide alert-response strategy against cholera in Haiti was feasible albeit with certain obstacles. Its cost was less than USD 8 million per year. Continuing this strategy seems essential to eventually defeat cholera in Haiti while ambitious long-term water and sanitation projects are conducted in vulnerable areas. It constitutes a core element of the current national plan for cholera elimination of the Haitian Government.

18: Epidemiology of Cancers in Zambia: A Significant Variation in Cancer Incidence and Prevalence across the Nation
more details view paper

Posted to bioRxiv 28 Aug 2018

Epidemiology of Cancers in Zambia: A Significant Variation in Cancer Incidence and Prevalence across the Nation
2,606 downloads epidemiology

Maybin Kalubula, Heqing Shen, Mpundu Makasa, Longjian Liu

Background: Cancers are one of the leading causes of death worldwide. More than two thirds of deaths due to cancers occur in low- and middle- income countries whereZambia belongs. This study therefore sought to assess the epidemiology of cancers in Zambia. Methods: We conducted a retrospective observational study nested on Zambia National Cancer Registry (ZNCR) histopathological and clinical data from 2007 to 2014.Zambia Central Statistics Office (CSO)demographic datawere used to calculate prevalence and incidence rates of cancers. Age-adjusted rates and case fatality rates were estimated using standard methods. We used a Poisson Approximation for calculating 95% confidence intervals (CI). Results: The top seven most cancer prevalent districts in Zambia have been Luangwa, Kabwe, Lusaka, Monze, Mongu, Katete and Chipata. Cervical cancer, prostate cancer, breast cancer and Kaposi’s sarcoma were the top four most prevalent cancers as well as major causes of cancer related deaths in Zambia.Standardised Incidence Rates and 95% CI for the top four cancers were: cervix uteri (186.3; CI = 181.77 – 190.83), prostate (60.03; CI = 57.03 – 63.03), breast (38.08; CI = 36.0 – 40.16) and Kaposi’s sarcoma (26.18; CI = 25.14 – 27.22).CFR were: Leukaemia (38.1%); pancreatic cancer (36.3%); lung cancer (33.3%); and brain, nervous system (30.2%). Cancers were associated with HIV with p- value of 0.000 and Pearson correlation coefficient of 0.818. Conclusions: The widespread distribution of cancers with high prevalence in the southern zone has been perpetrated by lifestyle and sexual culture as well as geography. Intensifying cancer screening and early detection countrywide as well as changing the lifestyle and sexual culture would greatly help in the reduction of cancer cases in Zambia.

19: Plasma proteome profiling to detect and avoid sample-related biases in biomarker studies
more details view paper

Posted to bioRxiv 30 Nov 2018

Plasma proteome profiling to detect and avoid sample-related biases in biomarker studies
2,601 downloads epidemiology

Philipp E Geyer, Eugenia Voytik, Peter V. Treit, Sophia Doll, Alisa Kleinhempel, Lili Niu, Johannes B. Müller, Jakob Bader, Daniel Teupser, Lesca M. Holdt, Matthias Mann

Plasma and serum are rich sources of information regarding an individuals health state and protein tests inform medical decision making. Despite major investments, few new biomarkers have reached the clinic. Mass spectrometry (MS)-based proteomics now allows highly specific and quantitative read-out of the plasma proteome. Here we employ Plasma Proteome Profiling to define contamination marker panels to assess plasma samples and the likelihood that suggested biomarkers are instead artifacts related to sample handling and processing. We acquire deep reference proteomes of erythrocytes, platelets, plasma and whole blood of 20 individuals (>6000 proteins), and compare serum and plasma proteomes. Based on spike-in experiments we determine contamination-associated proteins, many of which have been reported as biomarker candidates as revealed by a comprehensive literature survey. We provide sample preparation guidelines and an online resource (www.plasmaproteomeprofiling.org) to assess overall sample-related bias in clinical studies and to prevent costly miss-assignment of biomarker candidates.

20: Collider Scope: When selection bias can substantially influence observed associations
more details view paper

Posted to bioRxiv 07 Oct 2016

Collider Scope: When selection bias can substantially influence observed associations
2,460 downloads epidemiology

Marcus R. Munafò, Kate Tilling, Amy E Taylor, David M. Evans, George Davey Smith

Large-scale cross-sectional and cohort studies have transformed our understanding of the genetic and environmental determinants of health outcomes. However, the representativeness of these samples may be limited - either through selection into studies, or by attrition from studies over time. Here we explore the potential impact of this selection bias on results obtained from these studies, from the perspective that this amounts to conditioning on a collider (i.e., a form of collider bias). While it is acknowledged that selection bias will have a strong effect on representativeness and prevalence estimates, it is often assumed that it should not have a strong impact on estimates of associations. We argue that because selection can induce collider bias (which occurs when two variables independently influence a third variable, and that third variable is conditioned upon), selection can lead to substantially biased estimates of associations. In particular, selection related to phenotypes can bias associations with genetic variants associated with those phenotypes. In simulations, we show that even modest influences on selection into, or attrition from, a study can generate biased and potentially misleading estimates of both phenotypic and genotypic associations. Our results highlight the value of knowing which population your study sample is representative of. If the factors influencing selection and attrition are known, they can be adjusted for. For example, having DNA available on most participants in a birth cohort study offers the possibility of investigating the extent to which polygenic scores predict subsequent participation, which in turn would enable sensitivity analyses of the extent to which bias might distort estimates.

Previous page 1 2 3 4 5 . . . 78 Next page


Sign up for the Rxivist weekly newsletter! (Click here for more details.)