Combined Analysis of Gut Microbiota, Diet and PNPLA3 Polymorphism in Biopsy-proven Non-alcoholic Fatty Liver Disease

Sonja Lang; Anna Martin; Xinlian Zhang; Fedja Farowski; Hilmar Wisplinghoff; Maria J.G.T. Vehreschild; Marcin Krawczyk; Angela Nowag; Anne Kretzschmar; Claus Scholz; Philipp Kasper; Christoph Roderburg; Raphael Mohr; Frank Lammert; Frank Tacke; Bernd Schnabl; Tobias Goeser; Hans-Michael Steffen; Münevver Demir


Liver International. 2021;41(7):1576-1591. 

In This Article

Materials and Methods

Patient Cohort

A total of 180 NAFLD patients were prospectively enrolled in this cross-sectional observational study between March 2015 and December 2018 in the outpatient liver department of the Clinic for Gastroenterology and Hepatology, University Hospital of Cologne, Germany (Figure 1A). The protocol was approved by the Ethics Commission (reference # 15–056) of Cologne University's Faculty of Medicine, and written informed consent was obtained from each patient. The study was performed in accordance with the Declaration of Helsinki.

Figure 1.

Study overview. A, Numbers of total non-alcoholic fatty liver disease (NAFLD) patients enrolled in our cross-sectional observational study. B, Dimensionality of dietary data and 16S rRNA gene sequencing data were reduced using principal component (PC) analyses, and features were included in multiple proportional ordinal regression analyses using liver histology parameters as outcome. Additionally, clinical features and PNPLA3 were included in the models

Patients were referred to our tertiary referral centre with elevated liver function tests and/or liver abnormalities on ultrasound for further diagnostic tests or with already diagnosed NAFLD in order to assess disease activity and severity. If NAFLD diagnosis was made or confirmed, patients were consecutively enrolled in this observational study.

Within the study, a detailed medical history including drug treatment, physical exam, laboratory analysis, anthropometric and blood pressure measurements, ultrasound and/or magnetic resonance imaging (MRI), transient elastography and liver biopsy, if clinically indicated, as per standard of care were performed. NAFLD was diagnosed, if the following conditions were met: hepatic steatosis on liver imaging (ultrasound and/or magnet resonance imaging) and/or the presence of ≥5% fat in histological analysis of liver biopsy; self-reported daily alcohol consumption of less than 10 g in women and less than 20 g in men; absence of steatogenic drugs such as glucocorticoids, methotrexate, amiodarone and tamoxifen; absence of other diseases causing secondary steatosis such as human immunodeficiency virus infection, celiac disease or inflammatory bowel disease; absence of other chronic liver diseases, for example, viral hepatitis, autoimmune hepatitis, toxic liver injury, alcoholic steatohepatitis, cholestatic liver disease, Wilson's disease and hereditary hemochromatosis. Exclusion criteria for all study subjects were oral- or intravenous antibiotic treatment within the last 6 months prior to the study, known malignancy, pregnancy and age <18 years. Further exclusion criteria for NAFLD patients were ongoing successful lifestyle modifications defined as more than 5% loss of body weight within the last 3 months prior to enrolment or current or prior participation in an interventional NASH study.[10,11]

Any recommendations or treatment suggestions for study participants did not differ from usual patient care. Thus, NAFLD patients were treated according to the recommendations of the current European guideline.[12]

Abdominal ultrasound was performed for all patients. All blood samples for laboratory analyses were collected under fasting conditions. Anthropometric measurements were carried out by physicians or trained research assistant nurses.

Type 2 diabetes was defined as glycated haemoglobin (HbA1c) ≥6.5% and/or fasting glucose ≥126 mg/dL and/or use of antidiabetic medications. Metabolic syndrome was defined following the International Diabetes Foundation (IDF) criteria.[13] Arterial hypertension was defined as office blood pressure ≥140/90 mmHg on ≥2 measurements during ≥2 occasions or antihypertensive drug treatment. Dyslipidemia was defined as increased plasma cholesterol (>200 mg/dL) and/or triglycerides ≥150 mg/dL and/or low high-density lipoprotein levels (<50 mg/dL for women and <40 mg/dL for men).

Liver Biopsies

Liver biopsy was performed in patients with NAFLD with history of persistently elevated serum alanine aminotransferase (ALT) and/or aspartate aminotransferase (AST) for at least 6 months, to rule out potential other liver diseases than NAFLD and if there was clinical suspicion for advanced liver disease. If liver biopsy was performed, samples were evaluated by an experienced liver pathologist who was blinded for all clinical and laboratory patient data. The NASH clinical research network histological scoring system[14] was used to evaluate disease activity and severity. Accordingly, the NAFLD activity score (NAS) was obtained for each biopsy. This score is defined as the unweighted sum of the scores for steatosis (0–3), lobular inflammation (0–3) and ballooning (0–2), thus ranging from 0 to 8.[14,15] Fibrosis was staged from 0 to 4:0 none, 1 perisinusoidal or periportal, 2 perisinusoidal and portal/periportal, 3 bridging fibrosis and 4 cirrhosis. Stages 1a, 1b and 1c were summarized as Stage 1.

Genotyping of the PNPLA3 Variant

Genotyping of the common PNPLA3 variant rs738409 (p.I148M) was performed centrally in the genetic laboratory of the Department of Medicine II (Saarland University Medical Center) by technicians blinded to the phenotypes of patients.[16] PNPLA3 genotypes were included in the regression models, using the wild-type genotype (CC) as reference versus heterozygous (CG) and homozygous (GG) carriers, and in addition, we combined the genotypes CG and GG using CC as reference.

Gut Bacterial Sequencing

The DNA was isolated using the RNeasy Power Microbiome Kit (Qiagen, Hilden, Germany). Seven of the nine variable bacterial 16S rRNA gene regions (pool 1: V2, V4 and V8; pool 2: V3, V6/7 and V9) were amplified with the Ion 16S Metagenomics Kit (Thermo Fisher Scienctific, Waltham, USA) utilizing two primer pools (an integrated research solution for bacterial identification using 16S rRNA sequencing on the Ion PGM™ System with Ion Reporter™ Software Amplicons were pooled and cleaned using the NucleoMag NGS Clean-up (Macherey-Nagel, Düren, Germany). The Qubit system was used to determine amplicon concentration; the library was prepared with the Ion Plus Fragment Library Kit (Thermo Fisher Scienctific, Waltham, USA). For the template preparation, amplicon concentration was diluted to 30 ng/mL. The Ion Chef Kit and the Ion Chef system (both, Thermo Fisher Scienctific, Waltham, USA) were used to enrich and prepare the template-positive Ion Sphere Particles (ISPs). Amplicon library was sequenced using the Ion Torrent S5 system (pH-dependent, Thermo Fisher Scienctific, Waltham, USA). The amplicon sequences were clustered into operational taxonomic units (OTUs) before taxonomical alignment with the MicroSEQ 16S-rDNA Reference Library v2013.1 (Thermo Fisher Scienctific, Waltham, USA) and Greengenes v13.5 databases; 97% similarity was used to genus level assignment and 99% similarity for species level assignment. Data files were assigned by the Ion Reporter metagenomics 16S w1.1 workflow (Thermo Fisher Scienctific, Waltham, USA). The raw data were processed using the programming language R version 3.5.1.[10,17] We obtained a total of 231 206 (range 30 397–659558) clean reads per sample on average.

Accession Numbers Sequence Data

Sequence data were registered at NCBI under BioProject PRJNA540738. BioSample IDs included in this study can be found in Table S1.

Dietary Records

The food intake was recorded using an open, 14-day self-administered food record. Patients were instructed to report each daily portion of consumed food and all beverages in as much detail as possible directly after ingestion, to weigh foods or to estimate doses in gramme and not to change their usual dietary and physical activity habits during the recording period. EBISpro 2016 professional scientific software was used to analyse energy intake, basal metabolic rate and all macro- and micronutrients. The intake of all macro- and micronutrients was divided by the total energy intake to obtain the relative intake of the respective food component.

Energy misreporting is a very frequently observed issue in self-reported dietary assessment and is considered to be unavoidable.[18,19] However, excluding misreporters leads to a loss of statistical power and may bias estimates of associations.[19] As an alternative approach, we calculated the ratio between energy intake (EI) and the basal metabolic rate (BMR) (EI:BMR ratio). Overall, 40% of the cohort were definite energy misreporters (EI:BMR below 1). To account for energy misreporting in our study, we included the EI:BMR ratio in our multiple analyses in similarity to previous publications.[17,19,20]

Statistical Analysis

Results are expressed as median and range in parentheses for each continuous outcome and as number and percentage for categorical variables. A two-sided P value less than .05 was considered as statistically significant. Simple and multiple proportional ordinal regression models as implemented in the "Ordinal" package in R,[21] using liver histology features that are measured on an ordinal scale as outcome, were used to associate the PNPLA3 polymorphism and various clinical and dietary features with liver disease severity (Figure 1B). We used ordinal regression analyses because dichotomizing variables, originally measured on an ordinal scale, leads to a loss of information, and ordinal regression methods have been proposed in order to reduce sample size and increase statistical power.[22,23] Given an alpha level of 0.05, the sample size of n = 57 provides 80% power to detect odds ratios of 3.8 (NAS), 3.9 (fibrosis), 4.1 (steatosis), 4.1 (ballooning) and 4.2 (inflammation). After performing a simple regression with calculation and comparison of the Akaike information criterion (AIC), we used a forward stepwise approach. The AIC is a measurement of the regression model performance and can be used to compare different models, whereas a lower AIC indicates a better performance. Adding variables to the model was stopped as soon as the model performance could not be further improved by adding other variables, that is, when the AIC values start to pick up if adding one more variable. Outliers were not excluded from the analysis, and the analysed variables included in the regression models were not transformed prior to entering the model.

Principal component (PC) analyses were used for dimensionality reduction of the dietary and 16S gene sequencing data. The first three dietary PCs, which covered three main food groups (Figure 2), were included in the regression models. For the 16S gene sequencing data, the major contributing bacterial taxa at genus level were represented in all the first six PCs (Figure S1). In order to improve interpretability of the regression models, we therefore included these specific bacterial taxa individually in the regression models (Figure 1B). Statistical analysis was performed using R statistical software, R version 3.5.1.[24]

Figure 2.

Loadings of the first three dietary principal components (PCs). The first dietary PC was predominantly represented by several amino acids, sulphur, niacin, phosphor, uric acid and purine. PC2 was mainly represented by fat components, sugar and carbohydrates; PC3 was represented by fibre and several vitamins. These three PCs were included in the multiple regression models