Single-cell Transcriptomic Architecture Unraveling the Complexity of Tumor Heterogeneity in Distal Cholangiocarcinoma

Background & Aims Distal cholangiocarcinoma (dCCA) are a group of epithelial cell malignancies that occurs at the distal common bile duct, and account for approximately 40% of all cholangiocarcinoma cases. dCCA remains a highly lethal disease as it typically features remarkable cellular heterogeneity. A comprehensive exploration of cellular diversity and the tumor microenvironment is essential to depict the mechanisms driving dCCA progression. Methods Single-cell RNA sequencing was used here to dissect the heterogeneity landscape and tumor microenvironment composition of human dCCAs. Seven human dCCAs and adjacent normal bile duct samples were included in the current study for single-cell RNA sequencing and subsequent validation approaches. Additionally, the results of the analyses were compared with bulk transcriptomic datasets from extrahepatic cholangiocarcinoma and single-cell RNA data from intrahepatic cholangiocarcinoma. Results We sequenced a total of 49,717 single cells derived from human dCCAs and adjacent tissues, identifying 11 distinct cell types. Malignant cells displayed remarkable inter- and intra-tumor heterogeneity with 5 distinct subsets were defined in tumor samples. The malignant cells displayed variable degree of aneuploidy, which can be classified into low- and high-copy number variation groups based on either amplification or deletion of chr17q12 - chr17q21.2. Additionally, we identified 4 distinct T lymphocytes subsets, of which cytotoxic CD8+ T cells predominated as effectors in tumor tissues, whereas tumor infiltrating FOXP3+ CD4+ regulatory T cells exhibited highly immunosuppressive characteristics. Conclusion Our single-cell transcriptomic dataset depicts the inter- and intra-tumor heterogeneity of human dCCAs at the expression level.

METHODS: Single-cell RNA sequencing was used here to dissect the heterogeneity landscape and tumor microenvironment composition of human dCCAs. Seven human dCCAs and adjacent normal bile duct samples were included in the current study for single-cell RNA sequencing and subsequent validation approaches. Additionally, the results of the analyses were compared with bulk transcriptomic datasets from extrahepatic cholangiocarcinoma and single-cell RNA data from intrahepatic cholangiocarcinoma.

RESULTS:
We sequenced a total of 49,717 single cells derived from human dCCAs and adjacent tissues, identifying 11 distinct cell types. Malignant cells displayed remarkable inter-and intra-tumor heterogeneity with 5 distinct subsets were defined in tumor samples. The malignant cells displayed variable degree of aneuploidy, which can be classified into low-and high-copy number variation groups based on either amplification or deletion of chr17q12 -chr17q21.2. Additionally, we identified 4 distinct T lymphocytes subsets, of which cytotoxic CD8þ T cells predominated as effectors in tumor tissues, whereas tumor infiltrating FOXP3þ CD4þ regulatory T cells exhibited highly immunosuppressive characteristics. C holangiocarcinoma (CCA) arises from the epithelial lining of the biliary tree. Based on the anatomical locations, CCAs are commonly classified into intrahepatic cholangiocarcinoma (iCCA), perihilar cholangiocarcinoma, and distal cholangiocarcinoma (dCCA); the latter one refers to a subtype emerging in the area between the origin of the cystic duct and ampulla of Vater. 1,2 dCCA remains a highly lethal disease due to its increasing diagnostic incidence and high mortality rates. When diagnosed, the majority of cases have already reached the advanced stage because of being asymptomatic or having nonspecific symptoms at an early stage. 3 Surgical resection and subsequent adjuvant therapy can improve the overall survival rate for dCCA, but optimal adjuvant treatment strategy has not yet been established. The recurrence rate after surgical resection remains high, with a median overall survival for patients with dCCA after surgery ranging from 35 to 48 months. [4][5][6] The prognosis is extremely poor for patients with unresectable tumor due to the lack of available treatment options.
Compared with the other subtypes of CCA, the molecular landscape of dCCA remains poorly understood as little progress has occurred for it. Several studies using largescale bulk genomic and transcriptomic data revealed some critical gene mutations and aberrant signaling pathways in dCCA pathogenesis. KRAS, TP53, ARID1A, and SMAD4 were the most prevalent mutations. 7 CCA, including dCCA, is featured by profound genetic heterogeneity and rich tumor microenvironment (TME) comprising various cell types such as tumor cells, infiltrating immune cells, endothelial cells, and extracellular components. As bulk profiling limits the ability to capture tumor heterogeneity, deciphering the molecular profiles at the subclone or single-cell level is important for understanding the biology, the oncogenic landscape, and the complex interaction with TME in dCCA, which could lead to optimum therapies with improvement in patient survival.
Single-cell RNA sequencing (scRNA-seq) analysis represents as a powerful tool for illustrating cellular diversity and intercellular communication at single-cell resolution. 8 It has been applied to multiple tumor studies to help dissect cellular components, identify subsets of given cell type, and explore cross-talks between tumor cells and stroma cells in the microenvironment. [9][10][11] In such a way, it has strongly improved our knowledge of tumor pathogenesis and facilitated the screening of potential tumor biomarkers and promising therapy targets.
In the current study, we applied a droplet-based scRNA sequencing platform (10x Genomics) to profile single-cell transcriptomics of 4 human dCCA samples and 3 adjacent biliary tract tissue samples from 4 patients with dCCA. A single-cell transcriptome atlas was constructed, and a high level of inter and intra-tumor heterogeneity in dCCA samples was identified. Moreover, we explored the genomic alterations of malignant cells, identifying the underlying malignancy-driven mechanisms and confirming the scRNAseq data. By comparing with bulk transcriptomic datasets from extrahepatic cholangiocarcinoma (eCCA), including perihilar cholangiocarcinoma and dCCA samples, we demonstrated big differences in the findings between the 2 technologies and highlighted the revolutionary improvements made in biology research by single-cell approaches. Moreover, we compared our single-cell RNA data of dCCA with that of iCCA, confirming the concept that iCCA and dCCA were 2 different molecular entities.

Single-cell Transcriptomic Analysis Revealed the Spectrum of Cell Populations in Human dCCAs
To explore the diverse cellular components and tumor heterogeneity in human dCCA tissues, we applied 10x Genomics scRNA-seq and generated single-cell transcriptomic profiles of 4 treatment-naïve dCCA samples and 3 matched adjacent normal biliary duct tissues from 4 patients with dCCA ( Figure 1, A [49,717 cells]). All patients underwent pancreaticoduodenectomy. For ethics reasons, we could not get any healthy or normal biliary duct tissue from other conditions as control; instead, we used matched adjacent normal biliary duct tissues in the current study. The normal biliary duct tissues were picked as far as the upper surgical margin. The pathological diagnosis was confirmed by clinical pathologists as well as the confirmation of the normal biliary duct. The detailed clinical characteristics of those patients are listed in Table 1. After stringent filtering, 30,860 cells were retained for further analysis using methods implemented in the Seurat software suite. 12 Batch effects as well as variations in gender, age, and tumor stage among different patients were eliminated using Harmony 13 tool to confirm that cells from multiple samples were mixed uniformly. After gene expression normalization, principal component analysis (PCA) and uniform manifold approximation and projection for dimension reduction (UMAP) were applied respectively for dimensionality reduction and clustering. All high-quality single cells were clustered and annotated into 11 distinct cell types with known canonical marker genes, including T cells (10, (Figure 1, B and C). Thus, we identified a broad range of cell types in human dCCA samples. The top differentially expressed genes (DEGs) for each cell type are shown in Figure 1, D and listed in Supplementary  Table 1, confirming the precise annotation. The proportion of each cell type fluctuated greatly among samples (Figure 1, E and F), and we observed perturbations of all cell types between normal and malignant samples (Figure 2, A). We quantified shifts in abundance of cell types in response to dCCA malignancy in our study applying miloR tool, 14,15 identifying 2446 neighborhoods spanning the KNN graph (k ¼ 30), of which 77 showed evidence of differential abundance (false discovery rate [FDR] ¼ 25%) (Figure 2, B) between normal (N) and malignant (M) conditions. Moreover, we compared differential abundance results with all discrete cell clusters identified previously, recovering differentially abundant neighborhoods in all clusters except the mast cell subset (Figure 2, C).

Subsets of Malignant Cells Demonstrated Interand Intra-tumor Heterogeneity in Human dCCAs
We extracted and investigated all epithelial cells further and identified 6 main subclusters (Figure 3, A). As almost all cells in cluster N came from adjacent non-carcinoma tissues, we defined cluster N as normal epithelial cell subset and used it as reference for copy number variation (CNV) analysis. All the other 5 subsets (M1-M5) were defined as malignant, showing variable degree of CNV scores (Figure 3, B and C). Subcluster M1 was the most predominant malignant subset, whereas M5 was the minority malignant subcluster (Figure 3, D). The 5 malignant subtypes exhibited a great degree of inter and intra-tumor heterogeneity. Each tumor sample contained at least 3 different malignant cell subsets, whereas cells in each malignant subset originated from at least 2 tumor samples (Figure 3, A). The top DEGs of each malignant subtypes are shown in Figure 3, E and F. Subcluster M1 expressed a high level of epithelialmesenchymal transition marker mediator subunit 1 (MED1) and growth factor receptor bound protein 7 GRB7. 16,17 Subcluster M2 malignant cells were characterized by high expression levels of ANKRD3BC and MUC6, both of which were frequently mutated in cancer, and MUC6 was linked to strong immune response. The top DEGs of M3 included FYN and PTPRC. M4 expressed a high level of S100A2 and cell differentiation-associated gene KRT81. S100 protein family members were commonly dysregulated in various tumors, and a high level of S100 proteins was linked with advanced tumor stage as well as worse prognosis. 18,19 The last malignant subtype, M5, exhibited high expression levels of FAT3 and PLCG2. Both genes have been reported to show recurrent mutations in cancer and are associated with a poor prognosis. 20,21 Next, we compared the transcriptomic data between all malignant cells and normal epithelial cells, identifying 212 up-regulated genes in malignant and 70 relatively high-expressed genes in normal tissue (Figure 3, G). Kyoto Encyclopedia of Genes and Genomes analysis demonstrated that those highly expressed genes in malignancy were enriched in, for instance, cell junction, ErbB and Notch signaling pathways (Figure 3, H). To check whether the epithelia in malignant samples exhibit any abnormalities in comparison to normal, we investigated the transcriptomic landscapes of the 2 types of epithelia. In total, we identified 44 aberrantly upregulated genes in epithelia of malignant samples, whereas 59 genes were relatively over-expressed in the epithelia of normal samples; the majority of both were ribosome genes. After investigation of the enriched signaling pathways, we found that the up-regulated genes in the epithelia of malignant samples were enriched in such ways as antigen processing and presentation, protein process, and interleukin-17 signaling pathways. 22 This indicated that the epithelia in malignant samples was activated in the TME and involved in the immune system activation processes.
The top DEGs of each subcluster were confirmed with immunohistochemistry (IHC) and quantitative polymerase chain reaction (qPCR) approaches, and the morphological overview of each sample was shown as well ( Figure 4). The validated gene expression level was consistent with the single-cell transcriptome data, especially at the protein level (for instance, GRB7 was demonstrated to be highly expressed in sample '210115' [ Figure 4, A and B] at both the protein and mRNA levels, detected by IHC and qPCR, respectively), which was the main sample origin of M1 ( Figure 3 Moreover, we employed the single-cell regulatory network inference and clustering (SCENIC) method to explore all malignant epithelial cell subsets and identify the underlying transcription factors (TFs) in the different signatures ( Figure 5, A). We found common underlying TFs for all malignant subsets, such as STAT3 and Rel ( Figure 5, B); we also identified SMARCC2, EGR1, IKZF1, STAT1, and POU2F3 as the representative underlying TFs in M1 to M5, respectively ( Figure 5, A and B). The expression of those TFs with their targets showed a consistent distribution manner ( Figure 5, B). All of those TFs have been shown to play vital roles in the tumorigenesis and tumor progression processes. [23][24][25][26][27] Interestingly, the top 2 DEG of M1-GEB7, was a strong target of SMARCC2; one of the targets of EGR1 was MUC6, which was the most highly expressed gene in M2; FYN, which was the top 1 DEG of M3, was among the targets of all representative TFs of M3 -IKZF1, RUNX3, CREM, and STAT4. Similarly, FAT3 and PLCG2, both of which were the marker genes of M5, were targets of POU2F3 and SPIB, respectively.
Pseudo-time trajectory plot of the global transcriptomes for all epithelial cells showed that normal epithelial cells and each malignant cell subgroup formed a continuum, but malignant subgroups separated with each other, harboring distinct expression features, confirmed the heterogeneity of malignant subclones in human dCCAs ( Figure 5, C). We also applied R package Slingshot 28 to uncover the development trajectory of all epithelial cells and mapped the identified paths to UMAP projection for visualization. It demonstrated that the normal epithelial cell group formed as a root, giving 2 main trajectory branches separating M1 with M2, M3, and M5 ( Figure 5, D). Gene set variation analysis was applied, indicating that all M subsets shared common activated signatures, such as PI3K/AKT/mTOR, mTORC1, p53, hypoxia, and the activation of cell cycle (G2M checkpoint, E2F targets) signaling pathways. M1 and M2 sub-groups shared the most common pathways ( Figure 5, E).

Malignant Cells Demonstrated Opposite Alteration Status of Chromosome 17
Interestingly, when we investigated the CNV data in detail, it was illustrated that all 5 malignant subsets could be classified roughly into 2 groups, the low-CNV group (M1) and the high-CNV group (M2-M5) ( Figure 3, B), which was also separated into 2 branches on the pseudo-time trajectory plots ( Figure 5, C and D). The low-and high-CNV M groups were characterized by either amplification or deletion of chr17q12 -chr17q21.2, respectively ( Figure 3, C). This region involved multiple remarkable tumor-related genes (for instance RPL19, ERBB2, MIEN1, GRB7, KRT17, et al). All genes were highly expressed in M1, whereas they showed a relatively low expression level in M2 to M5 compared with the N group ( Figure 5, F). The expression of selected genes (RPL19, MIEN1, GRB7, and KRT17) along the pseudo-time was defined as well, which showed a distinguishable expression distribution among subgroups ( Figure 5, G). The Kaplan-Meier survival curve of most gene expression within this region using the median group cutoff showed the close relationship with patient overall survival probability in patients with CCAs from The Cancer Genome Atlas data ( Figure 5, H). IHC staining and qPCR confirmed the expression alterations of selected genes (KRT17 and GRB7) within this region for different subsets ( Figure 4). Both KRT17 and GRB7 were highly expressed in sample '210115M', which contained mainly M1 subgroup with genomic amplification of chr17q12.

Cytotoxic CD8þ T Cells and Immunosuppressive Tumor-infiltrating Tregs Were Enriched in Human dCCA Tumors
Tumor-infiltrating immune cells are highly heterogeneous and have been shown to play important roles in immunotherapy. In the current study, in the non-carcinoma biliary tract tissues, only naïve CD4þ and naïve CD8þ T cells were detected ( Figure 6, A), whereas in cancer tissues, 4 distinct T cell subclusters were identified, as the emergence of cytotoxic CD8þ T cells and FOXP3þ Treg cells ( Figure 6, B). The percentage of T cells in each cluster was shown in Figure 6, C, indicating naïve T cells were predominant in both tumor and non-tumor tissues. The cytotoxic CD8þ T cells were characterized by a high expression level of cytotoxic markers such as GZMB and perforin (PRF1), as well as a certain number of exhaustion markers, such as lymphocyte-activation gene 3 protein (LAG3), T cell immunoreceptor with Ig and ITIM domains (TIGIT), and T cell immunoglobulin mucin receptor 3 (HAVCR2), suggesting those cytotoxic CD8þ T cells became exhausted. The FOXP3þ Treg cells showed prominent expression levels of immunosuppression markers such as TIGIT, cytotoxic T lymphocyte antigen 4 (CTLA4), and TNFR-related protein (TNFRSF18). The effector T cells expressed a moderate level of programmed cell death-1 (PD-1). The NK cell clusters from either non-cancerous tissues or cancer tissues exhibited no significant individual features, and did not show any signs of activation, meaning the cytotoxic CD8þ T cells were the main effectors in dCCAs ( Figure 6, D).
Next, we investigated the intercellular communication landscapes between T cells and epithelial cells in either normal tissue or tumor tissue using iTALK, 29 which indicated that ERBB2 receptor was enhanced in tumor tissues during intercellular communications between T cell and epithelial cells, suggesting that blocking the ERBB2 signaling may affect the proliferation effects in malignant cells ( Figure 6, E).

Single-cell Data Compared With Bulk Expression Profiles
A recent published study analyzed the whole-genome expression profiles for 189 eCCA cases from an international multicenter cohort and classified all cases into 4 transcriptome-based molecular classes: Metabolic, Proliferation, Mesenchymal, and Immune, with each class showing distinct expression characteristics. The Metabolic class was dominated by gene expression data suggestive of deregulated metabolism of bile acids, fatty acids, and xenobiotics, showing a hepatocyte-like phenotype; the Proliferation class overexpressed MYC targets and featured activation of cell cycle signaling (E2F, mitotic spindle, and G2M checkpoint) and DNA repair pathways; the Mesenchymal class was enriched by genomic signals of epithelial-mesenchymal transition; and the Immune class was defined by upregulation of adaptive immune response genes. Moreover, a 174- gene classifier (genes are listed in Supplementary Table 2) was designed, composed of a maximum of 50 genes defining each class for externally validating the molecular classification of eCCA. 30 Although this classification system was acquired from bulk tumors, the expression programs of individual cellular components should enable us to extract additional insights. We would define whether the molecular subtypes from bulk data could reflect differences in malignant programs and TME composition in single-cell data. First, we tried to overlap our single epithelial cell subsets with the molecular subtypes using the 174-gene classifier. Strikingly, the findings showed all malignant epithelial subsets spread mainly in the Proliferation group, without any big variations among subsets (Figure 7, A). It was consistent with gene set variation analysis findings, which showed all malignant subsets shared features of activation of G2M checkpoint and E2F targets pathways. However, when we expanded our analysis to include all TME components and classified again, the heat map showed that immune cells (T and B cells) fell in the subtype of Immune; those mesenchymal cells, like endothelial cell, fibroblasts, nerve cells, and tissue stem cells, were more prone to be classified as Mesenchymal; almost all cells except the nerve and neutrophil cells showed features of Proliferation, and the epithelial cells slightly exhibited features of Metabolic compared with other groups (Figure 7, B). Those findings raised the possibility that the Mesenchymal subtype in bulk data reflects high stromal representation and the Immune subtype is a reflection of immune cells infiltrated, rather than a distinct malignant cell program. A new set of genes for classifying subtypes should be defined in the case of the isolated epithelial malignant cell group in single-cell data.

Single-cell Data Comparison Between dCCA and iCCA
CCA subtypes differ not only in their location but also in their etiopathogenesis and molecular aberrations, along with the emerging evidence pointing towards a different proposed cell of origin, 31 as iCCA is mostly derived from trans-differentiation of adult hepatocytes or their progenitor cells, whereas eCCA is derived from ductal cells, suggesting that eCCA and iCCA are distinct molecular entities. To infer the molecular differences at the single-cell level, we compared our single-cell data of dCCA with a previous published work conducted on iCCA with scRNA-seq, with data downloaded from GSE138709. 11 We extracted all malignant cells from the 2 datasets (11,186 cells) and eliminated the batch effects with the Harmony tool. In total, 12 subclusters were identified. Cells from iCCA or dCCA separated with no remarkable overlaps, which confirmed the different expression alteration landscapes between iCCA and dCCA (Figure 7, C). Interestingly, iCCA showed more significant interpatient heterogeneity than dCCA, with malignant cells from 4 patients with iCCA clustered independently, but it needed further study on larger cohort of patients for a validated conclusion. The DEGs between the 2 groups were shown in Figure 7, D, among which serine protease inhibitor Kazal type 1 (SPINK1) and phosphoprotein 1 (SPP1) were overexpressed in iCCAs, whereas trefoil factors (TFFs) (TFF1/TFF3) were overexpressed in dCCAs. SPINK1 has been detected in multiple types of cancers including bladder, renal, prostate, and liver cancers. 32 High levels of SPINK1 presented as prognostic and diagnostic biomarkers in hepatocellular carcinoma, promoting cell proliferation and metastasis. 33 Increased expression of SPP1 promoted invasion and metastasis in various malignant tumors. 34 TFFs function normally as secretory peptides to protect the gastrointestinal tract against mucosal damage. 35 In pathology, TFFs played pivotal roles in oncogenic transformation, growth, and metastasis of tumors. 36 In total, 45 genes were detected to be significantly upregulated in iCCA, whereas 17 were overexpressed in dCCA. Gene set enrichment analysis demonstrated that the 2 entities enriched in different activated signaling pathways. By comparison, iCCA was activated in DNA repair, G2M checkpoint, E2F, complement, and fatty acid metabolism pathways, whereas dCCA was enriched in NOTCH and UV response signaling pathways (Figure 7, E).

Discussion
dCCA is an aggressive malignancy with poor prognosis and outcomes. As subtype of CCA, dCCA differs remarkably with iCCA but often resembles adenocarcinoma of the pancreatic head and represents a distinct molecular entity. 37 In the past decade, significant efforts have been conducted to elucidate the molecular pathogenesis of CCA, but there is still limited understanding of the molecular landscape of dCCA, and no targeted therapy with clinical efficacy has been approved. Understanding the tumorigenesis and underlying molecular basis of dCCA is an unmet need. A comprehensive exploration of the transcriptomic profiles at the single-cell level can improve knowledge of dCCA pathogenesis and help in development of optimal therapy strategies. In the current study, we applied scRNA- seq to comprehensively delineate the transcriptomic landscape of human dCCA and elucidated the heterogeneity as well as the complex microenvironment in dCCA.
With the scRNA-seq approach, we annotated diverse distinct cell types in dCCAs, with the majority being T cells, which accounts for more than 30% of all cells; following is epithelial cells, including both normal and malignant epithelial cells. The percentage of each cell type in individual sample varies greatly, showing tremendous intertumor heterogeneity. ScRNA-seq analysis has been used to explore constituent malignant cell types in multiple cancer types, including gastric cancer, 38 renal cell cancer, 39 and others. In our study, scRNA-seq helped to define 5 different malignant subtypes, with each characterized by specifically DEGs, underlying TFs, cell trajectory, and copy number alterations. The tumor specimen of each patient contained at least 3 different malignant cell subsets, exhibiting a highly intratumor heterogeneity of tumor clones. Copy number differences are common aberrations. We identified that all 5 malignant subsets could be classified roughly into the low-CNV group and the high-CNV group, characterized by either amplification or deletion of chr17q12 -chr17q21.2, respectively. In a genomic profiling study with biliary tract cancers which included 101 dCCA samples, researchers identified that a 350-kb region at 17q12 was amplified in 32.6% samples, containing known oncogene TBC1D3 and cytokine genes CCL3L3 and CCL4L2. Event of amplification of 17q12 showed disease-free survival hazard ratio of 1.36 and overall survival hazard ratio of 1.46. 7 In addition, 17q copy number gain has been illustrated to be a unique event prevalent in tumor cells of stomach origin and is only present in tumor cells from the short-term survivors. 40 A star gene among the upregulated genes within this region was GRB2, with a number of compounds being screened as active. 41 The clinical significance and the underlying genomics-driven powers for cellular diversity of the copy number alteration status of this region requires further validation strategy in the future with a large cohort of cases.
Cancer is a complex disease involving interactions between the tumor and the immune system. 42 Studies showed Th1 cell and cytotoxic immune infiltration in human tumors was associated with a favorable clinical outcome, whereas a low density of T cells was associated with a poor prognosis. 43 In the case of dCCA, the tumor immune microenvironment is less complicated than other tumor types, such as head and neck cancer, 44 gastric cancer, 38 and others, as we only defined cytotoxic CD8þ T cells as effector T cells as well as FOXP3þ Treg cells as immune tolerance cells. On the other hand, it has been known for a long time that more-differentiated effector T cells were less effective for in vivo antitumor properties compared with naïve and early effector T cells, and lessdifferentiated T cells are more therapeutically effective upon adoptive transfer. 45 In our study, we found that the cytotoxic CD8þ T cells expressed a high level of exhaustion markers, like LAG3, TIGIT, and HAVCR2, suggesting those cytotoxic CD8þ T cells became exhausted. But the high level of naïve T cells in the microenvironment of dCCA may show promise as a potential immune therapy target.
By comparing our single-cell data with bulk transcriptome data of eCCA, we found that the malignant cells mainly fell into the Proliferation group, featured by the activation of cell cycle (E2F, mitotic spindle, and G2M checkpoint signaling), mTOR, and ERBB2. Those data suggested that anti-proliferative agents such as casein kinase II inhibitors 46 and CDK4/6 inhibitors may imply potential therapeutic property for dCCA. Subtypes of CCA arise through different extrinsic and intrinsic carcinogenic processes. 47,48 Molecular landscapes differ significantly depending on the anatomic locations of CCA subtype (for example, FGFR2 fusions are almost exclusively detected in iCCAs, whereas PRKCA-PRKCB fusions are found in eCCAs). 49 Our observations confirmed iCCA and dCCA as 2 different molecular entities, by showing malignant cells from the 2 entities formed different clusters and expressed different DEGs, which enriched in distinguishable activated signaling pathways. Those findings exhibit importance not only for pathogenesis mechanism understanding but also for clinical decision-making purposes.
Taken together, our single-cell dataset provides a comprehensive transcriptomic landscape of human dCCA, revealing a high level of inter-and intra-tumor heterogeneity and unraveling key biological traits with potential clinical implications for dCCA. Our study supports the concept that the molecular scenario of dCCA is intrinsically different than iCCA, pointing out unique precision therapeutic approaches that can be implemented in clinical situations.

Human dCCA Samples
Human dCCA samples and paired adjacent normal biliary duct tissues were collected from the Department of Hepatobiliary Surgery of Shandong Provincial Hospital (Jinan, China). The enrolled patients with dCCA were newly diagnosed and treatment-naïve before undergoing surgical resection. None of the patients had autoimmune disorders or history of prior cancers. In total, 4 dCCA samples and 3 matched adjacent biliary duct tissues from 4 patients with dCCA were used in the study for single-cell transcriptomics analysis. The study was approved by the local ethics committee, and written informed consent was obtained from all patients. All authors had access to the study data and had reviewed and approved the final manuscript.

Fresh Tissue Preparation and Single-cell Isolation
All the freshly resected surgical specimens were immediately washed with phosphate-buffered saline (PBS) and divided into 2 equal parts. One part was used for single-cell isolation and subsequent scRNA-seq library preparation, whereas the other part was stored at À80 C for pathology examination and other validation experiments. Tissue digestion was incubated in a 15-mL tube containing 10 mL  pre-warmed RPMI 1640 (ThermoFisher Scientific), 2 mg/ mL dispase (Roche), 1 mg/mL type IV collagenase (Sigma), and 10 U/mL DNase I (Roche) for 60 minutes at 37 C, then deactivated with 10% fetal bovine serum. Cell suspensions were filtered using a 70-mm filter and then centrifuged at 500 rpm for 6 minutes at 4 C to pellet dead cells and red blood cells. The cells were washed twice and suspended in PBS with 0.5% bovine serum albumin (Sigma). Then, fluorescence-activated cell sorting system was used to load cell for detection of cell viability and cell concentration. Samples could be processed further with cell viability higher than 70%. We diluted cell concentration to 300 to 600 cell/ mL for library preparation.

Library Preparation and Sequencing
Viable cells were loaded into a well of a microfluidic chip to generate cDNA library using 10x Genomics Chromium Single Cell Gene Expression Solution platform (10x Genomics, Pleasanton, CA). Single-cell transcriptomic amplification and library preparation were performed by CapitalBio Technology Corporation (Beijing, China) using single-cell 3' v3 (10x Genomics) according to the manufacturer's instructions. Libraries were sequenced on Illumina Nova-Seq6000 system (Illumina, Inc, San Diego, CA).

Single-cell Data Processing and Cell Subsets Annotation
Raw data was processed with CellRanger (10x Genomics) and Seurat R package (version 4.0.5). 50 Cell-barcode and unique molecular identifier (UMI) were extracted first, then RNA sequences were aligned to the reference genome (GRCh38), and reads mapped confidently to genome were used for subsequent analysis. Low-quality cells were removed according to the following criteria: cells had expressed gene counts fewer than 200 or more than 5000, or over 15% of UMIs derived from the mitochondrial genome. Additionally, all genes that were not detected in at least 3 single cells were discarded. All remaining cells were considered as high-quality cells, of which the gene expression matrices were normalized to the total cellular UMI counts. High variable genes were calculated using Seurat FindVariableGenes function with default parameters, and the top 2000 most variable genes were selected as representative for all genes for following PCA analysis and dimension reduction. Batch effects as well as variations in gender, age, and tumor stage among different samples were eliminated using the Harmony tool. Based on the elbow plot in which principal components were plotted as a function of the variability they accounted for and the heat maps of leading genes in each principal component, the top 15 significant harmonys were selected manually to perform dimension reduction; clusters were identified using FindClusters function (dims.use ¼ 1:15, resolution ¼ 0.5). The UMAP analyses were used for cluster visualization. 51 Cell subsets (Seurat clusters) were annotated to known biological cell types using canonical marker genes.

Differential Abundance Analysis With miloR
We applied miloR package 14 to dissect differential abundance for our scRNAseq data. Briefly, milo uses a KNN graph computed based on similarities in gene expression space as a representation of the phenotypic manifold on which cells lie (k ¼ 30, d ¼ 15). A representative subset of neighborhoods was defined that span the whole KNN graph. For each neighborhood, we counted the number of cells from each sample and tested differential abundance in neighborhoods, while setting spatial FDR as 25%.

Distinguishing Malignant and Nonmalignant Epithelial Cells
All epithelial cells were extracted for further analysis. Subclusters of epithelial cells were identified using FindClusters function after PCA analysis and dimension reduction as mentioned above. Batch effects among different samples were eliminated with the Harmony tool. Malignant epithelial cells were determined based on inferred CNVs, setting the subcluster of normal epithelial cells originated mainly from noncancerous tissue as reference. Initial CNVs for each region were estimated by inferCNV R package. 52 The CNV score of each cell was calculated as quadratic sum of CNV region.

SCENIC Analysis
After subsets of epithelial cells were defined, we employed the SCENIC package 53 (version 1.2.4) to analyze the enriched transcriptome factors for each subtype. SCENIC reconstructed regulons and assessed the activity of these discovered regulons in individual cells. Specific regulons (ie, transcription factors and their target genes) for each epithelial subset were identified.

Pseudo-time Analysis
Single epithelial cell trajectory analysis was performed using Monocle R package 54 and Slingshot R package. For Monocle, first, a Cell DataSet matrix was created for single epithelial cells using the default parameters. Next, we used the marker genes of each cluster to define the progression of cell transition. Then, we entailed dimensionality reduction and trajectory construction with the ordering genes. The expression of selected genes along the pseudo-time was defined as well. For Slingshot, first, a minimum spanning tree was constructed for defining a global lineage structure. Principal curve was fitted onto the reduced dimension dataset to compute pseudo-time scores for each lineage predicting cell-level transcriptional states. UMAP projection was mapped with the identified paths to for visualization.

IHC Analysis
Frozen tissue sections (4-6 mm) were fixed in 2propanone for 10 to 20 minutes, then washed with PBS for 3 minutes 3 times. Endogenous peroxidase activity was quenched for 30 minutes in 10% hydrogen peroxide. To examine the expression pattern of candidate antibodies in dCCAs and adjacent tissues, sections were immunostained with primary antibodies overnight at 4 C. The following antibodies were used in the current project: rabbit-anti-KRT17, rabbit-anti-PLCG2 (ABclonal, Wuhan, China), rabbit-anti-KRT81 (Servicebio, Wuhan, China), mouse-anti-MUC6, rabbit-anti-GRB7 (Abcam, Cambridge, UK) and rabbit-anti-MED1 (ThermoFisher, MA). The secondary antibody used for immunostaining was biotin-conjugated anti-rabbit or anti-mouse immunoglobulin (ZSGB-Bio, Beijing, China). The signal was detected using an ABC kit (ZSGB-Bio, Beijing, China), following the protocol of the manufacturer. Hematoxlin was used for counterstaining.

Quantitative Reverse-transcription PCR
Total RNA was extracted using Trizol (ThermoFisher, Waltham, MA). EasyScript First-Strand cDNA Synthesis SuperMix was used for reverse transcription. The PCR mixture was prepared using SYBR Green qPCR SuperMix (Vazyme Biotech Co, Ltd, Nanjing, China). PCR was performed using an ABI PRISM 7500 Sequence Detection System (Foster City, CA). The primer sequences used for gene detection are listed in Table 2. All primers were designed using Primer Premier 5.0 (PREMIER Biosoft International, Palo Alto, CA). GAPDH was used as an internal expression control.

Data Transparency
All the sequencing data related to the clinical samples described in this study have been deposited in the National Center for Biotechnology Information Sequence Read Archive with the following SRA accession: SUB11007007. All other datasets used and/or analyzed during the current study are available within the manuscript and its supplementary information files.