Identifying Potential Biomarkers in Colorectal Cancer and Developing Non-invasive Diagnostic Models Using Bioinformatics Approaches

According to a report on colorectal cancer (CRC), the Global Burden of Disease study estimated that 1.8 million of the world’s population were affected by CRC in 2017, causing 896 000 global deaths (1). CRC is known as the third and second frequent cause of global cancer mortality in males and females, respectively (2,3). Therefore, CRC is one of the most frequent causes of malignancies worldwide (4,5). One of the problems associated with CRC is that most of the patients with CRC are diagnosed at the late stages of the disease when the tumor cells have metastasized to other organs. Therefore, there are no early clinical signs to be used for diagnosis before the disease enters an irretrievable stage. Notably, it was reported that CRC could be treated at early stages, and the 5-year relative survival rate was 90% (4,6,7). To date, the screening methods for CRC include colonoscopy (3), fecal immunochemical test (8), multi-target stool DNA testing (9), and methylated septin9 (mSEPT9) assay (10). The gold standard approach for the diagnosis of CRC is based on colonoscopy, which is an invasive method and has some complications (11). Although the other mentioned screening methods provide non-invasive approaches to detect CRC, the sensitivity and specificity of the techniques are not satisfactory for scientists (3,9,12). Therefore, alternative non-invasive procedures are highly required using applicable biomarker signatures and novel bioinformatics technologies for early diagnosis and enhancing CRC survival rate. Although many studies have been performed to reveal the molecular mechanisms underlying CRC, the exact biological processes (BPs), molecular functions (MFs), cellular components (CCs), enriched pathways, and the most important genes and regulators involved in CRC are not entirely revealed yet (13,14). The miRNAs are noncoding ribonucleic acids and consisted of almost 16-22 nucleotides. They bind to their Avicenna Journal of Medical Biochemistry

complementary sequences located at the target mRNAs leading to either degradation of the target mRNA or inhibition of protein production (15)(16)(17). It has been demonstrated that dysregulation of miRNAs is frequent in cancer, and the pathological origin of cancer is directly associated with changes in expression of these molecules. Although miRNAs are tissue-specific, previous studies reported that they are transferred into the circulation in several ways, such as passive leakage of apoptosis, necrosis (the environment inflammation), exosomes/microvesicles, lipoproteins, and RNA-protein complexes. Numerous studies have confirmed that the alteration in expression of circulating miRNAs is related to the origin, progression, therapeutic response, and patient survival of the disease (3,(18)(19)(20)(21)(22)(23). Accordingly, circulating miRNAs have become encouraging candidates for non-invasive molecular biomarkers of malignancy (18).
In recent years, scientists have used machine learning tools (MLTs) to resemble predictive models and to classify the groups under study (24). It has been confirmed that the results of classification performed by MLTs are more reliable in comparison with clustering algorithms such as principal component analysis (25). Support vector machine (SVM), as an MLT method, is a type of supervised learning algorithm with the salient advantages of average model statistics, including sensitivity, specificity, accuracy, and area under the curve (AUC), besides reducing the time cost in the training step (3,26,27). Moreover, SVM can successfully mine data obtained from '-omics-' technologies, including microarray gene expression, proteomics, and metabolomics approaches (28).
Systems biology is a field of science that clarifies the meaning of an extensive amount of data derived from '-omics-' technologies (29). These technologies refer to high-throughput techniques that can potentially detect a large number of molecules (e.g., gene, transcript, protein, and metabolite) in biological samples, including serum, plasma, urine, tissue, and cerebrospinal fluid (30). In the present study, we performed a systems biology approach to identify differentially expressed microRNAs (miRNAs) (DEMs) between CRC samples and healthy controls (HCs) by mining microarray dataset GSE59856 (31).
Diagnostic models were constructed by SVM classifier using 1) DEMs and 2) the top-ranked miRNAs based on their weight coefficient in the initial diagnostic model. In addition, a protein-protein interaction (PPI) network was constructed based on the DEMs-target gene using the Mentha (32), InnateDB (33), ChEMBL (34), and Reactome (35) databases. The PPI network was analyzed using the Cytoscape software. Moreover, the most important genes, enriched gene ontology (GO) terms, and pathways involved in the pathogenesis of CRC were identified. Additionally, a gene regulatory network (GRN) base on the hub genes found in the PPI network was assembled to identify the regulatory transcription factors (TFs) associated with the disease.

Microarray Data Set Acquisition
In the current study, we aimed to identify significant DEMs that can be provided as a capable diagnostic panel of biomarkers for patients with CRC. Micro-RNA expression profile of GSE59856 (31) was downloaded from the publicly available National Center for Biotechnology Information Gene Expression Omnibus database (NCBI GEO, http://www.ncbi.nlm.nih.gov/geo) (36). GSE59856 consisted of 571 serum samples of different patients, including 50 CRC and 150 HCs based on the GPL18941 platform (3D-Gene Human miRNA V20_1.0.0).

Statistical Analysis
Raw microarray data were acquired as a TXT file from the GEO database. Min-max normalization was applied using R programming environment (version 3.6.0) before statistical analysis. Orthogonal partial least squares (OPLS) was applied as an advanced supervised multivariate modeling. The microarray technology is frequently used for biomarker discovery (37). The outcome of this technique contains a large number of variables and small number of samples, which is well-known as high-throughput data. Multivariate statistical analyses (e.g., OPLS) are suitable for analyzing these types of data (38,39). This was done by using the "ropls" package from the R programming environment (version 3.6.0). The DEMs with a P value < 0.001 were considered to be statistically significant.

Support Vector Machine Modeling
The SVM algorithm was utilized by performing the e1071 R package (version 1.6-8; https://cran.r-project.org/web/ packages/e1071) for the classification of CRC and HCs. SVM is a supervised MLT for two-group classification with salient advantages, including high average classification sensitivity, specificity, and accuracy, besides lowering time cost in the training step (26,27). Two models were built using different sets of variables, which were as follows: 1) DEMs with a P < 0.001, and 2) the top three ranked miRNAs based on their weight coefficient in the primary model. The radial kernel was implemented in SVM modeling, and the models were validated using leave-oneout (LOO) cross-validation, repeating 100 times for each model. The three major statistics for a model, including sensitivity (Sn), specificity (Sp), and accuracy (ACC) were determined. The receiver operating characteristic (ROC) curves were generated using the Epi R package (version 2.19; https://cran.rproject.org/web/packages/ Epi) to evaluate the diagnostic ability of the SVM models. The major statistics for the models were calculated after performing each time LOO. Next, the average of the measurements was also determined. Furthermore, the AUC for each model was calculated after integrating the Potential Biomarkers in Colorectal Cancer results over the 100 iterations of LOO cross-validations.

PPI Network and Gene Set Enrichment Analyses
We used the MiRWalk 2.0 database (http://zmf.umm. uni-heidelberg.de/apps/zmf/mirwalk2/index.html) to identify validated DEMs-target genes (40). Subsequently, a set of proteins associated with the target genes was obtained and integrated from the Mentha (32), InnateDB (33), ChEMBL (34), and Reactome (35) databases. The non-human proteins were excluded from the dataset. Cytoscape tool (version 3.7.1; https://cytoscape.org/) was utilized to construct the PPI network. The PPI network is used to study the comprehensive activities and biological procedures in living systems (41). It is noteworthy that the interaction between two proteins could be either physical or functional (29,42). The 'molecular complex detection' (MCODE) plugin was used to determine highly dense connected zones in the PPI network. MCODE identifies condensed regions (known as clusters [modules]) in a PPI network, especially based on the neighborhood density of the nodes (43). In addition, MCODE can find the vertex of a cluster known as a seed node, based on the actual knowledge about the biological functions of the nodes (44). The advanced options for the MCODE were as follows: minimum score for clusters, 2; number of nodes involved in each cluster, ≥2; maximum depth from seed, 100; and node score cutoff, 0.2. The ClueGO plugin (v2.3.3; Laboratory of Integrative Cancer Immunology, Paris, France) and the Reactome database (https://reactome.org/) (45) were used to analyze the enriched BP category and pathways of the most critical clusters associated with CRC, respectively. ClueGO integrates the Kyoto Encyclopedia of Genes and Genomes, and BioCarta pathways to detect functionally correlated GO terms (46). In addition, the Protein ANalysis THrough Evolutionary Relationships (PANTHER) webserver (http://www.pantherdb.org/) (47) was utilized to identify the enriched MFs and CCs of the DEMs-target genes associated with CRC. The GO terms and pathways enriched with false discovery rate (FDR) < 0.05 were considered statistically significant. The Network Analyzer tool was used to calculate the topological features of all the nodes in the network. Degree, betweenness, closeness, and eccentricity of the nodes were calculated to identify the most critical genes in the network. Accordingly, the nodes that had a betweenness and degree greater than 2-fold the mean, besides having a centrality and eccentricity higher than the average of the nodes in the network were considered as hub nodes.

Gene Regulatory Network Analyses
The iRegulon plugin was utilized to identify and visualize GRNs, known as motifs, including the determined hub genes and upstream regulatory TFs. It is notable that iRegulon performs an approach known as a genome-wide ranking-and-recovery technique to identify the motifs, by which all of the genes are scored in the genome through motif detection. Next, a normalized enrichment score (NES) was calculated for all of the TFs for each set of regulons. The NES determines the significance of the detected motif and is correlated with the AUC value of the motif (48)(49)(50).

Identification of DEMs in CRC
A predictive model was built using OPLS for the normalized dataset consisted of 200 observations (HC, 150; CRC, 50) and 2478 features. The statistics of the model were as follows: R2X = 0.101, R2Y = 0.93, and Q2 = 0.873, proposing that the model was statistically robust and able to distinguish CRC from HCs. A total of 569 miRNAs were determined as statistically discriminating variables for CRC compared with the healthy individuals from the OPLS diagnostic model, including 316 down-and 253 up-regulated miRNAs. All DEMs were considered to be a potential panel of biomarkers for the diagnosis of CRC. They all had a P value < 0.001 (Table S1, Supplementary file 1). In addition, the expression heat map of the top 34 down-regulated (fold change < 0.25) and the top 6 up-regulated (fold change > 4) DEMs were presented in Figure 1.

SVM Classifiers and Statistical Validation
Two predictive SVM models for CRC classification were constructed using 1) the primary dataset consisting of 569 identified DEMs, and 2) the top three ranked miRNAs, including MIMAT0027353, MIMAT0031005, and MIMAT0019715 based on their weight coefficient with the radial SVM algorithm. The weight coefficients for the top 10 features are demonstrated in Figure 2. It should be noted that the SVM algorithm is vigorous in managing noisy data, besides being resistant against outliers, which is appropriate for classifying high-throughput datasets such as microarray gene expression (51). In the current study, the average statistics of the models over the 100 iterations of LOO cross-validations were as follows: Sn = 1, Sp = 0.99, ACC = 0.99 using the primary dataset, and Sn = 0.96, Sp = 0.99, ACC = 0.98 on the final dataset. Moreover, the models were classified as the CRC versus healthy individuals with an AUC of 1 and 0.99 using the 569 DEMs and top three ranked miRNAs, respectively ( Figure 3). The details of calculations related to each of the SVM models are presented in Table S2. Furthermore, Table 1 illustrates the parameters of the SVM models.

PPI Network and GSE Analyses
Bioinformatics analyses were performed using the miRWalk 2.0 database to predict DEMs-target interactions. Accordingly, a total of 10,745 unique genes were identified as validated target genes of the DEMs. In addition, the PPI network, which was based on the DEMstargets, and associated with CRC, was constructed. The PPI network contained 396 998 interactions between 19 191 nodes. The MCODE plugin identified 91 clusters in the graph, 30 of which had seed nodes involved in the initial dataset (DEMs-targets). Accordingly, they were assumed as considerable modules in the PPI network. The characteristics of these 30 modules are demonstrated in Table 2. It should be noted that the nodes involved in a unique cluster usually participate in a common procedure, neither a BP nor a pathway (29,52). The BP and pathway enrichment analyses were performed using the ClueGO plugin and Reactome database (https://reactome.org/) (53), respectively, for the 30 critical clusters. According to the results, a total of 887 enriched pathways and 1005 BPs associated with CRC were identified. (Tables S3 and S4, respectively). In addition, a total of 9 MFs and 14 CCs were significantly affected in CRC. The list of enriched MFs and CCs are presented in Table 3 and Table 4, respectively. The network analyzer tool was utilized to identify proteins of potential importance to the CRC in the network. The average value of the degree, betweenness centrality, closeness centrality, and eccentricity for all the nodes in the network were calculated to be 41.37, 0.000111, 0.3214, and 5.72, respectively. A total of 110 proteins had a betweenness and degree higher than 2-fold the average, besides having a closeness and eccentricity greater than the mean of the nodes in the graph; accordingly, they were considered as hub nodes in the PPI network associated

Potential Biomarkers in Colorectal Cancer
with CRC (Table S5).

Master Regulators for the Hub Genes
The Cytoscape software with the iRegulon plugin was used to predict the TFs related to the hub genes. Only TFs with an NES > 3.0 were considered to be statistically significant (54). Accordingly, there were 22 convincing TFs associated with CRC. The most enriched TF motif was CCAAT/ enhancer-binding protein delta (CEBPD) with an NES 3.96. Furthermore, a total of 45 hub genes were regulated by signal transducer and activator of transcription 1 (STAT1). The statistics of the significant master upstream regulators are presented in Table 5. In addition, the TFassociated GRNs were constructed and merged. This   network contained 113 nodes and 398 edges ( Figure 4).

Discussion
In the present study, the serum miRNA expression profiles from patients with CRC and healthy individuals were obtained from the GEO database and analyzed. According to the results, a total of 569 miRNAs were found to be statistically significant in the two groups using a multivariate OPLS analysis (P < 0.001). Furthermore, two SVM classifiers were designed using DEMs and the top three ranked miRNAs based on their weight coefficient in the primary model; the models distinguished CRC from those with healthy individuals over the 100 times repeat of LOO cross-validations with an AUC of 1 and 0.99, respectively. The established SVM classifiers demonstrated great potential for the future invention of a serum-based diagnostic test for CRC. Although the final SVM classifier we developed in this study was not as robust as the primary model, the biomarker signature for the second model contained only three miRNAs; this makes the diagnostic approach to be less time-consuming and costly. After identifying the DEMs-targets, a PPI network associated with CRC was built and analyzed using the MCODE, ClueGO, and network analyzer tool in the Cytoscape software, besides the Reactome database. Accordingly, there were 1005 BPs and 887 pathways significantly enriched in our study by Bonferroni correction for multiple comparisons (FDR < 0.05). The GO analysis also revealed that 9 MFs and 14 CCs were significantly affected in CRC by univariate Wilcoxon Rank-Sum Test with Bonferroni correction utilizing the PANTHER database (FDR < 0.05). In addition, we identified a total of 110 critical proteins, which play essential roles in the PPI network associated with CRC. Furthermore, GRN analysis was performed to verify the TFs involved in the regulation of the hub proteins in the PPI network, using the iRegulon plugin in the Cytoscape. Reactome pathway analysis demonstrated that the genes involved in the most significant clusters in the PPI network were primarily associated with the "Regulation of activated PAK-2p34 by proteasome mediated degradation", "FBXL7 down-regulates AURKA during mitotic entry and in early mitosis", and "SCF-beta-TrCP mediated degradation of Emi1" (Figure 5).
It is well-known that coordinated regulation of proliferation, cell survival, and apoptosis play an essential role in the progress and maintenance of multicellular organisms. The p21-activated protein kinases (PAKs) are implicated in cell survival and cell death (55,56). The p21-activated protein kinase 2 (PAK-2) can act as an antiand pro-apoptotic factor depending on its mechanism of activation. Accordingly, full-length activation of PAK-2 via the monomeric GTPases Cdc42 or Rac promotes cell survival. However, caspase activation of PAK-2 to the PAK-2p34 fragment contributes to the apoptotic response and therefore, under-expression of caspase-activated PAK-2p34 may result in neoplastic growth (57). Lee et al (58) demonstrated that recombinant expression of PAK-2p34 induced morphological changes characteristic of apoptotic cell death in different types of cell lines and promoted cell  Targets   CEBPD  3.96  7  FLOT1, GMCL1, APOB, UBB, ARL6IP1, MAP1LC3B, CTSB   GATA3  3.82  20  PDE4DIP, MED23, FLOT1, GNAQ, ITSN1, SP3, PSMB2, SNX6, CTSB, USP2, PDK1, WT1, SNRPE, ERRFI1, FZR1, ECSIT,  PLEC, ARL6IP1, ZNF526, JMJD6   GATA5  3.68  8  NOS3, APOB, GNA14, WT1, HLA-C, CHRM3, ERRFI1, CHRM1   CEBPB  3.6  9  ANXA1, CTSB, TJP2, GMCL1, SNRPE, PDE4DIP, PLEC, ERRFI1, FLOT1   PLAG1  3.57  death in HeLa and CHO cells. In addition, Jakobi et al (59) showed that the concentration of caspase-activated PAK-2p34 is regulated by ubiquitination and degradation by the proteasome. Aurora kinase A (AURKA) is a mitotic serine/threonine kinase that plays a role in the regulation of cell cycle progression (60). AURKA is associated with the centrosome and the spindle microtubules during mitosis; therefore, it plays an important role in different mitotic events such as the establishment of mitotic spindle, centrosome duplication, centrosome separation, maturation, chromosomal alignment, spindle assembly checkpoint, and cytokinesis (60,61). According to previous studies, overexpression of AURKA contributes to mitotic disturbance and neoplastic progression (62)(63)(64). Controversial roles have been reported for AURKA; activation and/or overexpression of AURKA were demonstrated in early-stage/low-grade, as well as noninvasive ovarian tumors, suggesting that its aberrant expression could be considered as an early event in ovarian oncogenesis (65). In addition, low-grade patients illustrated higher AURKA expression than highgrade patients in CRC (66). Moreover, overexpression of AURKA was associated with favorable prognosis in patients with metastatic CRC (67). Therefore, the exact roles of AURKA in tumorigenesis of CRC deserve further study (68).
Early mitotic inhibitor 1 (Emi1) is an inhibitor of ubiquitin ligase-anaphase promoting complex/cyclosome (APC/C) during mitotic and meiotic cell cycle (61,69), resulting in the accumulation of cyclins A and B, as a consequence, preventing re-replication during the S phase (70). It has been reported that depletion of Emi1 promotes DNA re-replication (71). Chao et al (72) determined the effects of inhibiting the expression of thymosin β 4 (Tβ 4 ) on normal intestinal epithelial cells. It is noteworthy that Tβ 4 is one of the proteins considered as a target for treatment of CRC. Chao et al (72) demonstrated that reduced expression of Tβ 4 in IEC-6 normal rat small intestinal cells diminished the growth, which is associated with DNA re-replication caused by decreased expression of Emi1, its upstream activator, and transcription factor E2F1.
The PPI network analysis revealed that a total of 110 hub genes were considerably associated with the pathogenesis of CRC. Accordingly, PDK1, SNRPE, and GNB5 demonstrated the highest degree of scores in the graph.
Pyruvate dehydrogenase kinase 1 (PDK1) is one of the most important enzymes in the glycolysis pathway contributes to a switch in metabolism from mitochondriabased glucose oxidation to cytoplasm-based glycolysis. Several studies have demonstrated that PDK1 was upregulated in various solid tumors and hematological malignancies, including ovarian cancer, head and neck cancer, glioma, melanoma, and acute myeloid leukemia (73)(74)(75)(76)(77). Quin et al (78) demonstrated that overexpression of PDK1 was significantly associated with poor prognosis in CRC patients, and PDK1 knockdown resulted in reduced liver metastasis in both nude mice and immune competent mice. Quin et al (78) concluded that PDK1 inhibition diminishes the survival of CRC cells in blood circulation through overexpression of anoikis.
Small nuclear ribonucleoprotein E is encoded by the SNRPE gene, and is involved in the core component of U small nuclear ribonucleoproteins, which are implicated in the pre-mRNA processing spliceosome. In addition, the encoded protein contributes to the 3' end processing of histone transcripts. This protein has been considered as one of the targets in many human disorders, such as autoimmune disease systemic lupus erythematosus, and previous studies have linked between mutations in this gene and hypotrichosis (79)(80)(81)(82)(83). Anchi et al (84) demonstrated that SNRPE is upregulated in high-grade prostate cancer (PC) cells compared with normal prostatic epithelial cells. This was done by using qRT-PCR. In addition, knockdown of SNRPE expression by short interfering RNA (siRNA) led to the suppression of proliferation in PC cells. Anchi et al (84) suggested that SNRPE may be considered as new target for therapeutic aims of PC. However, the exact mechanism of SNRPE in the development of CRC remains unknown and needs further study in the future.
Guanine nucleotide-binding protein subunit beta-5 (Gβ5) is encoded by the GNB5 gene (85). Gβ5 interacts with G-protein-coupled receptors leading to downregulation of central nervous system G-protein Potential Biomarkers in Colorectal Cancer signaling pathway, and therefore, participates in several BPs including neurotransmission (86). It has been reported that GNB5 knockout in mice resulted in impaired neurologic, cardiac, and retinal function (87). Park et al (88) analyzed the gene expression data of CRC patients treated with cetuximab (CTX) monotherapy. The authors demonstrated a significant correlation between GNB5 overexpression and poor prognosis in patients with CRC. In addition, GNB5 knockdown increased CTX sensitivity; therefore, the authors suggested that GNB5 may be considered as a potential target to combat CTX resistance. Moreover, Park et al (88) reported that upregulation of GNB5 leads to CTX resistance by modulating the Akt signaling pathway.
According to the results of the present study, several TFs were identified to act as a regulator of the hub genes in the PPI network, including STAT1 and CEBPD.
STAT1 was demonstrated to regulate the expression of 45 of the hub genes utilizing the iRegulon plugin in Cytoscape software. Tanaka et al (89) studied the alteration in STAT1 in 736 CRC tissue specimens, including 614 early stage and 122 advanced-stage patients. This was done using the immunohistochemistry technique and semi-quantitatively scoring. It was demonstrated that the overexpression of STAT1 in the cytoplasm of early-stage cases was significantly correlated with lower survival time in the microsatellite instability (MSI) subtype of CRCs. The authors suggested that STAT1 could be considered as a potential prognostic and diagnostic biomarker for early-stage MSI CRC. STAT1 is a master regulator for IFN-related intracellular signaling pathway and could be activated by several ligands in the cytoplasm, including interferon-alpha (IFN-α), IFN-γ, epidermal growth factor, platelet-derived growth factor, and interleukin 6. It has been reported that STAT1 plays multi-functional roles in the molecular biology of cancer. STAT1 can act as an anti-oncogenic molecule by overexpression of caspases (89)(90)(91), and down-regulation of the BCL2 family (89,92). In addition, it can act as a pro-oncogenic molecule in invasive breast carcinoma (89,93). Therefore, the exact mechanisms of STAT1 in CRC have not been fully elucidated (89) and needs consideration in future studies. However, according to the results of the study by Tanaka et al (89), a positive correlation was reported between the expression of STAT1, CD274, and PDCD1 in MSI CRCs. The authors speculated that the up-regulation of STAT1 might induce CD274 overexpression, which results in an immunosuppressive microenvironment. Thus, they suggested that STAT1 may act as a pro-oncogenic factor in MSI CRC patients.
CEBPD was the most enriched TF with an NES 3.96. This protein significantly regulated 9 of the hub proteins in the PPI network, including ANXA1, cathepsin B (CTSB), tight junction protein ZO-2 (TJP2), germ cell-less protein-like 1 (GMCL1), small nuclear ribonucleoprotein E (SNRPE), phosphodiesterase 4D-interacting protein (PDE4DIP), plectin (PLEC), ERBB receptor feedback inhibitor 1 (ERRFI1), and flotillin-1 (FLOT1). It has been reported that CEBPD is a critical tumor suppressor factor, and the down-regulation of the protein was demonstrated in different types of malignancies, including breast, liver, and cervical cancer (94,95). In addition, CEBPD elevates DNA stability in the hole genome (94,96). However, further research is required to examine the CEBPD expression in CRC.
The present study had some limitations. The miRNAs profiled in our study were generated from GPL18941 platform (3D-Gene Human miRNA V20_1.0.0), which probably only represents part of the miRNA populations. Therefore, the DEMs identified in this study may not represent all the DEMs in CRC. In addition, our results were only based on bioinformatics techniques. Thus, considerable in vitro and in vivo approaches are required to confirm these findings. Future studies should use more microarray datasets (e.g., gene microarray data sets) and compare the results with our findings. Besides, wet-lab approaches using large targeted cohorts are required to confirm these findings.

Conclusion
In conclusion, since miRNA has remarkable stability in the serum, it is considered as a possible biomarker for developing a non-invasive diagnostic approach for CRC detection. Accordingly, accurate prognostic models were developed using distinguishing miRNAs derived from the multivariate statistical analysis and their weight coefficient in the primary SVM model. In addition, PPI analysis revealed 30 clusters with considerable seed nodes, which were previously identified as DEMs-targets and were associated with CRC. The GSE analysis revealed 1,005 BPs, besides 887 pathways significantly affected in CRC. Investigating these BPs and pathways may elucidate the mechanism of CRC progression. PPI analysis identified 110 nodes considered to be the most important genes associated with CRC according to their remarkable degree, betweenness, closeness centrality, and eccentricity. The GRN analysis significantly demonstrated 22 TFs for the 110 hub genes. Accordingly, these selected hub genes and their master regulators may be considered as potential therapeutic targets for treating CRC. However, further molecular experiments are required to confirm the exact function of these identified genes in CRC. The DEMstarget genes significantly affected 9 MFs and 14 CCs in CRC patients. In addition to the results of the present study, the SVM classifier can successfully distinguish miRNA profiles in patients with CRC and healthy individuals. OPLS analysis is also a robust statistical technique, which can significantly identify DEMs derived from microarray data. The results of the present study provided insights into the molecular mechanism of CRC, which may be useful in further investigations and be used to establish novel and non-invasive diagnostic tests. However, validation is required in the future.