Role of Pre-molten Globule Structure in Protein Amyloid Fibril Formation

Molten and Pre-molten Globule Structure Proteins routinely acquire their native structure upon a folding process. A protein in the native state is usually at the lowest minimum of energy, which is also the most stable conformation of the polypeptide (1). It is suggested that most small proteins are unfolded through a two-state mechanism, namely, the folding and unfolding procedure (2, 3). On the other hand, some other proteins are unfolded via a three-state procedure. During the threestage unfolding procedure, protein goes between the native and the unfolded structures by at least one intermediate condition (4). The molten globule (MG) structure is the most important and well-known intermediate (5). There are clear reports on the physical characteristics of MG structures of several proteins. Based on some reports, the MG structure, which is also the more condensed intermediate, is as condensed as the native protein (6, 7). It has a native-like secondary structure, but no rigid tertiary structure. It is notable that the initial intermediate state is non-functional and has no rigid tertiary structure. However, it is approximately as compact as the native protein and has a very frequency secondary structure (8, 9). MG can be generated by a protein treatment using acid solutions, mild denaturants, or by removing proteinbound prosthetic groups or metal ions, as well as protein chain truncation (9). It is also suggested that some proteins unfold through a four-state unfolding process in which the native protein unfolds via two intermediates including MG and pre-MG states (10). The term “pre-molten globule” was first proposed by Jeng et al, when they discovered that the cytochrome c denatured in acidic condition cannot be described by a three-state procedure including the native, MG, and unfolded states (11). Although the pre-MG is a less compact intermediate, it is relatively compacted with a substantial secondary structure (12). Several reports indicate that the protein intermediate may play important roles in defining aberrant protein aggregation and amyloid production when having surface-exposed hydrophobic clusters (13). The pre-MG conformation normally appears under stronger denaturing conditions. It is more flexible than the MG intermediate structure and is composed of lower surface-exposed hydrophobic residues (14). Additionally, the pre-MG structure having 1-anilino-8naphthalene sulfonate emission intensity is lower than that of the MG state and its secondary structure has a peak at Avicenna Journal of Medical Biochemistry


Protein Aggregation
Proteins are of crucial importance in any living organism and any disturbance or abnormality in their function would result in various disorders. In particular, the inability of proteins to adapt to their new structures causes several disorders. Apparently, such misfolding leads to some adverse consequences such as amyloidogenesis (15,16). There are several known proteins that tend to be misfolded. In addition, several known factors may contribute to protein misfolding, including protein-protein interactions, point mutation(s), toxin exposures, abnormal posttranslational modifications, trafficking, and oxidation. All these features can occur separately or simultaneously (17,18).
The protein conformational disorders not only affect a single organ but also several tissues, among which neurodegenerative disorders and amyloidosis are the biggest groups (19). In other words, natively flexible protein conversion into fibrillated forms would result in the occurrence of misfolding diseases (16,20). Such fibrils share common properties such as core cross-β-sheet structures in which β-strands are protruded out of the long axis vertically (21). Morphologically, a fibril with 4-13 nm diameter is comprised of 2-6 and 2-5 nm unbranched protofilaments which are twisted together (22). Despite the structural and morphological similarities of amyloid proteins, the polypeptide structures are different and may consist of β-sheets, α-helix, β-helix, or even totally disordered (16). It was previously demonstrated that only the proteins that contain amyloid core motifs, are capable of fibril formation. However, recent reports have shown that almost any given protein can fibrillate as long as the required circumstances are provided (23,24). Considering the structural variety of polypeptides and the consequent fibrils similarities, proteins go through noticeable conformational changes upon fibrillation (25). For instance, the fibrillation process is not only restricted to the rearrangement of tertiary structure but also it requires partial unfolding when a globular protein is intended to convert into a fibril one (26). In other words, the ability of these proteins for fibrillating and making intermolecular connections (e.g., hydrogen or hydrophobic bonds) originates from their partially unfolded structure (27). The structures of such intermediates vary from one to another, which may result in distinctly determined fibrils (28,29). However, every single fibrillary protein may have different quantities of ordered structures. In general, insoluble proteins are the result of conformational changes in the least-ordered intermediate structure despite unfolded proteins arising from precursors with the most-ordered structure (30).
Recently, there has been an increase in the number of the soluble protein lacking 3-D structure, under experimental conditions. These proteins are termed as intrinsically unstructured and have a flexible structure similar to that of globular proteins (31). There are two major natively unfolded proteins including extended-disorder and collapsed-disorder proteins. In the first group, the structure comprises a resemblance to coil or pre-MGs while the collapsed-disorder proteins have no such structures (32)(33)(34). In the following section, several amyloidogenic proteins are explained, in which amyloid fibril formation was linked to their folding procedure. It is converted into a 'misfolded' conformer and then amyloidogenesis.

Protein Fibrillation From Pre-molten Globule Structure
Alpha-Synuclein Fibrillation Synucleinopathies are a class of neurodegenerative diseases, which is resulted from fibrillation and the subsequent deposition of α-synuclein proteins in the central nervous system (35). The amino acid sequence of α-synuclein represents particular physical properties similar to any other natively unfolded protein. It is an intrinsically disordered protein, as long as physical circumstances are provided, including neutral pH and low to moderate ionic strength (36). It has been a matter of debate that which forces and factors are required to convert a natively unfolded protein into an ordered one. Natively unfolded proteins contain low overall hydrophobicity and high net charge. Thus, the adverse physical conditions should be provided to induce a protein folding. For example, a condition in which the net charge decreases upon a pH reduction while the overall hydrophobicity enhances by a temperature increase (36). It is shown that a pH reduction from 7.5 to 3 would result in β-sheet formation, as well as the development of a new band in the region of 1626 cm -1 in Fourier-transform infrared spectra. In addition, there has been an increase in the 1-anilino-8-naphthalene sulfonate (ANS) fluorescence intensity, as well as a large blue shift of the ANS fluorescence maximum (from ~515 to ~475 nm) due to acidic pH. Accordingly, this demonstrates the conversion of natively unfolded protein into its ordered compact structure with soluble hydrophobic clusters (36). Thus, α-synuclein possesses an unfolded structure as long as the neutral pH is provided. On the other hand, the unfolded structure shows some residual or helical domains, but not a random structure. This would lead to a partial compaction (37). The partially unfolded structure of α-synuclein for conversion into a folded one, with a pre-MG state, is either a pH decrease or a temperature increase (36). There is a hydrophobic driving force that causes protein folding, resulted from the negative net charge reduction upon a pH decrease. Furthermore, the ordered structure could result from a high temperature, which increased the hydrophobicity (36,38). Therefore, the protein contains a pre-MG-like partially folded (PF) structure when providing a low pH or high temperature (36). Not only α-synuclein can produce a monomeric structure, but also it is capable of producing oligomers Pre-molten globule structure and amyloid fibrils and aggregates, as long as the temperatures are suitable (39). The oligomer formation, which is also temperaturedependent, would demand a tiny reversible increase in its ordered secondary structure. Notably, such oligomeric structures are proven to have intensive similarity with the pre-MG-like PF monomeric conformer, resulting from a low pH and high temperature (39).

Aβ Fibrillation
The senile plaque is considered as one of the major pathological factors that is observed in Alzheimer's disease (AD) brain and is composed of amyloid β peptides in the extracellular matrix. Aβs are generated from an amyloid precursor protein, which have 40 to 42 residues. Moreover, Aβ proteins indicate neurotoxic properties, leading to this amyloidogenic disease. Aβ 1-42 , with a hydrophobic N-terminal, is unfolded at the first stages of fibrillation just like Aβ 1-40 . However, both of them possess a pre-MGlike structure after a partial refolding during the fibrillation process (25,40).

Tau Fibrillation
Tau protein is a microtubule-associated protein and stabilizes microtubule polymers. Additionally, it has six distinct isoforms resulted from the alternative splicing of a single mRNA. In addition, it is a phospho-protein and is relatively phosphorylated in the physiological state but its hyperphosphorylation would be in microtubule dissociation and aggregation, resulting in neurodegeneration upon the AD process (41). During such an aggregation process, hyperphosphorylated tau proteins are converted into both twisted paired helical filaments and non-twisted straight filaments. According to reports, tau proteins are not only highly phosphorylated but also their disordered structure with a tendency for self-assembly transforms into a pre-MG-like structure and subsequently, 6 full-length tau proteins (42,43).

Amylin
Two major pathological hallmarks mediating the development of diabetes type II include amyloidogenesis and insulin resistance. Amyloid fibrils are composed of natively unfolded amylin with 37 amino acid residues. Notably, a pre-MG-like intermediate with a PF structure is observable upon the initial aggregation process of amylin fibrillation (44,45).
ABri Peptide Familial British dementia is an age-dependent disease that is accompanied by spasticity and cerebellar ataxia. There are accumulation of both paired helical filaments,(PHFs) and ABri peptides. The ABri peptides comprised of 34 residues are located in cerebral blood vessels and brain parenchyma (46). In this case, a point mutation occurs in the stop codon of BRI(a precursor of ABri protein), resulting in a longer peptide compared to the normal healthy one with random coil and β-sheet structures in acidic pH (4.9). Interestingly, the protein has a pre-MG-like structure at neutral pH. At this pH, this protein converts into amyloid fibrils (47,48).

Prion Proteins
Prion diseases are a class of neurological disorders with a pathogenicity of the spongiform brain and the excessive aggregation of prion proteins with a β-sheet structure (PrP sc ) in the central nervous system. PrP sc have an unstructured N-terminal and α-helical C-terminal regions and are localized in cell membranes. However, the PrP sc aggregation reflects the prion disease. The aggregated form is a prion protein isoform resulted from the conversion of the C-terminal α-helical structure to β-sheets after a partial unfolding or pre-MG structure. Furthermore, the last 50 residues in the disordered N-terminal region of PrP c are proved to play a pivotal role in β-sheet structure formation (49).

Polyglutamine Repeat Diseases
Recently, all familial diseases, including Huntington's, SBMA, DRPLA, SCA1, 2, 3, 6, and 7, have been known as trinucleotide repeat disorders due to the repetition of CAG codon in the pertaining genes resulting from a duplicate mutation (50). For example, the CAG codon is excessively repeated in the huntingtin gene in Huntington's, leading to polyQ regions with more than 38 repeated glutamines, and eventually, reflects fibrous deposits and neuronal death (51). Generally speaking, polyGln peptides with 5, 15, 28, and 44 residues indicate a random coil structure while those with more than 37 residues possess an ordered structure (52).

Calcitonin and Medullary Thyroid Cancer
Calcitonin is a hormone with 32 residues and is secreted by the C-cells of the thyroid gland. Its aggregation is shown to bring about either sporadic or inheritable medullary thyroid cancer (53). During the fibrillation process, the α-helical and random coil structures of calcitonin are indicated to convert into the β-sheet structure. However, there is a pre-MG-like structure in the first stages of fibrillation (54,55).
Prothymosin α This acidic protein structure has almost 50% aspartic and glutamic acid composition and is transformed from a random coil to a pre-MG-like structure at low pH values. The conversion is due to the lack of aromatic and cysteine residues or hydrophobic aliphatic amino acids. However, such a conversion from a natively unfolded protein to a PF polypeptide fails to participate in the appearance of amyloidogenesis diseases (56). Furthermore, prothymosin α has elongated fibrils, with ribbon-like structure at low (<3) pH values (57,58).
Apolipoprotein C-II ApoC-II is a protein that is secreted in plasma and functions as a lipoprotein lipase co-factor. In addition, it has an α-helical structure in the presence of sodium dodecyl sulfate as a lipid mimetic polymer. However, it has no ordered structure in the absence of lipid molecules. ApoC-II turns into the pre-MG-like structure, as long as phospholipids are added in sub-micellar concentrations in the absence of any sodium dodecyl sulfates. On the other hand, critical micelle concentration treatment with phospholipids leads to the formation of α-helical structure, prohibiting fibril formation (59,60).
Core Histones Core histones have a MG-like structure under high acidic conditions (pH=2). However, they possess 4 different structures under various physical conditions, all of which have an aggregation tendency. However, core histones are converted into non-fibrillar aggregates when the protein solution is saturated (61).
Apo Carbonic Anhydrase Bovine carbonic anhydrase II protein (CA, EC 4.2.1.1) contains 259 amino acids and a molecular weight of 30 000 Daltons. This protein naturally occurs monomerically and contains zinc ions in its active site. Further, the CA is a monomeric protein and consists of a chain with no post-translational modifications. Carbonic anhydrase catalyzes the reaction of CO 2 hydration to bicarbonate and is the highest amount of protein in the red blood cell after hemoglobin. There are many similarities between the human and bovine species of this enzyme. In its structure, the zinc atom binds to three histidines ( Figure 1).
The HCA II and BCA II proteins are susceptible to accumulation in the MG, which is highly specific and occurs in 4-7 beta strands containing the hydrophobic core. Apo carbonic anhydrase shows a pre-MG structure with the capacity for fibrillation over strong acidic pH values (~2.4). However, an MG form becomes abundant when the pH rises to ~3.6, resulting in amorphous aggregates. It is notable that numerous proteins undergo amyloid fibrillation probably through PMG intermediate structure formation (62). Furthermore, apocarbonic anhydrase moves through amyloid assemblies by a mechanism without nucleation. After 12 hours of incubation, the amount of ThT emission intensifies, and then until day 4, an increase in ThT emission increases by a slower process. Moreover, the increase in ThT emission continues more rapidly after day 4, and eventually, the emission remains constant on day 7 and even decreases in the following days (63).
Fink demonstrated that the closer structure to the native protein tends to be amorphous aggregate. According to Fink's theory, carbonic anhydrase in the molten state is more like the normal state of the protein, which further moves toward amorphous aggregates (Figure 2). Many reports emphasize the structure of the pre-molten as the structure that leads to further amyloid aggregation. In other words, the structure of intermediates leading to aggregation varies in different proteins or in one protein under different conditions (64).
Carboxymethylated α-Lactalbumin α-Lactalbumin (α-LA), as a part of lactose synthetase, possesses an MG-like structure at acidic pH. This Ca 2+binding protein also displays the same structure in high temperatures. Furthermore, protein stability depends on the number of disulfide bridges in the peptide sequence (65). It is shown that 1SS-α-LA is a prevailing form of this acidic protein. Additionally, some secondary structures remain while no tertiary structures can be found upon a reduction in disulfide bridges. Subsequently, secondary structures take part in the fibril formation after conversion into a pre-MG-like state (66).

Apo Cytochrome C552
Apo cytochrome c is an electron-transfer protein, which goes through a fibrillation process upon a particular mutation. The mutation replaces 2 cysteine residues with alanine, breaking the covalent bond between the heam prosthetic group (67). The holo-protein indicates a helical structure although the consequent cysteine residues devoid of any prosthetic groups possess a pre-MG-like structure, leading to the amyloid formation (68).
SH3 Domain at Acidic pH SH3 is a 60-to-85 residue domain which plays a pivotal role in regulating protein-protein interactions at neutral pH and has a β-barrel structure comprised of 5 or 6 β-strands. However, such a folding structure disappears at acidic pH and is transformed into a typical state of unfolded proteins, namely, monomeric A form. The monomeric form has a tendency for turning into a pre-MG-like structure. Such a folded protein is then accumulated as amyloid fibrils (69).

N-Terminal Domain of Escherichia coli HypF
The N-terminal domain of E. coli HypF is populated at low pH in a conformational state. The emergence of various biochemical and biophysical techniques demonstrates that this state is almost unfolded and contains more hydrophobic clusters, as well as secondary structures. It is also more compact than a random coil-like structure and less organized than an MG state. A stronger ionic solution would induce the amyloid-like protofibril formation of such a pre-MG state. These findings show that a pre-MG state can be one of the precursor states of the amyloid formation. Although several other conformational states might exist, the pre-MG state is more likely the amyloidogenesis initiator (70). Amyloidogenesis of Wild Type Hen Egg-White Lysozyme Under oxidative stress, the level of enzyme glyoxalase diminishes, which increases the serum concentration of the glyoxal. There is a three-step transition in the interaction between glyoxal and Hen egg-white lysozyme including pre-molten and MG states formed on days 7 and 15 of incubation, respectively. The structures are characterized by an increase in the ANS fluorescence intensity compared to the native state. The longer incubation of MG states would result in the aggregates, which have an increase in ThT fluorescence intensity, as well as a red shift in Congo red absorbance, the loss of signals at 284, 290, and 294 nm in the near-UV CD spectra, and a negative ellipticity peak at 217 nm in the far-UV CD (71).
Malaria Surface Protein 2 Merozoite surface protein 2 (MSP2) is one of the most frequent proteins of the merozoite stage of Plasmodium falciparum. MSP2, as a glycosylphosphatidylinositolanchored protein, is composed of conserved N-and C-terminal domains with a variable central region. It is demonstrated that this protein is intrinsically unstructured and has a high tendency for fibrillation, in which the N-terminal domain plays a crucial role, confirmed by CD spectroscopy. In addition, MSP2 has a large effective hydrodynamic radius which is consistent with an intrinsic pre-MG state under physiological conditions, confirmed by pulsed-field gradient NMR diffusion measurements. This was further confirmed by sedimentation velocity studies (72).

Conalbumin
The conalbumin structure can be changed by altering the dielectric constant, likely upon fluoroalcohols, resulting in protein aggregation treatment. There is a maximum protein aggregation in 15% (v/v) trifluoroethanol (TFE) and 3% (v/v) hexafluoroisopropanol (HFIP). The aggregation induced by TFE and HFIP has amyloidlike properties, confirmed by ANS, ThT binding, and transmission electron microscopy. Based on the reports, the higher concentrations of TFE and HFIP would result in more helical structures. It is clear that the production of a partially structured intermediate state precedes the aggregation process, showed by Far-UV CD, intrinsic fluorescence, along with ANS and ThT fluorescence. It is shown that CA goes through different conformational states such as the pre-MG at a pH of 4.0 and the MG at a pH of 3% upon an acid unfolding process, indicating a strong protein aggregation (73).

Rifampicin Induces Ovalbumin Aggregation
Ovalbumin is a protein that is secreted into a hen oviduct. It is reported that rifampicin (6 µm) induces the aggregated state while the pre-molten and MG states could be created at 3 µm and 5 µm concentrations of rifampicin, respectively. Native ovalbumin consists of a β-sheet-rich structure with a negative ellipticity peak at 217 nm in CD spectra when incubated with 6 µm rifampicin. The aggregation is further confirmed by a red shift of 50 nm in the Congo red binding assay, an increase in the absorbance at 450 nm and a 10-fold increase in the ThT fluorescence intensity compared to the native state (74).
Human Profilin-1 There are numerous missense mutations in the profilin-1 gene, which are connected to the early stage of familial amyotrophic lateral sclerosis. The mutations are addressed to the production of the intracellular inclusions of mutant proteins. Furthermore, the mutations would result in the destabilization of the native protein and the fully-folded state. However, it remains uncertain how these mutations cause misfolding and self-assembly. It is reported that upon a refolding process, wild-type profilin-1 transiently goes through a PF state devoted to hydrophobic clusters that are exposed to the solvent without any secondary structure. This structure is stable at a pH of 7. Nevertheless, it becomes significantly accumulated at lower pH values. Interestingly, the mutations associated with fALS do not modify the profilin-1 refolding mechanism, but stabilize the PF state. The mutation-induced stabilization of a PF state may cause profilin-1 aggregation, reflecting the pathogenicity of the mutations. These observations introduced the PF state as a pre-MG conformational state, which has a free energy value similar to that of the unfold state. When the secondary structures are abolished, it also has solvent-exposed collapsed hydrophobic clusters (75).

Conclusion
This review focused on the relationship between amyloid fibril formation and protein folding intermediate. Despite extensive considerations, it remains uncertain how soluble proteins are converted into amyloid fibrils. To answer this question, it is crucial to determine the early stage of amyloidogenesis. The intermediate state likely plays a role in almost every amyloidogenesis. In addition, the intermediates of the protein folding pathway have an essential role in the production of amyloid aggregates. Two important well-known intermediates in the folding pathway are MG and pre-MG. The MG structure is shown as one of the main intermediate forms in a protein folding process (76). Further, it is a PF conformation, which is characterized by the presence of a significant secondary structure arranged in an overall native-like fold. Moreover, it has a compact shape and great surface-exposed hydrophobic clusters (77). Furthermore, MG could be generated when a protein is exposed to acid solutions or mild denaturants. Moreover, it can be formed by losing a prosthetic group of proteins or metal ions, as well as the protein chain cleavage (9). It was shown that surfaceexposed hydrophobic clusters would cause the proteins to play important roles in protein aggregation and amyloid formation (13). The pre-MG conformation, which usually forms under a more intense denaturing state, contains a lower ratio of surface-exposed hydrophobic residues and is more flexible than the MG intermediate structure (14). It seems that a pre-MG structure initiates the amyloidogenesis in numerous proteins even if the protein has no intrinsic regular structure. There are limited protein structures in the pre-MG and the hydrophobic domains are gathered together. On the other hand, the pre-MG has a flexible structure, resulting in protein-protein interactions, and eventually, aggregation. Thus, assuming that all proteins go through a pre-MG state during the fibrillation process, the present study aims to design specific sensors to detect the structure for amyloidogenic disorders in the early diagnosis ( Figure 3). Furthermore, the conclusion has profound clinical implications in finding the right therapeutic target.