Identification of NFYA family genes
A total of 94 family members were identified in the four cotton species. G. hirsutum mainly includes GhNFYA1-30, with 30 members. There are 32 members (GbNFYA1-32) in G. barbadense. There are 16 in G. arboreum (GaNFYA1-16). G. raimondii has 16 GrNFYA1-16. Other species were also classified according to their chromosomal location information (Additional file 1: Table S1). The amount of NFYA in tetraploid cotton (G. hirsutum and G. barbadense) was almost twice that in diploid cotton (G. arboreum and G. raimondii). It is proved again that the allotetraploid of cotton is formed by the hybridization of subgenome A and subgenome D to double chromosomes [15]. Analyzing the physicochemical properties of the genes of the NFYA family of G. hirsutum, the isoelectric point ranges from 6.84 (GhNFYA11) to 10.22 (GhNFYA3). The minimum number of amino acids is 175 (GhNFYA3) and the maximum is 999 amino acids (GhNFYA11). Molecular weights ranged from 19.40KDa (GhNFYA3) to 109.38KDa (GhNFYA11) (Additional file 1: Table S2).
Phylogenetic analysis of NFYA family genes
To study the evolutionary relationship of NFYA in plants, the protein sequences of 181 family members identified in 11 species were aligned and a phylogenetic tree was constructed. There are 30 genes in G. hirsutum, 32 in G. barbadense, 16 in G. arboreum, 16 genes in G. raimondii, 10 in Arabidopsis thaliana, 11 in Oryza sativa, 7 in Theobroma cacao, 18 in Zea mays, 13 in Populus trichocarpa, 21 in Glycine max and 7 Vitis vinifera (Additional file 1: Fig. S1). The identified NFYA family members were renamed according to their chromosomal location. Phylogenetic trees were constructed by maximum likelihood method using MEGAX. The unrooted phylogenetic tree was further beautified using the EvolView website (Fig. 1). Through the overall analysis of the phylogenetic tree, the NFYA family members are classified, divided into three classes A, B, and C, mainly by referring to the phylogenetic trees of Arabidopsis thaliana and four Gossypium species. Class A includes the AtNFYA1, AtNFYA4, AtNFYA7 and AtNFYA9 genes in Arabidopsis thaliana and the similar NFYA genes in cotton, Populus trichocarpa, Glycine max, Zea mays and other species. NFYA genes are classified according to maizeGDB (https://www.maizegdb.org/), TAIR (https://www.arabidopsis.org/index.jsp), JGI Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html) and other online sites. Class B includes the AtNFYA3, AtNFYA5, AtNFYA6 and AtNFYA8 genes of Arabidopsis thaliana, Oryza sativa, Theobroma cacao, and four Gossypium species with similar evolutionary relationships. Class C mainly includes AtNFYA2, AtNFYA10 genes, Glycine max, Zea mays, Oryza sativa and a few NFYA genes of four Gossypium species.
As shown in Fig. 1A, the NFYA family members have two branches in class A. In each branch, the NFYA genes of the four Gossypium species have the closest evolutionary relationship. Meanwhile, the evolutionary relationship between cotton and Theobroma cacao, Populus trichocarpa, Glycine max and Vitis vinifera is also relatively similar. The NFYA genes in Arabidopsis thaliana and Glycine max are most closely related in evolution. Similar evolutionary relationships were also found in class B, and NFYA genes were more closely related in Zea mays and Oryza sativa. Combined with class C, it can be concluded that Theobroma cacao, Populus trichocarpa and Glycine max are the most closely related to cotton NFYA gene evolution. Arabidopsis thaliana is closest to Glycine max NFYA family genes. Maize and rice NFYA family genes have a relatively close evolutionary relationship. The number of NFYA family genes in G. hirsutum is more than twice that of Arabidopsis thaliana, Oryza sativa and Theobroma cacao, indicating that Gossypium has undergone significant gene amplification during its evolution [24]. The number of genes of NFYA family members in the four Gossypium species is 30 in G. hirsutum, 32 in G. barbadense, 16 in G. arboreum, and 16 family members in G. raimondii. During the evolution from diploid Gossypium to G. hirsutum, there are two genes are missing. By constructing the phylogenetic tree of NFYA family members of four cotton species, it is found that there are mainly three branches a, b and c (Fig. 1B).
Chromosome location analysis of NFYA family
To further analyze the characteristics of NFYA family genes, the chromosomes of NFYA family in four Gossypium species were mapped (Fig. 2). These genes were renamed according to their location on the chromosome. Both G. hirsutum and G. barbadense have two genes that are not located on chromosomes, and other genes of NFYA show uneven distribution on different chromosomes. The genes of G. hirsutum and G. barbadense have the same distribution on Chr03, Chr06, Chr07, Chr10, Chr11 and Chr13. One gene is distributed on Chr03 and Chr07. There are two genes distributed on Chr06 and Chr11 chromosomes. There are four genes located on Chr10 and Chr13 chromosomes. Meanwhile, the NFYA family genes are distributed in G. hirsutum with one more gene than G. barbadense GbAt-01 and GbDt-05. Upland cotton has one less gene than GbAt-02, GbDt-04, GbDt-08 and GbDt-09 on the corresponding chromosomes of sea island cotton. The results suggested that some NFYA genes of upland cotton may have been lost during evolution. Analyzing the distribution of G. arboreum chromosomes of this gene family, it is found that the distribution of the rest of the chromosomes tends to be the same on the A genome except for chromosome 2 which is more than that of the A genome of G. hirsutum. A large number of genes additions and deletions occurred in G. raimondii, and only the number of genes on chromosomes 3 and 13 is the same as that of G. hirsutum and G. barbadense D genome.
Collinearity analysis of NFYA family genes
Colinear analysis of NFYA genes in four cotton species was performed to understand the evolutionary relationship of NFYA gene family in cotton. The evolution of gene family generally goes through three processes, namely, fragment duplication, tandem duplication and whole genome duplication [31]. A joint analysis of the NFYA genes of G. arboreum, G. raimondii, G. hirsutum and G. barbadense, analyzed the gene duplication and collinearity between them. The NFYA genes of G. arboreum (Ga) and G. raimondii (Gr) were duplicated in G. hirsutum (Gh) and G. barbadense (Gb). It was shown that the two tetraploid genomes of upland cotton and sea island cotton were generated from the diploid genome in the process of genetic transformation. According to the chromosomal distance, similarity and coverage of NFYA gene family members of diploid and tetraploid Gossypium species, tandem repeats and tandem repeats were identified, so as to identify the evolutionary relationship between NFYA family genes.
Genes linked together by collinear lines represent the same gene. In Fig. 3, it can be seen that many chromosomes in the GhAt/GhDt, GbAt/GbDt subgenomes and the GaA, GrD genomes are connected by lines of the same color. That is, the GhAt/GhDt and GbAt/GbDt subgenomes have NFYA homologous genes in the GaA and GrD genomes. It shows that these genomes/subgenomes are closely related in evolution, and most NFYA genes have been preserved in the evolution of polyploidy. Genes located in the same chromosome region (e value < 1e-5) are classified as tandem repeats, while the rest of the genes from the same genome are considered to be fragment repeats. In all collinearity analysis results, they come from 4 different Gossypium species. Genome and subgenome genes are usually classified as genome-wide duplication. Homologous/like homologous gene pairs were identified for the GhAt/GhDt and GbAt/GbDt subgenomes of two tetraploid Gossypium species. After homology analysis, it was found that several gene loci were highly conserved between the At and Dt subgenomes of the two tetraploid Gossypium species. As mentioned above, G. hirsutum and G. barbadense are derived from the ancestors of diploid G. arboreum and G. raimondii [32]. No tandem duplications were found in the NFYA gene family by analysis, with 98 fragment duplications and 266 genome-wide duplications. Based on these results, it is speculated that closely related gene pairs are usually generated through genome-wide duplication and fragment duplication, especially fragment duplication is the most important factor in the evolutionary process (Fig. 3).
Ka/Ks analysis of selective pressure of NFYA family genes
The evolution of NFYA gene pairs in four cotton species was obtained by selection pressure analysis. It can be judged whether there is selective pressure acting on NFYA family genes. In the process of evolution, the duplicated gene pair may also diverge from its original function, which eventually leads to non-functionalization (loss of original function), sub-functionalization (division of original function), and new functionalization (acquisition of new function) [33]. To determine the nature and degree of selection pressure of repeated gene pairs and explore whether Darwin's positive selection is related to the divergence of repeated NFYA genes, Non-synonymous (Ka) and Synonymous (Ks) values were calculated for 314 duplicated gene pairs from 10 combinations of 4 cotton species. These combinations include G. hirsutum VS G. hirsutum (Gh-Gh), G. hirsutum VS G. barbadense (Gh–Gb), G. hirsutum VS G. arboreum (Gh–Ga), G. hirsutum VS G. raimondii (Gh–Gr), G. barbadense VS G. barbadense (Gb–Gb), G. barbadense VS G. arboreum (Gb–Ga), G. barbadense VS G. raimondii (Gb–Gr), G. arboreum VS G. arboreum (Ga–Ga), G. arboreum VS G. raimondii (Ga–Gr) and G. raimondii VS G. raimondii (Gr–Gr). According to the ratio of Ka/Ks, the selection pressure of duplicated gene pairs can be inferred. It is generally believed that Ka/Ks = 1 means neutral selection (pseudogene), Ka/Ks < 1 means negative purifying selection, and Ka/Ks > 1 means positive selection effect [34].
There are 314 duplicate gene pairs in the NFYA family genes in four cotton species. Among them, there are 13 gene pairs with positive selection effect, and 301 gene pairs with purifying selection. It indicates that the NFYA family genes are relatively conserved in the evolutionary process. Positive selection gene pairs appeared in Ga–Gr, Ga–Gb, and Ga–Gh were 2 pairs, 3 pairs and 2 pairs, respectively. It indicated that some NFYA genes had beneficial mutations in the process of hybridization of diploid cotton into allotetraploid. Likewise, there are also 1, 2 and 3 gene pairs in Gb–Gb, Gb–Gr, and Gh–Gr, respectively. There were 13 gene pairs that had beneficial mutations during evolution. There are 12 and 5 gene pairs with Ka/Ks values ranging from 0.49 to 0 in the Ga–Ga, Gr–Gr repeat gene pairs, respectively. Indicates that they were selected for complete purification (100%). Similarly, the number of Gb–Gh, Gh–Gh repeated gene pairs with Ka/Ks values between 0.99 and 0.5 are 8 and 1, respectively; the number of repeated gene pairs with Ka/Ks between 0.49 and 0 are 27 and 34.
In conclusion, 314 pairs of duplicate genes from four Gossypium species (Gh, Gb, Ga and Gr) were found in selection pressure analysis. Among them, 301 pairs (95.86%) of repeated gene pairs have Ka/Ks values less than 1, including 261 pairs of genes with Ka/Ks values less than 0.5 and 40 pairs of genes with a Ka/Ks value between 0.5 and 0.99, showing purification selection. Only 13 pairs (4.14%) of repetitive gene pairs have a Ka/Ks value greater than 1. These gene pairs may have undergone rapid evolution after repetition and have experienced positive selection pressure. Since most of the Ka/Ks values were less than 1.0, it was speculated that the Gossypium NFYA family genes has underwent strong purifying selection pressure and limited functional differentiation after fragment duplication and genome-wide duplication (Additional file 1: Fig. S2).
Analysis of motif and gene structure of NFYA family
Through the joint analysis of the phylogenetic tree, gene structure and motifs of the NFYA gene family, the characteristics of NFYA family members and their relationships were further understood. The phylogenetic tree of four cotton species was constructed using MAGAX software. Combined with the motif files obtained from the MEME website, the TBtools software was used to display the structure and taxonomic information of the four cotton species NFYA family (Fig. 4).
There are 10 motifs in NFYA genes in four cotton species. According to the phylogenetic tree and motif type, the four cotton species NFYA gene families were divided into three groups: I, II, III. The motifs of each class tend to be consistent and have obvious structural features. Class I includes all motif structures, and they are arranged in the order of motif8, motif9, motif5, motif6, motif4, motif2, motif10, motif1, motif3 and motif7. However, most of class II genes lack motif8, and few genes lack motif9. Compared with the class I NFYA genes, it is possible that a certain function will be lost. Class III contains fewer motif structures, lacking motif8, motif9 and motif6. In general, NFYA family genes contain motif5, motif4, motif2, motif10, motif1, and motif3, indicating that motif largely determine the similarity of family gene function and structure. From the point of view of gene structure introns and exons, all genes contain exons and introns. Meanwhile, the gene structures of class I, class II and class III have their own consistent characteristics. The class I of the NFYA gene introns and exons are compact and uniform, and the length of the exons are shorter. The exons of NFYA gene in class II were scattered. The exons of GhNFYA11, GaNFYA13 and GhNFYA25 genes are more dispersed, including a longer intron and exon. For the genes of class III, the exons are relatively short and only some of them contain longer exons, but the whole is consistent. In conclusion, NFYA family members have unique characteristics and obvious structural differences, which are relatively conservative in the process of evolution.
Analysis of differentially expressed genes of NFYA family in G. hirsutum
The members of the NFYA family play important roles in various important physiological and biochemical processes of plants [35]. In addition, NFYA is also involved in the response to various environmental stimuli [36]. To determine the function of GhNFYA gene in different environments, upland cotton was subjected to low temperature, high temperature, high salt and PEG stress. The expression level of GhNFYA gene during growth and development and its response to phytohormones were analyzed (Fig. 5). The cis-acting element is located in the promoter region of the gene and can be used as a reference for tissue specificity and stress response in different environments. The cis-acting elements of the NFYA gene family mainly include the cis-acting regulatory elements involved in the methyl jasmonate (MeJA) response, the cis-acting regulatory elements necessary for anaerobic induction, the MYB binding site involved in drought induction, the cis-regulatory elements involved in meristem expression, cis-acting elements involved in low temperature response, cis-acting elements involved in defense and stress response, cis-acting elements involved in stress response, and phytohormone-related regulatory elements (salicylic acid, auxin, gibberellin and abscisic acid, etc.) (Additional file 2: Table S3). The number of cis-acting elements varied among genes, for example, GhNFYA16 contained a cis-acting element for abscisic acid, a MYB binding site involved in drought induced, anaerobically induced action element, a cis-regulatory element involved in meristem expression, and a cis-acting element involved in stress response. In general, the NFYA family of G. hirsutum mainly contains cis-acting elements related to plant hormones, growth and development and adversity. It can be inferred that this gene family is related to adversity to a certain extent.
Gene expression patterns can provide an important reference for gene function analysis. It is related to the biological functions controlled by cis-acting elements. To explore the expression patterns of GhNFYA in G. hirsutum under different stress environments, the gene expression levels of cotton under four abiotic stress conditions of salt, cold, heat and PEG (1 h, 3 h, 6 h and 12 h) were analyzed [37]. The results showed that the genes of the NFYA family had different degrees of response to cold, heat, salt and PEG. It can be seen that the expression level was higher at 12 h of salt treatment, and the expression level of cold and heat stress also changed in a trend. Combining salt and PEG stress, it can be seen that GhNFYA16 is differentially expressed under salt and PEG stress treatments, and the expression patterns of each gene are slightly different under stress treatments. These results further prove that GhNFYAs participate in the stress response of plants. Overall, it concludes that NFYA gene family has been influenced to more evolutionary events and extended. Moreover, some point mutations in exon regions and regulatory region of new family members might affect the function and expression of new family members [38, 39].
Tissue specificity of NFYA family genes in G. hirsutum and analysis of differentially expressed genes under salt stress
The tissue-specific presentation of 30 genes in the GhNFYA family showed that there were certain differences in the expression of 30 genes among different tissues (Fig. 6A). The expression levels of GhNFYA5, GhNFYA20, GhNFYA23 and GhNFYA28 genes were the highest in roots. 11 genes had the highest expression levels in the stem: GhNFYA2, GhNFYA6, GhNFYA7, GhNFYA12, GhNFYA13, GhNFYA14, GhNFYA15, GhNFYA17, GhNFYA19, GhNFYA22 and GhNFYA27. The relative expression of 12 genes GhNFYA1, GhNFYA8, GhNFYA9, GhNFYA10, GhNFYA11, GhNFYA16, GhNFYA18, GhNFYA21, GhNFYA24, GhNFYA26, GhNFYA29 and GhNFYA30 were the highest in leaves, and the expression of GhNFYA4 in roots and stems were higher than that in leaves. The expression levels of GhNFYA3 and GhNFYA25 did not differ among different tissues.
The most intuitive phenotypic changes occur in leaves when subjected to abiotic stress. Therefore, 12 highly expressed gene in leaves were selected, and the expression values were analyzed in periods (1, 3, 6 and 12 h) of NaCl stress (Fig. 6B, Additional file 3: Table S4), which provided support for subsequent virus-induced gene silencing experiments. The expression values of GhNFYA1, GhNFYA18, GhNFYA29 and GhNFYA30 were the highest when treated with NaCl for 12 h. The expression of GhNFYA8 decreased significantly after NaCl treatment for 3, 6 and 12 h compared with that after treatment for 1 h. The expression of GhNFYA16 increased significantly after 6 and 12 h of NaCl treatment. The expression of GhNFYA21 decreased significantly at 3 h compared with 1 h after NaCl treatment, and increased again at 6 h and 12 h. GhNFYA26 was significantly decreased after NaCl treatment for 3 h and 6 h, while its expression level was significantly increased after NaCl treatment for 12 h. Obviously, some GhNFYA genes showed significant differential expression after NaCl treatment.
G. hirsutum was treated with 100 mM NaCl stress when it grew to the three leaf one heart stage. It was found that the cotyledons began to wilt and lose their luster after 6 h of treatment, and the wilting was more serious after 12 h of treatment. After 24 h of treatment, part of the cotyledons fell off, true leaves wilted, leaf edges curled, and new leaves wilted to death (Fig. 6C).
Co-expression network analysis of GhNFYA genes under salt stress
To further understand the role of GhNFYA genes in salt stress, the correlation network of family members based on Pearson correlation coefficients (PCCs) was analyzed [40]. Expression network analysis of genes under salt stress showed positive or negative correlations. A total of 142 gene pairs were positively correlated and 137 gene pairs were negatively correlated under stress (Fig. 7). Except for GhNFYA1, GhNFYA11, GhNFYA24 and GhNFYA29, other genes showed complex and highly similar functional relationships. 273 gene pairs that interact with each other during salt stress are involved in resilience. In conclusion, the expression network studies showed that GhNFYAs genes were closely related to each other in salt stress.
Subcellular localization of GhNFYA
The Programs website predicts that GhNFYA is most likely to be located in the nucleus and cytoplasm in G. hirsutum. According to the prediction results of the WoLF–PSORT website, GhNFYA is mainly located in the nucleus, cytoplasm, vacuole and chloroplast. For example, on the WoLF–PSORT website, the GhNFYA16 predicts subcellular localization in the nucleus and chloroplasts, while using the Programs website to predict that GhNFYA16 is localized in the nucleus. In general, subcellular localization verification focuses on the nucleus. Most of the GhNFYA are located in the nucleus, which may be related to their role as transcription factors that combine with NF–YB and NF–YC in the nucleus to form a trimer to regulate downstream target genes.
Combined with the differential expression of GhNFYAs genes under various stresses and the real-time expression of genes in 4 time periods under NaCl treatment, GhNFYA16 was selected for subcellular localization verification. The expression vector of GhNFYA16-GFP fusion protein was constructed. The recombinant plasmid containing the expression vector was injected into the epidermis of tobacco. 3 days later, the observe under a focusing microscope. The results showed that the green fluorescence signal of the fusion protein showed that GhNFYA16 was located in the nucleus (Fig. 8).
Analysis of GhNFYA protein interaction network
GhNFYA16 is an orthologous gene of Arabidopsis thaliana NFYA1. In this protein interaction network, Arabidopsis NF-YA1 interacts with NF-YB1, NF-YB2, NF-YB3, NF-YB6, NF-YC1, NF-YC2, NF-YC3, NF-YC4, NF-YC9 and NF-YC12. It is speculated that GhNFYA16 is closely related to the corresponding protein in cotton. Many functions were enriched in this protein network, such as positive regulation of photomorphogenesis, abscisic acid-activated signaling pathway, positive regulation of nitrogen compound metabolic process and regulation of developmental process and other Go enrichment. The functional diversification of NF-Y evolution in plants enables plants to actively respond to different abiotic stresses (Fig. 9).
Silencing GhNFYA16 reduced tolerance to salt stress in cotton
According to the analysis of the differentially expressed genes of the GhNFYA family, a highly expressed gene GhD01G1179.1 (GhNFYA16) was screened out after NaCl stress for 6–12 h. To further study the function of the GhNFYA16 gene, a VIGS experiment was performed on GhNFYA16 with Gossypium cv H177 as the material. 2 weeks later, the Gossypium with pYL156: PDS showed albino phenomenon, indicating that VIGS silence was successful. The silencing effect of GhNFYA16 gene was examined by quantitative real-time PCR (qRT-PCR). The results showed that the expression of pYL156: GhNFYA16 was significantly lower than that of pYL156 in the control group. After salt treatment, the conductivity rate and chlorophyll content of the plants were measured. It was found that after silencing the GhNFYA16 gene, the conductivity rate increased and the chlorophyll content decreased compared with the control group. Therefore, it can be inferred that GhNFYA16 is involved in the adaptability of cotton to salt stress (Fig. 10).