    A challenge may be presented if the positive and negative test sets are unbalanced in the study of biological information. In most cases, the number of positive samples is far less than the number of negative samples.

    In a few cases, the number of positive samples may be much larger than the number of negative samples. We can easily obtain ACC-SP when the number of positive samples is greater than the negative samples. In this case, the classifier only reflects the classification effect of the negative samples and is unable to accurately express the prediction effect of the classifier on the entire test data set.

    To solve this problem, researchers typically use the geometric mean Gm as described in. MCC can be expressed as. Currently, studies on miRNA commonly use one or more of these above evaluation indices. To select a better feature set for classification, we needed to determine the effect of different feature subsets on the performance of the classifier.

    To do this, we used the BP neural network method with the same training set positive samples and samples to test different feature sets, with the results shown in Table 4. From Table 4 , we learn that the accuracy of the entire feature sets can be as high as This result indicates that our feature set is more effective for processing of a more complex structure or sequence diversity.

    Considering that the feature sets used here are not very large and each feature subset is highly independent, reducing the dimension of the feature vector is no longer needed.

    The selection of is important because not only determines the number of samples but also determines the computational complexity. Usually, a value of between 5 and 10 is selected based on experience. Statistical performance shows little improvement when selection is greater than Again, computational complexity must be considered; thus a value between 5 and 10 is best [ 32 ]. We divided the samples into two cases for training and testing.

    In the first one, a large difference was observed between the number of positive and negative samples: The second case included equal numbers of positive and negative samples: These training and testing were repeated five times. The testing performance is shown in Figures 3 and 4. From comparison of the data in Figures 3 and 4 , no significant difference was observed between the actual output and the expected output of each test.

    As described above, the evaluation of the reference index is shown in Table 5. From the data presented in Table 5 , the number of samples affects the accuracy and recall rate of the positive samples. In particular, the precision and recall rate of the negative samples decreased with the decrease in the number of negative samples in the training set. This result indicates that the more the samples in the training process, the better the classification effect of the classifier.

    At the same time, the precision and recall rate of the number of positive samples were affected. With the number of negative samples in the training set increased, the number of correct predictions increased by four and the number of error predictions was reduced by eight. This result shows that the precision and recall rate of the positive samples decreased with the increase in the number of the negative samples. The performance of our method was compared with other methods: As shown in Table 6 and Figure 5 , the results demonstrate that the total prediction accuracy of our method is The overall performance of the models as measured by MCC was in the following order: To demonstrate the validity and the universal applicability of the BP method, we analyzed six other species: Anolis carolinensis , Arabidopsis thaliana , Drosophila melanogaster , Drosophila pseudoobscura , Epstein-Barr virus, and Xenopus tropicalis.

    Identification of miRNAs is the first step toward understanding their biological characteristics. Many approaches have been proposed to predict pre-miRNAs in recent years. However, feature extraction in these methods can result in information redundancy.

    To overcome this drawback, a BP neural network algorithm together with optimal 98D features was employed for this analysis. The results demonstrate that the total prediction accuracy of our method is After the identification step, functional analysis is also important for miRNA research. If human miRNA and diseases were focused on, two main approaches would be employed to predict the relationship.

    The first one is the statistical comparison analysis for the miRNA or isomiR expression [ 41 ]. The second one is the network analysis and prediction for miRNA-disease relationship [ 42 — 45 ]. Functional analysis of the novel detected miRNAs would be our future works. The work was supported by the Natural Science Foundation of China no. Indexed in Science Citation Index Expanded. Subscribe to Table of Contents Alerts.

    Table of Contents Alerts. Triple Structure Sequence In addition to high specificity of the primary sequence features, the secondary structure sequence of pre-miRNA is also a contributing factor. Structural Diversity Characteristics The potential for nucleotide pairing in the sequence is a significant characteristic that can also be used to describe the pre-miRNA sequence. Fixing the Number of Nodes in the Hidden Layer In general, to select the number of nodes in the hidden layer in changing the BP neural network structure is difficult.

    Corresponding training results with different numbers of nodes in the hidden layers. They can be successfully utilized to detect and track the inheritance mechanisms of polymorphic traits that contribute to genetic diversity Khatkar et al.

    Molecular markers which enable the detection of genetic variants at DNA sequence level are devoid of these limitations typical for morphological, chromosomal and protein markers. They also have unique properties that make them more useful than other markers.

    What is more, molecular markers are not influenced by environment and usually do not have pleiotropic effect on quantitative traits loci QTL Teneva, Up to date, numerous techniques of studying DNA variations at the molecular level are known.

    Restriction fragment length polymorphism RFLP is associated with the occurrence of differences in nucleotide sequences in the gene. It is a result of point mutations, which may manifest themselves phenotypically.

    They appear through the use of restriction enzymes that recognize sequences specific to their nucleotide. The mutations result in the creation of new places that are identified by restriction enzymes that cut DNA into fragments of various lengths. This technique consists of amplification of specific parts of the genome and the amplicon is digested by one or more restriction enzymes. The obtained DNA fragments are distributed on an agarose gel and, depending on their size, migrate at different speed rates.

    Smaller fragments tend to move faster in the comparison to larger ones Beuzen et al. The system enables detection of single nucleotide polymorphism within the examined DNA sequences. It is based on the amplification of specific parts of the genome in the PCR reaction and sequencing of the product obtained.

    A comparison between electrophoresis images of amplification products is conducted, which allows determining whether a mutation in a given region had occurred.

    What is more, these markers are present in both coding and non-coding parts of the genome Stoneking, SNP polymorphism is usually associated with the presence of only two alleles in the gene pool of the population Beuzen et al.

    On the one hand, a great advantage of this polymorphism is its university in the genome of different species and highly efficient identification of polymorphism within the tested sequence while on the other hand, the high cost of the analysis makes it a disadvantage.

    Genetic diversity may be also determined by utilization of microsatellite sequences. These repeats consist of several nucleotides sequences also referred to as motifs. They occur mainly in non-coding regions of the genes, thus they can be also identified in flanking sequences or more rarely in coding sequences. What is more, they are characterized by uniform dispersion at 6 to 10 bp Li et al. The function of microsatellites is not yet fully understood Li et al.

    Features of microsatellites like high level of polymorphism, high frequency of occurrence, ease of identification and uniform distribution across the genome contributed to their common usage. They are used in the estimation of the genetic variability of animals, in the research on the control of origin, to characterize the structure and degree of inbreeding of the population and also to identify the genes of quantitative traits QTL.

    In the evaluation procedure of animals breeding value the knowledge of genome organization and polymorphism is increasingly utilized due to the fact of vast and easy access to many molecular technics. What is more, many mutations directly affecting the phenotype were recognized. On the other hand, thousands of anonymous genetic markers, because of their potential linkage with novel mutations of large scale of activity, may be utilized for estimation of the breeding values and selection based on genetic markers MAS.

    The major milk proteins include casein: These fractions, in most species, are polymorphic. Polymorphism of milk proteins has been widely explored in the case of cattle. Genes are arranged in order: These genes are closely linked and form a cluster Bai et al. It consists of one major and one minor component. Both of these proteins are composed of a single polypeptide chain of the same amino acid sequence. It consists of amino acids residues: For European cattle breeds allele B is the most common — it exceeds the frequency of 0.

    Allele B at the position of the polypeptide chain encodes glutathione, whereas the allele C encodes glycine. Variant A occurs sporadically. Alleles C, D and E were created due to the mutation of allele B. Table 1 shows the frequencies of as 1 -casein alleles in various breeds of cattle. Allele A which was created by mutation of allele D exists in most European breeds up to date. Alleles B and C are specific, respectively, for zebu and yaks Ibeagha-Awemu et al. At the 67 position of amino acid chain, respectively, variant A 1 contains histidine and variant A 2 proline.

    Variant A 2 , however, is the original form and is identified in old breeds of cattle Zebu, Guernsey , whereas variant A 1 evolved much later and is characteristic to contemporary breeds Hanusova et al.

    Variant B is less common, and A 3 and C exist rarely Farrell et al. These alleles are most common for European breeds of cattle.

    Allele E was only identified for the Italian Piemontese breed. The 6 minor components are detected by PAGE in urea with 2-mercaptoethanol. It consists of amino acid residues arranged in the following order: It is highly homologous to the fibrinogen gamma chain. What is more it serves a similar function, while being a stabilizing factor during the formation of the clot Azevedo et al.

    The differences between them are caused by two point mutations involving a substitution of threonine with isoleucine at the position of polypeptide chain and aspartic acid with alanine at the position Azevedo et al. It consists of 7 exons and its length is approximately 6 bp. The first who discovered its polymorphism were Aschaffenburg and Drewry in as cited in El-Hanafy et al.

    For most cow breeds, both variant A and B are most common and occur with high frequency Heidari et al. Mutations in the nucleotide sequence resulting in substituting of amino acids are distributed on 3 exons: The differences between variant A and B occur because of the existence of different amino acids at position The chicken, side-necked turtle and gecko converted benzoic acid mainly into ornithuric acid, but all three species also excreted smaller amounts of hippuric acid. National Center for Biotechnology Information , U.

    Journal List Biochem J v. Smith , and R. Author information Copyright and License information Disclaimer. This article has been cited by other articles in PMC. Species differences in the aromatization of quinic acid in vivo and the role of gut bacteria.

    Canadian Journal of Botany

    J Pharm Pharmacol. May;55(5) Extracts of various species of Epilobium inhibit proliferation of human prostate cells. Vitalone A(1), Guizzetti M, .


    The urinary excretion of orally administered [14C]benzoic acid in man and 20 other species of animal was examined. 2. At a dose of 50mg/kg, benzoic acid was.


    The method also utilized the multiple species of miRNA sequences and structural features. It proved that miRNA genes could be detected.

