diff --git a/README.md b/README.md index ba62061..6472371 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,19 @@ FScanpy is a comprehensive Python package designed for the prediction of [Progra For detailed documentation and usage examples, please refer to our [tutorial](tutorial/tutorial.md). +## Core Features +- **Sequence Feature Extraction**: Support for extracting features from nucleic acid sequences, including base composition, k - mer features, and positional features. +- **Frameshift Hotspot Region Prediction**: Predict potential PRF sites in nucleotide sequences using machine learning models. +- **Feature Extraction**: Extract relevant features from sequences to assist in prediction. +- **Cross - Species Support**: Built - in databases for viruses, marine phages, Euplotes, etc., enabling PRF prediction across various species. + +## Main Advantages +- **High Accuracy**: Integrates multiple machine learning models to provide accurate PRF site predictions. +- **Efficiency**: Utilizes a sliding window approach and feature extraction techniques to rapidly scan sequences. +- **Versatility**: Supports PRF prediction across various species and can be combined with the [FScanR](https://github.com/seanchen607/FScanR.git) framework for enhanced accuracy. +- **User - Friendly**: Comes with detailed documentation and usage examples, making it easy for researchers to use. +- **Flexible**: Provides different resolutions to suit different using situations. + ## Installation Requirements - Python ≥ 3.7 - Dependencies are automatically handled during installation diff --git a/tutorial/tutorial.md b/tutorial/tutorial.md index 5fe4e63..4575607 100644 --- a/tutorial/tutorial.md +++ b/tutorial/tutorial.md @@ -1,5 +1,5 @@ ## Abstract -FScanpy is a Python package designed to predict Programmed Ribosomal Frameshifting (PRF) sites in DNA sequences. It integrates advanced machine learning models, including Gradient Boosting and BiLSTM-CNN, to provide accurate predictions. This tool is essential for understanding gene expression regulation in various organisms, including eukaryotes and viruses, and offers a robust solution for PRF prediction challenges. +FScanpy is a Python package designed to predict Programmed Ribosomal Frameshifting (PRF) sites in DNA sequences. This package integrates machine learning models, sequence feature analysis, and visualization capabilities to help researchers rapidly locate potential PRF sites. ## Introduction ![FScanpy structure](/image/structure.jpeg) @@ -9,7 +9,7 @@ FScanpy is a Python package dedicated to predicting Programmed Ribosomal Framesh ![Machine learning models](/image/ML.png) For the prediction of the entire sequence, FScanpy adopts a sliding window approach to scan the entire sequence and predict the PRF sites. For regional prediction, it is based on the 33-bp and 399-bp sequences in the 0 reading frame around the suspected frameshift site. Initially, the Gradient Boosting model will predict the potential PRF sites within the scanning window. If the predicted probability exceeds the threshold, the BiLSTM-CNN model will predict the PRF sites in the 399bp sequence.Then,VotingClassifier will combine the two models to make the final prediction. -For PRF detection from BLASTX output, FScanpy identifies potential PRF sites from BLASTX alignment results, acquires the two hits of the same query sequence, and then utilizes frameDist_cutoff, mismatch_cutoff, and evalue_cutoff to filter the hits. Finally, it employs [FScanR](https://github.com/seanchen607/FScanR.git) to identify the PRF sites. +For PRF detection from BLASTX output, [FScanR](https://github.com/seanchen607/FScanR.git) identifies potential PRF sites from BLASTX alignment results, acquires the two hits of the same query sequence, and then utilizes frameDist_cutoff, mismatch_cutoff, and evalue_cutoff to filter the hits. Finally, FScanpy is utilized to predict the probability of PRF sites. ### Background [Ribosomal frameshifting](https://en.wikipedia.org/wiki/Ribosomal_frameshift), also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can be programmed by the nucleotide sequence of the mRNA and is sometimes affected by the secondary, 3-dimensional mRNA structure. It has been described mainly in viruses (especially retroviruses), retrotransposons and bacterial insertion elements, and also in some cellular genes.