AAIndexLoc: Predicting Protein Subcellular Localization Using Amino Acid Index

AAIndexLoc is a machine-learning-based algorithm that uses amino acid index to predict protein subcellular localization based on its sequence. It is believed that the physico-chemical properties of amino acids play an important role in determining the protein subcellular localization. AAIndexLoc uses the local and global information of protein sequence to predict its localization. Basically, the protein sequence is divided into three parts, i.e. N-terminal, middle and C-terminal. For local information, the N-terminal and middle sequence information are used for encoding. Whereas, for global information, the full-length protein is used. The protein sequence, either locally or globally, is encoded using amino acid composition, weighted amino acid composition, five-level grouping composition and five-level dipeptide composition. This system has been trained using the dataset from MultiLoc on Animal, Fungal and Plant dataset.

The user can predict the unknown protein sequence(s) by pasting the protein sequences in FASTA format in the textarea below. Click here for an example! For prediction of a number of protein sequences, the system needs longer time to process. Therefore, the user can provide his/her email address in the text box given (optional). If the user provides the email address, the results will be sent to the user upon finished. Otherwise, the results will be given in the webpage upon finished.

Manuscript in preparation

Kuo-Bin Li

Prediction in:
Email address:

Enter your protein sequence here (FASTA format):