Sequence QueryHelp topic index; Search parameter reference; Results format; Results InterpretationIntroductionThe sequence query, in which one or more peptide molecular masses are combined with sequence, composition and fragment ion data, is potentially the most powerful search of all. The usual source of sequence information is a partial interpretation of an MS/MS spectrum. While it is very difficult to determine a complete and unambiguous peptide sequence from an MS/MS spectrum, it is almost always possible to find a series of peaks providing 3 or 4 residues of sequence data. This general approach was pioneered by Mann and co-workers at EMBL, who used the term "sequence tag" for the combination of a few residues of sequence data combined with molecular weight information [Mann, 1994 #484]. They defined a sequence tag derived from an MS/MS spectrum as the mass of the precursor peptide, the mass of the first peak of the identified sequence ladder, some partial sequence, and the mass of the final peak of the ladder. The sequence query mode of Mascot differs from a sequence tag search in that fragment ion mass data and sequence data can be supplied in any combination. Sequence data are very specific, making it possible to use very wide tolerances on the mass data. For example, with a mass tolerance of ¡À25%, the following two queries: 1853 seq(n-tcp) seq(dst) 1933 seq(n-tcp) seq(dst) will give the same results, even though the molecular weights of the peptides are very different, because the sequence data alone provide sufficient discrimination. SyntaxEach line entered into the query window must consist of one experimental peptide mass value, optionally followed by qualifiers for that peptide: M seq(¡) comp(¡) ions(¡) M is an experimental peptide mass value, seq(¡) is AA sequence information, comp(¡) is AA composition information, ions(¡) contains MS/MS fragment mass and (optionally) intensity values. A line may contain zero, one, or many qualifiers, but only one qualifier can be of type comp(¡). N.B. Only include qualifiers for information which is known with a high degree of confidence. If a seq(¡) or comp(¡) qualifier condition is not met, then the associated mass value will be excluded from consideration, so cannot contribute towards the final score. Sequence informationThe sequence information should be given in standard one letter code. It should be preceded by a prefix as outlined in the table below, to indicate what type of sequence it is. If no prefix is specified, the default is b-. |
| Prefix | Meaning | Example |
|---|---|---|
| b- | N->C sequence | seq(b-DEFG) |
| y- | C->N sequence | seq(y-GFED) |
| *- | Orientation unknown | seq(*-DEFG) |
| n- | N terminal sequence | seq(n-ACDE) |
| c- | C terminal sequence | seq(c-FGHI) |
|
The examples will all match to a peptide with the sequence ACDEFGHI. Note that *-DEFG will search for both DEFG and GFED. Note also that y-GFED is written C-term to N-term, whereas c-FGHI is written N-term to C-term Both lower and upper case characters may be used for amino-acids. An unknown amino acid may be indicated by an 'X'. More than one amino acid may be specified for a position by putting them between square brackets. A line may contain several sequence information qualifiers. For example, the following query will match to a peptide with the sequence ACDEFGHI: 1234 seq(n-AC[DHK]) seq(c-HI) seq(*-GF) Composition InformationComposition should consist of a number, followed by the corresponding amino acid between square brackets. An asterisk means "one or more". For example comp(2[H]0[M]3[DE]*[K]) indicates a peptide which contains 2 histidines, no methionines, 3 acidic residues (glutamic or aspartic acid) and at least 1 lysine. Note that 'X' is not meaningful and so not allowed in a composition query. Ions informationMass and (optionally) intensity values from one or more ion series in the MS/MS spectrum of a peptide can be specified in an ions qualifier. Each ions qualifier can include a prefix to indicate what type of ion series the m/z values belong to. |
| Prefix | Meaning | Example |
|---|---|---|
| b- | b series ions | ions(b-m1:i1,m2:i2, ¡,mn:in) |
| y- | y series ions | ions(y-m1,m2, ¡,mn) |
| unassigned | ions(m1:i1,m2:i2, ¡,mn:in) |
|
The inclusion of intensity values, separated from mass values by colons, is optional. If intensity values are not included, then the colons must also be omitted, as in the y series example. Mascot uses the intensity information to iteratively select sub-sets of the most intense peaks in order to optimise scoring discrimination. Mass values do not need to be in order, or represent contiguous sequence ion ladders. A line may contain several ions information qualifiers, for example: 1454.4 ions(b-610,707,804,1086) ions(y-2909) ions(2106,2632,2545) Other Qualifierspeptol(tolerance,unit) may be used to specify a mass tolerance for an individual query, over-riding the search form default. For example, peptol(10,%) or peptol(2,Da). If you re-Search a Sequence Query from the results page, you may notice two additional qualifiers which are used internally by Mascot: from(mass,charge) is used to track the original mass and charge state of the peptide, after it has been converted to a neutral, Mr value. For example, if the peptide charge state was specified to be 1+, the query 1234.5 would become 1233.492 from(1234.5,1+) title(encoded title text) can be used to associate a text string with an individual query. If the text contains non alphanumeric characters, these must be Url encoded by conversion to %nn, where nn is the hexadecimal ASCII code for the character. For example, Sample(1) becomes Sample%281%29. ExampleChoose Sequence query mode, and accept all the default settings. Paste the following search into the query window and submit the search.
All of the top scoring results correspond to the same peptide, found in a number of homologous heat shock proteins in the OWL database: |
1. HSP53001 Mass: 27048 Score: 57.00 HSP53001 NID: g506432 - human. Observed Mr(expt) Mr(calc) Delta Start End Miss Ions Peptide 1933.00 1933.00 1852.91 80.09 140 - 156 0 ---- TCPVQLWVDSTPPPGTR
|
The remarkable feature of these results is the "delta" between the experimental mass and the calculated mass: 80 Da. The specificity of the sequence qualifiers took priority over the specificity of the molecular weight. This is the intended behaviour and, in this case, the delta of 80 Da may indicate a phosphorylation site. |
| Copyright © 2000 Matrix Science Ltd. All Rights Reserved. |