The Bio Professional Package is equipped with significant and unique features specifically designed to meet the needs of expert IP and R&D personas. It encompasses the following key features:
-
Degenerate Sequence Searching
- Chemical Modification Searching
-
50,000 search results: with the Bio Professional Package, users can capture a substantial number of hits, facilitating thorough prior art and comparative assessments.
The Bio Professional package offers a range of valuable functionalities to its users. Firstly, it aids in minimizing the risk of encountering Freedom to Operate (FTO) issues by diligently identifying relevant patents, thus reducing the chances of missing any critical information. Researchers can benefit from the package's ability to identify all pertinent sequences that incorporate degenerate or ambiguous codes, ensuring comprehensive coverage within their specific area of interest. This feature is particularly useful when identifying degenerate primers or other important sequences. Additionally, the package provides reliable alignments that consider the possibilities resulting from degenerate or ambiguous codes. This allows for the accurate identification of conserved regions, functional motifs, and other important elements within those sequences. In summary, Bio Professional significantly enhances the search efficiency and enables users to discover sequences that may have otherwise gone unnoticed, providing a comprehensive and powerful tool for professionals in the field.
Degenerate Search
A degenerate sequence refers to sequences in which some positions allow for multiple possible options For example, if we take the sequence:
The new degenerate search feature enables users to conduct searches using a specific sequence while retrieving results that contain ambiguous codes.
While BLAST does allow for searching sequences with degenerate codes or wildcards, there are certain drawbacks associated with using the conventional search method. These drawbacks include:
- Difficulty in finding target sequences: The Blast Algorithm assigns a negative score to degenerate codes during the search process. As a result, sequences containing multiple instances of ambiguous codes, such as "X," may not be retrieved, leading to challenges in locating the desired target sequence.
-
Inability to obtain accurate sequence similarity: BLAST considers degenerate codes as mismatches, which can result in an underestimation of sequence similarity. Even if there are 100% matches between sequences, the reported similarity may be significantly lower than expected due to this treatment of degenerate codes.
-
Vulnerability to filtering based on sequence identity parameters: Once the searched target sequence is sorted, it becomes susceptible to being filtered out based on sequence identity parameters or other criteria. This can potentially lead to the exclusion of relevant sequences that meet the search criteria but do not meet the specific filtering thresholds.
As a result, users can miss key sequence information when performing FTO or novelty searches leading to lost opportunities and unnecessary risk-taking. These limitations highlight the need for alternative approaches or specialized tools that can address the challenges associated with degenerate codes and provide more accurate and comprehensive search results.
Solution
The innovative 'Degenerate' search incorporates advanced data processing techniques and exclusive search algorithms that empower users to effectively search for and identify sequences with variability.
Additionally, the results view has undergone optimization to facilitate the identification of degenerate sequences. A user-friendly table view aligns the general degenerate characters (e.g., X) with their corresponding actual residues (e.g., A, P, etc.) at the respective positions, ensuring easy interpretation and analysis.
The extensive database supporting this search type comprises a staggering 45 million degenerate sequences, carefully curated from patent sequence listing files. With this vast collection, users gain access to over a trillion possible sequences, enabling them to conduct comprehensive Freedom to Operate (FTO) or novelty searches with unmatched depth and breadth.
Limitation with V1
When n/X represents any replacement and if the proportion of these n/X instances in the hit result exceeds 50%, our degenerate search will be unable to retrieve those specific results. To illustrate this, let's consider the example of EP2603596B1 (SEQ ID. 127):
Search Query: IPAAAAGAYLARAEQQQQH
Expected Result: XPXXXXGXYXXRXXXXXXX
The sequence has a total length of 19 amino acids. Among these, 4 instances of X, for example, have replacement information, such as X = V, I, L, M. On the other hand, 11 instances of X represent the potential substitution of any amino acid, encompassing the possibilities of [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y].
Consequently, out of the 19 amino acids, 11 are represented by degenerate codes indicating any replacement, accounting for 58% of the total sequence length. Since this proportion exceeds the 50% threshold, the degenerate search version 1.0 will not retrieve this result.
Degenerate Sequence Search Example
In traditional BLAST algorithms, there can be cases where the Query Sequence and Hit Sequence have a perfect 100% match, but the reported Alignment identity is only 88%. This discrepancy highlights a potential issue where the sequence might be missed and arises due to the treatment of ambiguous bases (X) present in the Hit Sequence as mismatches. Specifically, out of the 98 bases in the Hit Sequence, 12 are ambiguous. It is crucial to note that this is just one example of a result that could potentially be overlooked depending on the search parameters employed. Hits containing a higher percentage of ambiguous bases are more likely to go unretrieved in the search, highlighting the importance of considering and adjusting search parameters to account for such cases. This is not the case with Degenerate Search.
Motif Vs Normal Vs Degenerate Sequence Search
MOTIF serves as a tool for searching sequences that possess specific domains or motifs. Users have the flexibility to include degenerate characters, such as "X" at the desired positions within the query sequence. However, it is important to note that MOTIF will only retrieve hits with 100% identity to all other regions. If there is even a single mismatch anywhere, the hit will not be detected by MOTIF.
In contrast, the regular sequence search utilizing the BLAST algorithm focuses on overall similarity. This approach may result in users not finding the specific sequences they are looking for, as the overall similarity scoring could be significantly low. For instance, degenerate characters are treated as mismatches during alignment, leading to the potential exclusion of relevant hit sequences from the search results.
However, the degenerate search feature enables users to search regular sequences without ambiguous residues and retrieve results that contain degenerate and wildcard characters. Even though these sequences may have a perfect 100% fit, they would be missed by other search algorithms. This empowers researchers to conduct a comprehensive Freedom to Operate (FTO) search and capture as many relevant hits as possible, thereby enhancing the efficacy of their research.
Chemical Modification Searching
Chemical modifications to biomolecules, such as nucleic acids (DNA, RNA) or proteins, involve altering their structure by introducing chemical groups or functional changes at specific sites within the sequence. These modifications can significantly impact the properties or functions of these biomolecules. Some examples include the following:
- Methylation: methylation refers to the addition of a methyl group to the DNA sequence of a gene, which can result in the gene being turned off, preventing it from producing a protein.
- Glycosylation: glycosylation involves attaching carbohydrate molecules (e.g., sugars) to amino acid residues in proteins. This modification can affect various aspects of protein behavior, including folding, stability, solubility, and recognition by other molecules.
In biochemistry, chemical modifications involve altering the structure and function of biomolecules by adding or removing "modifying elements." These modifications, typically achieved through a series of chemical reactions, can be applied to various macromolecules, including proteins, nucleic acids, carbohydrates, and lipids. A wide range of modifications is possible, and this approach is utilized in the development of various drug types, such as antibodies, proteins, peptides, and oligonucleotides.
Chemical Modification Search
The Chemical Modification Search is an innovative tool that eliminates the need for manual searching and cleaning of patents containing chemical modification data. By accurately capturing modification information, it helps reduce the risk of infringement. This tool can be accessed from the sidebar in the Bio Platform by clicking on "Modification". Please note that this feature is exclusively available to Bio Professional users.
To conduct a search, you can start by entering a sequence length between 3 and 1000, followed by selecting the location of the modification and modification type:
You also have the option to enter a query sequence, up to a total of 200 sequences:
Once you are satisfied with your input, you can click on "search" to proceed to the results page. On the results screen, you will find the hit sites and modification overview in the middle of the page. Here, only the top three are displayed for each site. Any redundant ones are shown in grey.
There is also a new "Hit modification" field and filter, as well as a "Modification browser." Clicking on the "Modification browser" allows you to view the summary report:
If you switch to the "Alignment view" it will display the hit modifications and sources:
After clicking on a sequence of interest, you will be redirected to the Sequence Detail page. Here, you will find details about Hit modifications, as well as information about the sources from which the hit modification data was retrieved.
Comments
0 comments
Please sign in to leave a comment.