Searching in Bio – Patsnap Help Center

Types of searches:

There are several different search types on the Bio platform.

Sequences
Short Sequences
Motif
Fragment
Degenerate
Modification
Antibody
Combined
Drug/Gene Index

Sequence Search

When doing a sequence search you have two options:

Type out the sequence you are interested in.
- When typing out a sequence you can input the query as a DNA or RNA chain, protein sequence, FASTA format, or by typing out a string of amino acids.
- The system supports multi-format input so you can combine any of the input formats up to 200 sequences.
Upload sequences by identifiers

Added support for utilizing Sequence Code, Genbank ID, and CAS RN identifiers in the Sequence Search feature, allowing for quick input of saved IDs for sequence retrieval.

After inputting the search query, specify the input type and search database using the filters on the right side of the screen.

Once you have selected the filters the search button will be activated.

Once the search results are collected, you can further refine using the filters on the left-hand side of the screen

After filtering your search results, you can:

1. View the patents in analytics.
  - To view patents on analytics, go to the "Patents" tab and then click on the "view in Analytics" button:
  - Once you open analytics with all the desired patents you can save them to a workspace and interact with them using the workspace functions.
2. Export the results as a FASTA file.
3. View Patents, Literature, Other sources, and Analysis:

To make things easier to find patents and literature with the most relevant documents with the highest identity sequence hits, sorting options have been added to the Patent and Literature tab allowing you to sort documents according to sequence attributes. Added fields include:

Sequence Fields -

Blast score
Alignment identity
Query/subject coverage and identity
Positive Score
Sequence length
The number of sequence hits etc.

Literature Fields -

Published Date

Patent Fields -

Publication Date
Application Date
Priority Date

Short Sequence Searching

The short sequence search is a tool used to search amino acid and nucleotides less than 30 in length. The advantage of using the short sequence search over the normal sequence search is that its:

Faster
The advanced preferences are automatically optimised for a short sequence search
- (e-value is higher meaning you get more results)
Supports multi format sequence search

Motif Searching

You might want to find sequences that contain critical patterns, these regions within sequences may be of special significance, perhaps conferring some sort of binding activity. It could be that you are interested in all sequences that contain this exact region and do not care much about the rest of the sequence. It could also be that you want to find sequences that are the same as yours, but in which certain residues can be one of a number of residues, perhaps with the same chemical properties (like being negatively charged). In both these cases, the Motif Search is perfect for you since it can obtain all possible results including the sequence at one input.

Essentially, Motif Search is useful for controlling particular points where you want the mutation to be and you can insert degenerate code into the sequence that represents different amino acids or different selections of them. It is also more specific compared to a normal sequence search where you are looking at the overall similarity, not necessarily matching the key parts of the sequence.

When performing a motif search enter your sequence code into the query box. The query box supports the input of degenerate characters.

On the right side of the screen, apply the appropriate filters and begin your search.

The results page will display sequence codes that pertain to the search query. Select the "Patent" tab to look at related patent results:

You can opt to view the patents on the analytics platform by clicking the "View in Analytics" button. Viewing patents on the analytics platform allows you to easily save the patents into a workspace and navigate through the patents using the tools available in the workspace.

Motif Search use cases:

Matching to sequence variants (Single nucleotide polymorphisms, alleles, CRISPR variants)
Searching for the degenerate sequence: a type of Sequence Markush (a number of sequences in patents have residues such as XXXX or NNN)
Searching for protein domains or sequence patterns (drug discovery)

Fragment Searching

The fragment searching function is used to search for sections of DNA, RNA or protein sequences. The tool allows for fragments to be searched by size and position.

Searching by Size

When searching by fragment use the Bio filters on the right-hand side of the screen to identify the minimum length of the search query.

Searching by Position

When searching by position and uploading multiple queries to the search box, make sure to identify the start and end position of each fragment using the “multi” Bio filter.

After you have inputted your search query and applied the appropriate bio filters on the right of the screen the search button will activate.

On the search results page, you have the option to filter your results using the filters on the left. Once you are happy with your filters, you can click on the "Patents" tab to view patent results:

from this page, you can click on "view in analytics" which will take all the results into Analytics where they can be stored and further manipulated or interacted with.

Degenerate Searching

A degenerate sequence refers to sequences in which some positions allow for multiple possible options For example, if we take the sequence ALRKD FTGEX:

The new degenerate search feature enables users to conduct searches using a specific sequence while retrieving results that contain ambiguous codes.

While BLAST does allow for searching sequences with degenerate codes or wildcards, there are certain drawbacks associated with using the conventional search method. If you would like to read more about these drawbacks, please check out this article.

Please note that this feature is exclusively available to Bio Professional users.

To access the Degenerate search, navigate to the home page of Bio where you will see it on the left-hand side:

You are able to find examples of sequences by clicking on the blue "Example" button:

This search is comparable to a regular sequences search in Bio, however, we are examining databases containing sequences with degenerate codes, and our algorithm conducts matching by utilizing indexing that correlates with the meaning of those degenerate codes. Once the search loads, you will notice some excellent examples of sequences that likely wouldn't have been identified through a standard blast search, for example, the following has multiple degenerates:

You can hover over the highlighted text shown in the image above to dive deeper into what code the 'X' in a specific position represents. You also have this view where you can check out the degenerate codes in different positions and see what they represent

The extensive database supporting this search type comprises a staggering 45 million degenerate sequences, carefully curated from patent sequence listing files. With this vast collection, you gain access to over a trillion possible sequences, enabling you to conduct comprehensive Freedom to Operate (FTO) or novelty searches with unmatched depth and breadth.

Modification Searching

The Chemical Modification Search is an innovative tool that eliminates the need for manual searching and cleaning of patents containing chemical modification data. By accurately capturing modification information, it helps reduce the risk of infringement. This tool can be accessed from the sidebar in the Bio Platform by clicking on "Modification". Please note that this feature is exclusively available to Bio Professional users.

To conduct a search, you can start by entering a sequence length between 3 and 1000, followed by selecting the location of the modification and modification type:

You also have the option to enter a query sequence, up to a total of 200 sequences:

Once you are satisfied with your input, you can click on "search" to proceed to the results page. On the results screen, you will find the hit sites and modification overview in the middle of the page. Here, only the top three are displayed for each site. Any redundant ones are shown in grey.

There is also a new "Hit modification" field and filter, as well as a "Modification browser." Clicking on the "Modification browser" allows you to view the summary report:

If you switch to the "Alignment view" it will display the hit modifications and sources:

After clicking on a sequence of interest, you will be redirected to the Sequence Detail page. Here, you will find details about Hit modifications, as well as information about the sources from which the hit modification data was retrieved.

Antibody searching

Antibodies are constructed by pairing heavy and light polypeptide chains. When conducting the antibody search, you must know whether your input is classified as a heavy chain or light chain. Within the polypeptide sequences, there are complementarity-determining regions (CDR). The CDR sequence is a region of the polypeptide chain that is more easily recognizable within the genetic code. To perform an antibody search you must know if your query is a CDR chain on the heavy chain region on the antibody.

1. To begin an antibody search, select Antibody from the available search options.

2. Enter your sequence in the appropriate search field.

- For CDR sequences you must identify the degrees of mismatch that can be identified in the results.
  - When inputting multiple chain sequences separate the sequences with space, in the appropriate search box.

A New field has been added to the Search page, allowing you to rename your search directly from the Search page itself. This streamlines the process and saves a lot of time and effort by eliminating the need to navigate to the Search history page.

3. After inputting your search query, identify which part of the literature you want to search for the inputted query.

4. Once your search query has been entered and preferences selected the search button will be activated.

5. The results page will then display your search settings, which you can hide by toggling the arrow:

Below are the counts of similar sequences, patents, and other literature that reference the searched sequences.

Below are the Common Documents charts which show the number of patents containing the sequences you have searched.

(All the charts on the sequences are labelled and contain a description underneath the title to define the purpose of the chart.)

6. Clicking on a highlighted number in your selected chart, will open the related sequences in another tab. Allowing you to view all the relevant sequences, patents, literature, and other documents.

7. When you click on a patent to view you will automatically be redirected to the analytics platform. Here you can use all the functionalities of the analytics platform to view and save the patents you are interested in.

Combined Search

The combined search allows the user to search a maximum of six queries at a time with a combination of inputs. The results will display patents that contain any combination of the searched query.

1. Navigate to the combined search option and begin inputting your queries.

Tip: You can customize the names and preferences of each query you input. You can also use different input types for each query (E.g. Amino acid sequence for Query 1 and Nucleotide input for Query 2).

2. Click search to navigate to the results page. On the top half of the result page, you will have data analytics of the result and the bottom half will generate a heat map of the patents containing relevant results. It should be noted that the Analysis charts are dynamic to help you quickly get to the key data of interest.

Tip: Hovering over the graphs will provide a breakdown of sequence and patent data for each query.

The heat map displays the number of matching sequences in each patent and the alignment identities for each sequence found. Using the legend on the right you can determine which patents have sequences with the highest alignment identity.

Tip: For patents with multiple sequences matched within a patent, hover your cursor over the number tile to get a breakdown of sequence data for all matched sequences.

3. To further review the patents click “View in Analytics”.

However, instead of or before going into Analytics, you are able to further refine your results using the filters on the left-hand side of the page, ultimately improving workflow efficiency. These filters are similar to those found in Analytics

In addition to the filters, there is also an option to narrow down your results to show patents that have the sequence in the claims section. You can do this by clicking on the 'In claims' checkbox and this filter will then retain patents only if the given sequence is mentioned in the resulting patent claims.

Finally, you also have the option to apply Patent family groupings to the resulting data set allowing you to display the most relevant family representative. this can be done by clicking on the small gear icon which will open up the following pop-up screen:

4. Once the results are taken to the Analytics platform they will be organized into families. At this point, all the functionalities of Analytics can be used to refine, save, and analyze patents.

5. After refining your results, they can be saved in a Bio enabled workspace.

Tip: The workspaces are now enabled to show sequence data from the bio platform when you click the Seq Code. Clicking view results from this pop-out box will redirect you to the Bio platform. To further analyze the patent click on the patent name or title.

Literature Combined Search

You can also use Combined search when searching for sequences in non-patent literature involving multiple queries. Particularly, you are able to utilize Combined Search in literature sources to
help identify prior art. In addition, you can also use filters on the left-hand side of the page such as keyword, literature source, journal, author, etc., to further narrow down results and improve workflow efficiency. To search for literature, make sure to select the check box next to 'literature' on the Combined Search home page, and then select the 'Literature' tab at the top of the page.

The display of the results is similar to the display when viewing patents, with the Analysis charts on the top of the page and the heat map below that.

Drug/gene Index

Within Patsnap Bio you can can identify sequences associated with a drug or gene using our drug/gene index search. To perform this search:

To perform a search identifying sequences associated with a drug or target of interest, select "Drug/Gene Index" from the available search options on the of the sidebar.

Search using a drug name, or the synonyms, or browse our drop-down directory available for a target search. You can search for up to three drugs or targets simultaneously.
The results page will provide a list of sequences associated with your drug(s) or target(s), an overview of the sequence details, and the number of patents, papers (non-patent sources) and other sources associated with each sequence. You can see the different data types by using the tabs at the top of the page.

Use the filters within the results screen to refine the sequences further. (TIPS: Using the filters on the left-hand side of your results screen, you can choose to refine your sequences, for example, by sequence length, whether it is an antibody or if it has been chemically modified through site-specific modifications, etc. If you are interested in performing an FTO search, you can utilize the "claimed in patents" filter to filter on a sequence level.)
Next, select the sequence codes of interest to gain an overview of the sequence itself, including the full sequence, primary, secondary, other associated names of said sequence as well as patent & literature documents associated with the sequence. (TIP: you can also click on the hyperlinked drug or targets within the results page to view its details. This will provide an overview of the drug or target of interest including the sequence information, synonyms, associated drugs, targets and diseases, regulatory approval information for drugs etc.)

Tip: At this point, you can filter your results using PatSnap provided patent filters or use the “In claims” button to see patent results where the sequence searched is only shown in the claims section of the patents found.

Once you are happy with the results found, clicking on the ‘View in Analytics’ button will take the results into the analytics platform.

At this point, all the functionalities can from the analytics platform be used on the patent results. For example, you can now apply keyword refinements to the patent results.

Once you have finalized all refinements on the patents found, the results can be saved in a bio-enabled workspace which allows you to keep track of the target sequences listed within each patent

Advanced preferences

Below is a description of the most commonly used advanced preferences available on the PatSnap Bio platform

Sequence Identity- This allows you to set what percentage match is allowed in your results. This includes the characters used in the sequence and alignment. The higher the percentage the closer the results should match your sequence.

Query Coverage- This specifies the percentage of your query sequence you wish to match. For example, if I search for a sequence that contains 100 amino acids and I would like to see sequences that match at least 70 amino acids of my query sequence then I would change the slider from 70% to 100%. This will not take account into positioning of the amino acids.

Match with Gaps- Allows for sequences to match and found in your results if the result sequences displays a gap rather than an amino acid.

E-Value - indicates how likely it is that a sequence is similar to yours simply by chance. For instance, if your sequence is very short, there is a higher likelihood that it appears in several locations simply by chance. The greater the e-value, the more likely it is that this is just down to luck.

Algorithm- There are 3 algorithms to choose from depending on the search you wish to perform.

MegaBlast

Great for comparing very similar sequences

Blastn

Standard nucleotide-nucleotide comparisons

Blastn-short

Optimized for sequences with fewer than 50 nucleotides

Search results Cap

Within Advanced Preferences, you will have the option to increase your search results by changing the 'Max Target Sequence'. The default selection will be 5000, but this can be changed to 1000 or 10,000. For results greater than 10,000, there will be a range selection option to View Sources.

Advanced Preferences Search Parameters

We covered 2 of the 6 search parameters you can change to get broader or more specific results. The other 4 are the following:

Subject Length - This is the length of the subject the system will look at to match your query against. You can use this parameter to limit your search results based on how long you want your subject to be.

Alignment Identity (%) - The Alignment Identity is a number that describes how similar the query sequence is to the target sequence (how many characters in each sequence are identical). The higher the percent identity is, the more significant the match.

Query Identity (%) - This is the percent of matching amino acids or nucleotides.

Subject Coverage (%) - This is the percentage of the subject sequence that matches the query sequence. If you would like the entire subject to be present in the query sequence, select 100%.

The following are some setting recommendations you might find helpful for different search types:

When the target sequence to be retrieved is similar in length to the query sequence, for example, when using Wild-type sequences to find mutant sequences, you might not want to get very short or very long sequence results. In this case, you can set Query Identity and Subject Coverage to 90-100:
When you want to get short target sequences by using a long sequence query, for example, using a gene sequence with thousands of base pairs to get a SiRNA or short fragments. In this case, you can set the Alignment identity to 95-100 and Query Coverage to 0-10. Along with this, you can also set a lower word size and use the BlastN algorithm:
When using short sequences to retrieve long sequences, for example, if you know the sequence of a linked peptide (GGGGSGGGGSGGGGSGGGGS), and want to find the long sequence containing the short peptide. In this case, you can use an Alignment Identity of 95-100 and Subject Coverage of 0-10. Along with this, you can use the Blastn-short algorithm:
To conduct a broad search, it is recommended to use the default settings so that you don't miss out on certain results and also to match with gaps:

Previous searches

Users may have many searches in their search history relating to different projects and may wish to delete some of these searches. It is inconvenient to go through and delete each search one by one. Therefore, tick boxes have been added to easily select multiple searches at once from 'My Searches' to batch delete historical records. This removes the manual effort of deleting each search individually and saves users time when organizing their work.

You can access the 'My Searches' page on the search/home page of Bio by clicking on the 'My Searches' tab on the left of the screen, as seen in the GIF above.