A New Way of Searching and Identifying Protein or Nucleotide Sequences – Patsnap Help Center

You may be wondering what advantages Patsnap has over free tools like Blast. There are two major advantages. Firstly, the data. Patsnap has the world's most comprehensive sequence database, which includes sequences from 73 pattern jurisdictions, manual curation sequences from our partners at CAS, and data extracted from text and patent images using AI (Artificial Intelligence) and OCR (Optical Character Recognition) technologies. We believe that data is crucial for search functionalities. Secondly, Patsnap's bio platform seamlessly connects and integrates with our patent data, allowing for easy sharing across the organization, which will be discussed in more detail later.

Now that we understand why the Bio platform is superior to free tools, let's take a closer look at some of its major features and how to conduct a general search. While sequence searching may seem complex initially, it is actually easier to become proficient in compared to keyword searching. This is because sequence searching can be broken down into logical principles that can be applied to tackle even complex searches. That's why sequence searching is a secret weapon for scientists and attorneys, enabling them to obtain highly accurate and robust results quickly and efficiently.

So, let's take a look at the following Nucleotide sequence as our example:

To begin your search, go to the sequences tab and enter either a nucleotide or protein sequence into the large search box. You can enter up to 200 sequences at a time. There are several important aspects to note in this process. Firstly, Bio allows for searching across protein or nucleotide databases, or both simultaneously, which is extremely useful.

Secondly, the advanced preferences in Bio offer various alignment parameters, of which three are particularly noteworthy and that you might want to play around with.

The first parameter is subject length, which controls the length of sequences retrieved, whether they are short or long, and can be set from zero to infinity. The second parameter is alignment identity, which is a percentage similarity measure. The optimal threshold for alignment identity depends on the technology area being studied. For instance, in some areas, a minimum of 80% alignment identity may be required to avoid noise in the dataset, while for enzymes from different species, a lower alignment identity of around 30% may be acceptable.

The third parameter, query coverage, is a measure of the degree of overlap between the query sequence and the retrieved sequences. For example, setting query coverage to 60% means that any retrieved sequence must contain at least 60% of the query sequence.

After you are happy with your settings, you can go ahead and run your search which will then take you to the results page. You will see a lot of information on this page, however, some key areas to note are the refinements on the left-hand side, sequence alignment information in the middle of the page, and the variation report and variation filter buttons:

The variation report is another feature that you might want to play with and goes beyond just showing the percentage similarity, as it also provides insights into the specific locations where variations occur. This allows you to identify the points with the most sequence variations in comparison to the wild type. By examining these variations in more detail, you can potentially uncover valuable information that can inform future iterations. For instance, by manually clicking through these results, you may discover specific properties or effects associated with a particular variation or change. This knowledge can be used to guide future mutagenesis projects, research and development efforts, or applications in specific contexts. Such insights gained from the variation report can be highly valuable in informing decision-making and driving progress in various scientific and technical endeavors. More information on the variation report can be found here

Another key feature you will see on the results page is “Set Alert”, through which you have the ability to set up sequence-level alerts.

We are currently the only company in the market offering this feature. With sequence-level alerts, you can set specific query parameters, and then receive regular email notifications, either weekly or monthly, that inform you not only about any newly discovered sequences but also any relevant scientific literature or other references that have emerged. This feature allows you to stay updated on the latest developments in your field of interest and ensures that you are well-informed about any relevant findings or research that may impact your work.

Coming back to the refinements section on the page, this is where you will be able to narrow down your search even further and go from for example 5000 results all the way to 21 results by using the ‘Chemically modified’ and ‘Claimed in Patents’ refinements:

Once you have conducted your search and are happy with your sequence results, you can click on the "Patents" tab:

If you go to the "Analysis" tab, there is a Venn diagram that summarises how many sequences come from which source.

Furthermore, for sequences published in patents, you can filter to display sequences that only appear in the patent claims using our “in claims” filter.

Once you are happy with your results you can save them to a workspace to view later, using the “save” button in the top right-hand corner, or selecting specific sequences and then clicking “save to workspace” in the bottom right-hand corner.

More information about our sequence management platform can be found here.

Alternatively, instead of saving your patents into a workspace, you can continue your analysis by bringing your results into analytics. You can do this by clicking the "view in analytics" button in the top right-hand corner, or by selecting specific sequences and then clicking "view in analytics" in the bottom right-hand corner.

This will then transfer the data across Bio and into Analytics, displaying the patents which contain your sequences.

From here you can refine your search further using our advanced refinement tool in the top left-hand corner if you wish. This will allow you to filter patents further via keywords, date, assignee, legal status, and much more.

Once you are happy with your patent results you can click into a patent and enter patent view. From here you can view patent information such as the full text, application date, publication date, estimated expiry date, and more. Additionally, you can access a tool called the “sequence assistant”.

This tool scans the patent document and will highlight the matching hit sequences and shows the alignment. This makes reading and understanding the patent as easy as possible.

More information about the sequence assistant can be found here

Graphical user interface, text, application

Description automatically generated

Finally, you can save any relevant patents to a workspace by clicking “save to workspace” in the top right-hand corner.

Related articles