QIAGEN powered by

Latest improvements for QIAGEN CLC Genomics Workbench early access

  Current line         Previous line          Archive

QIAGEN CLC Genomics Workbench early access 21.0

Release date: 2020-10-22

These are the draft release notes for CLC Genomics Workbench 21.0, due for release on January 12, 2021.

The draft manual is available in PDF format and HTML format.

Installers for this product are available as “early access” via links at the bottom of this page. These products are not supported, and we recommend they are not used in production during the early access period.

To download a commercial license for this product, you must have a license covered for Maintenance, Upgrades and Support (MUS) until on or after January 12, 2021. . A 2 week evaluation license is available via the License Manager within the software.

New features and improvements

Full workflow support for Sanger sequence analysis

New features have been introduced, and improvements made, to support automated analyses of Sanger trace data using workflows.

Trim Sequences
  • Trim Sequences can be used in workflows.
  • Trim Sequences can be run on the CLC Genomics Server.
  • A new sequence element containing the trimmed sequences is output. Previously, the input was modified and saved.
  • A report can be generated containing a summary of the number of reads trimmed and the reasons for the trimming. This report is supported by the Combine Reports tool.
  • The UniVec database used in this tool has been updated to version 10.0 of UniVec_Core.
Other improvements supporting trace data analysis in workflows
  • Trace data can be imported using on-the-fly import in workflows.
  • Improved output naming by the Assemble Sequences to Reference and Assemble Sequences tools: The sample name is included in the file name and the sequence names in the output.
  • Metadata-based naming is supported in workflows run in batch mode or with Iterate control flow elements through the use of new placeholders: {metadata} and {metadata:<columnname>}.
  • The Secondary Peak Calling tool no longer modifies the input data element, but instead produces new elements as output. Note: This change requires that the workflows with this tool that were created in older versions of the software must be manually updated. The old workflow element must be replaced by a new one. The recommended upgrade path for installed workflows containing the Secondary Peak Calling tool is to save a copy of the workflow in the Navigation Area using a version CLC Genomics Workbench 20.x, and then open and manually update that workflow in CLC Genomics Workbench 21.0. The new workflow can then be installed, if desired.
New tools
  • Create Sample Report creates a summary report of selected information from multiple reports relating to a single sample. Specific types of information can be specified for inclusion in the Quality Control section.
  • Extract IsomiR Counts extracts information from the read mappings of each miRNA or other custom added database type, e.g. piRNA etc, and collects the information across all mappings in a table that can be exported.
  • Annotate with Repeat and Homopolymer Information adds annotations to variants by appending two new columns with information about repeat and homopolymer status.
  • Merge Variant Tracks merges multiple variant tracks into a single track. Options are available for appending annotations from overlapping variants.

Extract IsomiR Counts, Annotate with Repeat and Homopolymer Information and Merge Variant Tracks were previously available via the Biomedical Genomics Analysis plugin.

Workflow related
  • When a workflow with Export elements is run in batch mode, the exported files from each batch run can be saved to separate folders.
  • BED and VCF format files can be imported on-the-fly in workflows.
  • On-the-fly import can be used without metadata when running workflows in batch mode, and when running workflows containing a single Iterate element.
  • Name placeholders for output elements and export elements have been updated, and the naming of outputs of workflows run in batch mode can be more finely controlled.
  • Improvements for Workflow Input elements:
    • Workflow Input elements can be configured to limit the data input method to either selection of data elements from the Navigation Area or selection of files to be imported using on-the-fly import. The default is to allow the input method to be chosen when launching the workflow.
    • Workflow Input elements can be configured to limit the on-the-fly import types available when launching the workflow. Parameters of selected importers can also be locked or unlocked, as desired, defining whether the setting is configurable when launching the workflow.
  • Additional configuration options for Iterate and Collect and Distribute workflow elements are available.
  • When a workflow with Iterate elements is run with the “Batch” checkbox checked, the “Batch identifier” column in the Workflow Result Metadata table will contain the combined batch identifier, reflecting all levels of batching and iterations.
  • The following tools are available to be included in workflows:
Performance improvements
Working with tables
  • Column order can be adjusted when viewing tables, and the revised column order will be respected when exporting the open table to, for example, csv or excel format files. (Intention to link from “Column order” to Working_with_tables.html)
  • Tables in reports can be opened in a new tab: right click -> Open Table.
  • Tables can be exported using a right-click option: “File” -> “Export Table”. The export takes into account filtering, ordering and deselection of columns.
Export
  • Exported files can be saved into subfolders of the selected output area by using a forward slash character / at the start of the custom file name definition.
  • Graphics export of Tracks, Track lists, Sequences, Alignments and Read mappings is supported as a standard export, which can be embedded into workflows and executed on a CLC Genomics Server. This feature is intended for high-throughput applications. For other applications, we recommend the existing graphic export tool.
  • The naming pattern for files exported using the fastq exporter has been updated to be in line with the naming format the Illumina importer expects. The exported file names now end with “_R1.fastq” and “_R2.fastq”. Previously the extension used was “.R1.fastq” when exporting a single file, if pairs where exported to two files, the second file had the extension “.R2.fastq”. (The first “.” in the original naming has been replaced by an “_”).
  • Export VCF has been updated:
    • It supports the export of CNV and fusion data.
    • If multiple elements have been selected for export, there is an option for exporting them to a single file.
    • It uses the value “.” to represent missing variant annotations.
    • Special characters in variant annotations are exported using percent encoding, as specified in VCF 4.3.
Illumina importer
  • The “Paired reads” option is enabled by default.
  • Improved validation when the “Paired reads” option is enabled,. The names of the pairs of files are validated as follows:
    • If the file names follow the Illumina naming format, the two files are required to have the same sample name and lane
    • If the file names do not follow the Illumina naming format, but _R1/_R2 is detected in the names, the first file must contain _R1 and the second file must contain _R2.
    • If the “Join reads from different lanes” option is enabled, the detected lane, in the format _L001, must be the same for both files.
    • If a pair of files does not meet the requirements above, a message is printed in the log and the pair of files is skipped.
  • Improved naming of the imported elements:
    • If the imported files follow the Illumina naming format, the imported elements no longer contain the _R1_001 suffix.
    • Otherwise, if _R1 / _R2 is detected in the names of the files, it is removed from the name of the imported elements.
Create Protein Report

Updates have been made relating to the BLAST functionality in Create Protein Report:

  • The default expect value (e-value) for BLAST searches at the NCBI is 0.05, aligning with the defaults used at the NCBI.
  • The top 10 BLAST alignments are included in the report, where previously it was the top 100. The full BLAST report continues to be available by clicking on the results in the report, and the full BLAST hit table continues to be included in the report.
  • Results of searches against local sequences or databases can no longer be included in the report. (The standard BLAST tool remains available for running local searches.)
Local Realignment
  • A restriction has been removed from Local Realignment that prevented paired reads from being realigned when that realignment would change which read was left-most on the reference. The overall effect of this change is to increase the likelihood of detecting insertions in rare cases.
  • Improvements have been made when realigning large insertions at the beginning of reads.
  • The “Allow guidance insertion mismatches” and “Maximum guidance-variant length” options are enabled only when a guidance-variant track is provided.
  • Fixed an issue that caused reads with unaligned ends stretching over a chromosome boundary to be removed from the mapping.
QC for Targeted Sequencing
  • A new option in QC for Targeted Sequencing allows a custom list of coverage levels to be specified.
  • The report includes the complete set of chromosomes in the “Targeted region overview” section when using references with up to 200 chromosomes. Previously the limit was 100 chromosomes. This change means the hg38_no_alt_analysis_set reference data set, available from the Reference Data Manager, is now supported.
  • The report has been extended with values reporting the number and percentage of base positions in target regions with coverage above or equal to the minimum threshold.
Working with a CLC Server
  • The CLC Server connection dialog will present information like the version and port of the selected CLC Server prior to login, when that information is available.
  • The CLC Server connection dialog will close automatically after the “Log In” button is clicked. The login process runs in the background, indicated by a flashing server icon in the lower left corner of the Workbench.
  • When the Workbench loses connection to a CLC Server, it attempts to reestablish the connection. Open views of data stored on the CLC Server are not closed.
  • When selecting files stored on a CLC Server, only files with the relevant extensions are listed, together with their date of last modification and the size of the files.
Other improvements
  • Improved the alignment quality for read mappings by removing aligned ends with an alignment score of zero. As a result, some alignments will be shorter and may be filtered away because they no longer pass the minimum length fraction criterion. Tools benefiting from this change include Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
  • Option names and other information in the wizards for the Trim Reads tool and the corresponding workflow element have been updated for clarity and consistency.
  • De Novo Assembly reports can be used as input to the Combine Reports tool.
  • A new option, “Filter on average expression for FDR correction” is available in Differential Expression for RNA-Seq and Differential Expression in Two Groups. When checked, automatic, independent filtering prior to FDR correction is carried out, with the aim of increasing power.
  • A Chromosome Table View is available for tracks and track lists, providing a chromosome-level summary of the data contained in the track or track list.
  • Stand-alone Read Mapping, Contig and BLAST Graphics views support wrapped sequence layouts. The relevant option is available in the side panel. This may be of particular interest when working with Sanger trace data.
  • Reference data downloaded via Download Genomes includes the version number as part of the name.
  • The behavior of track views when making selections in the neighborhood of insertions has been improved.
  • Import Metadata uses the name of the imported spreadsheet when naming the resulting metadata table.
  • The element History view has been updated, and its performance has been improved when handling many history entries.
  • When hovering the mouse cursor over a Sequence List in the Navigation Area, the tooltip includes information about the sequencing platform, if this information is available.
  • Improved the rendering of annotations in the Export Graphics tool when used with the “Export whole area” option
  • When configuring the Demultiplex Reads tool, tags can be moved up and down.
  • The list of file types automatically associated with the Workbench has been updated to only include CLC files (.clc). On Mac OS only, the Workbenches would previously be associated with the set of file types that can be imported using the ‘Standard Import’ tool. The Workbench can still be associated to any given file type using the standard tools of the operating system in question.
  • Annotate with Overlap Information and Filter Based on Overlap count insertions and zero-length annotations as overlapping a region when they overlap either border. E.g. when an insertion is right on the border of a gene, we say that the insertion overlaps the gene.
  • Data from the BGISEQ platform is supported for download using the Search for Reads in SRA tool.
  • The SRA toolkit has been updated to version 2.10.7.
  • Plots and tables generated by QC for Sequencing Reads have better usability, especially when working with long reads. Tables with more than 500 data points now show the first 100 entries and then bin remaining data points, based on range. In graphs, end positions with a coverage below 0.005% across the reads are not included.
  • QC for Sequencing Reads reports the QC metrics separately for different read types found in the input data: unpaired reads, R1 reads and R2 reads.
  • In Quantify miRNA, the minimum value for the setting “Minimum sequence length”, used for seed counting, has been changed to 8. (The seed is a 7 nucleotide sequence from positions 2-8 on the mature miRNA.)
  • The Quantify miRNA outputs, “Grouped on mature” and “Grouped on seed tables”, contain links to miRBase.
  • A new section has been added to the Call Methylation Levels report containing details of read conversion and direction.
  • Remove Duplicate Mapped Reads outputs reads in a deterministic order.
  • The “Reads trimmed (%)” column in the “Trim summary” section of the Combine Reports output has been removed as it was a duplicate of the “Reads after trim (%)” column.
  • Custom attributes can be configured in a data location such that attribute values are not copied when copying data elements.
  • Data locations can be added when using the Workbench in Viewing Mode.
  • Various minor improvements

 

Bug fixes

  • Fixed an issue affecting Filter on Custom Criteria when included in a workflow with the filtering step option unlocked. If criteria were updated, added, or removed filter in the launch wizard, the updated criteria were not used in the first run of the workflow with these updated values. Instead, the old criteria were used in that run. In subsequent runs, the updated values were used.
  • Fixed an issue in Filter on Custom Criteria where, after loading annotations in the wizard, comparison operators for existing filter criteria would be set to defaults, while the original values of those operators, configured before the annotations were loaded, are actually used in the analysis.
  • Fixed an issue affecting read mappings where a short deletion was preferred to a mismatch for equal scoring alignments. Tools benefiting from this change include Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
  • Fixed an issue in Trim Reads where length filters were applied before automatic read-through adapter trimming was done, if it was enabled. This could result in reads shorter than minimum length settings being included in the output.
  • Fixed an issue affecting Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection, where forward coverage or reverse coverage could be reported as being higher than it was when looking for very low frequency variants with very low minimum count values.
  • Fixed an issue affecting annotations of restriction sites for enzymes cutting within the recognition site, where the arms indicating the cut site spanned large sequence regions, instead of indicating the cut site.
  • Fixed an issue where the Search for Sequences at NCBI tool would occasionally return fewer rows than configured in the ‘Number of hits’ preference setting.
  • Fixed an issue where IonTorrent SAM files with special characters in the sample name could not be imported to separate folders.
  • Fixed an issue where Map Reads to Reference could occasionally ignore reads when encountering a read with an unaligned end that wraps twice around a chromosome.
  • Fixed an issue in Quantify miRNA where the isomiRs associated with a reference mir-rna were not all consistently named using the miRbase isomiR nomenclature (http://www.mirbase.org/help/nomenclature.shtml).
  • Fixed an issue where the unmapped reads output by Quantify miRNA could not be connected to nucleotide sequence list input channels, like the Sequence Reads input channel of the RNA-Seq Analysis element.
  • Fixed an issue in Create Heat Map for RNA-Seq affecting the “Fixed number of features” option, where one member of the set of most variable genes or transcripts was missing from those used in the analysis, with a slightly less variable feature included instead.
  • Fixed an issue in Create Heat Map for RNA-Seq, where the “Filter by statistics” option could not be used with miRNA expression data.
  • Fixed an issue in Create Heat Map for RNA-Seq, where the history of heat maps did not include the name or the version of the tool used.
  • Fixed an issue where RNA-Seq Analysis failed if a read mapped across 2 exons of a gene, where those 2 exons spanned the origin of a chromosome.
  • Fixed an issue where RNA-Seq Analysis failed if a gene or mRNA spanned the origin of a chromosome and that chromosome was marked as linear. We now ignore these mRNAs.
  • Fixed an extremely rare issue where RNA-Seq Analysis could fail when the positions of genes (or transcripts) were defined with respect to a sequence that was not part of the genome. An example of this kind of annotation is the remote entry identifier allowed by GenBank flat file format, see http://www.insdc.org/files/feature_table.html#3.4 These genes and transcripts are now filtered away prior to the tool being run.
  • Fixed an issue that caused Combine Reports to occasionally fail when combining reports with summary information shown as plots.
  • Fixed an issue with Combine Reports where, when combining RNA-Seq reports, warning messages for the “Distribution of biotypes” section could be present when they should not have been.
  • Fixed an issue in the Navigation Area that caused move operations to be converted to copy operations if the Navigation Area was refreshed before the move was completed.
  • Fixed an issue where “Reset tree topology” in the phylogenetic tree editor could fail in some cases when input sequences were extracted from (at least) two different trees.
  • Fixed an issue where a wrongly formatted VCF file could make the VCF importer terminate instead of writing the error to the log.
  • Fixed an issue where Transcription Factor ChIP-Seq would exit with an error when given a read mapping with a circular reference sequence with coverage across all bases.
  • Fixed an issue affecting the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools, where complex indels were reported in regions where the reference had a sequences of Ns. This error was introduced in CLC Genomics Workbench 20.0.2.
  • Fixed an issue that could cause De Novo Assembly to occasionally fail when assembling paired data with both the “Auto detect paired distances” and “”Map reads back to contigs (slow)” options enabled.
  • Fixed an issue with links to HGNC in gene tracks imported from GFF3 files using “Import Tracks from File” and in some Refseq gene tracks provided via the Reference Data Manager.
  • Fixed an issue where external files could not be opened on Windows Server 2019.
  • Fixed an issue where, on Mac OS, clicking CLC URLs (“clc://…”) would open an older version of the Workbench even after installing a new version.
  • Fixed an issue where a path specified in the path.properties configuration file was not properly interpreted if it was specified in ‘Windows syntax’, e.g. “x:\myDrive\temp”
  • Fixed an issue in Excel importer, where the presence of certain formulas would previously prevent successful import.
  • Fixed an issue where, if BLAST at NCBI failed with an error, no error would be shown and instead no hits were returned.
  • Fixed an issue where Track Lists sometimes could not display reference data.
  • Fixed an issue where the Ctrl+F keyboard shortcut did not activate the ‘Find’ side panel when viewing a track list.
  • Fixed an issue where some workflows using a Collect and Distribute element with multiple output channels did not pass the correct inputs to a tool after the Collect and Distribute element.
  • Fixed an issue where compressed data elements sometimes appeared to have “Compression enabled: No” in the Element Info tab.
  • Fixed an issue that caused the updating of plugins containing workflows (e.g. Biomedical Genomics Analysis, CLC Microbial Genomics Module) to be slow.
  • Various minor bug fixes

Changes

Other changes
  • The Java version bundled with CLC Genomics Workbench 21.0 is Java 11.08, where we use the JRE from AdoptOpenJDK.
  • The read mapping tool used by various tools in the CLC Genomics Workbench (e.g. Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference) has been updated for this release and corresponds to the version in CLC Assembly Cell 5.2.1. Other binaries are unchanged and continue to correspond to the versions in CLC Assembly Cell 5.1.1.
  • The default base name for the element being exported is designated using the placeholder {name}, instead of {input}. The numeric equivalent, {1}, is unchanged. The default export naming pattern has correspondingly been changed to {name}.{extension}. (GxS notes only, add the following: This change also applies to exports configured in External Applications.) Previously {input} was used.
  • The default expect value (e-value) for BLAST at NCBI is 0.05 and the maximum number of hits is 5000, aligning with the defaults used at the NCBI.
  • Changes have been made to the handling of sequence identifiers when using Create BLAST Database. This change allows continued flexibility in the naming of sequences used for making these databases, avoiding direct exposure to limitations present in the underlying BLAST+ program, makeblastdb, such as not allowing long or duplicate sequence names. Further details are provided in our FAQ area.
  • The option “Reports originate from a single sample” has been removed from the Combine Reports tool. For generation of a single sample combined report, please use the new Create Sample Report tool.
  • The “Chromosome M name” option in Trio Analysis has been renamed to “Chromosome MT name”, with default value “MT” instead of “M”.
  • The creation of Workflow Result Metadata tables is optional when running workflows on the CLC Genomics Server.

Functionality retirement

Tools

  • Reverse Sequence

Plugin notes

Plugin retirements

  • The Ingenuity Variant Analysis plugin has been retired. The Ingenuity Variant Analysis service has been replaced by QCI Interpret Translational. Please email ts-bioinformatics@qiagen.com if you need further information about this.
  • The Advanced Structural Variant Detection (beta) plugin has been retired. An improved tool, Structural Variant Caller, is available in the Biomedical Genomics Analysis plugin.
  • The functionality of the External Applications Client Plugin is now built into CLC Workbenches, so this plugin has been retired.

Early Access installers

These products are not supported, and we recommend they are not used in production during the early access period.

Due to issues identified in version 21.0, the version now available for for early access use is 21.0.1.