This can be used for a variety of applications the most common ones are:
removing sequences from the host
removing ribosomal sequences
removing contaminants
This function uses minimap2 to align and identify hits and does not require a prebuilt index.
remove_reference(reads, out, reference, alignments = NA, threads = 3)
A character vector containing the read files in fastq format.
Can be generated using find_read_files
.
A folder to which to save the filtered fastq files.
Path to a fasta file (can be gzipped) that contains the sequences to filter. Can be a genome or transcripts.
Whether to keep the alignment. If not NA should be a string indicating the path to the output bam file.
How many threads to use for mapping.
A numeric vector with two entries. The number of sequences after filtering (non-mapped), and the number of removed sequences (mapped).
NULL
#> NULL