This function searches for NCBI records corresponding to the species found in the argument `ncbi_tax`, i.e. the output from the function `get_ncbi_taxonomy`.
It can also search for accession numbers and return the same type of object.
For a thorough explanation of the function usage and capabilities, see the
'Introduction to the barcodeMineR package' vignette:
vignette("Introduction to the barcodeMineR package", package = "barcodeMineR")
Usage
download_ncbi(
ncbi_tax = NULL,
ncbi_ids = NULL,
rate_xml = 200,
rate_fasta = 100,
default.filter = TRUE,
filter = NULL,
api_rate = NULL,
ask = TRUE,
prefix = NULL
)
Arguments
- ncbi_tax
`data.frame` A data frame, as returned from the `get_ncbi_taxonomy()` function.
- ncbi_ids
`character` A character vector with NCBI accession numbers.
- rate_xml
`integer` The number of xml objects to be downloaded at a time. It can be lowered for unstable internet connections. Defaults to `200`.
- rate_fasta
`integer` The number of fasta sequences to be downloaded at a time. Many fasta can correspond to mitogenomes and chromosomes, which may lead to an errors if downloaded in great numbers. Defaults to `100`.
- default.filter
`logical` Whether to filter the records excluding whole genome shotgun sequences and transcribed shotgun assembly. Defaults to `TRUE`.
- filter
`character` an additional query filter in the form of a/multiple string/s to add to every searched taxid. This will allow any user to specifically filter every search with a custom query. Multiple strings should be provided in the form of a character vector of single query filters (see description for details). Defaults to `NULL`.
- api_rate
`integer` The API rate with which to iterate each separate request. Must be a number between 3 and 10 which will translate in a rate of `1 / api_rate` seconds.
- ask
`logical` Should the function ask the user whether to filter the final output for taxonomic ranks. Default `TRUE`.
- prefix
`character` A character string that will be used to create numbered custom ids for each record in ascending order. The prefix will compose the recordID field in the final object. Default to `NULL`, using the internal recordID generator that will use the accession number for NCBI records and the processID for BOLD records, avoiding duplicates by adding `_1`, `_2` etc.
Examples
tax <- get_ncbi_taxonomy("Polymastia invaginata")
download_ncbi(tax, ask = FALSE)
#> # A tibble: 16 × 30
#> recordID markerCode DNA_seq phylum class order family genus species source
#> <chr> <chr> <DNA> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 HG423800.1 28S rRNA TAGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 2 HG423799.1 28S rRNA TAGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 3 HG423770.1 28S rRNA ACACGG… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 4 HG423769.1 28S rRNA ACACGG… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 5 HG423740.1 28S rRNA TTAAGC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 6 HG423739.1 28S rRNA TTAAGC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 7 HG423712.1 COI GACTCT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 8 HG423711.1 COI GACTCT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 9 LN850239.1 COI GTATGT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 10 LN850219.1 COI GTATGT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 11 LN850218.1 COI GTATGT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 12 LN606560.1 28S rRNA TAGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 13 LN606530.1 28S rRNA ACACGG… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 14 LN606500.1 28S rRNA TTAAGC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 15 LN606462.1 COI GACTCT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> 16 AY561922.1 28S large … CGGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI
#> # ℹ 20 more variables: lat <dbl>, lon <dbl>, lengthGene <int>, sampleID <chr>,
#> # QueryName <chr>, identified_by <lgl>, taxNotes <lgl>, db_xref <chr>,
#> # sourceID <chr>, NCBI_ID <chr>, institutionStoring <lgl>,
#> # collected_by <lgl>, collection_date <chr>, altitude <lgl>, depth <lgl>,
#> # country <lgl>, directionPrimers <lgl>, lengthSource <int>,
#> # PCR_primers <lgl>, note <lgl>