Download records from the NCBI

This function searches for NCBI records corresponding to the species found in the argument `ncbi_tax`, i.e. the output from the function `get_ncbi_taxonomy`.

It can also search for accession numbers and return the same type of object.

For a thorough explanation of the function usage and capabilities, see the 'Introduction to the barcodeMineR package' vignette: vignette("Introduction to the barcodeMineR package", package = "barcodeMineR")

Usage

download_ncbi(
  ncbi_tax = NULL,
  ncbi_ids = NULL,
  rate_xml = 200,
  rate_fasta = 100,
  default.filter = TRUE,
  filter = NULL,
  api_rate = NULL,
  ask = TRUE,
  prefix = NULL
)

Arguments

ncbi_tax: `data.frame` A data frame, as returned from the `get_ncbi_taxonomy()` function.
ncbi_ids: `character` A character vector with NCBI accession numbers.
rate_xml: `integer` The number of xml objects to be downloaded at a time. It can be lowered for unstable internet connections. Defaults to `200`.
rate_fasta: `integer` The number of fasta sequences to be downloaded at a time. Many fasta can correspond to mitogenomes and chromosomes, which may lead to an errors if downloaded in great numbers. Defaults to `100`.
default.filter: `logical` Whether to filter the records excluding whole genome shotgun sequences and transcribed shotgun assembly. Defaults to `TRUE`.
filter: `character` an additional query filter in the form of a/multiple string/s to add to every searched taxid. This will allow any user to specifically filter every search with a custom query. Multiple strings should be provided in the form of a character vector of single query filters (see description for details). Defaults to `NULL`.
api_rate: `integer` The API rate with which to iterate each separate request. Must be a number between 3 and 10 which will translate in a rate of `1 / api_rate` seconds.
ask: `logical` Should the function ask the user whether to filter the final output for taxonomic ranks. Default `TRUE`.
prefix: `character` A character string that will be used to create numbered custom ids for each record in ascending order. The prefix will compose the recordID field in the final object. Default to `NULL`, using the internal recordID generator that will use the accession number for NCBI records and the processID for BOLD records, avoiding duplicates by adding `_1`, `_2` etc.

Value

`data.frame` A refdb data frame, including the DNA sequence as a field.

Examples

tax <- get_ncbi_taxonomy("Polymastia invaginata")

download_ncbi(tax, ask = FALSE)
#> # A tibble: 16 × 30
#>    recordID   markerCode  DNA_seq phylum class order family genus species source
#>    <chr>      <chr>       <DNA>   <chr>  <chr> <chr> <chr>  <chr> <chr>   <chr> 
#>  1 HG423800.1 28S rRNA    TAGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  2 HG423799.1 28S rRNA    TAGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  3 HG423770.1 28S rRNA    ACACGG… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  4 HG423769.1 28S rRNA    ACACGG… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  5 HG423740.1 28S rRNA    TTAAGC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  6 HG423739.1 28S rRNA    TTAAGC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  7 HG423712.1 COI         GACTCT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  8 HG423711.1 COI         GACTCT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#>  9 LN850239.1 COI         GTATGT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> 10 LN850219.1 COI         GTATGT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> 11 LN850218.1 COI         GTATGT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> 12 LN606560.1 28S rRNA    TAGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> 13 LN606530.1 28S rRNA    ACACGG… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> 14 LN606500.1 28S rRNA    TTAAGC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> 15 LN606462.1 COI         GACTCT… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> 16 AY561922.1 28S large … CGGCCC… Porif… Demo… Poly… Polym… Poly… Polyma… NCBI  
#> # ℹ 20 more variables: lat <dbl>, lon <dbl>, lengthGene <int>, sampleID <chr>,
#> #   QueryName <chr>, identified_by <lgl>, taxNotes <lgl>, db_xref <chr>,
#> #   sourceID <chr>, NCBI_ID <chr>, institutionStoring <lgl>,
#> #   collected_by <lgl>, collection_date <chr>, altitude <lgl>, depth <lgl>,
#> #   country <lgl>, directionPrimers <lgl>, lengthSource <int>,
#> #   PCR_primers <lgl>, note <lgl>

Usage

Arguments

Value

See also

Examples