2025-sourmash-ncbi-viral-databases

Build infrastructure for creating sourmash databases from NCBI for all viral genomes.

The strategy is:

use the NCBI datasets API to retrieve accessions and taxids for all the viral genomes in NCBI; create a lineage CSV for them.
use the directsketch plugin to build skip-mer sketches for all of them: -p skipm2n3,k=24,scaled=50

How to install taxdump for taxonkit

You'll need to install the NCBI taxonomy taxdump for taxonkit, which is used by pytaxonkit.

In some stable directory,

wget -c ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxvf taxdump.tar.gz
mkdir -p $HOME/.taxonkit
ln -s names.dmp nodes.dmp delnodes.dmp merged.dmp $HOME/.taxonkit

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2025-sourmash-ncbi-viral-databases

How to install taxdump for taxonkit

About

Releases

Packages

Languages

License

sourmash-bio/2025-sourmash-ncbi-viral-databases

Folders and files

Latest commit

History

Repository files navigation

2025-sourmash-ncbi-viral-databases

How to install taxdump for taxonkit

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages