CDesk also provides data download tools. This chapter shows how to use CDesk to download the data and some tips for you while using CDesk piepeline.
CDesk tools download module enables high-speed, parallel downloading of SRR/ERR data using Aspera. Here is an example about how to use the CDesk tools download module to download SRR/ERR data.
CDesk tools download \
-i /.../input.txt -o /.../output_directory \
--key /.../keyfile
| Parameters(*necessary) | Description | Default value |
|---|---|---|
| -i,--input* | The input sample list file | |
| -o,--output* | The output directory | |
| -k,--key | The ASCP Key Path | |
| -t,--thread | The number of threads to use | 5 |
If the pipeline runs successfully, there would be sample output folders saving the fastq.gz files.
Checking required tools... 1 SRR/ERR samples in total 📥 Downloading SRR35012809 ⏳ Waiting for all downloads to complete... SRR35012809_1.fastq.gz 100% 1618MB 93.8Mb/s 04:26 SRR35012809_2.fastq.gz 100% 1592MB 49.4Mb/s 10:45 Completed: 3287242K bytes transferred in 645 seconds (41688K bits/sec), in 2 files, 1 directory. ✅ All downloads completed!
SRR5489315 SRR35012809 An input SRR/ERR file list
CDesk tools prepare module provides functionality to generate the input CSV file containing sequence data information for subsequent preprocessing analysis. Here is an example about how to use the CDesk tools prepare module.
CDesk tools prepare \
-i /.../input_directory -o /.../output_directory \
--pair _R1.fq.gz,_R2.fq.gz --single _single.fq.gz
| Parameters(*necessary) | Description | Default value |
|---|---|---|
| -i,--input* | The input directory | |
| -o,--output* | The output directory | |
| --pair | The ASCP Key Path | |
| --single | The number of threads to use |
If the pipeline runs successfully, there would be a csv file recording the sequence data information in the output directory which could be used as the input csv for the following preprocess analysis.
Files in the input directory: sample1_R1.fq.gz sample1_R2.fq.gz sample2_R1.fq.gz sample2_R2.fq.gz Sample_single.fq.gz command: CDesk tools prepare \ -i /.../input_directory -o /.../output_directory \ --pair _R1.fq.gz,_R2.fq.gz --single _single.fq.gz output csv: sample,fq1,fq2,ports Sample,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/Sample_single.fq.gz,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/Sample_single.fq.gz,1 sample2,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample2_R1.fq.gz,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample2_R2.fq.gz,2 sample1,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample1_R1.fq.gz,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample1_R2.fq.gz,2
Here are some trouble you might encounter while using CDesk pipeline.
Before you run CDesk bulk preprocess, you might need to check gfold software.
which gfold
If it shows the following bug:
gfold: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory
The issue occurs because GSL does not provide libgsl.so.0 by default — libgsl.so.0 and libgsl.so refer to the same file. Therefore, creating a hard link from libgsl.so to libgsl.so.0 could solve the problem.
cd /.../envs/CDesk/lib
ln libgsl.so libgsl.so.0
When you use CDesk scRNA cluster in scanpy mode, you might encounter the following bug:
ModuleNotFoundError: No module named 'importlib.metadata'
The importlib.metadata module has been included in the Python standard library since Python 3.8, out environment is Python 3.7. You can modify the code to debug this error as follow.
# .../envs/CDesk_py3.7/lib/python3.7/site-packages/umap/__init__.py
from importlib.metadata import version, PackageNotFoundError
|
from importlib_metadata import version, PackageNotFoundError
If you use CDesk HiC reconstruct flamingo, you need to install FLAMINGOrLite manually first because it could not be installed by conda.
devtools::install_github('JiaxinYangJX/FLAMINGOrLite',ref='HEAD')