CDesk: 0. Tools and tips

CDesk also provides data download tools. This chapter shows how to use CDesk to download the data and some tips for you while using CDesk piepeline.

Data download tools

Prepare tools

Tips

CDesk handbook

Data download tools

CDesk tools download module enables high-speed, parallel downloading of SRR/ERR data using Aspera. Here is an example about how to use the CDesk tools download module to download SRR/ERR data.

CDesk tools download \
-i /.../input.txt -o /.../output_directory \
--key /.../keyfile

Parameters^(*necessary)	Description	Default value
-i,--input*	The input sample list file
-o,--output*	The output directory
-k,--key	The ASCP Key Path
-t,--thread	The number of threads to use	5

If the pipeline runs successfully, there would be sample output folders saving the fastq.gz files.

A successful CDesk tools download running process


Checking required tools...
1 SRR/ERR samples in total
📥 Downloading SRR35012809
⏳ Waiting for all downloads to complete...
SRR35012809_1.fastq.gz  100% 1618MB 93.8Mb/s    04:26    
SRR35012809_2.fastq.gz  100% 1592MB 49.4Mb/s    10:45    
Completed: 3287242K bytes transferred in 645 seconds
 (41688K bits/sec), in 2 files, 1 directory.
✅ All downloads completed!

What should the input file look like?


SRR5489315
SRR35012809

An input SRR/ERR file list

Prepare tools

CDesk tools prepare module provides functionality to generate the input CSV file containing sequence data information for subsequent preprocessing analysis. Here is an example about how to use the CDesk tools prepare module.

CDesk tools prepare \
-i /.../input_directory -o /.../output_directory \ 
--pair _R1.fq.gz,_R2.fq.gz --single _single.fq.gz

Parameters^(*necessary)	Description	Default value
-i,--input*	The input directory
-o,--output*	The output directory
--pair	The ASCP Key Path
--single	The number of threads to use

If the pipeline runs successfully, there would be a csv file recording the sequence data information in the output directory which could be used as the input csv for the following preprocess analysis.

An example to use CDesk tools prepare


Files in the input directory:
sample1_R1.fq.gz
sample1_R2.fq.gz
sample2_R1.fq.gz
sample2_R2.fq.gz
Sample_single.fq.gz

command:
CDesk tools prepare \
-i /.../input_directory -o /.../output_directory \ 
--pair _R1.fq.gz,_R2.fq.gz --single _single.fq.gz
output csv:
sample,fq1,fq2,ports
Sample,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/Sample_single.fq.gz,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/Sample_single.fq.gz,1
sample2,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample2_R1.fq.gz,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample2_R2.fq.gz,2
sample1,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample1_R1.fq.gz,/mnt/linzejie/CDesk_test/data/0.Tools/prepare/sample1_R2.fq.gz,2

Tips

Here are some trouble you might encounter while using CDesk pipeline.

GFold bug

Before you run CDesk bulk preprocess, you might need to check gfold software.

which gfold

If it shows the following bug:

gfold: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

The issue occurs because GSL does not provide libgsl.so.0 by default — libgsl.so.0 and libgsl.so refer to the same file. Therefore, creating a hard link from libgsl.so to libgsl.so.0 could solve the problem.

cd /.../envs/CDesk/lib
ln libgsl.so libgsl.so.0

scanpy bug

When you use CDesk scRNA cluster in scanpy mode, you might encounter the following bug:

ModuleNotFoundError: No module named 'importlib.metadata'

The importlib.metadata module has been included in the Python standard library since Python 3.8, out environment is Python 3.7. You can modify the code to debug this error as follow.

# .../envs/CDesk_py3.7/lib/python3.7/site-packages/umap/__init__.py
from importlib.metadata import version, PackageNotFoundError
              |
from importlib_metadata import version, PackageNotFoundError

FLAMINGOrLite installation

If you use CDesk HiC reconstruct flamingo, you need to install FLAMINGOrLite manually first because it could not be installed by conda.

devtools::install_github('JiaxinYangJX/FLAMINGOrLite',ref='HEAD')