WGBS pipeline
Preparing the VM to be able to run things
sudo bash prepare_vm.sh
Running a sample
The reference genome is GRCh38 no alt (from ENCODE) to which I added the Lambda phage DNA to capture methylation spike-ins. It's included in the repository. Also included are the indices that are built using the "bismark_genome_preparation" utility, so no need to re-run that.
bash wgbs-pipeline.sh /folder/with/fastqfiles/sample_id.R1.fastq.gz /folder/with/fastqfiles/sample_id.R2.fastq.gz
Creating references
The wgbs-pipeline.sh
pipeline should take care of downloading and indexing
references if the appropriate files cannot be found in the ./ref folder. This
can also be done beforehand if so desired using the generate_bs_indices.sh
script. You need to provide the folder in which the reference files will be
stored. By default (so that you don't need to change anything in the pipeline)
you can run the following:
bash ./apps/generate_bs_indices.sh ./ref
The script requires samtools, bismark and bowtie2 to be installed in very
specific directories. To that end, the prepare_vm.sh
script needs to be run.