PBGL CNV-seq Analysis
Background
Copy number variation (CNV) analysis using CNVseq, R, Jupyter Notebooks, Miniconda3, Mamba, and Git.
All the commands run in a Linux terminal are preceded by the $ prompt sign. To run a command, copy/past the command without the $ sign. Those commands run in a Jupyter Notebook are preceded by the In [ ]:
Note
This is not an official IAEA publication but is made available as working material. The material has not undergone an official review by the IAEA. The views expressed do not necessarily reflect those of the International Atomic Energy Agency or its Member States and remain the responsibility of the contributors. The use of particular designations of countries or territories does not imply any judgement by the publisher, the IAEA, as to the legal status of such countries or territories, of their authorities and institutions or of the delimitation of their boundaries. The mention of names of specific companies or products (whether or not indicated as registered) does not imply any intention to infringe proprietary rights, nor should it be construed as an endorsement or recommendation on the part of the IAEA.
Installations - Virtual Environments and Software Packages
Before installing any necessary software, it is recommended to check if the computer is running 32-bit or 64-bit for downloading Miniconda3. Run the following to verify the system:
$ uname -m
Miniconda3 (conda) and Mamba
Download the Miniconda3, or simply “conda”, installer:
Run the downloaded installer (for a 64-bit system):
$ bash Miniconda3-latest-Linux-x86_64.sh
Open a new terminal window for conda to take effect. The word (base) should appear in front of the computer name in the terminal window, like so:
Verify the installation and update conda in new terminal window with:
$ conda env list
$ conda update --all
$ conda upgrade --all
Install mamba library/package manager that will be used for installing software dependencies of the tool:
$ conda install mamba --yes
Git Installation and Repo Cloning
Git is required for cloning locally (downloading a copy to your local computer) the PBGL CNVseq Github repository. Git and Github are used for version control of software. It keeps track of development, releases, and issues of a software project.
Install git for cloning the pbgl-cnvseq software repository from Github, where the latetest version of the tool resides:
$ mamba install git --yes
After the instalation, clone PBGL’s CNVseq repository, pbgl-cnvseq, to the local computer in any desired directory.
$ git clone https://github.com/pbgl/pbgl-cnvseq.git
The cloning process will depict the following:
Cloning into 'pbgl-cnvseq'...
remote: Enumerating objects: 628, done.
remote: Counting objects: 100% (336/336), done.
remote: Compressing objects: 100% (239/239), done.
remote: Total 628 (delta 109), reused 292 (delta 78), pack-reused 292
Receiving objects: 100% (628/628), 11.79 MiB | 7.03 MiB/s, done.
Resolving deltas: 100% (190/190), done.
The pbgl-cnvseq repository should have been cloned successfully. Verify that the download is complete by listing the folders/files in the directory.
$ ls -l
The folder called pbgl-cnvseq should be listed in the directory.
Required Libraries with Mamba
CNVseq has multiple dependencies, listed below:
Samtools
R
configr
ggplot2
BiocManager
Bioconductor-GenomicAlignments
Bioconductor-GenomeInfoDb
Jupyter Notebook
IRkernel
There are two ways to create the cnvseq virtual environment and install the necessary libraries to run CNVseq: automatically or manually.
Automatically (faster)
One YAML file, environment.yml, is provided to automatically create a virtual environment and install the dependent libraries through mamba. The file creates the cnvseq virtual environment, along R, Jupyter Notebook, and the R-kernel in Jupyter. It also installs the dependent R libraries. Run environment.yml:
$ mamba env create --file envs/environment.yml
Once done, a list of the virtual environments available can be seen by running:
$ conda env list
Activate (enter) the recently-created virtual environment cnvseq:
$ conda activate cnvseq
Once done, the virtual environment should be activated and all the necessary packages should be installed. This can be verified with:
$ conda list
Manually (slower)
To manually create and activate an environment, run:
$ conda create --name cnvseq
$ conda activate cnvseq
Start running the installations of the necessary libraries:
$ mamba install --channel conda-forge notebook r-irkernel r-biocmanager --yes
$ mamba install --channel bioconda samtools bioconductor-genomicalignments bioconductor-genomeinfodb --yes
$ mamba install --channel r r-ggplot2 --yes
$ mamba install --channel pcgr r-configr --yes
Once done, all the necessary packages should be installed. This can be verified with:
$ conda list
Running Jupyter
To activate Jupyter, run the following in the terminal:
$ jupyter notebook
This command will start a Jupyter session inside the directory the command is run. The user can navigate between directories, visualize files, and edit files in a web browser by clicking on directories or files, respectively.
Look for the directory pbgl-cnvseq and click on it. Click on tool directory, which contains three directories and two Jupyter Notebooks. Here is a breakdown of each:
config:
directory containing configuration files specifying file paths, parameter definitions, and comparison lists
helper-functions:
directory containing R scripts with functions to calculate and plot CNVs
output:
directory that will contain both tab-files and images output after running a CNVseq analysis
two Jupyter Notebooks:
RCNV-seq-PBGL-sorghum-analysis-example.ipynb
example analysis of a comparison between a control and mutant of sorghum
RCNV-seq-template.ipynb
template for the user
Note
Jupyter lets the user duplicate, rename, move, download, view, or edit files in a web browser. This can be done by clicking the box next to a file and choosing accordingly.
Editing the Configuration File
In order to run the CNVseq Jupyter Notebook, the user needs to feed it with a configuration file (config-CNVseq.yml) that specifies the paths to the bam files, comparisons to be done, chromosomes to analyze, and parameter definitions for calculating and plotting CNVs.
The configuration file config-CNVseq.yml can be found in the pbgl-cnvseq/tool/config directory. The configuration file contains the following:
# parameters for CNV/window size calculations defaults from HLiang:
# https://github.com/hliang/cnv-seq
parameters:
annotate: TRUE
bed_file_present: FALSE
bigger: 1.5
log2: 0.6
pvalue: 0.001
window_size: 10000
# list of chromosomes to be analyzed
chromosomes:
- Chromosome1
- Chromosome2
- Chromosome3
- AnotherChromosome
# folder to store outputs; should be inside "output" directory, example:
# output_path: output/run-name or output/organism
output_path: output/organism-being-analyzed
# path to bed file for varying window sizes (optional)
bed_path: /path/to/bed/file.bed
# paths to bam files
paths:
control-1: &control-1 /path/to/bam/file/control-1.bam
mutant-1: &mutant-1 /path/to/bam/file/mutant-1.bam
mutant-2: &mutant-2 /path/to/bam/file/mutant-2.bam
# comparisons to be analyzed
comparisons:
control-1-vs-mutant-1:
control: *control-1
mutant: *mutant-1
control-1-vs-mutant-2:
control: *control-1
mutant: *mutant-2
Note
The user needs to edit config-CNVseq.yml to point towards bam/bed files; specify comparisons and chromosomes to analyze; and define the parameters to calculate/plot CNVs.
One example configuration files is provided (config-CNVseq-PBGL-sorghum-analysis-example.yml). The configuration file config-CNVseq.yml contains multiple fields to be defined by the user.
parameters:
parameters used to create window sizes, thresholds, plots, etc
the parameter defaults are provided accroding to HLiang’s original values
chromosomes:
list of chromosome names to analyze
chromosome names can be found in a bam file’s header using the following samtools command:
$ samtools view -h my_bam_file.bam | less -S
output_path:
directory that will contain both tab-files and images output after running a CNVseq analysis
the defined path will be inside the pbgl-cnvseq/tool/output directory
images:
directory that will store the CNV image outputs per comparison of:
all chromosomes in one plot
each chromosome individually in one plot
any zoomed-in region plot of one chromosome
tab-files:
directory that will contain two types of output tab-delimited files:
hits used to calculate CNVs, containing chromosome, start, end, width, control coverage, mutant coverage
CNVs per chromosome per comparison, containing CNV number, chromosome, start, end, size, log2, and p-value
bed_path:
path pointing to bed file containing targets if using varying window sizes
it is recommended to run the analysis without a .bed file; plots will only reflect the targets in a .bed file
if a .bed file is provided, the bed_file_present parameter under the parameters section has to be changed to TRUE
paths:
sample names and their respective paths to .bam files
samples can be named as desired but the sample name must be repeated after the colon and prefixed with a & sign
the & prefix sign is used to reference the sample’s path in different places of the same configuration file
example use:
paths:
mysample: &mysample /home/username/bam_files/mysample.bam
XYZ-123: &XYZ-123 /home/username/bam_files/XYZ-123.bam
potato95: &potato95 /home/username/bam_files/potato95.bam
comparisons:
comparison names with respective control and mutant samples per comparison
each comparison can be named as desired
the sample names to be used as control and mutant need to be prefixed by a * sign
the * prefixed sign is used to extract the sample’s path defined in the paths section
example:
comparisons:
comparison-1:
control: *mysample
mutant: *potato95
a-different-comparison-278asd:
control: *mysample
mutant: *XYZ-123
Running a RCNV_seq-template Jupyter Notebook
Note
It is recommended to duplicate the RCNV-seq-template notebook and then renaming the copy before doing any edits to the notebook.
In the pbgl-cvnseq/tool directory, click on RCNV-seq-template and a new tab in your web-browser will open the notebook.
The notebook contains cells that are populated by text or code. Instructions are provided in the notebook to guide the user. To run a cell, click on the corresponding cell and click on the Run button on the top of the notebook. Another way to run a cell can be done by clicking on the corresponding cell and pressing Ctrl + Enter or Shift + Enter.
The notebook consists of 6 sections:
Installing Required Libraries (optional)
Loading Required Libraries (mandatory)
User Input (mandatory)
CNV Calculations
Plotting
Plotting a Zoomed-In Region of One Chromosome
Installing Required Libraries (optional)
libraries being installed:
In [ ]: # install necessary libraries using R functions
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("GenomicAlignments", "GenomeInfoDb"))
install.packages("configr")
install.packages("ggplot2")
to be run if the mamba installations were not successful or the loading of the required libraries fails under the Loading Required Libraries section
Loading Required Libraries (mandatory)
libraries being loaded:
In [ ]: # load necessary libraries
library(GenomicAlignments)
library(ggplot2)
library(configr)
# specify source R script with helper functions
source("helper-functions/RCNV_seq-helper.R")
source("helper-functions/cnvHLiang.R")
this cell will load necessary libraries and scripts containing the necessary functions to be used
if running this cell fails, some libraries may be missing from the installation
to fix this issue, run the installations under the Installing Required Libraries
User Input (mandatory)
the user needs to write the appropriate name of the configuration file being used in the following cell:
In [ ]: configPath <- "config/config-CNVseq.yml"
the bottom cell will extract the fields defined in the configuration file:
In [ ]: config <- read.config(configPath)
CNV Calculations
function to calculate all the hits and CNVs:
In [ ]: cnvCalculate(config)
output tabulated files are stored inside pbgl-cnvseq/tool/output/name_defined_in_output_path_of_config/tab-files directory
Plotting
function to plot two types of images:
CNVs of all chromosomes in the same plot
CNVs of one chromosome per plot
In [ ]: cnvPlot(config, imgType="", yMin= , yMax= )
function parameters are:
config - configuration file defined under User Input section
imgType - image extention to use; available options are: png, jpeg, svg, and pdf; default to png
yMin - y-axis bottom limit; default to -5
yMax - y-axis upper limit; default to 5
output images are stored inside the pbgl-cnvseq/tool/output/name_defined_in_output_path_of_config/images directory
it is recommended to inspect the CNV-plots with default y-limits and then modify
cnvPlotting can be used in two ways: with default values or user-defined values
with default values, the parameters can be omitted and set to imgType=”png”, yMin=-5 and yMax=5
the following two commands have the same parameter values and will output the same plots:
In [ ]: cnvPlot(config)
cnvPlot(config, imgType="png", yMin=-5, yMax=5)
Plotting a Zoomed-In Region of One Chromosome
function to plot a specific zoomed-in region of one chromosome
In [ ]: cnvPlotZoom(config, tabFile, chromosome="", start= , end= , yMin= , yMax= , imgType="")
function parameters are:
config - configuration file defined under User Input section
tabFile - tabulated file containing all hits; requires user-input to define path to all-hits tabulated file
chromosome - chromosome name to focus on; default to NA
start - start of window in bp; both scientific notation is accepted: 100000 or 10e4; defaults to NA
end - end of window in bp; the same applies as in the start parameter; defaults to NA
yMin - y-axis bottom limit; default to -5
yMax - y-axis upper limit; default to 5
imgType - image extention to use; available options are: png, jpeg, svg, and pdf; defaults to png
requires path definition of tabFile parameter in the cell:
In [ ]: tabFile <- "output/run-name/tab-files/tab-file.tab"
Example Tutorial - CNV Analysis of Sorghum
In this section of the manual, an example analysis of sorghum will be shown in a step-by-step process. The data has been subset to three chromosomes Chr04, Chr05, and Chr09. The CNVseq analysis will depict a deletion on chromosome Chr09.
The tutorial is divided between the following sections:
Data Download
General Installations
Github Repository Cloning
Virtual Environment Creation
Configuration File Editing
Jupyter Notebook Analysis
Data Download
Running this tutorial requires two binary alignment map (BAM) files of sorghum crop: one control and one mutant. Each .bam file is subset for chromosomes Chr04, Chr05, and Chr09; this is reflected on their names. They have been both sorted and indexed using samtools.
The following links can be used to download the necessary .bam and .bam.bai (index) files:
con-2_S1 (sorghum control)
D2-1_S7 (sorghum mutant)
There are two additional ways to download the .bam files and their respectives indices, besides clicking the links above:
Linux terminal
Web-Browser
Linux Terminal
Open a new terminal and navigate to a directory of choice. We recommend creating a directory to store the data and running wget in the respective location:
$ mkdir bam_files
$ cd bam_files
$ wget https://bss1innov1nafa1poc1.blob.core.windows.net/sample-container/2021_Training/con-2_S1-Chromes-04-05-09.bam
$ wget https://bss1innov1nafa1poc1.blob.core.windows.net/sample-container/2021_Training/con-2_S1-Chromes-04-05-09.bam.bai
$ wget https://bss1innov1nafa1poc1.blob.core.windows.net/sample-container/2021_Training/D2-1_S7-Chromes-04-05-09.bam
$ wget https://bss1innov1nafa1poc1.blob.core.windows.net/sample-container/2021_Training/D2-1_S7-Chromes-04-05-09.bam.bai
Web-Browser
Open a web-browser of preference. Copy/paste the links provided above in the address bar. This should automatically begin the download. Move the downloaded files to a location of personal preference.
General Installations
Open a web browser and copy/paste the following link to download Miniconda3:
After download, open a new terminal window, navigate to the directory with the downloaded Miniconda3 installer, and run the installation.
Note
The donwloaded Miniconda3 installer file might not match the one run in this example. Please, type the corresponding name of the .sh file downloaded.
$ bash Miniconda3-latest-Linux-x86_64.sh
Once the Miniconda3 installation is done, close the terminal and open a new one. The word (base) should be present to the left of the computer name in the prompt. Update conda and install mamba.
$ conda update --all --yes
$ conda upgrade --all --yes
$ conda install mamba --yes
Github Repository Cloning
Git or Github is used for storage and version control of software projects. Git is used to manage in a local Linux terminal. First, git will need to be installed. In a new terminal window, run the installation command:
$ mamba install git --yes
After installing git, create a location to store Github repoitories, navigate into it, and clone (download a copy locally) of PBGL’s pbgl-cnvseq repository.
$ mkdir Github
$ cd Github
$ git clone https://github.com/pbgl/pbgl-cnvseq
A successful cloning will output the following:
Cloning into 'pbgl-cnvseq'...
remote: Enumerating objects: 628, done.
remote: Counting objects: 100% (336/336), done.
remote: Compressing objects: 100% (239/239), done.
remote: Total 628 (delta 109), reused 292 (delta 78), pack-reused 292
Receiving objects: 100% (628/628), 11.79 MiB | 7.03 MiB/s, done.
Resolving deltas: 100% (190/190), done.
Navigate into the cloned repository,
$ cd pbgl-cnvseq
Virtual Environment Creation
Once inside the pbgl-cnvseq directory, create the cnvseq virtual environment, install the necessary libraries, and activate the newly created cnvseq virtual environment.
$ mamba env create --file envs/environment.yml
$ conda activate cnvseq
The name (cnvseq) environment should be reflected to the left of the computer name in the terminal command prompt.
Configuration File Editing
Open a Jupyter session by running,
$ jupyter notebook
Navigate to pbgl-cnvseq/tool/config and click on config-CNVseq-PBGL-sorghum-analysis-example.yml. This will open the configuration file in a new web-browser tab. Copy/paste the following in the configuration file.
Note
The paths to the bam files will not match. They need to be edited accordingly to point towards the location of the bam files stored locally in the user’s computer.
# parameters for CNV/window size calculations defaults from HLiang:
# https://github.com/hliang/cnv-seq
parameters:
annotate: TRUE
bed_file_present: FALSE
bigger: 1.5
log2: 0.6
pvalue: 0.001
window_size: 10000
# list of chromosomes to be analyzed
chromosomes:
- Chr04
- Chr05
- Chr09
# folder to store outputs; should be inside "output" directory, example:
# output_path: output/run-name or output/organism
output_path: output/PBGL-sorghum-analysis-example
# path to bed file for varying window sizes (optional)
bed_path:
# paths to bam files
paths:
con-2_S1: &con-2_S1 /home/anibal/bam_files/sorghum/con-2_S1-Chromes-04-05-09.bam
D2-1_S7: &D2-1_S7 /home/anibal/bam_files/sorghum/D2-1_S7-Chromes-04-05-09.bam
# comparisons to be analyzed
comparisons:
con-2_s1-vs-D2-1_S7:
control: *con-2_S1
mutant: *D2-1_S7
Save the file and close the tab.
Jupyter Notebook Analysis
In the open Jupyter session, navigate to the pbgl-cnvseq/tool directory and click on the RCNV-seq-template.ipynb Jupyter Notebook. Begin by running the Loading Required Libraries section. This can be done by clicking on the cell to be run followed by clicking the Run button on the top of the Jupyter Notebook.
In [ ]: # load necessary libraries
library(GenomicAlignments)
library(ggplot2)
library(configr)
# specify source R script with helper functions
source("helper-functions/RCNV_seq-helper.R")
source("helper-functions/cnvHLiang.R")
If loading the libraries is unsuccessful (specified in the cell output), an attempt to install the missing libraries can be done by running the Installing Required Libraries section,
In [ ]: # install necessary libraries using R functions
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("GenomicAlignments", "GenomeInfoDb"))
install.packages("configr")
install.packages("ggplot2")
Run the User Input section after correctly editing the name/path of the configuration file:
In [ ]: configPath <- "config/config-CNVseq-PBGL-sorghum-analysis-example.yml"
Load the contents of the configuration file by running the cell,
In [ ]: config <- read.config(configPath)
Copy number variants (CNVs) are calculated under the CNV Calculations section by running the cell,
In [ ]: cnvCalculate(config)
The output tabulated files of all-hits and CNVs can be inspected in the pbgl-cnvseq/tool/output/PBGL-sorghum-analysis-example/tab-files directory. These can be used to import data to other softwares of choice for downstream data analysis. A successful run has the following output right under the input cell,
Comparison: con-2_s1-vs-D2-1_S7
Comparing samples:
/home/anibal/bam_files/sorghum/con-2_S1-Chromes-04-05-09.bam
vs.
/home/anibal/bam_files/sorghum/D2-1_S7-Chromes-04-05-09.bam
[1] "chromosome: Chr04"
[1] "chromosome: Chr05"
[1] "chromosome: Chr09"
[1] "cnv_id: 1 of 25"
[1] "cnv_id: 2 of 25"
[1] "cnv_id: 3 of 25"
[1] "cnv_id: 4 of 25"
[1] "cnv_id: 5 of 25"
[1] "cnv_id: 6 of 25"
[1] "cnv_id: 7 of 25"
[1] "cnv_id: 8 of 25"
[1] "cnv_id: 9 of 25"
[1] "cnv_id: 10 of 25"
[1] "cnv_id: 11 of 25"
[1] "cnv_id: 12 of 25"
[1] "cnv_id: 13 of 25"
[1] "cnv_id: 14 of 25"
[1] "cnv_id: 15 of 25"
[1] "cnv_id: 16 of 25"
[1] "cnv_id: 17 of 25"
[1] "cnv_id: 18 of 25"
[1] "cnv_id: 19 of 25"
[1] "cnv_id: 20 of 25"
[1] "cnv_id: 21 of 25"
[1] "cnv_id: 22 of 25"
[1] "cnv_id: 23 of 25"
[1] "cnv_id: 24 of 25"
[1] "cnv_id: 25 of 25"
CNV percentage in genome: 1.6%
CNV nucleotide content: 3140000
CNV count: 25
Mean size: 125600
Median size: 85000
Max Size: 750000
Min Size: 20000
cnv chromosome start end size log2 p.value
CNVR_1 Chr05 66247501 66267500 20000 1.013742 1.222196e-59
CNVR_2 Chr09 23497501 23527500 30000 -2.288917 3.556107e-238
CNVR_3 Chr09 23547501 23917500 370000 -2.684822 0
CNVR_4 Chr09 23927501 24132500 205000 -2.639143 0
CNVR_5 Chr09 24142501 24302500 160000 -2.359219 0
CNVR_6 Chr09 24337501 24417500 80000 -1.907744 0
CNVR_7 Chr09 24437501 24467500 30000 -0.912225 1.86383e-68
CNVR_8 Chr09 24512501 24652500 140000 -2.799081 0
CNVR_9 Chr09 24657501 24692500 35000 -2.466217 9.479478e-297
CNVR_10 Chr09 24747501 24887500 140000 -2.46515 0
CNVR_11 Chr09 24917501 24942500 25000 -1.980229 9.759448e-172
CNVR_12 Chr09 24947501 25007500 60000 -2.378318 0
CNVR_13 Chr09 25017501 25102500 85000 -2.464295 0
CNVR_14 Chr09 25107501 25132500 25000 -2.254819 4.908193e-196
CNVR_15 Chr09 25137501 25202500 65000 -2.90878 0
CNVR_16 Chr09 25212501 25422500 210000 -3.111295 0
CNVR_17 Chr09 25427501 25512500 85000 -2.883919 0
CNVR_18 Chr09 25522501 25637500 115000 -2.617425 0
CNVR_19 Chr09 25647501 25722500 75000 -2.367854 0
CNVR_20 Chr09 25727501 26477500 750000 -2.388962 0
CNVR_21 Chr09 26482501 26507500 25000 -1.727876 9.894624e-147
CNVR_22 Chr09 26512501 26552500 40000 -1.624917 3.444569e-216
CNVR_23 Chr09 26562501 26787500 225000 -2.290655 0
CNVR_24 Chr09 26817501 26937500 120000 -1.704837 0
CNVR_25 Chr09 26947501 26972500 25000 -1.303558 1.960969e-100
The output tabulated files under pbgl-cnvseq/tool/output/PBGL-sorghum-analysis-example/tab-files are the following:
con-2_s1-vs-D2-1_S7-all-chroms-CNVs.tab
con-2_s1-vs-D2-1_S7-window-10000-all-hits.tab
A depiction of the files are shown below,
Tabulated file containing all hits of all chromosomes
Tabulated file containing all CNVs of all chromsomes
After the CNV calculations are done, plot the CNVs under the Plotting section. Here, the parameters will be set to the following,
In [ ]: cnvPlot(confg, imgType="png", yMin=-5, yMax=5)
The output images under pbgl-cnvseq/tool/output/PBGL-sorghum-analysis-example/images are the following:
con-2_s1-vs-D2-1_S7-chromosome-all-window-10000.png
con-2_s1-vs-D2-1_S7-chromosome-Chr04-window-10000.png
con-2_s1-vs-D2-1_S7-chromosome-Chr05-window-10000.png
con-2_s1-vs-D2-1_S7-chromosome-Chr09-window-10000.png
The output plots are the following,
CNV plot of all chromosomes
CNV plot of chromosome Chr04
CNV plot of chromosome Chr05
CNV plot of chromosome Chr09
As seen above, each plot contains:
a title of the chromosome being analyzed
a sub-title of the comparison being done
Log2 Ratio in the y-axis
chromosome name on the x-axis in plot of all chromosomes
base-pair (bp) position on the x-axis in plot of one chromosome
a label under the x-axis specifying the location of the stored image
The CNV plot of chromosome Chr09 shows an artifact. Moreover, this artifact seems to lie between base-pairs 2e07 and 3e07. To zoom in, the Plotting a Zoomed-In Region of One Chromosome section is provided.
Under the Plotting a Zoomed-In Region of One Chromosome section, first specify the location of the all-hits tabulated file of chromosome Chr09,
In [ ]: tabFile <- "output/PBGL-sorghum-analysis-example/tab-files/con-2_s1-vs-D2-1_S7-window-10000-all-hits.tab"
The run the zooming-in function with the following parameters,
In [ ]: cnvPlotZoom(config, tabFile, chromosome="Chr09", start=2.3e7 , end=2.8e7 , yMin=-5 , yMax=2.5, imgType="png")
The output plot, which is also stored in the pbgl-cnvseq/tool/output/PBGL-sorghum-analysis-example/images, shows a zoomed-in region of chromosome Chr09 of sorghum. The output path is also specified under the x-axis label.
CNV plot of a zoomed-in region of chromosome Chr09
Knowing where artifacts like this one is located plays a major role in identifying insertions or deletions in an organism. The user can, in turn, use different tools, like the Integrative Genomics Viewer (IGV), for further analysis.
This culminates the tutorial.
References
BMC Bioinformatics Publication:
GitHub repositories: