Abstract
Short project description.This report includes an overview of all the sequences analyzed. For further information for each animal sequence, please check the reports per folder.name.
Here is attached four plots to compare the filtered sequences by the proposed quality requirements, the graphs on the left are before the filtering and the graphs on the right are after filtering. The code was written to filter the best sequences, if they were repeated, it will select the best quality sequence after comparing both sequences quality scores. The y-axis contain quality score similar to Phread Quality Score, which is logarithmically related to the base-calling error probabilities. Thus, a score of 10 represents a basecalling error probability of 1 in 10 (90% accuracy), a quality score of 20, 1 in 100 (99% accuracy) etc.
The filtering was based on:
The full script of filtering can be found on the Rmd file.
Here you can see a table containing the number of repeated sequences, total sequences, filtered sequences, and the percentage of the selected sequences out of the total number of unique sequences (total-repeated).It also contain the quality score per plate, and standard deviation, based on the Phred Quality Score explained above.
folder.name | n_repeated | n_total | n_filtered | n_unique | used_percentage | mean.quality | standard.deviation |
---|---|---|---|---|---|---|---|
group_1 | 0 | 113 | 76 | 113 | 67.26 | 49.95 | 4.426 |
group_2 | 0 | 244 | 165 | 244 | 67.62 | 51.11 | 3.784 |
group_3 | 0 | 203 | 120 | 203 | 59.11 | 52.31 | 4.384 |
Total | 0 | 560 | 361 | 560 | 64.66 | 51.26 | 4.207 |
If the algorithm detected a secondary peak on CDR3 region, in our case between position 100 and 150, it will plot automatically the chromatogram from the CDR3 region. Below you can see a histogram based on the number of secondary peaks detected inside the CDR3 region. The secondary.peak needed to be at least half the size (ratio = 0.5) of the primary peak to be considered a true secondary peak. You can check the CDR3 chromatograms in the folder called “chromatograms”.
All the informations about the sequence primary basecall, quality scores, folder.name, well, plate, secondary peaks etc. can be found in the csv file created inside the “processing” folder.
A fasta file containing all the filtered sequences was created on the folder called “quality reports”.
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] RepertoiR_0.1.0 tibble_3.1.4 kableExtra_1.3.4
## [4] sangerseqR_1.26.0 stringr_1.4.0 DECIPHER_2.18.1
## [7] RSQLite_2.2.8 Biostrings_2.58.0 XVector_0.30.0
## [10] IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.1
## [13] devtools_2.4.2 usethis_2.0.1 gridExtra_2.3
## [16] ggplot2_3.3.5 dplyr_1.0.5 knitr_1.34
## [19] ape_5.5
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-153 fs_1.5.0 bit64_4.0.5 webshot_0.5.2
## [5] httr_1.4.2 rprojroot_2.0.2 tools_4.0.5 bslib_0.3.0
## [9] utf8_1.2.2 R6_2.5.1 DBI_1.1.1 colorspace_2.0-2
## [13] withr_2.4.2 tidyselect_1.1.1 prettyunits_1.1.1 processx_3.5.2
## [17] bit_4.0.4 compiler_4.0.5 cli_3.0.1 rvest_1.0.1
## [21] xml2_1.3.2 desc_1.3.0 labeling_0.4.2 sass_0.4.0
## [25] scales_1.1.1 callr_3.7.0 systemfonts_1.0.2 digest_0.6.27
## [29] rmarkdown_2.10 svglite_2.0.0 pkgconfig_2.0.3 htmltools_0.5.2
## [33] sessioninfo_1.1.1 highr_0.9 fastmap_1.1.0 rlang_0.4.11
## [37] rstudioapi_0.13 shiny_1.6.0 farver_2.1.0 jquerylib_0.1.4
## [41] generics_0.1.0 jsonlite_1.7.2 magrittr_2.0.1 Rcpp_1.0.7
## [45] munsell_0.5.0 fansi_0.5.0 lifecycle_1.0.1 stringi_1.7.4
## [49] yaml_2.2.1 zlibbioc_1.36.0 pkgbuild_1.2.0 grid_4.0.5
## [53] blob_1.2.2 promises_1.2.0.1 crayon_1.4.1 lattice_0.20-44
## [57] ps_1.6.0 pillar_1.6.3 pkgload_1.2.1 glue_1.4.2
## [61] evaluate_0.14 remotes_2.4.0 vctrs_0.3.8 httpuv_1.6.2
## [65] testthat_3.0.4 gtable_0.3.0 purrr_0.3.4 assertthat_0.2.1
## [69] cachem_1.0.6 xfun_0.25 mime_0.11 xtable_1.8-4
## [73] later_1.3.0 viridisLite_0.4.0 memoise_2.0.0 ellipsis_0.3.2