logo

This report includes an overview of all the sequences analyzed. For further information for each animal sequence, please check the reports per folder.name.

Overall quality of sequences post-filtering

Here is attached four plots to compare the filtered sequences by the proposed quality requirements, the graphs on the left are before the filtering and the graphs on the right are after filtering. The code was written to filter the best sequences, if they were repeated, it will select the best quality sequence after comparing both sequences quality scores. The y-axis contain quality score similar to Phread Quality Score, which is logarithmically related to the base-calling error probabilities. Thus, a score of 10 represents a basecalling error probability of 1 in 10 (90% accuracy), a quality score of 20, 1 in 100 (99% accuracy) etc.
The filtering was based on:

The full script of filtering can be found on the Rmd file.

Overall quality score

Here you can see a table containing the number of repeated sequences, total sequences, filtered sequences, and the percentage of the selected sequences out of the total number of unique sequences (total-repeated).It also contain the quality score per plate, and standard deviation, based on the Phred Quality Score explained above.

folder.name n_repeated n_total n_filtered n_unique used_percentage mean.quality standard.deviation
group_1 0 113 76 113 67.26 49.95 4.426
group_2 0 244 165 244 67.62 51.11 3.784
group_3 0 203 120 203 59.11 52.31 4.384
Total 0 560 361 560 64.66 51.26 4.207

Secondary peaks inside the CDR3 region

If the algorithm detected a secondary peak on CDR3 region, in our case between position 100 and 150, it will plot automatically the chromatogram from the CDR3 region. Below you can see a histogram based on the number of secondary peaks detected inside the CDR3 region. The secondary.peak needed to be at least half the size (ratio = 0.5) of the primary peak to be considered a true secondary peak. You can check the CDR3 chromatograms in the folder called “chromatograms”.

Create a csv file from the filtered sequences

All the informations about the sequence primary basecall, quality scores, folder.name, well, plate, secondary peaks etc. can be found in the csv file created inside the “processing” folder.

Fasta files

A fasta file containing all the filtered sequences was created on the folder called “quality reports”.

## R version 4.0.5 (2021-03-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] RepertoiR_0.1.0     tibble_3.1.4        kableExtra_1.3.4   
##  [4] sangerseqR_1.26.0   stringr_1.4.0       DECIPHER_2.18.1    
##  [7] RSQLite_2.2.8       Biostrings_2.58.0   XVector_0.30.0     
## [10] IRanges_2.24.1      S4Vectors_0.28.1    BiocGenerics_0.36.1
## [13] devtools_2.4.2      usethis_2.0.1       gridExtra_2.3      
## [16] ggplot2_3.3.5       dplyr_1.0.5         knitr_1.34         
## [19] ape_5.5            
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-153      fs_1.5.0          bit64_4.0.5       webshot_0.5.2    
##  [5] httr_1.4.2        rprojroot_2.0.2   tools_4.0.5       bslib_0.3.0      
##  [9] utf8_1.2.2        R6_2.5.1          DBI_1.1.1         colorspace_2.0-2 
## [13] withr_2.4.2       tidyselect_1.1.1  prettyunits_1.1.1 processx_3.5.2   
## [17] bit_4.0.4         compiler_4.0.5    cli_3.0.1         rvest_1.0.1      
## [21] xml2_1.3.2        desc_1.3.0        labeling_0.4.2    sass_0.4.0       
## [25] scales_1.1.1      callr_3.7.0       systemfonts_1.0.2 digest_0.6.27    
## [29] rmarkdown_2.10    svglite_2.0.0     pkgconfig_2.0.3   htmltools_0.5.2  
## [33] sessioninfo_1.1.1 highr_0.9         fastmap_1.1.0     rlang_0.4.11     
## [37] rstudioapi_0.13   shiny_1.6.0       farver_2.1.0      jquerylib_0.1.4  
## [41] generics_0.1.0    jsonlite_1.7.2    magrittr_2.0.1    Rcpp_1.0.7       
## [45] munsell_0.5.0     fansi_0.5.0       lifecycle_1.0.1   stringi_1.7.4    
## [49] yaml_2.2.1        zlibbioc_1.36.0   pkgbuild_1.2.0    grid_4.0.5       
## [53] blob_1.2.2        promises_1.2.0.1  crayon_1.4.1      lattice_0.20-44  
## [57] ps_1.6.0          pillar_1.6.3      pkgload_1.2.1     glue_1.4.2       
## [61] evaluate_0.14     remotes_2.4.0     vctrs_0.3.8       httpuv_1.6.2     
## [65] testthat_3.0.4    gtable_0.3.0      purrr_0.3.4       assertthat_0.2.1 
## [69] cachem_1.0.6      xfun_0.25         mime_0.11         xtable_1.8-4     
## [73] later_1.3.0       viridisLite_0.4.0 memoise_2.0.0     ellipsis_0.3.2