Question: Why don't some of the reads in the FASTQ output from the
longranger basic pipeline have barcodes?
longranger basic takes FASTQ files and performs basic barcode processing including error correction, barcode white-listing, and attaching barcodes to reads. The processed reads can be output in FASTQ or BAM format. If the first 16 basepairs of read one do not represent a valid barcode,
longranger basic will not attach a barcode sequence to the read in the FASTQ output. For more information on what constitutes a valid barcode, please see the article here.
A read with the barcode attached will look like this:
@ST-E00273:259:H7WY3ALXX:1:1101:15554:38315 BX:Z:AAACACCAGCGATATA-1 TTGTTGTTCTTAACATTTCTGTTGATTCAGGTAAGTTCCATTGGCATTTCAGTACAACTAGTGACTAATGCCTCAAAGAATGAAATGAAATTCTACAGCTCTGCTATGACTGGAATGAGATAACAGTT + JJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJF-FFJJJJJJFJJJJFJJJJJJFFFJJJJJFJJJJJJJJJFJJFFJJJJJ--7<AFJ<FJFAFFJJJJ-AJJ<JFJJJA-7FF7AJAA<F7--<-7-
The barcode is added to the id line of the fasta entry and is flagged with the BX tag. The same BX tag is used in the BAM output of the longranger basic pipeline. The FASTQ entry above looks like this in the BAM/SAM output option:
@ST-E00273:259:H7WY3ALXX:1:1101:15554:38315 68 * 0 0 128S * 0 -1 TTGTTGTTCTTAACATTTCTGTTGATTCAGGTAAGTTCCATTGGCATTTCAGTACAACTAGTGACTAATGCCTCAAAGAATGAAATGAAATTCTACAGCTCTGCTATGACTGGAATGAGATAACAGTT JJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJF-FFJJJJJJFJJJJFJJJJJJFFFJJJJJFJJJJJJJJJFJJFFJJJJJ--7<AFJ<FJFAFFJJJJ-AJJ<JFJJJA-7FF7AJAA<F7--<-7- RG:Z:NA12878_basic_bam:LibraryNotSpecified:1:unknown_fc:0 BC:Z:GCTACCTG QT:Z:AAFAFJJJ TR:Z:GGGTGAT TQ:Z:JJJFJJJ BX:Z:AAACACCAGCGATATA-1 RX:Z:AAACACCAGCGATATA QX:Z:AAFFFJJJJJJJJJJF
For more information on the BAM format please see the Barcoded BAM documentation.