Question: When and how to use the --use-bases-mask option while demultiplexing using Cell Ranger mkfastq pipeline?
Answer: Many times, a library is sequenced more than what is recommended in the sequencing requirements for any 10x application. For instance, for dual index 3' gene expression libraries with v3 chemistry sequencing requirements, we recommend R1=28bp, R2=90bp, and I1 and I2=10 bp each. However, the user may choose to sequence longer than recommended for any of the reads. For example, if a user has sequenced more than the recommended 28bp for Read 1 and 90bp for Read 2, they can choose to mask the unwanted bases. For this example, they can use the option --use-bases-mask=Y28n*,I10,I10,Y90n* to generate reads of recommended read lengths.
In the above notation, we expect there to be four sets of reads i.e. two reads and two indexes, each separated by a 'comma', with the number indicating the desired length in base pairs. For the above example,
Read 1 will have 28 bases, the index will be 10bp, the second index (if applicable) will be 10bp, and Read 2 will have 90bp in the final FASTQ output that is generated.
'Y' refers to yes and 'N' refers to no (N is not used in this case as this is a dual indexed library, so we only use I).
-'n' refers to ignore any bases after 28bp and '*' means wildcard to ignore everything that follows. So 'n*' means ignore everything after the first 28 bases until the end of the read.
Reference: bcl2fastq2 user guide