Question: Why does the cellranger aggr
pipeline complain that I am missing the library_id
field in the header row when I view it in my text editor?
Answer: There are at least two possible causes of the error:
(1) One possible explanation is the Unicode byte order mark (BOM), which is represented by a hidden character <U+FEFF>. Certain programs such as Excel and the Linux cat
command will not display the character, so a CSV file appears to conform to formatting requirements. However, cellranger aggr
will crash while parsing such a CSV file.
It is possible to check the header line for BOM on Linux using the following command, and quit by pressing 'q
':
less Aggregation.csv
It is also possible to fix the CSV file by running the following series of Linux commands:
# Erase <U+FEFF> from file, save result to tmp
awk '{ gsub(/\xef\xbb\xbf/,""); print }' Aggregation.csv > tmp
# Rename tmp to Aggregation.csv
mv tmp Aggregation.csv
(2) Another possible explanation is the CTRL-M characters. This article describes a few ways to remove CTRL-M characters from a file in UNIX: Clean CTRL-M.
Related Article: No .cloupe file found error when you execute aggr (count, multi)
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.