Question: It is surprising that in your Supernova datasets, maize has only 35.8% repetitive genome fraction and 6.7% in chili. Actually, maize and chili both have ~80% repetitive sequences. Why the discrepancy - how does Supernova define the repetitive fraction of the genome?
Answer: Here is our definition of the repetitive fraction:
Genome repetitivity index: percent of read kmers, counted with multiplicity, whose depth exceeds twice the expected depth. Intended as an index of repetitivity, rather than a measure of ‘which fraction of the genome is repetitive’. This statistic could be confounded by microbiome sequences and contamination.
So it depends how you define what a repeat is. The definition above is designed to correlate with assembly difficulty, as best we can.
There is a definition people get by running RepeatMasker, but the meaningfulness of that depends on the completeness of the repeat database, and it's not particularly relevant to assembly (although obviously correlated).
Old and new repeats (on an evolutionary timescale) behave very differently in assembly. For transposable elements that inserted recently, one expects to find highly similar copies. Whereas, if they inserted a long time ago, most will have been degraded by random mutations, so their similarity is much lower. They are biologically interesting, but if sufficiently degraded, irrelevant for assembly.
The goal of Supernova is not to annotate the genome biologically, or to summarize the biology, but rather to characterize the genome's properties vis-a-vis assembly. And the fact that the chili contigs are five times longer than the maize contigs is no accident - from an assembly perspective, chili is a lot less repetitive than maize.
For more information on Supernova performance for non-model organisms, please see Supernova performance on twenty human and nonhuman datasets.