I’m actually wondering more generally if there’s documentation about what the various files are and what purposes they are intended for.
But specifically, if I want to get all high quality tumor variant calls, what would be the best file to use? I’ve been looking at the data for one patient PT_YYGH8EMR. I was thinking that the somatic mutations found in the file
2114e40c-4db8-430e-a6c8-203f068037b2.consensus_somatic.PASS.vep.vcf.gz should be a subset of all the variants found in
SL264139.hard-filtered.vcf.gz but it does not seem to be true. Out of 24 consensus somatic variants on Chr 21, only 2 are in the hard-filtered file.
So, what is the hard-filtered file useful for? Can I use it for my purpose of getting the tumor calls, or must I start from the bam or fastq? I’d really rather not spend a ton of money running these pipelines when it seems like this must have already been done.