Mixcr

Import bulk TCR data previously parsed with mixcr into tcrdist, using tcrdist.mixcr.mixcr_to_tcrdist2(). We strongly recommend cleaning up this import with the function tcrdist.mixcr.remove_entries_with_invalid_vgene(). It ensures that only valid v-gene names are included given your chain of interest.

Example

import os
import numpy as np
import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist import mixcr

clones_fn = os.path.join('tcrdist',
                         'test_files_compact',
                         'SRR5130260.1.test.fastq.output.clns.txt')

df = mixcr.mixcr_to_tcrdist2(chain = "delta",
                             organism = "human",
                             clones_fn = clones_fn)

df = mixcr.remove_entries_with_invalid_vgene(df,
                                             chain = "delta",
                                             organism = "human")

df['subject'] = 'SRR5130260.1'

tr = TCRrep(cell_df = df,
            organism = "human",
            chains = ['delta'],
            db_file='gammadelta_db.tsv')

Tip

For additional information on using mixcr, see the section on parsing fastq files at the bottom of this page.

Mixcr: parsing a fastq

This example uses a compact-test data file from gamma/delta T cells.

  • SRR5130260.1.test.fastq contains a small sample of data from SRR5130255.1.fastq

The script below provides an example of running mixcr within a docker container milaboratory/mixcr:3-imgt.

NAME=SRR5130260.1.test.fastq

docker run --rm\
    -m 4g \
    -v ~/TCRDIST/tcrdist2/tcrdist/test_files_compact/:/work \
    milaboratory/mixcr:3-imgt \
    -c "mixcr align ${NAME} ${NAME}.vdjca --species hsa; \
    mixcr assemble ${NAME}.vdjca ${NAME}.output.clns; \
    mixcr exportClones ${NAME}.output.clns ${NAME}.output.clns.txt; \
    mixcr exportAlignments ${NAME}.vdjca > ${NAME}.result.txt"

Running this script produces a set of file:

  • tcrdist/test_files_compact/SRR5130260.1.test.fastq.output.clns
  • tcrdist/test_files_compact/SRR5130260.1.test.fastq.output.clns.txt
  • tcrdist/test_files_compact/SRR5130260.1.test.fastq.result.txt
  • tcrdist/test_files_compact/SRR5130260.1.test.fastq.vdjca

The example above used the clones file SRR5130260.1.test.fastq.output.clns.txt One can also use SRR5130260.1.test.fastq.result.txt as input for tcrdist2, using tcrdist.mixcr.mixcr_to_tcrdist2() with the seqs_fn argument. If SRR5130260.1.test.fastq.output.clns.txt is passed to the clones_fn argument of tcrdist.mixcr.mixcr_to_tcrdist2(), the DataFrame returned will contain “clone_id” and “count” columns.