tcrdist2¶
This page introduces two quick ways to get started with tcrdist2. You can calculate
tcr-distances using the wrapper function tcrdist.repertoire.TCRrep.tcrdist2()
with only a few lines of code. This example uses epitope-specific, paired-chain T cell receptor data from Dash et al. 2017. Download dash.csv.
tcrdist2 Distance¶
import pandas as pd
from tcrdist.repertoire import TCRrep
df = pd.read_csv('dash.csv')
tr = TCRrep(cell_df=df, chains=['alpha','beta'], organism='mouse')
tr.tcrdist2()
You can then access pairwise distance attributes:
tr.pw_tcrdist
tr.pw_alpha
tr.pw_beta
tcrdist2 Basic Options¶
tcrdist.repertoire.TCRrep.tcrdist2()
provides multiple options.
import multiprocessing
import numpy as np
import pandas as pd
from tcrdist.repertoire import TCRrep
cpu = multiprocessing.cpu_count() # determine the number of available cpu
df = pd.read_csv('dash.csv')
df = df[df.epitope.isin(['NP'])]
tr = TCRrep(cell_df=df, chains=['alpha','beta'], organism='mouse')
tr.tcrdist2(processes = cpu,
metric = 'nw',
reduce = True,
dump = False,
save = True,
dest = "default_archive",
dest_tar_name = "default_archive.tar.gz")
- processes - number of parallel processes to use
- metric - (default : nw) specifies the Needleman Wunsch metric
- reduce - (default : True) If True, reduces distances from float to int16 data types to reduce file size
- dump - (default : False) If True, dumps pairwise distance matrics for CDR1, CDR2, and CDR2.5 after used
- save - (default : False) If True, saves results to dest folder, and .tar.gz
- dest - Name of folder for saving results
- dest_tar_name - Name of archive for saving results as a .tar.gz
- verbose - If True, provide information to sys.stdout on each step.
Deduplicating your TCRrep.cell_df to make TCRrep.clone_df.
Computing pairwise matrices for multiple Complementarity Determining Regions (CDRs):.
Computing pairwise matrices for cdrs within the alpha-chain using the nw metric.
Computing pairwise matrices for cdrs within the beta-chain using the nw metric.
Calculating composite tcrdistance measures:
Single chain pairwise tcrdistances are in attribute : TCRrep.pw_alpha
Single chain pairwise tcrdistances are in attribute : TCRrep.pw_beta
Combined pairwise tcrdistances are in attribute : TCRrep.pw_tcrdist
CDR specific tcrdistances are in attributes, e.g., : TCRrep.cdr3_b_aa_pw
Reducing File Size: `reduce` argumment set to True:
Reducing : paired_tcrdist to int16.
Reducing : cdr3_a_aa_pw to int16.
Reducing : cdr2_a_aa_pw to int16.
Reducing : cdr1_a_aa_pw to int16.
Reducing : pmhc_a_aa_pw to int16.
Reducing : cdr3_b_aa_pw to int16.
Reducing : cdr2_b_aa_pw to int16.
Reducing : cdr1_b_aa_pw to int16.
Reducing : pmhc_b_aa_pw to int16.
Reducing : pw_alpha to int16.
Reducing : pw_beta to int16.
Reducing : pw_tcrdist to int16.
Archiving your TCRrep using Zipdist2 (save = True)
Saving paired_tcrdist to .csv : default_archive/paired_tcrdist.csv
Saving paired_tcrdist to .npy : default_archive/paired_tcrdist.npy
Saving cdr3_a_aa_pw to .csv : default_archive/cdr3_a_aa_pw.csv
Saving cdr3_a_aa_pw to .npy : default_archive/cdr3_a_aa_pw.npy
Saving cdr2_a_aa_pw to .csv : default_archive/cdr2_a_aa_pw.csv
Saving cdr2_a_aa_pw to .npy : default_archive/cdr2_a_aa_pw.npy
Saving cdr1_a_aa_pw to .csv : default_archive/cdr1_a_aa_pw.csv
Saving cdr1_a_aa_pw to .npy : default_archive/cdr1_a_aa_pw.npy
Saving pmhc_a_aa_pw to .csv : default_archive/pmhc_a_aa_pw.csv
Saving pmhc_a_aa_pw to .npy : default_archive/pmhc_a_aa_pw.npy
Saving cdr3_b_aa_pw to .csv : default_archive/cdr3_b_aa_pw.csv
Saving cdr3_b_aa_pw to .npy : default_archive/cdr3_b_aa_pw.npy
Saving cdr2_b_aa_pw to .csv : default_archive/cdr2_b_aa_pw.csv
Saving cdr2_b_aa_pw to .npy : default_archive/cdr2_b_aa_pw.npy
Saving cdr1_b_aa_pw to .csv : default_archive/cdr1_b_aa_pw.csv
Saving cdr1_b_aa_pw to .npy : default_archive/cdr1_b_aa_pw.npy
Saving pmhc_b_aa_pw to .csv : default_archive/pmhc_b_aa_pw.csv
Saving pmhc_b_aa_pw to .npy : default_archive/pmhc_b_aa_pw.npy
Saving pw_alpha to .csv : default_archive/pw_alpha.csv
Saving pw_alpha to .npy : default_archive/pw_alpha.npy
Saving pw_beta to .csv : default_archive/pw_beta.csv
Saving pw_beta to .npy : default_archive/pw_beta.npy
Saving pw_tcrdist to .csv : default_archive/pw_tcrdist.csv
Saving pw_tcrdist to .npy : default_archive/pw_tcrdist.npy
Saving cell_df to .csv : default_archive/cell_df.csv
Saving cell_df to .feather : default_archive/cell_df.feather
Saving clone_df to .csv : default_archive/clone_df.csv
Saving clone_df to .feather : default_archive/clone_df.feather
Saving JSON with complex attribute definitions : default_archive/complex_attributes.json
Saving JSON with simple attribute definitions : default_archive/simple_attributes.json
Combining saved files in : [default_archive.tar.gz].
Archiving your TCRrep using Zipdist2 in [default_archive.tar.gz]
Archiving your TCRrep using Zipdist2 in [default_archive.tar.gz]
TCRrep.tcrdist2() COMPLETED SUCCESSFULLY, see the docs for Analysis steps!
TCRdist Legacy Distance¶
Calculating a legacy TCRdist using methods from Dash et al. 2017, can be achieved
with a few prepatory steps and tcrdist.repertoire.TCRrep._tcrdist_legacy_method_alpha_beta()
.
import pandas as pd
from tcrdist.repertoire import TCRrep
df = pd.read_csv('dash.csv')
tr = TCRrep(cell_df=df, chains=['alpha','beta'] organism='mouse')
tr.infer_cdrs_from_v_gene(chain='alpha', imgt_aligned=True)
tr.infer_cdrs_from_v_gene(chain='beta', imgt_aligned=True)
tr.index_cols = ['clone_id', 'subject', 'epitope',
'v_a_gene', 'j_a_gene', 'v_b_gene', 'j_b_gene',
'cdr3_a_aa', 'cdr3_b_aa',
'cdr1_a_aa', 'cdr2_a_aa', 'pmhc_a_aa',
'cdr1_b_aa', 'cdr2_b_aa', 'pmhc_b_aa',
'cdr3_b_nucseq', 'cdr3_a_nucseq',
'va_gene', 'vb_gene',
'ja_gene', 'jb_gene']
tr.deduplicate()
tr._tcrdist_legacy_method_alpha_beta()
tr.pw_tcrdist
Other methods are available for single chain and paired chain analysis:
tcrdist.repertoire.TCRrep._tcrdist_legacy_method_alpha_beta()
tcrdist.repertoire.TCRrep._tcrdist_legacy_method_alpha()
tcrdist.repertoire.TCRrep._tcrdist_legacy_method_beta()
tcrdist.repertoire.TCRrep._tcrdist_legacy_method_gamma_delta()
tcrdist.repertoire.TCRrep._tcrdist_legacy_method_gamma()
tcrdist.repertoire.TCRrep._tcrdist_legacy_method_delta()