isv package¶
Subpackages¶
Submodules¶
isv.alternative module¶
- isv.alternative.alternative_data(cnvs: list | ndarray | DataFrame, labels: list | ndarray, extra_columns: DataFrame | dict, cnv_type: str) DMatrix ¶
Prepare dataset
In the first step annotate and scale cnvs in bed format by ISV. Then add columns specified in a dictionary or dataframe.
- Parameters:
cnvs – cnvs specified in a bed format
extra_columns – columns to be added to the dataset
cnv_type – either DUP or DEL
- Returns:
pandas dataframe
- isv.alternative.alternative_model(train_dmat: DMatrix, val_dmat: DMatrix, params=None, num_boost_round: int = 100, early_stopping_rounds: int = 15, verbose_eval: int = 0)¶
Train a model with alternative
- Parameters:
train_dmat – result of “alternative_data” function for train cnvs
val_dmat – result of “alternative_data” function for validation cnvs
params – model parameters. If not set, ISV parameters are used
num_boost_round – max number of boosting rounds
early_stopping_rounds – Early stopping rounds
verbose_eval – verbosity
- Returns:
xgboost model
isv.annotate module¶
- isv.annotate.annotate(cnvs)¶
- Parameters:
cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)
- Returns:
pd DataFrame of annotated CNVs
- isv.annotate.annotate_cnv(chrom, start, end, gencode_genes, regulatory, hi_genes, hits_regions)¶
Annotate a candidate CNV
- Parameters:
chrom – chromosome identifier, eg. “chr1”
start – start position on the GRCh38 assembly
end – end position on the GRCh38 assembly
gencode_genes – preprocessed gencode genes dict
regulatory – preprocessed regulatory genes dict
hi_genes – preprocessed hi_genes genes dict
hits_regions – preprocessed hi_regions and ts regions dict
- Returns:
annotation
- isv.annotate.get_el(arr, start: int, end: int)¶
Get overlapped elements
- Parameters:
arr – array to query
start – start position
end – end position
- Returns:
rows of array overlapped by start, end positions
- isv.annotate.open_data(f)¶
Open preprocessed data
- Parameters:
f – filepath
- Returns:
python dictionary of numpy arrays
isv.config module¶
- class isv.config.Settings¶
Bases:
object
isv.isv module¶
- class isv.isv.ISV(cnvs)¶
Bases:
object
Annotate and Predict pathogenicity of CNVs
- Parameters:
cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)
- Returns:
ISV output as a pandas dataframe
- predict(proba: bool = True, threshold: float = 0.95)¶
Generate ISV predictions
- Parameters:
proba – whether probabilities should be calculated
threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)
- Returns:
dataframe with last column representing the ISV predictions
- shap(df: DataFrame | None = None)¶
Calculate SHAP values
- Returns:
dataframe of shap values
- waterfall(cnv_index: int, filepath: str = 'temp-plot.html', return_fig: bool = False, pathogenic_color: str = 'rgb(255, 0, 50)', benign_color: str = 'rgb(58, 130, 255)', text_position: str = 'outside', width: int = 800, height: int = 800)¶
Waterfall plot for CNV at specified index
- Parameters:
cnv_index – CNV which should be plotted
filepath – Path where resulting html file will be saved
return_fig – whether raw figure should be returned
pathogenic_color – color of bars pushing predictions to pathogenic values
benign_color – color of bars pushing predictions to benign values
text_position – text position
width – figure width
height – figure height
- Returns:
html plot
isv.predict module¶
- isv.predict.predict(annotated_cnvs: DataFrame, proba: bool = True, threshold: float = 0.95)¶
Predict bulk of CNVs with different cnv types
- Parameters:
annotated_cnvs – Annotated CNVs
proba – whether probabilities should be calculated
threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)
- Returns:
predictions
- isv.predict.predict_with_same_cnv_type(annotated_cnvs: DataFrame, cnv_type: str)¶
Return model predictions for a selected dataframe
- Parameters:
annotated_cnvs – Raw counts of genomic elements
cnv_type – type of cnv
- Returns:
yhat: predicted values
isv.shap_vals module¶
- isv.shap_vals.shap_values(annotated_cnvs: DataFrame)¶
Calculate SHAP values
- Parameters:
annotated_cnvs – annotated cnvs
- Returns:
explainer object
- isv.shap_vals.shap_values_with_same_cnv_type(annotated_cnvs: DataFrame, cnv_type: str, raw: bool = False)¶
Calculate SHAP values for CNVs with the same cnv type
- Parameters:
annotated_cnvs – Raw counts of genomic elements
cnv_type – type of cnv
raw – whether raw shap explainer object should be returned
- Returns:
explainer object
Module contents¶
- isv.isv(cnvs, proba: bool = True, shap: bool = False, threshold: float = 0.95)¶
Predict pathogenicity, and optionally calculate shap values of CNVs with this simple wrapper class
- Parameters:
cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)
proba – whether probabilities should be calculated
shap – whether probabilities should be calculated
threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)
- Returns:
pandas dataframe of results