isv package¶

Subpackages¶

isv.scripts package

Submodules¶

isv.alternative module¶

isv.alternative.alternative_data(cnvs: list | ndarray | DataFrame, labels: list | ndarray, extra_columns: DataFrame | dict, cnv_type: str) → DMatrix¶

Prepare dataset

In the first step annotate and scale cnvs in bed format by ISV. Then add columns specified in a dictionary or dataframe.

Parameters:

cnvs – cnvs specified in a bed format
extra_columns – columns to be added to the dataset
cnv_type – either DUP or DEL

Returns:

pandas dataframe

isv.alternative.alternative_model(train_dmat: DMatrix, val_dmat: DMatrix, params=None, num_boost_round: int = 100, early_stopping_rounds: int = 15, verbose_eval: int = 0)¶

Train a model with alternative

Parameters:

train_dmat – result of “alternative_data” function for train cnvs
val_dmat – result of “alternative_data” function for validation cnvs
params – model parameters. If not set, ISV parameters are used
num_boost_round – max number of boosting rounds
early_stopping_rounds – Early stopping rounds
verbose_eval – verbosity

Returns:

xgboost model

isv.annotate module¶

isv.annotate.annotate(cnvs)¶

Parameters:: cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)
Returns:: pd DataFrame of annotated CNVs

isv.annotate.annotate_cnv(chrom, start, end, gencode_genes, regulatory, hi_genes, hits_regions)¶

Annotate a candidate CNV

Parameters:

chrom – chromosome identifier, eg. “chr1”
start – start position on the GRCh38 assembly
end – end position on the GRCh38 assembly
gencode_genes – preprocessed gencode genes dict
regulatory – preprocessed regulatory genes dict
hi_genes – preprocessed hi_genes genes dict
hits_regions – preprocessed hi_regions and ts regions dict

Returns:

annotation

isv.annotate.get_el(arr, start: int, end: int)¶

Get overlapped elements

Parameters:

arr – array to query
start – start position
end – end position

Returns:

rows of array overlapped by start, end positions

isv.annotate.open_data(f)¶

Open preprocessed data

Parameters:: f – filepath
Returns:: python dictionary of numpy arrays

isv.config module¶

class isv.config.Settings¶: Bases: object

isv.isv module¶

class isv.isv.ISV(cnvs)¶

Bases: object

Annotate and Predict pathogenicity of CNVs

Parameters:: cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)
Returns:: ISV output as a pandas dataframe

predict(proba: bool = True, threshold: float = 0.95)¶

Generate ISV predictions

Parameters:

proba – whether probabilities should be calculated
threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)

Returns:

dataframe with last column representing the ISV predictions

shap(df: DataFrame | None = None)¶

Calculate SHAP values

Returns:: dataframe of shap values

waterfall(cnv_index: int, filepath: str = 'temp-plot.html', return_fig: bool = False, pathogenic_color: str = 'rgb(255, 0, 50)', benign_color: str = 'rgb(58, 130, 255)', text_position: str = 'outside', width: int = 800, height: int = 800)¶

Waterfall plot for CNV at specified index

Parameters:

cnv_index – CNV which should be plotted
filepath – Path where resulting html file will be saved
return_fig – whether raw figure should be returned
pathogenic_color – color of bars pushing predictions to pathogenic values
benign_color – color of bars pushing predictions to benign values
text_position – text position
width – figure width
height – figure height

Returns:

html plot

isv.predict module¶

isv.predict.predict(annotated_cnvs: DataFrame, proba: bool = True, threshold: float = 0.95)¶

Predict bulk of CNVs with different cnv types

Parameters:

annotated_cnvs – Annotated CNVs
proba – whether probabilities should be calculated
threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)

Returns:

predictions

isv.predict.predict_with_same_cnv_type(annotated_cnvs: DataFrame, cnv_type: str)¶

Return model predictions for a selected dataframe

Parameters:

annotated_cnvs – Raw counts of genomic elements
cnv_type – type of cnv

Returns:

yhat: predicted values

isv.shap_vals module¶

isv.shap_vals.shap_values(annotated_cnvs: DataFrame)¶

Calculate SHAP values

Parameters:: annotated_cnvs – annotated cnvs
Returns:: explainer object

isv.shap_vals.shap_values_with_same_cnv_type(annotated_cnvs: DataFrame, cnv_type: str, raw: bool = False)¶

Calculate SHAP values for CNVs with the same cnv type

Parameters:

annotated_cnvs – Raw counts of genomic elements
cnv_type – type of cnv
raw – whether raw shap explainer object should be returned

Returns:

explainer object

Module contents¶

isv.isv(cnvs, proba: bool = True, shap: bool = False, threshold: float = 0.95)¶

Predict pathogenicity, and optionally calculate shap values of CNVs with this simple wrapper class

Parameters:

cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)
proba – whether probabilities should be calculated
shap – whether probabilities should be calculated
threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)

Returns:

pandas dataframe of results