isv package

Subpackages

Submodules

isv.alternative module

isv.alternative.alternative_data(cnvs: list | ndarray | DataFrame, labels: list | ndarray, extra_columns: DataFrame | dict, cnv_type: str) DMatrix

Prepare dataset

In the first step annotate and scale cnvs in bed format by ISV. Then add columns specified in a dictionary or dataframe.

Parameters:
  • cnvs – cnvs specified in a bed format

  • extra_columns – columns to be added to the dataset

  • cnv_type – either DUP or DEL

Returns:

pandas dataframe

isv.alternative.alternative_model(train_dmat: DMatrix, val_dmat: DMatrix, params=None, num_boost_round: int = 100, early_stopping_rounds: int = 15, verbose_eval: int = 0)

Train a model with alternative

Parameters:
  • train_dmat – result of “alternative_data” function for train cnvs

  • val_dmat – result of “alternative_data” function for validation cnvs

  • params – model parameters. If not set, ISV parameters are used

  • num_boost_round – max number of boosting rounds

  • early_stopping_rounds – Early stopping rounds

  • verbose_eval – verbosity

Returns:

xgboost model

isv.annotate module

isv.annotate.annotate(cnvs)
Parameters:

cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)

Returns:

pd DataFrame of annotated CNVs

isv.annotate.annotate_cnv(chrom, start, end, gencode_genes, regulatory, hi_genes, hits_regions)

Annotate a candidate CNV

Parameters:
  • chrom – chromosome identifier, eg. “chr1”

  • start – start position on the GRCh38 assembly

  • end – end position on the GRCh38 assembly

  • gencode_genes – preprocessed gencode genes dict

  • regulatory – preprocessed regulatory genes dict

  • hi_genes – preprocessed hi_genes genes dict

  • hits_regions – preprocessed hi_regions and ts regions dict

Returns:

annotation

isv.annotate.get_el(arr, start: int, end: int)

Get overlapped elements

Parameters:
  • arr – array to query

  • start – start position

  • end – end position

Returns:

rows of array overlapped by start, end positions

isv.annotate.open_data(f)

Open preprocessed data

Parameters:

f – filepath

Returns:

python dictionary of numpy arrays

isv.config module

class isv.config.Settings

Bases: object

isv.isv module

class isv.isv.ISV(cnvs)

Bases: object

Annotate and Predict pathogenicity of CNVs

Parameters:

cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)

Returns:

ISV output as a pandas dataframe

predict(proba: bool = True, threshold: float = 0.95)

Generate ISV predictions

Parameters:
  • proba – whether probabilities should be calculated

  • threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)

Returns:

dataframe with last column representing the ISV predictions

shap(df: DataFrame | None = None)

Calculate SHAP values

Returns:

dataframe of shap values

waterfall(cnv_index: int, filepath: str = 'temp-plot.html', return_fig: bool = False, pathogenic_color: str = 'rgb(255, 0, 50)', benign_color: str = 'rgb(58, 130, 255)', text_position: str = 'outside', width: int = 800, height: int = 800)

Waterfall plot for CNV at specified index

Parameters:
  • cnv_index – CNV which should be plotted

  • filepath – Path where resulting html file will be saved

  • return_fig – whether raw figure should be returned

  • pathogenic_color – color of bars pushing predictions to pathogenic values

  • benign_color – color of bars pushing predictions to benign values

  • text_position – text position

  • width – figure width

  • height – figure height

Returns:

html plot

isv.predict module

isv.predict.predict(annotated_cnvs: DataFrame, proba: bool = True, threshold: float = 0.95)

Predict bulk of CNVs with different cnv types

Parameters:
  • annotated_cnvs – Annotated CNVs

  • proba – whether probabilities should be calculated

  • threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)

Returns:

predictions

isv.predict.predict_with_same_cnv_type(annotated_cnvs: DataFrame, cnv_type: str)

Return model predictions for a selected dataframe

Parameters:
  • annotated_cnvs – Raw counts of genomic elements

  • cnv_type – type of cnv

Returns:

yhat: predicted values

isv.shap_vals module

isv.shap_vals.shap_values(annotated_cnvs: DataFrame)

Calculate SHAP values

Parameters:

annotated_cnvs – annotated cnvs

Returns:

explainer object

isv.shap_vals.shap_values_with_same_cnv_type(annotated_cnvs: DataFrame, cnv_type: str, raw: bool = False)

Calculate SHAP values for CNVs with the same cnv type

Parameters:
  • annotated_cnvs – Raw counts of genomic elements

  • cnv_type – type of cnv

  • raw – whether raw shap explainer object should be returned

Returns:

explainer object

Module contents

isv.isv(cnvs, proba: bool = True, shap: bool = False, threshold: float = 0.95)

Predict pathogenicity, and optionally calculate shap values of CNVs with this simple wrapper class

Parameters:
  • cnvs – a list, np.array or pandas dataframe with 4 columns representing chromosome (eg, chr3), cnv start (grch38), cnv end (grch38) and cnv_type (DUP or DEL)

  • proba – whether probabilities should be calculated

  • shap – whether probabilities should be calculated

  • threshold – probability threshold for classifying CNVs into three classes: Pathogenic (>= threshold), Uncertain significance ((1-threshold, threshold)) or Benign (<= 1 - threshold)

Returns:

pandas dataframe of results