API Reference

Submodules

seaborn_extensions.annotated_clustermap module

A replacement of seaborn.clustermap with additional features.

seaborn_extensions.annotated_clustermap.activate()[source]
seaborn_extensions.annotated_clustermap.clustermap(*args, **kwargs)[source]

Plot a matrix dataset as a hierarchically-clustered heatmap.

This function requires scipy to be available.

Parameters

data2D array-like

Rectangular data for clustering. Cannot contain NAs.

configstr, optional

EXTENSION! One of two pre-defined configurations: “abs”, “zscore”. These two configurations provide custom default keyword arguments compared with the native seaborn function and several adjustments to figure and axis sizes, labels and other objects. Options:

  • “abs”: good for non-negative data.

  • “zscore”: good for real data with variables with very different means.

Other keyword arguments affected (only is not provided):
  • {x,y}ticklabels: will turn off if more than 120 items in each axis.

  • dendrogram_ratio: will adjust, given relative shape of data.

methodstr, optional

Linkage method to use for calculating clusters. See scipy.cluster.hierarchy.linkage() documentation for more information.

metricstr, optional

Distance metric to use for the data. See scipy.spatial.distance.pdist() documentation for more options. To use different metrics (or methods) for rows and columns, you may construct each linkage matrix yourself and provide them as {row,col}_linkage.

z_scoreint or None, optional

Either 0 (rows) or 1 (columns). Whether or not to calculate z-scores for the rows or the columns. Z scores are: z = (x - mean)/std, so values in each row (column) will get the mean of the row (column) subtracted, then divided by the standard deviation of the row (column). This ensures that each row (column) has mean of 0 and variance of 1.

standard_scaleint or None, optional

Either 0 (rows) or 1 (columns). Whether or not to standardize that dimension, meaning for each row or column, subtract the minimum and divide each by its maximum.

figsizetuple of (width, height), optional

Overall size of the figure.

cbar_kwsdict, optional

Keyword arguments to pass to cbar_kws in heatmap(), e.g. to add a label to the colorbar.

{row,col}_clusterbool, optional

If True, cluster the {rows, columns}.

{row,col}_linkagenumpy.ndarray, optional

Precomputed linkage matrix for the rows or columns. See scipy.cluster.hierarchy.linkage() for specific formats.

{row,col}_colorslist-like or pandas DataFrame/Series, optional

EXTENSION! List of colors to label for either the rows or columns. Useful to evaluate whether samples within a group are clustered together. Can use nested lists or DataFrame for multiple color levels of labeling. If given as a DataFrame or Series, labels for the colors are extracted from the DataFrames column names or from the name of the Series. DataFrame/Series colors are also matched to the data by their index, ensuring colors are drawn in the correct order.

TODO: complete defining new behavious

{row,col}_colors_cmaps: Sequence[str]

EXTENSION! Colormaps to be used for the variables provided in {row,col}_colors.

maskbool array or DataFrame, optional

If passed, data will not be shown in cells where mask is True. Cells with missing values are automatically masked. Only used for visualizing, not for calculating.

{dendrogram,colors}_ratiofloat, or pair of floats, optional

Proportion of the figure size devoted to the two marginal elements. If a pair is given, they correspond to (row, col) ratios.

cbar_postuple of (left, bottom, width, height), optional

Position of the colorbar axes in the figure. Setting to None will disable the colorbar.

tree_kwsdict, optional

Parameters for the matplotlib.collections.LineCollection that is used to plot the lines of the dendrogram tree.

pvaluespandas DataFrame, optional

EXTENSION! A dataframe matching the input shape, where the values are p-values. Values 0.05 > p > 0.01 will be labeled with ‘*’. Values p < 0.01 will be labeled with ‘**’. Values p >= 0.05 will not be labeled. This will be overlaid as text on top of the heatmap. If providing pvalues, annot cannot be used.

square: bool, optional

EXTENSION! Try to make the shape of the figure as square as possible. If used, figsize will be ignored.

Returns

ClusterGrid

A ClusterGrid instance.

See Also

heatmap : Plot rectangular data as a color-encoded matrix.

Notes

The returned object has a savefig method that should be used if you want to save the figure object without clipping the dendrograms.

To access the reordered row indices, use: clustergrid.dendrogram_row.reordered_ind

Column indices, use: clustergrid.dendrogram_col.reordered_ind

Examples

seaborn_extensions.annotated_clustermap.colorbar_decorator(f: Callable) Callable[source]

Decorate seaborn.clustermap in order to have numeric values passed to the row_colors and col_colors arguments translated into row and column annotations and in addition colorbars for the restpective values.

seaborn_extensions.annotated_clustermap.get_attribute_colors(y: DataFrame, attributes: Sequence[str], palettes: Mapping[str, Tuple[float]], cmaps: Mapping[str, str], as_dataframe: bool = False) ndarray | DataFrame[source]
seaborn_extensions.annotated_clustermap.plot_attribute_heatmap(y: DataFrame, attributes: Sequence[str], palettes: Mapping[str, Tuple[float]], cmaps: Mapping[str, str], **kwargs) Figure[source]

seaborn_extensions.rankplot module

seaborn_extensions.rankplot.rankplot(series: Series, annotate_text: bool | Sequence[str] = True, n_top: int = 10, diff_threshold: float | None = None, fig_kws: Dict | None = None, scatter_kws: Dict | None = None, ax_kws: Dict | None = None, ax: Axis | None = None) Figure[source]

seaborn_extensions.swarmboxenplot module

A type of plot that combines swarms and box(en)/bar plots in an overlaid fashion.

seaborn_extensions.swarmboxenplot.swarmboxenplot(data: DataFrame, x: str, y: str | set | MutableSequence | Series | Index, hue: str | None = None, swarm: bool = True, boxen: bool = True, bar: bool = False, orient: str = 'vertical', plot: bool = True, ax: Axis | Sequence[Axis] | None = None, test: bool | str = 'mann-whitney', to_test: str = 'all', multiple_testing: bool | str = 'fdr_bh', test_upper_threshold: float = 0.05, test_lower_threshold: float = 0.01, plot_non_significant: bool = False, plot_kws: Dict[str, Any] | None = None, test_kws: Dict[str, Any] | None = None, fig_kws: Dict[str, Any] | None = None, tqdm: bool | Dict[str, Any] = True) Figure | DataFrame | Tuple[Figure, DataFrame] | None[source]

A categorical plot that overlays individual observations as a swarm plot and summary statistics about them in a boxen plot.

In addition, this plot will test differences between observation groups and add lines representing a significant difference between them.

Parameters

data: pd.DataFrame

A dataframe with data where the rows are the observations and columns are the variables to group them by.

x: str

The categorical variable.

y: str | list[str]

The continuous variable to plot. If more than one is given, will ignore the ax attribute and return figure with a subplot per each y variable.

hue: str, optional

An optional categorical variable to further group observations by.

swarm: bool

Whether to plot individual observations as a swarmplot.

boxen: bool

Whether to plot summary statistics as a boxenplot.

bar: bool

Whether to plot summary statistics as a barplot.

orient: str
Whether the plot should be oriented horizontally or vertically with relation to the numeric values y.
  • ‘vertical’: y-axis is y variable (numeric).

  • ‘horizontal’: x-axis is y variable (numeric).

Default is ‘vertical’.

ax: matplotlib.axes.Axes, optional

An optional axes to draw in.

test: bool | str

Whether to test differences between observation groups. If False, will not return a dataframe as well. If a string is passed, will perform test accordingly. Available tests:

  • ‘t-test’

  • ‘mann-whitney’

  • ‘kruskal’

Default is a parwise ‘mann-whitney’ test with p-value adjustment.

to_test: str

Whether to test all possible combinations or just within hue groups for each x. Only relevant when hue is not None.

  • ‘all’: a model “y ~ x * hue”, i.e. test between x groups, and within hue for each x.

  • ‘hue’: a model “y ~ x | hue”, i.e. test within hue for each x.

multiple_testing: str

Method for multiple testing correction.

test_upper_threshold: float

Upper theshold to consider p-values significant. Will be marked with “*”.

test_lower_threshold: float

Secondary theshold to consider p-values highly significant. Will be marked with “**”.

plot_non_significant: bool

Whether to add a “n.s.” sign to p-values above test_upper_threshold.

plot_kws: dict

Additional values to pass to seaborn.boxenplot or seaborn.swarmplot

test_kws: dict

Additional values to pass to pingouin.pairwise_tests. The default is: dict(parametric=False) to run a non-parametric test.

tqdm: bool, dict

Additional values to pass to pingouin.pairwise_tests. The default is: dict(parametric=False) to run a non-parametric test.

Returns

tuple[Figure, pandas.DataFrame]:

if ax is None and test is True.

pandas.DataFrame: if ax is not None. Figure: if test is False.

None:

if test is False and ax is not None.

Raises

ValueError:

If either the x or hue column in data are not Category, string or object type, or if y is not numeric.

seaborn_extensions.types module

seaborn_extensions.utils module

Utility functions used throughout the package.

seaborn_extensions.utils.close_plots(func: Callable) None[source]

Decorator to close all plots on function exit.

seaborn_extensions.utils.filter_kwargs_by_callable(kwargs: Dict[str, Any], callabl: Callable, exclude: List[str] | None = None, allow_kwargs: bool = False) Dict[str, Any][source]

Filter a dictionary keeping only the keys which are part of a function signature.

seaborn_extensions.utils.get_categorical_cmap(x: Series) ListedColormap[source]

Choose a colormap for a categorical series encoded as ints.

seaborn_extensions.utils.get_grid_dims(dims: int | Collection, _nstart: int | None = None) Tuple[int, int][source]

Given a number of dims subplots, choose optimal x/y dimentions of plotting grid maximizing in order to be as square as posible and if not with more columns than rows.

seaborn_extensions.utils.get_n_colors(n: int, max_value: float = 1.0) ndarray[source]

With modifications from https://stackoverflow.com/a/13781114/1469535

seaborn_extensions.utils.is_datetime(x: Series) bool[source]
seaborn_extensions.utils.is_documented_by(original)[source]
seaborn_extensions.utils.is_numeric(x: Series | Any) bool[source]
seaborn_extensions.utils.log_pvalues(x, f: float = 0.1)[source]

Calculate -log10(p-value) of array.

Replaces infinite values with:

max(x) + max(x) * f

that is, fraction f more than the maximum non-infinite -log10(p-value).

Parameters

xpandas.Series

Series with numeric values

ffloat

Fraction to augment the maximum value by if x contains infinite values.

Defaults to 0.1.

Returns

pandas.Series

Transformed values.

seaborn_extensions.utils.minmax_scale(x: ndarray) ndarray[source]
seaborn_extensions.utils.minmax_scale(x: DataFrame) DataFrame
seaborn_extensions.utils.to_color_dataframe(x: Series | DataFrame, cmaps: str | Sequence[str] | None = None, offset: int = 0) DataFrame[source]

Map a numeric pandas DataFrame to RGB values.

seaborn_extensions.utils.to_color_series(x: Series, cmap: str | None = None) Series[source]

Map a numeric pandas series to a series of RBG values. NaN values are white.

seaborn_extensions.utils.to_numeric(x: Series) Series[source]

Encode a string or categorical series to integer type.

seaborn_extensions.volcano module

seaborn_extensions.volcano.volcano_plot(stats: DataFrame, annotate_text: bool | Sequence[str] = True, diff_threshold: float | None = 0.05, n_top: int | None = None, invert_direction: bool = True, fig_kws: Dict | None = None, axes: Sequence[Axis] | None = None) Figure[source]
Assumes stats dataframe from seaborn_extensions.swarmboxenplot:
  • “hedges/coefs” column with effect sizes

  • “p-unc/pvalues” column with significance

  • “p-cor” column with significance corrected for multiple testing (will be added if missing)

  • “Variable” column with variable names (will use dataframe index if missing)

If multiple tests are performed, each will be plotted in a subplot: - “A”, “B” group identifiers such that hedges is positive value ~if mean(A) > mean(B).

Module contents

Top-level package for seaborn_extensions.