API Reference¶
Submodules¶
seaborn_extensions.annotated_clustermap module¶
A replacement of seaborn.clustermap with additional features.
- seaborn_extensions.annotated_clustermap.clustermap(*args, **kwargs)[source]¶
Plot a matrix dataset as a hierarchically-clustered heatmap.
This function requires scipy to be available.
Parameters¶
- data2D array-like
Rectangular data for clustering. Cannot contain NAs.
- configstr, optional
EXTENSION! One of two pre-defined configurations: “abs”, “zscore”. These two configurations provide custom default keyword arguments compared with the native seaborn function and several adjustments to figure and axis sizes, labels and other objects. Options:
“abs”: good for non-negative data.
“zscore”: good for real data with variables with very different means.
- Other keyword arguments affected (only is not provided):
{x,y}ticklabels: will turn off if more than 120 items in each axis.
dendrogram_ratio: will adjust, given relative shape of data.
- methodstr, optional
Linkage method to use for calculating clusters. See
scipy.cluster.hierarchy.linkage()
documentation for more information.- metricstr, optional
Distance metric to use for the data. See
scipy.spatial.distance.pdist()
documentation for more options. To use different metrics (or methods) for rows and columns, you may construct each linkage matrix yourself and provide them as {row,col}_linkage.- z_scoreint or None, optional
Either 0 (rows) or 1 (columns). Whether or not to calculate z-scores for the rows or the columns. Z scores are: z = (x - mean)/std, so values in each row (column) will get the mean of the row (column) subtracted, then divided by the standard deviation of the row (column). This ensures that each row (column) has mean of 0 and variance of 1.
- standard_scaleint or None, optional
Either 0 (rows) or 1 (columns). Whether or not to standardize that dimension, meaning for each row or column, subtract the minimum and divide each by its maximum.
- figsizetuple of (width, height), optional
Overall size of the figure.
- cbar_kwsdict, optional
Keyword arguments to pass to cbar_kws in
heatmap()
, e.g. to add a label to the colorbar.- {row,col}_clusterbool, optional
If
True
, cluster the {rows, columns}.- {row,col}_linkage
numpy.ndarray
, optional Precomputed linkage matrix for the rows or columns. See
scipy.cluster.hierarchy.linkage()
for specific formats.- {row,col}_colorslist-like or pandas DataFrame/Series, optional
EXTENSION! List of colors to label for either the rows or columns. Useful to evaluate whether samples within a group are clustered together. Can use nested lists or DataFrame for multiple color levels of labeling. If given as a DataFrame or Series, labels for the colors are extracted from the DataFrames column names or from the name of the Series. DataFrame/Series colors are also matched to the data by their index, ensuring colors are drawn in the correct order.
TODO: complete defining new behavious
- {row,col}_colors_cmaps: Sequence[str]
EXTENSION! Colormaps to be used for the variables provided in {row,col}_colors.
- maskbool array or DataFrame, optional
If passed, data will not be shown in cells where mask is True. Cells with missing values are automatically masked. Only used for visualizing, not for calculating.
- {dendrogram,colors}_ratiofloat, or pair of floats, optional
Proportion of the figure size devoted to the two marginal elements. If a pair is given, they correspond to (row, col) ratios.
- cbar_postuple of (left, bottom, width, height), optional
Position of the colorbar axes in the figure. Setting to
None
will disable the colorbar.- tree_kwsdict, optional
Parameters for the
matplotlib.collections.LineCollection
that is used to plot the lines of the dendrogram tree.- pvaluespandas DataFrame, optional
EXTENSION! A dataframe matching the input shape, where the values are p-values. Values 0.05 > p > 0.01 will be labeled with ‘*’. Values p < 0.01 will be labeled with ‘**’. Values p >= 0.05 will not be labeled. This will be overlaid as text on top of the heatmap. If providing pvalues, annot cannot be used.
- square: bool, optional
EXTENSION! Try to make the shape of the figure as square as possible. If used, figsize will be ignored.
Returns¶
ClusterGrid
A
ClusterGrid
instance.
See Also¶
heatmap : Plot rectangular data as a color-encoded matrix.
Notes¶
The returned object has a
savefig
method that should be used if you want to save the figure object without clipping the dendrograms.To access the reordered row indices, use:
clustergrid.dendrogram_row.reordered_ind
Column indices, use:
clustergrid.dendrogram_col.reordered_ind
Examples¶
- seaborn_extensions.annotated_clustermap.colorbar_decorator(f: Callable) Callable [source]¶
Decorate seaborn.clustermap in order to have numeric values passed to the
row_colors
andcol_colors
arguments translated into row and column annotations and in addition colorbars for the restpective values.
seaborn_extensions.rankplot module¶
seaborn_extensions.swarmboxenplot module¶
A type of plot that combines swarms and box(en)/bar plots in an overlaid fashion.
- seaborn_extensions.swarmboxenplot.swarmboxenplot(data: DataFrame, x: str, y: str | set | MutableSequence | Series | Index, hue: str | None = None, swarm: bool = True, boxen: bool = True, bar: bool = False, orient: str = 'vertical', plot: bool = True, ax: Axis | Sequence[Axis] | None = None, test: bool | str = 'mann-whitney', to_test: str = 'all', multiple_testing: bool | str = 'fdr_bh', test_upper_threshold: float = 0.05, test_lower_threshold: float = 0.01, plot_non_significant: bool = False, plot_kws: Dict[str, Any] | None = None, test_kws: Dict[str, Any] | None = None, fig_kws: Dict[str, Any] | None = None, tqdm: bool | Dict[str, Any] = True) Figure | DataFrame | Tuple[Figure, DataFrame] | None [source]¶
A categorical plot that overlays individual observations as a swarm plot and summary statistics about them in a boxen plot.
In addition, this plot will test differences between observation groups and add lines representing a significant difference between them.
Parameters¶
- data: pd.DataFrame
A dataframe with data where the rows are the observations and columns are the variables to group them by.
- x: str
The categorical variable.
- y: str | list[str]
The continuous variable to plot. If more than one is given, will ignore the ax attribute and return figure with a subplot per each y variable.
- hue: str, optional
An optional categorical variable to further group observations by.
- swarm: bool
Whether to plot individual observations as a swarmplot.
- boxen: bool
Whether to plot summary statistics as a boxenplot.
- bar: bool
Whether to plot summary statistics as a barplot.
- orient: str
- Whether the plot should be oriented horizontally or vertically with relation to the numeric values y.
‘vertical’: y-axis is y variable (numeric).
‘horizontal’: x-axis is y variable (numeric).
Default is ‘vertical’.
- ax: matplotlib.axes.Axes, optional
An optional axes to draw in.
- test: bool | str
Whether to test differences between observation groups. If False, will not return a dataframe as well. If a string is passed, will perform test accordingly. Available tests:
‘t-test’
‘mann-whitney’
‘kruskal’
Default is a parwise ‘mann-whitney’ test with p-value adjustment.
- to_test: str
Whether to test all possible combinations or just within hue groups for each x. Only relevant when hue is not None.
‘all’: a model “y ~ x * hue”, i.e. test between x groups, and within hue for each x.
‘hue’: a model “y ~ x | hue”, i.e. test within hue for each x.
- multiple_testing: str
Method for multiple testing correction.
- test_upper_threshold: float
Upper theshold to consider p-values significant. Will be marked with “*”.
- test_lower_threshold: float
Secondary theshold to consider p-values highly significant. Will be marked with “**”.
- plot_non_significant: bool
Whether to add a “n.s.” sign to p-values above test_upper_threshold.
- plot_kws: dict
Additional values to pass to seaborn.boxenplot or seaborn.swarmplot
- test_kws: dict
Additional values to pass to pingouin.pairwise_tests. The default is: dict(parametric=False) to run a non-parametric test.
- tqdm: bool, dict
Additional values to pass to pingouin.pairwise_tests. The default is: dict(parametric=False) to run a non-parametric test.
Returns¶
- tuple[Figure, pandas.DataFrame]:
if ax is None and test is True.
pandas.DataFrame: if ax is not None. Figure: if test is False.
- None:
if test is False and ax is not None.
Raises¶
- ValueError:
If either the x or hue column in data are not Category, string or object type, or if y is not numeric.
seaborn_extensions.types module¶
seaborn_extensions.utils module¶
Utility functions used throughout the package.
- seaborn_extensions.utils.close_plots(func: Callable) None [source]¶
Decorator to close all plots on function exit.
- seaborn_extensions.utils.filter_kwargs_by_callable(kwargs: Dict[str, Any], callabl: Callable, exclude: List[str] | None = None, allow_kwargs: bool = False) Dict[str, Any] [source]¶
Filter a dictionary keeping only the keys which are part of a function signature.
- seaborn_extensions.utils.get_categorical_cmap(x: Series) ListedColormap [source]¶
Choose a colormap for a categorical series encoded as ints.
- seaborn_extensions.utils.get_grid_dims(dims: int | Collection, _nstart: int | None = None) Tuple[int, int] [source]¶
Given a number of dims subplots, choose optimal x/y dimentions of plotting grid maximizing in order to be as square as posible and if not with more columns than rows.
- seaborn_extensions.utils.get_n_colors(n: int, max_value: float = 1.0) ndarray [source]¶
With modifications from https://stackoverflow.com/a/13781114/1469535
- seaborn_extensions.utils.log_pvalues(x, f: float = 0.1)[source]¶
Calculate -log10(p-value) of array.
Replaces infinite values with:
max(x) + max(x) * f
that is, fraction
f
more than the maximum non-infinite -log10(p-value).Parameters¶
- x
pandas.Series
Series with numeric values
- f
float
Fraction to augment the maximum value by if
x
contains infinite values.Defaults to 0.1.
Returns¶
pandas.Series
Transformed values.
- x
- seaborn_extensions.utils.minmax_scale(x: ndarray) ndarray [source]¶
- seaborn_extensions.utils.minmax_scale(x: DataFrame) DataFrame
- seaborn_extensions.utils.to_color_dataframe(x: Series | DataFrame, cmaps: str | Sequence[str] | None = None, offset: int = 0) DataFrame [source]¶
Map a numeric pandas DataFrame to RGB values.
seaborn_extensions.volcano module¶
- seaborn_extensions.volcano.volcano_plot(stats: DataFrame, annotate_text: bool | Sequence[str] = True, diff_threshold: float | None = 0.05, n_top: int | None = None, invert_direction: bool = True, fig_kws: Dict | None = None, axes: Sequence[Axis] | None = None) Figure [source]¶
- Assumes stats dataframe from seaborn_extensions.swarmboxenplot:
“hedges/coefs” column with effect sizes
“p-unc/pvalues” column with significance
“p-cor” column with significance corrected for multiple testing (will be added if missing)
“Variable” column with variable names (will use dataframe index if missing)
If multiple tests are performed, each will be plotted in a subplot: - “A”, “B” group identifiers such that hedges is positive value ~if mean(A) > mean(B).
Module contents¶
Top-level package for seaborn_extensions.