API Documentation

class PyWGCNA.geneExp.GeneExp(species=None, level='gene', anndata=None, geneExp=None, geneExpPath=None, sep=',', geneInfo=None, sampleInfo=None)[source]

A class used to creat gene expression anndata along data trait including both genes and samples information.

Parameters:
  • species (str) – species of the data you use i.e mouse, human

  • level (str) – which type of data you use including gene, transcript (default: gene)

  • anndata – if the expression data is in anndata format you should pass it through this parameter. X should be expression matrix. var is a gene information and obs is a sample information.

  • anndata – anndata

  • geneExp (pandas dataframe) – expression matrix which genes are in the rows and samples are columns

  • geneExpPath (str) – path of expression matrix

  • sep (str) – separation symbol to use for reading data in geneExpPath properly

  • geneInfo (pandas dataframe) – dataframe that contains genes information it should have a same index as gene expression column names (gene/transcript ID)

  • sampleInfo (pandas dataframe) – dataframe that contains samples information it should have a same index as gene expression index (sample ID)

static updateGeneInfo(geneExpr, geneInfo=None, path=None, sep=',')[source]

add/update genes info in expr anndata

Parameters:
  • geneExpr (anndata) – gene expression data along with sample and genes/transcript information

  • geneInfo (pandas dataframe) – gene information table you want to add to your data

  • path (str) – path of geneInfo

  • sep (str) – separation symbol to use for reading data in path properly (default: ‘,’)

Returns:

updated gene expression data along with sample and genes/transcript information

Return type:

anndata

static updateSampleInfo(geneExpr, sampleInfo=None, path=None, sep=',')[source]

add/update metadata in expr anndata

Parameters:
  • geneExpr (anndata) – gene expression data along with sample and genes/transcript information

  • sampleInfo (pandas dataframe) – Sample information table you want to add to your data

  • path (str) – path of metaData

  • sep (str) – separation symbol to use for reading data in path properly (default: ‘,’)

Returns:

updated gene expression data along with sample and genes/transcript information

Return type:

anndata

class PyWGCNA.wgcna.WGCNA(name='WGCNA', TPMcutoff=1, powers=None, RsquaredCut=0.9, MeanCut=100, networkType='signed hybrid', TOMType='signed', minModuleSize=50, naColor='grey', cut=inf, MEDissThres=0.2, species=None, level='gene', anndata=None, geneExp=None, geneExpPath=None, sep=',', geneInfo=None, sampleInfo=None, save=False, outputPath=None, figureType='pdf')[source]

A class used to do weighted gene co-expression network analysis.

Parameters:
  • name (str) – name of the WGCNA we used to visualize data (default: ‘WGCNA’)

  • save (bool) – indicate if you want to save result of important steps in a figure directory (default: False)

  • species (str) – species of the data you use i.e mouse, human

  • level (str) – which type of data you use including gene, transcript (default: gene)

  • outputPath (str) – path you want to save all you figures and object (default: ‘’, where you rau your script)

  • anndata (anndata) – if the expression data is in anndata format you should pass it through this parameter. X should be expression matrix. var is a gene information and obs is a sample information.

  • geneExp (pandas dataframe) – expression matrix which genes are in the columns and samples are rows

  • geneExpPath (str) – path of expression matrix

  • sep (str) – separation symbol to use for reading data in geneExpPath properly

  • geneInfo (pandas dataframe) – dataframe that contains genes information it should have a same index as gene expression column names (gene/transcript ID)

  • sampleInfo (pandas dataframe) – dataframe that contains samples information it should have a same index as gene expression index (sample ID)

  • TPMcutoff (int) – cut off for removing genes that expressed under this number along samples

  • cut (float) – number to remove outlier sample (default: ‘inf’) By default we don’t remove any sample by hierarchical clustering

  • powers (list of int) – different powers to test to have scale free network (default: [1:10, 11:21:2])

  • RsquaredCut (float) – R squaered cut to choose power for having scale free network; between 0 to 1 (default: 0.9)

  • MeanCut (int) – mean connectivity to choose power for having scale free network (default: 100)

  • networkType (str) – Type of network we can create including “unsigned”, “signed” and “signed hybrid” (default: “signed hybrid”)

  • TOMType (str) – Type of topological overlap matrix(TOM) including “unsigned”, “signed” (default: “signed”)

  • minModuleSize (int) – We like large modules, so we set the minimum module size relatively high (default: 50)

  • naColor (str) – color we used to identify genes we don’t find any cluster for them (default: “grey”)

  • MEDissThres (float) – diss similarity threshold (default: 0.2)

  • figureType (str) – extension of figure (default: “pdf”)

  • MEs (ndarray) – eigengenes

  • geneExpr (geneExp class) – gene expression object that contains raw gene expression along with gene and sample information.

  • datExpr (anndata) – data expression data that contains preprocessed data

  • dynamicMods (list) – name of modules by clustering similar genes together

  • TOM – topological overlap measure using average linkage hierarchical clustering which inputs a measure of interconnectedness

  • TOM – ndarray

  • adjacency (ndarray) – adjacency matrix calculating base of the type of network

  • geneTree (ndarray) – average hierarchical clustering of dissTOM matrix

  • power (int) – power to have scale free network (default: 6)

  • sft (pandas dataframe) – soft threshold table which has information for each powers

  • datME (pandas dataframe)

:param signedKME:(signed) eigengene-based connectivity (module membership) :type signedKME: pandas dataframe :param moduleTraitCor: correlation between each module and metadata :type moduleTraitCor: pandas dataframe :param moduleTraitPvalue: p-value of correlation between each module and metadata :type moduleTraitPvalue: pandas dataframe

CalculateSignedKME(exprWeights=None, MEWeights=None)[source]

Calculation of (signed) eigengene-based connectivity, also known as module membership.

Parameters:
  • exprWeights (pandas dataframe) – optional weight matrix of observation weights for datExpr, of the same dimensions as datExpr

  • MEWeights (pandas dataframe) – optional weight matrix of observation weights for datME, of the same dimensions as datME

Returns:

A data frame in which rows correspond to input genes and columns to module eigengenes, giving the signed eigengene-based connectivity of each gene with respect to each eigengene.

Return type:

pandas dataframe

CoexpressionModulePlot(modules, numGenes=10, numConnections=100, minTOM=0, filters=None, file_name=None)[source]

plot Coexpression for given module

Parameters:
  • modules (list of str) – name of modules you like to plot

  • numGenes (int) – number of genes you want to show for each module

  • numConnections (int) – number of connection you want to show for each module

  • minTOM (float) – minimum TOM to keep connections

  • filters (dict) – Dictionary which keys are columns names of datExpr.var that you want to filter the genes based on it and values are determining which rows you want to keep

  • file_name (str) – name of the html output file (default: module names or network if there is more than 3 modules for input)

Returns:

save a html file with name of modules in figures directory

PPI_network(species, moduleName=None, geneList=None, output_format='image')[source]

retrieve an image of a STRING network of a neighborhood surrounding one or more proteins or ask STRING to show only the network of interactions between your input proteins.

Parameters:
  • species (int) – NCBI taxon identifiers (e.g. Human is 9606, see: https://string-db.org/cgi/input.pl?input_page_active_form=organisms).

  • moduleName (str) – name of module you want to find PPI

  • geneList (list) – list of genes you want to find PPI

  • output_format (str) – format of output which can be “image”, “highres_image”, “svg” (default: “image”)

Returns:

dataframe contains genes along with interaction with their scores

Return type:

pandas dataframe

static TOMsimilarity(adjMat, TOMType='signed', TOMDenom='min')[source]

Calculation of the topological overlap matrix, and the corresponding dissimilarity, from a given adjacency matrix

Parameters:
  • adjMat (pandas dataframe) – adjacency matrix, that is a square, symmetric matrix with entries between 0 and 1 (negative values are allowed if TOMType==”signed”).

  • TOMType (str) – one of “unsigned”, “signed”

  • TOMDenom (str) – a character string specifying the TOM variant to be used. Recognized values are “min” giving the standard TOM described in Zhang and Horvath (2005), and “mean” in which the min function in the denominator is replaced by mean. The “mean” may produce better results but at this time should be considered experimental.

Returns:

A matrix holding the topological overlap.

Return type:

pandas dataframe

static adjacency(datExpr, selectCols=None, adjacencyType='unsigned', power=6, corOptions=Empty DataFrame Columns: [] Index: [], weights=None, weightArgNames=None)[source]

Calculates (correlation or distance) network adjacency from given expression data or from a similarity

Parameters:
  • datExpr (pandas dataframe) – data frame containing expression data. Columns correspond to genes and rows to samples.

  • selectCols (list) – for correlation networks only; can be used to select genes whose adjacencies will be calculated. Should be either a numeric list giving the indices of the genes to be used, or a boolean list indicating which genes are to be used.

  • adjacencyType (str) – adjacency network type. Allowed values are (unique abbreviations of) “unsigned”, “signed”, “signed hybrid”. (default = unsigned)

  • power (int) – soft thresholding power.

  • corOptions (pandas dataframe) – specifying additional arguments to be passed to the function given by corFnc.

  • weights (pandas dataframe) – optional observation weights for datExpr to be used in correlation calculation. A matrix of the same dimensions as datExpr, containing non-negative weights. Only used with Pearson correlation.

  • weightArgNames (list) – character list of length 2 giving the names of the arguments to corFnc that represent weights for variable x and y. Only used if weights are non-NULL.

Returns:

Adjacency matrix

Return type:

pandas dataframe

analyseWGCNA(order=None, geneList=None, show=True, alternative='two-sided')[source]

Analysing results: 1.calculating module trait relationship 2.plotting module heatmap eigengene 3.finding GO term for each module

Parameters:
  • order (list) – indicate in which order metadata will show up in plots (should same as metadata name in anndata)

  • geneList (pandas dataframe) – genes information you want to add (keep in mind you can not have multiple row for same gene)

  • show (bool) – indicate if you want to see plots in when you run your code

  • alternative (str) – Defines the alternative hypothesis for calculating correlation for module-trait relationship. Default is ‘two-sided’. The following options are available: ‘two-sided’: the correlation is nonzero, ‘less’: the correlation is negative (less than zero), ‘greater’: the correlation is positive (greater than zero)

barplotModuleEigenGene(moduleName, metadata, combine=True, colorBar=None, show=True)[source]

bar plot of module eigen gene figure in given module

Parameters:
  • moduleName (str) – module name

  • metadata (list) – list of metadata you want to be plotted

  • combine (bool) – indicate if you want to combine all metadata to show them together

  • show (bool) – indicate if you want to see plots in when you run your code

Praram colorBar:

metadata you want to use to color bar plot with

static calBlockSize(matrixSize, rectangularBlocks=True, maxMemoryAllocation=None, overheadFactor=3)[source]

find suitable block size for calculating soft power threshold

static checkAdjMat(adjMat, min=0, max=1)[source]

check adjacency matrix format is correct

Parameters:
  • adjMat (pandas dataframe) – data we want to be checked

  • min (int) – minimum value to be allowed for data (default = 0)

  • max (int) – maximum value to be allowed for data (default = 1)

Raises:

exit – if format is not correct

static checkAndScaleWeights(weights, expr, scaleByMax=True)[source]

check and scale weights of gene expression :param weights: weights of gene expression :type weights: pandas dataframe :param expr: gene expression matrix :type expr: pandas dataframe :param scaleByMax: if you want to scale your weights by diving to max :type scaleByMax: boll

Returns:

processed weights of gene expression

Return type:

pandas dataframe

static checkSets(data, checkStructure=False, useSets=None)[source]

Checks whether given sets have the correct format and retrieves dimensions.

Parameters:
  • data (dict) – A dict of lists; in each list there must be a component named data whose content is a matrix or dataframe or array of dimension 2.

  • checkStructure (bool) – If FALSE, incorrect structure of data will trigger an error. If TRUE, an appropriate flag (see output) will be set to indicate whether data has correct structure. (default = False)

  • useSets (list) – Optional specification of entries of the list data that are to be checked. Defaults to all components. This may be useful when data only contains information for some of the sets.

Returns:

a dictionary contains: “nSets”: Number of sets (length of the vector data). “nGenes”: Number of columns in the data components in the lists. This number must be the same for all sets. “nSamples”: A vector of length nSets giving the number of rows in the data components. “structureOK”: Only set if the argument checkStructure equals TRUE. The value is TRUE if the paramter data passes a few tests of its structure, and FALSE otherwise. The tests are not exhaustive and are meant to catch obvious user errors rather than be bulletproof.

Return type:

dict

static checkSimilarity(adjMat, min=-1, max=1)[source]

check similarity matrix format is correct

Parameters:
  • adjMat (pandas dataframe) – data we want to be checked

  • min (int) – minimum value to be allowed for data (default = 0)

  • max (int) – maximum value to be allowed for data (default = 1)

Raises:

exit – if format is not correct

static consensusMEDissimilarityMajor(MEs, useAbs=False, useSets=None, method='consensus')[source]

Calculates consensus dissimilarity (1-cor) of given module eigengenes realized in several sets.

static consensusOrderMEs(MEs, useAbs=False, useSets=None, greyLast=True, greyName='MEgrey', method='consensus')[source]

Reorder given (eigen-)vectors such that similar ones (as measured by correlation) are next to each other.

Parameters:
  • MEs (dict) – Module eigengenes of several sets in a multi-set format

  • useAbs (bool) – Controls whether vector similarity should be given by absolute value of correlation or plain correlation. (defualt = False)

  • useSet – Allows the user to specify for which sets the eigengene ordering is to be performed.

  • greyLast (bool) – Normally the color grey is reserved for unassigned genes; hence the grey module is not a proper module and it is conventional to put it last. If this is not desired, set the parameter to FALSE. (defualt = True)

  • greyName (str) – Name of the grey module eigengene. (defualt = “MEgrey”)

  • method (str) – A character string giving the method to be used calculating the consensus dissimilarity. Allowed values are (abbreviations of) “consensus” and “majority”. The consensus dissimilarity is calculated as the maximum of given set dissimilarities for “consensus” and as the average for “majority”.

Returns:

A dictionary of the same type as MEs containing the re-ordered eigengenes

Return type:

dict

static cutree(sampleTree, cutHeight=50000.0)[source]

Given a linkage matrix Z, return the cut tree. remove samples/genes/modules base on hierarchical clustering

Parameters:
  • sampleTree (scipy.cluster.linkage array) – The linkage matrix.

  • cutHeight (array_like) – A optional height at which to cut the tree (default = 50000)

Returns:

An array indicating group membership at each agglomeration step. I.e., for a full cut tree, in the first column each data point is in its own cluster. At the next step, two nodes are merged. Finally, all singleton and non-singleton clusters are in one group. If n_clusters or height are given, the columns correspond to the columns of n_clusters or height.

Return type:

array

static cutreeHybrid(dendro, distM, cutHeight=None, minClusterSize=20, deepSplit=1, maxCoreScatter=None, minGap=None, maxAbsCoreScatter=None, minAbsGap=None, minSplitHeight=None, minAbsSplitHeight=None, externalBranchSplitFnc=None, nExternalSplits=0, minExternalSplit=None, externalSplitOptions=Empty DataFrame Columns: [] Index: [], externalSplitFncNeedsDistance=None, assumeSimpleExternalSpecification=True, pamStage=True, pamRespectsDendro=True, useMedoids=False, maxPamDist=None, respectSmallClusters=True)[source]

Detect clusters in a dendorgram produced by the function hclust.

Parameters:
  • dendro (ndarray) – a hierarchical clustering dendorgram such as one returned by hclust.

  • distM (pandas dataframe) – Distance matrix that was used as input to hclust.

  • cutHeight (int) – Maximum joining heights that will be considered. It defaults to 99of the range between the 5th percentile and the maximum of the joining heights on the dendrogram.

  • minClusterSize (int) – Minimum cluster size. (default = 20)

  • deepSplit (int or bool) – Either logical or integer in the range 0 to 4. Provides a rough control over sensitivity to cluster splitting. The higher the value, the more and smaller clusters will be produced. (default = 1)

  • maxCoreScatter (int) – Maximum scatter of the core for a branch to be a cluster, given as the fraction of cutHeight relative to the 5th percentile of joining heights.

  • minGap (int) – Minimum cluster gap given as the fraction of the difference between cutHeight and the 5th percentile of joining heights.

  • maxAbsCoreScatter (int) – Maximum scatter of the core for a branch to be a cluster given as absolute heights. If given, overrides maxCoreScatter.

  • minAbsGap (int) – Minimum cluster gap given as absolute height difference. If given, overrides minGap.

  • minSplitHeight (int) – Minimum split height given as the fraction of the difference between cutHeight and the 5th percentile of joining heights. Branches merging below this height will automatically be merged. Defaults to zero but is used only if minAbsSplitH

  • minAbsSplitHeight (int) – Minimum split height given as an absolute height. Branches merging below this height will automatically be merged. If not given (default), will be determined from minSplitHeight above.

  • externalBranchSplitFnc – Optional function to evaluate split (dissimilarity) between two branches. Either a single function or a list in which each component is a function.

  • minExternalSplit (list) – Thresholds to decide whether two branches should be merged. It should be a numeric list of the same length as the number of functions in externalBranchSplitFnc above.

  • externalSplitOptions (pandas dataframe) – Further arguments to function externalBranchSplitFnc. If only one external function is specified in externalBranchSplitFnc above, externalSplitOptions can be a named list of arguments or a list with one component.

  • externalSplitFncNeedsDistance (pandas dataframe) – Optional specification of whether the external branch split functions need the distance matrix as one of their arguments. Either NULL or a logical list with one element per branch

  • assumeSimpleExternalSpecification (bool) – when minExternalSplit above is a scalar (has length 1), should the function assume a simple specification of externalBranchSplitFnc and externalSplitOptions. (default = True)

  • pamStage (bool) – If TRUE, the second (PAM-like) stage will be performed. (default = True)

  • pamRespectsDendro (bool) – If TRUE, the PAM stage will respect the dendrogram in the sense an object can be PAM-assigned only to clusters that lie below it on the branch that the object is merged into. (default = True)

  • useMedoids – if TRUE, the second stage will be use object to medoid distance; if FALSE, it will use average object to cluster distance. (default = False)

  • maxPamDist (float) – Maximum object distance to closest cluster that will result in the object assigned to that cluster. Defaults to cutHeight.

  • respectSmallClusters (bool) – If TRUE, branches that failed to be clusters in stage 1 only because of insufficient size will be assigned together in stage 2. If FALSE, all objects will be assigned individually. (default = False)

Returns:

list detailing the deteced branch structure.

Return type:

list

findModules(kwargs_function={'cutreeHybrid': {'deepSplit': 2, 'pamRespectsDendro': False}})[source]

Clustering genes through original WGCNA pipeline: 1.pick soft threshold 2.calculating adjacency matrix 3.calculating TOM similarity matrix 4.cluster genes base of dissTOM 5.merge similar cluster dynamically

Parameters:

kwargs_function (dict) – dictionary where the keys are the name of the function and values are the dictionary contains parameter you want to change within function

static fixDataStructure(data)[source]

Encapsulates single-set data in a wrapper that makes the data suitable for functions working on multiset data collections.

Parameters:

data (pandas dataframe ot dict) – A dataframe, matrix or array with two dimensions to be encapsulated.

Returns:

input data in a format suitable for functions operating on multiset data collections.

Return type:

dict

functional_enrichment_analysis(type, moduleName, sets=None, p_value=1, file_name=None, **kwargs)[source]

Doing functional enrichment analysis including GO, KEGG and REACTOME

Parameters:
  • type (str) – indicate the type of databases which it should be one of “GO”, “KEGG”, “REACTOME”

  • moduleName (str) – module name

  • sets (str, list, tuple) – str, list, tuple of Enrichr Library name(s). or custom defined gene_sets (dict, or gmt file) (you can add any Enrichr Libraries from here: https://maayanlab.cloud/Enrichr/#stats) only need to fill if the type is GO or KEGG

  • p_value (float) – Defines the pValue threshold. (default: 0.05)

  • file_name (str) – name of the file you want to use to save plot (default is moduleName)

  • kwargs (key, value pairings) – Other keyword arguments are passed through to the underlying gseapy.enrichr() finction

getDatTraits(metaData)[source]

get data trait module base on samples information

Returns:

a dataframe contains information in suitable format for plotting module trait relationship heatmap

Return type:

pandas dataframe

getGeneModule(moduleName)[source]

get list of genes corresponding to modules

Parameters:

moduleName (list) – name of modules

Returns:

A dictionary contains list of genes for requested module(s)

Return type:

dict

getModuleName()[source]

get names of modules

Returns:

name of modules

Return type:

ndarray

getModulesGene(geneIds)[source]

get list of modules corresponding to gene(s)

Parameters:

geneIds (list or str) – gene id

Returns:

A list contains name of module(s) for requested gene(s)

Return type:

list or str

static goodGenesFun(datExpr, weights=None, useSamples=None, useGenes=None, minFraction=0.5, minNSamples=4, minNGenes=4, tol=None, minRelativeWeight=0.1)[source]

Check data for missing entries and returns a list of genes that have non-zero variance

:param datExpr:expression data. A data frame in which columns are genes and rows ar samples. :type datExpr: pandas dataframe :param weights: optional observation weights in the same format (and dimensions) as datExpr. :type weights: pandas dataframe :param useSamples: optional specifications of which samples to use for the check (Defaults to using all samples) :type useSamples: list of bool :param useGenes: optional specifications of genes for which to perform the check (Defaults to using all genes) :type useGenes: list of bool :param minFraction: minimum fraction of non-missing samples for a gene to be considered good. (default = 1/2) :type minFraction: float :param minNSamples: minimum number of non-missing samples for a gene to be considered good. (default = 4) :type minNSamples: int :param minNGenes: minimum number of good genes for the data set to be considered fit for analysis. If the actual number of good genes falls below this threshold, an error will be issued. (default = 4) :type minNGenes: int :param tol: An optional ‘small’ number to compare the variance against :type tol: float :param minRelativeWeight: observations whose relative weight is below this threshold will be considered missing. Here relative weight is weight divided by the maximum weight in the column (gene). (default = 0.1) :type minRelativeWeight: float

Returns:

A logical list with one entry per gene that is TRUE if the gene is considered good and FALSE otherwise. Note that all genes excluded by useGenes are automatically assigned FALSE.

Return type:

list of bool

static goodSamplesFun(datExpr, weights=None, useSamples=None, useGenes=None, minFraction=0.5, minNSamples=4, minNGenes=4, minRelativeWeight=0.1)[source]

Check data for missing entries and returns a list of samples that have non-zero variance

:param datExpr:expression data. A data frame in which columns are genes and rows ar samples. :type datExpr: pandas dataframe :param weights: optional observation weights in the same format (and dimensions) as datExpr. :type weights: pandas dataframe :param useSamples: optional specifications of which samples to use for the check (Defaults to using all samples) :type useSamples: list of bool :param useGenes: optional specifications of genes for which to perform the check (Defaults to using all genes) :type useGenes: list of bool :param minFraction: minimum fraction of non-missing samples for a gene to be considered good. (default = 1/2) :type minFraction: float :param minNSamples: findModulesminimum number of non-missing samples for a gene to be considered good. (default = 4) :type minNSamples: int :param minNGenes: minimum number of good genes for the data set to be considered fit for analysis. If the actual number of good genes falls below this threshold, an error will be issued. (default = 4) :type minNGenes: int :param minRelativeWeight: observations whose relative weight is below this threshold will be considered missing. Here relative weight is weight divided by the maximum weight in the column (gene). (default = 0.1) :type minRelativeWeight: float

Returns:

A logical list with one entry per sample that is TRUE if the sample is considered good and FALSE otherwise. Note that all samples excluded by useSamples are automatically assigned FALSE.

Return type:

list of bool

static goodSamplesGenes(datExpr, weights=None, minFraction=0.5, minNSamples=4, minNGenes=4, tol=None, minRelativeWeight=0.1)[source]

Checks data for missing entries, entries with weights below a threshold, and zero-variance genes. If necessary, the filtering is iterated.

:param datExpr:expression data. A data frame in which columns are samples and rows are gene. :type datExpr: pandas dataframe :param weights: optional observation weights in the same format (and dimensions) as datExpr. :type weights: pandas dataframe :param minFraction: minimum fraction of non-missing samples for a gene to be considered good. (default = 1/2) :type minFraction: float :param minNSamples: minimum number of non-missing samples for a gene to be considered good. (default = 4) :type minNSamples: int :param minNGenes: minimum number of good genes for the data set to be considered fit for analysis. If the actual number of good genes falls below this threshold, an error will be issued. (default = 4) :type minNGenes: int :param tol: An optional ‘small’ number to compare the variance against :type tol: float :param minRelativeWeight: observations whose relative weight is below this threshold will be considered missing. Here relative weight is weight divided by the maximum weight in the column (gene). (default = 0.1) :type minRelativeWeight: float

Returns:

A triple containing (goodGenes, goodSamples, allOK) goodSamples: A logical vector with one entry per sample that is TRUE if the sample is considered good and FALSE otherwise. goodGenes: A logical vector with one entry per gene that is TRUE if the gene is considered good and FALSE otherwise. allOK: if everything is okay

Return type:

list, list, bool

static hclust(d, method='complete')[source]

Hierarchical cluster analysis on a set of dissimilarities and methods for analyzing it.

Parameters:
  • d (ndarray) – a dissimilarity structure as produced by ‘pdist’.

  • method (str) – The linkage algorithm to use. (default = complete)

Returns:

The hierarchical clustering encoded as a linkage matrix.

Return type:

ndarray

static intramodularConnectivity(mat, colors, scaleByMax=False, index=None)[source]

Calculates intramodular connectivity, i.e., connectivity of nodes to other nodes within the same module.

Parameters:
  • mat (ndarray) – adjacency which should be a square, symmetric matrix with entries between 0 and 1.

  • colors (list) – module labels. A list of length ncol(adjMat) giving a module label for each gene (node) of the network.

  • scaleByMax (bool) – should intramodular connectivity be scaled by the maximum IM connectivity in each module?

  • index (ndarray) – gene id or name of mat index

Returns:

If input getWholeNetworkConnectivity is TRUE, a data frame with 4 columns giving the total connectivity, intramodular connectivity, extra-modular connectivity, and the difference of the intra- and extra-modular connectivities for all genes; otherwise a vector of intramodular connectivities

Return type:

pandas dataframe

static labels2colors(labels, zeroIsGrey=True, colorSeq=None, naColor='grey')[source]

Converts a vector or array of numerical labels into a corresponding vector or array of colors corresponding to the labels.

Parameters:
  • labels (list or matrix) – list or matrix of non-negative integer or other (such as character) labels.

  • zeroIsGrey (bool) – If TRUE, labels 0 will be assigned color grey. Otherwise, labels below 1 will trigger an error. (default = True)

  • colorSeq (list or matrix) – Color sequence corresponding to labels. If not given, a standard sequence will be used.

  • naColor (str) – Color that will encode missing values.

Returns:

An array of character strings of the same length or dimensions as labels.

Return type:

ndarray

static mergeCloseModules(exprData, colors, MEs=None, useSets=None, impute=True, checkDataFormat=True, unassdColor='grey', useAbs=False, equalizeQuantiles=False, quantileSummary='mean', consensusQuantile=0, cutHeight=0.2, iterate=True, relabel=False, colorSeq=None, getNewMEs=True, getNewUnassdME=True, trapErrors=False)[source]

Merges modules in gene expression networks that are too close as measured by the correlation of their eigengenes.

Parameters:
  • exprData (pandas dataframe) – Expression data, either a single data frame with rows corresponding to samples and columns to genes, or in a multi-set format.

  • colors (list) – A list (numeric, character or a factor) giving module colors for genes. The method only makes sense when genes have the same color label in all sets, hence a single list.

  • MEs (dict) – If module eigengenes have been calculated before, the user can save some computational time by inputting them. MEs should have the same format as exprData. If they are not given, they will be calculated.

  • useSets (list) – A list of scalar allowing the user to specify which sets will be used to calculate the consensus dissimilarity of module eigengenes. Defaults to all given sets.

  • impute (bool) – Should missing values be imputed in eigengene calculation? If imputation is disabled, the presence of NA entries will cause the eigengene calculation to fail and eigengenes will be replaced by their hubgene approximation. (defualt = True)

  • checkDataFormat (bool) – If TRUE, the function will check exprData and MEs for correct multi-set structure. If single set data is given, it will be converted into a format usable for the function. If FALSE, incorrect structure of input data will trigger an error. (defualt = True)

  • unassdColor (str) – Specifies the string that labels unassigned genes. Module of this color will not enter the module eigengene clustering and will not be merged with other modules. (default = “grey”)

  • useAbs (bool) – Specifies whether absolute value of correlation or plain correlation (of module eigengenes) should be used in calculating module dissimilarity. (defualt = False)

  • equalizeQuantiles (bool) – should quantiles of the eigengene dissimilarity matrix be equalized (“quantile normalized”)? The default is FALSE for reproducibility of old code; when there are many eigengenes (e.g., at least 50), better results may be achieved if quantile equalization is used. (defualt = False)

  • quantileSummary (str) – One of “mean” or “median”. Controls how a reference dissimilarity is computed from the input ones (using mean or median, respectively). (default = “mean”)

  • consensusQuantile (int) – A number giving the desired quantile to use in the consensus similarity calculation. (defualt = 0)

  • cutHeight (float) – Maximum dissimilarity (i.e., 1-correlation) that qualifies modules for merging. (defualt = 0.2)

  • iterate (bool) – Controls whether the merging procedure should be repeated until there is no change. If FALSE, only one iteration will be executed. (defualt = True)

  • relabel (bool) – Controls whether, after merging, color labels should be ordered by module size. (defualt = False)

  • colorSeq (list) – Color labels to be used for relabeling. Defaults to the standard color order used in this package if colors are not numeric, and to integers starting from 1 if colors is numeric.

  • getNewMEs (bool) – Controls whether module eigengenes of merged modules should be calculated and returned. (defualt = True)

  • getNewUnassdME (bool) – When doing module eigengene manipulations, the function does not normally calculate the eigengene of the ‘module’ of unassigned (‘grey’) genes. Setting this option to TRUE will force the calculation of the unassigned eigengene in the returned newMEs, but not in the returned oldMEs. (defualt = True)

  • trapErrors – Controls whether computational errors in calculating module eigengenes, their dissimilarity, and merging trees should be trapped. If TRUE, errors will be trapped and the function will return the input colors. If FALSE, errors will cause the function to stop. (defualt = False)

Returns:

A dictionaty contains: “colors”: Color labels for the genes corresponding to merged modules. The function attempts to mimic the mode of the input colors: if the input colors is numeric, character and factor, respectively, so is the output. Note, however, that if the fnction performs relabeling, a standard sequence of labels will be used: integers starting at 1 if the input colors is numeric, and a sequence of color labels otherwise. “dendro”: Hierarchical clustering dendrogram (average linkage) of the eigengenes of the most recently computed tree. If iterate was set TRUE, this will be the dendrogram of the merged modules, otherwise it will be the dendrogram of the original modules. “oldDendro”: Hierarchical clustering dendrogram (average linkage) of the eigengenes of the original modules. “cutHeight”: The input cutHeight. “oldMEs”: Module eigengenes of the original modules in the sets given by useSets. “newMEs”: Module eigengenes of the merged modules in the sets given by useSets. “allOK”: A boolean set to TRUE.

Raises:

trapErrors==TRUE – A dictionaty contains: “colors”: A copy of the input colors. “allOK”: a boolean set to FALSE.

Return type:

dict

static moduleEigengenes(expr, colors, impute=True, nPC=1, align='along average', excludeGrey=False, grey='grey', subHubs=True, softPower=6, scaleVar=True, trapErrors=False)[source]

Calculates module eigengenes (1st principal component) of modules in a given single dataset.

Parameters:
  • expr (pandas dataframe) – Expression data for a single set in the form of a data frame where rows are samples and columns are genes (probes).

  • colors (list) – A list of the same length as the number of probes in expr, giving module color for all probes (genes). Color “grey” is reserved for unassigned genes.

  • impute (bool) – If TRUE, expression data will be checked for the presence of NA entries and if the latter are present, numerical data will be imputed. (defualt = True)

  • nPC (int) – Number of principal components and variance explained entries to be calculated. Note that only the first principal component is returned; the rest are used only for the calculation of proportion of variance explained. If given nPC is greater than 10, a warning is issued. (default = 1)

  • align (str) – Controls whether eigengenes, whose orientation is undetermined, should be aligned with average expression (align = “along average”) or left as they are (align = “”). Any other value will trigger an error. (default = “along average”)

  • excludeGrey (bool) – Should the improper module consisting of ‘grey’ genes be excluded from the eigengenes (default = False)

  • grey (str) – Value of colors designating the improper module. Note that if colors is a factor of numbers, the default value will be incorrect. (default = grey)

  • subHubs (bool) – Controls whether hub genes should be substituted for missing eigengenes. If TRUE, each missing eigengene (i.e., eigengene whose calculation failed and the error was trapped) will be replaced by a weighted average of the most connected hub genes in the corresponding module. If this calculation fails, or if subHubs==FALSE, the value of trapErrors will determine whether the offending module will be removed or whether the function will issue an error and stop. (default = True)

  • softPower (int) – The power used in soft-thresholding the adjacency matrix. Only used when the hubgene approximation is necessary because the principal component calculation failed. It must be non-negative. The default value should only be changed if there is a clear indication that it leads to incorrect results. (default = 6)

  • trapErrors (bool) – Controls handling of errors from that may arise when there are too many NA entries in expression data. If TRUE, errors from calling these functions will be trapped without abnormal exit. If FALSE, errors will cause the function to stop. Note, however, that subHubs takes precedence in the sense that if subHubs==TRUE and trapErrors==FALSE, an error will be issued only if both the principal component and the hubgene calculations have failed. (default = False)

  • scaleVar (bool) – can be used to turn off scaling of the expression data before calculating the singular value decomposition. The scaling should only be turned off if the data has been scaled previously, in which case the function can run a bit faster. Note however that the function first imputes, then scales the expression data in each module. If the expression contain missing data, scaling outside of the function and letting the function impute missing data may lead to slightly different results than if the data is scaled within the function. (default = True)

Returns:

A dictionary containing: “eigengenes”: Module eigengenes in a dataframe, with each column corresponding to one eigengene. The columns are named by the corresponding color with an “ME” prepended, e.g., MEturquoise etc. If returnValidOnly==FALSE, module eigengenes whose calculation failed have all components set to NA. “averageExpr”: If align == “along average”, a dataframe containing average normalized expression in each module. The columns are named by the corresponding color with an “AE” prepended, e.g., AEturquoise etc. “varExplained”: A dataframe in which each column corresponds to a module, with the component varExplained[PC, module] giving the variance of module module explained by the principal component no. PC. The calculation is exact irrespective of the number of computed principal components. At most 10 variance explained values are recorded in this dataframe. “nPC”: A copy of the input nPC. “validMEs”: A boolean vector. Each component (corresponding to the columns in data) is TRUE if the corresponding eigengene is valid, and FALSE if it is invalid. Valid eigengenes include both principal components and their hubgene approximations. When returnValidOnly==FALSE, by definition all returned eigengenes are valid and the entries of validMEs are all TRUE. “validColors”: A copy of the input colors with entries corresponding to invalid modules set to grey if given, otherwise 0 if colors is numeric and “grey” otherwise. “allOK”: Boolean flag signalling whether all eigengenes have been calculated correctly, either as principal components or as the hubgene average approximation. “allPC”: Boolean flag signalling whether all returned eigengenes are principal components. “isPC”: Boolean vector. Each component (corresponding to the columns in eigengenes) is TRUE if the corresponding eigengene is the first principal component and FALSE if it is the hubgene approximation or is invalid. “isHub”: Boolean vector. Each component (corresponding to the columns in eigengenes) is TRUE if the corresponding eigengene is the hubgene approximation and FALSE if it is the first principal component or is invalid. “validAEs”: Boolean vector. Each component (corresponding to the columns in eigengenes) is TRUE if the corresponding module average expression is valid. “allAEOK”: Boolean flag signalling whether all returned module average expressions contain valid data. Note that returnValidOnly==TRUE does not imply allAEOK==TRUE: some invalid average expressions may be returned if their corresponding eigengenes have been calculated correctly.

Return type:

dict

module_trait_relationships_heatmap(metaData, alternative='two-sided', figsize=None, show=True, file_name='module-traitRelationships')[source]

plot topic-trait relationship heatmap

Parameters:
  • metaData (list) – traits you would like to see the relationship with topics (must be column name of datExpr.obs)

  • alternative (str) – Defines the alternative hypothesis for calculating correlation for module-trait relationship. Default is ‘two-sided’. The following options are available: ‘two-sided’: the correlation is nonzero, ‘less’: the correlation is negative (less than zero), ‘greater’: the correlation is positive (greater than zero)

  • figsize (tuple of float) – indicate the size of plot

  • show (bool) – indicate if you want to show the plot or not (default: True)

  • file_name (str) – name and path of the plot use for save (default: topic-traitRelationships)

static multiSetMEs(exprData, colors, universalColors=None, useSets=None, useGenes=None, impute=True, nPC=1, align='along average', excludeGrey=False, subHubs=True, trapErrors=False, softPower=6, grey=None)[source]

Calculates module eigengenes for several sets.

Parameters:
  • exprData (pandas dataframe) – Expression data in a multi-set format

  • colors (list) – A list of the same length as the number of probes in expr, giving module color for all probes (genes). Color “grey” is reserved for unassigned genes.

  • universalColors (list) – Alternative specification of module assignment

  • useSets (list) – If calculations are requested in (a) selected set(s) only, the set(s) can be specified here. Defaults to all sets.

  • useGenes (list) – Can be used to restrict calculation to a subset of genes

  • impute (bool) – If TRUE, expression data will be checked for the presence of NA entries and if the latter are present, numerical data will be imputed. (defualt = True)

  • nPC (int) – Number of principal components and variance explained entries to be calculated. Note that only the first principal component is returned; the rest are used only for the calculation of proportion of variance explained. If given nPC is greater than 10, a warning is issued. (default = 1)

  • align (str) – Controls whether eigengenes, whose orientation is undetermined, should be aligned with average expression (align = “along average”) or left as they are (align = “”). Any other value will trigger an error. (default = “along average”)

  • excludeGrey (bool) – Should the improper module consisting of ‘grey’ genes be excluded from the eigengenes (default = False)

  • subHubs (bool) – Controls whether hub genes should be substituted for missing eigengenes. If TRUE, each missing eigengene (i.e., eigengene whose calculation failed and the error was trapped) will be replaced by a weighted average of the most connected hub genes in the corresponding module. If this calculation fails, or if subHubs==FALSE, the value of trapErrors will determine whether the offending module will be removed or whether the function will issue an error and stop. (default = True)

  • trapErrors (bool) – Controls handling of errors from that may arise when there are too many NA entries in expression data. If TRUE, errors from calling these functions will be trapped without abnormal exit. If FALSE, errors will cause the function to stop. Note, however, that subHubs takes precedence in the sense that if subHubs==TRUE and trapErrors==FALSE, an error will be issued only if both the principal component and the hubgene calculations have failed. (default = False)

  • softPower (int) – The power used in soft-thresholding the adjacency matrix. Only used when the hubgene approximation is necessary because the principal component calculation failed. It must be non-negative. The default value should only be changed if there is a clear indication that it leads to incorrect results. (default = 6)

  • grey (str) – Value of colors or universalColors (whichever applies) designating the improper module

Returns:

A dictionary similar in spirit to the input exprData

Return type:

dict

static orderMEs(MEs, greyLast=True, greyName='MEgrey', orderBy=0, order=None, useSets=None)[source]

Reorder given (eigen-)vectors such that similar ones (as measured by correlation) are next to each other.

Parameters:
  • MEs (dict) – Module eigengenes in a multi-set format.

  • greyLast (bool) – Normally the color grey is reserved for unassigned genes; hence the grey module is not a proper module and it is conventional to put it last. If this is not desired, set the parameter to FALSE. (default = True)

  • greyName (str) – Name of the grey module eigengene. (default = “MEgrey”)

  • orderBy (int) – Specifies the set by which the eigengenes are to be ordered (in all other sets as well). Defaults to the first set in useSets (or the first set, if useSets is not given). (defualt = 0)

  • order (list) – Allows the user to specify a custom ordering.

  • useSets (list) – Allows the user to specify for which sets the eigengene ordering is to be performed.

Returns:

A dictionary of the same type as MEs containing the re-ordered eigengenes.

Return type:

dict

static pickSoftThreshold(data, dataIsExpr=True, weights=None, RsquaredCut=0.9, MeanCut=100, powerVector=None, nBreaks=10, blockSize=None, corOptions=None, networkType='unsigned', moreNetworkConcepts=False, gcInterval=None)[source]

Analysis of scale free topology for multiple soft thresholding powers.

Parameters:
  • data – expression data in a matrix or data frame. Rows correspond to samples and columns to genes.

  • data – pandas dataframe

  • dataIsExpr (bool) – should the data be interpreted as expression (or other numeric) data, or as a similarity matrix of network nodes?

  • weights (pandas dataframe) – optional observation weights for data to be used in correlation calculation. A matrix of the same dimensions as datExpr, containing non-negative weights. Only used with Pearson correlation.

  • RsquaredCut (float) – desired minimum scale free topology fitting index (R^2). (default = 0.9)

  • MeanCut (int) – desired maximum mean connectivity scale free topology fitting index. (default = 100)

  • powerVector (list of int) – A list of soft thresholding powers for which the scale free topology fit indices are to be calculated.

  • nBreaks (int) – number of bins in connectivity histograms (default = 10)

  • blockSize (int) – block size into which the calculation of connectivity should be broken up. If not given, a suitable value will be calculated using function blockSize and printed if verbose>0. If R runs into memory problems, decrease this value.

  • corOptions (list) – a list giving further options to the correlation function specified in corFnc.

  • networkType (str) – network type. Allowed values are (unique abbreviations of) “unsigned”, “signed”, “signed hybrid”. (default = unsigned)

  • moreNetworkConcepts (bool) – should additional network concepts be calculated? If TRUE, the function will calculate how the network density, the network heterogeneity, and the network centralization depend on the power. For the definition of these additional network concepts, see Horvath and Dong (2008). PloS Comp Biol.

  • gcInterval (int) – a number specifying in interval (in terms of individual genes) in which garbage collection will be performed. The actual interval will never be less than blockSize.

Returns:

tuple including powerEstimate: estimate of an appropriate soft-thresholding power which is the lowest power for which the scale free topology fit (R^2) exceeds RsquaredCut and conectivity is less than MeanCut. If (R^2) is below RsquaredCut for all powers maximum will re returned and datout which is a data frame containing the fit indices for scale free topology. The columns contain the soft-thresholding power, adjusted (R^2) for the linear fit, the linear coefficient, adjusted (R^2) for a more complicated fit models, mean connectivity, median connectivity and maximum connectivity. If input moreNetworkConcepts is TRUE, 3 additional columns containing network density, centralization, and heterogeneity.

Type:

int and pandas dataframe

plotModuleEigenGene(moduleName, metadata, show=True)[source]

plot module eigen gene figure in given module

Parameters:
  • moduleName (str) – module name

  • metadata (list) – list of metadata you want to be plotted

  • show (bool) – indicate if you want to see plots in when you run your code

preprocess(show=True)[source]

Preprocessing PyWGCNA object including removing obvious outlier on genes and samples

Parameters:

show (bool) – indicate if you want to show your plot or not (if you put this to False it will not either show and save the plot)

static replaceMissing(x, replaceWith)[source]

Replacing missing (NA) value with appropriate value (for integer number replace with 0 and for string replace with “”)

Parameters:
  • x (object) – value want to replace (single item)

  • replaceWith (object) – define character you want to replace na value by looking at type of data

Returns:

object without any missing (NA) value

static request_PPI(genes, species)[source]

Getting all the STRING interaction partners of the protein set

Parameters:
Returns:

dataframe contains genes interact with each other

Return type:

pandas dataframe

static request_PPI_image(params, genes, file_name, request_url='https://version-11-5.string-db.org/api/image/network')[source]

plot PPI interaction along with link that direct you to the STRING webpage

Parameters:
  • params (dict) – parameters for requesting

  • genes (list) – list of genes you want to find interaction for

  • file_name (str) – name of the output file

  • request_url (str) – suitable url for using STRING API

static request_PPI_subset(params, request_url='https://version-11-5.string-db.org/api/tsv-no-header/interaction_partners')[source]

request STRING to find genes interact with our gene list base

Parameters:
  • request_url (str) – suitable url for using STRING API

  • params (dict) – parameters for requesting

Returns:

dataframe contains genes interact with each other

Return type:

pandas dataframe

runWGCNA()[source]

Preprocess and find modules

saveWGCNA()[source]

Saves the current WGCNA in pickle format with the .p extension

static scaleFreeFitIndex(k, nBreaks=10)[source]

calculates several indices (fitting statistics) for evaluating scale free topology fit.

Parameters:
  • k (list) – numeric list whose components contain non-negative values

  • nBreaks (int) – (default = 10)

setMetadataColor(col, cmap)[source]

set color pallete for each group of metadata

Parameters:
  • col (str) – name of metadata

  • cmap (list) – color pallet

static softConnectivity(datExpr, corOptions=Empty DataFrame Columns: [] Index: [], weights=None, type='unsigned', power=6, blockSize=1500, minNSamples=None)[source]

Given expression data or a similarity, the function constructs the adjacency matrix and for each node calculates its connectivity, that is the sum of the adjacency to the other nodes.

Parameters:
  • datExpr (pandas dataframe) – a data frame containing the expression data, with rows corresponding to samples and columns to genes.

  • corOptions (pandas dataframe) – character string giving further options to be passed to the correlation function.

  • weights (pandas dataframe) – optional observation weights for datExpr to be used in correlation calculation. A matrix of the same dimensions as datExpr, containing non-negative weights. Only used with Pearson correlation.

  • type (str) – network type. Allowed values are (unique abbreviations of) “unsigned”, “signed”, “signed hybrid”.

  • power (int) – soft thresholding power.

  • blockSize (int) – block size in which adjacency is to be calculated. Too low (say below 100) may make the calculation inefficient, while too high may cause R to run out of physical memory and slow down the computer. Should be chosen such that an array of doubles of size (number of genes) * (block size) fits into available physical memory.

  • minNSamples (int) – minimum number of samples available for the calculation of adjacency for the adjacency to be considered valid. If not given, defaults to the greater of ..minNSamples (currently 4) and number of samples divided by 3. If the number of samples falls below this threshold, the connectivity of the corresponding gene will be returned as NA.

Returns:

A list with one entry per gene giving the connectivity of each gene in the weighted network.

Return type:

ndarray

top_n_hub_genes(moduleName, n=10)[source]

find top n hub genes based on connectivity in given module

Parameters:
  • moduleName (str) – name of module you want to top n hub genes

  • n (int) – number of top hub genes

Returns:

dataframe contains top n hun genes along with connectivity score and additional gene information you added to your expression matrix

Return type:

pandas dataframe

updateGeneInfo(geneInfo=None, path=None, sep=',')[source]

add/update genes info in datExpr and geneExpr anndata

Parameters:
  • geneInfo (pandas dataframe) – gene information table you want to add to your data

  • path (str) – path of geneInfo

  • sep (str) – separation symbol to use for reading data in path properly (default: “,”)

updateSampleInfo(sampleInfo=None, path=None, sep=',')[source]

add/update metadata in datExpr and geneExpr anndata

Parameters:
  • sampleInfo (pandas dataframe) – Sample information table you want to add to your data

  • path (str) – path of metaData

  • sep (str) – separation symbol to use for reading data in path properly (default: “,”)

class PyWGCNA.comparison.Comparison(geneModules=None)[source]

A class used to compare PyWGCNA to another PyWGCNA or any gene marker table

Parameters:
  • geneModules (dict) – gene modules of networks

  • jaccard_similarity (pandas dataframe) – jaccard similarity of common genes between each modules

  • P_value (pandas dataframe) – P value of common genes between each modules

  • fraction (pandas dataframe) – fraction of common genes between each modules

calculateFraction()[source]

Calculate common fraction along multiple networks

Returns:

dataframe containing fraction between all modules in all netwroks

Return type:

pandas dataframe

calculateJaccardSimilarity()[source]

Calculate jaccard similarity matrix along multiple networks

Returns:

dataframe containing jaccard similarity between all modules in all PyWGCNA objects

Return type:

pandas dataframe

calculatePvalue(alternative='greater')[source]

Calculate pvalue of modules overlap along multiple networks using fisher exact test

Parameters:

alternative (str) – {‘two-sided’, ‘less’, ‘greater’}, alternative hypothesis, use ‘greater’ to detect overlapping modules, ‘less’ to detect mutually exclusive modules, ‘two-sided’ to detect both (default: greater)

Returns:

dataframe containing pvalue between all modules in all networks

Return type:

pandas dataframe

compareNetworks()[source]

compare Networks

static jaccard(list1, list2)[source]

Calculate jaccard similarity matrix for two lists

Parameters:
  • list1 (list) – first list containing the data

  • list2 (list) – second list containing the data

Returns:

jaccard similarity

Return type:

double

plotBubbleComparison(bubble_size='jaccard_similarity', cutoff=0.01, color=None, order1=None, order2=None, figsize=None, save=True, plot_show=True, plot_format='png', file_name='bubble_comparison')[source]

plot comparison matrix as a bubble plot

Parameters:
  • bubble_size (str) – which information you want to use for size of bubble (options: jaccard_similarity or fraction) default: jaccard_similarity

  • cutoff (double) – threshold you used for defining significant comparison

  • color (dict) – if you want to color tick labels for each networks separately

  • order1 (list of str) – order of modules in PyWGCNA1 you want to show in plot (name of each elements should mapped the name of modules in your first PyWGCNA)

  • order2 (list of str) – order of modules in PyWGCNA2 you want to show in plot (name of each elements should mapped the name of modules in your second PyWGCNA)

  • figsize (tuple of int) – indicate the size of plot (default is base on the number of modules)

  • save (bool) – if you want to save plot as comparison.png near to your script

  • save – indicate if you want to save the plot or not (default: True)

  • plot_show (bool) – indicate if you want to show the plot or not (default: True)

  • plot_format (str) – indicate the format of plot (default: png)

  • file_name (str) – name and path of the plot use for save (default: jaccard_similarity)

plotHeatmapComparison(color='jaccard_similarity', row_cluster=True, col_cluster=True, save=True, plot_show=True, plot_format='pdf', file_name='heatmap_comparison')[source]

plot heatmap comparison

Parameters:
  • color (str) – how to color heatmap (options: jaccard_similarity or fraction) default: jaccard_similarity

  • row_cluster (bool) – If True, cluster the rows. (default True)

  • col_cluster (bool) – If True, cluster the columns. (default True)

  • save (bool) – if you want to save plot as comparison.png near to your script

  • plot_show (bool) – indicate if you want to show the plot or not (default: True)

  • plot_format (str) – indicate the format of plot (default: pdf)

  • file_name (str) – name and path of the plot use for save (default: heatmap_comparison)

plotJaccardSimilarity(color=None, cutoff=0.1, figsize=None, save=True, plot_show=True, plot_format='png', file_name='jaccard_similarity')[source]

Plot jaccard similarity matrix as a network

Parameters:
  • color (dict) – if you want to color nodes for each networks separately

  • cutoff (double) – threshold you used for filtering jaccard similarity

  • figsize (tuple of int) – indicate the size of plot (default is base on the number of nodes that pass cutoff)

  • save (bool) – indicate if you want to save the plot or not (default: True)

  • plot_show (bool) – indicate if you want to show the plot or not (default: True)

  • plot_format (str) – indicate the format of plot (default: png)

  • file_name (str) – name and path of the plot use for save (default: jaccard_similarity)

saveComparison(name='comparison')[source]

save comparison object as comparison.p near to the script

Parameters:

name (str) – name of the pickle file (default: comparison.p)

PyWGCNA.utils.compareNetworks(PyWGCNAs)[source]

Compare serveral PyWGCNA objects

Parameters:

PyWGCNAs (list of PyWGCNA class) – list of PyWGCNA objects

Returns:

compare object

Return type:

Compare class

PyWGCNA.utils.compareSingleCell(PyWGCNAs, sc)[source]

Compare WGCNA and gene marker from single cell experiment

Parameters:
  • PyWGCNAs (PyWGCNA class) – WGCNA object

  • sc (pandas dataframe) – gene marker table which has ….

Returns:

compare object

Return type:

Compare class

PyWGCNA.utils.getGeneList(dataset='mmusculus_gene_ensembl', attributes=['ensembl_gene_id', 'external_gene_name', 'gene_biotype'], maps=['gene_id', 'gene_name', 'go_id'], server_domain='http://ensembl.org/biomart')[source]

get table that map gene ensembl id to gene name from biomart

Parameters:
Returns:

table extracted from biomart related to the datasets including information from attributes

Return type:

pandas dataframe

PyWGCNA.utils.getGeneListGOid(dataset='mmusculus_gene_ensembl', attributes=['ensembl_gene_id', 'external_gene_name', 'go_id'], Goid='GO:0003700', server_domain='http://ensembl.org/biomart')[source]

get table that find gene id and gene name to specific Go term from biomart

Parameters:
Returns:

table extracted from biomart related to the datasets including information from attributes with filtering

Return type:

pandas dataframe

PyWGCNA.utils.readComparison(file)[source]

Read a comparison from a saved pickle file.

Parameters:

file (string) – Name / path of comparison object

Returns:

comparison object

Return type:

comparison class

PyWGCNA.utils.readWGCNA(file)[source]

Read a WGCNA from a saved pickle file.

Parameters:

file (str) – Name / path of WGCNA object

Returns:

PyWGCNA object

Return type:

PyWGCNA class