Exponential random graph models

Exponential random graph models (ERGMs) are a family of statistical models for analyzing data about social and other networks.

Background

Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity.[1][2] However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression.[3] Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties.[4] Degree Preserving Randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks.

Definition

The Exponential family is a broad family of models for covering many types of data, not just networks. An ERGM is a model from this family which describes networks.

Formally a random graph Y consists of a set of n nodes and m dyads (edges) \{ Y_{ij}: i=1,\dots,n; j=1,\dots,n\} where Y_{ij}=1 if the nodes (i,j) are connected and Y_{ij}=0 otherwise.

The basic assumption of these models is that the structure in an observed graph y can be explained by any statistics s(y) depending on the observed network and nodal attributes. This way, it is possible to describe any kind of dependence between the dyadic variables:


P(Y = y | \theta) = \frac{\exp(\theta^{T} s(y))}{c(\theta)}

where \theta is a vector of model parameters associated with s(y) and c(\theta) is a normalising constant.

These models represent a probability distribution on each possible network on n nodes. However, the size of the set of possible networks for an undirected network (simple graph) of size n is 2^{n(n-1)/2}. Because the number of possible networks in the set vastly outnumbers the number of parameters which can constrain the model, the ideal probability distribution is the one which maximizes the Gibbs entropy.[5]

References

  1. Wasserman, Stanley; Faust, Katherine (1994). Social Network Analysis: Methods and Applications. ISBN 978-0-521-38707-1.
  2. Newman, M.E.J. "The Structure and Function of Complex Networks". SIAM Review 45 (2): 167–256. doi:10.1137/S003614450342480.
  3. Contractor, Noshir; Wasserman, Stanley; Faust, Katherine. "Testing Multitheoretical, Multilevel Hypotheses About Organizational Networks: An Analytic Framework and Empirical Example". Academy of Management Review 31 (3): 681–703. doi:10.5465/AMR.2006.21318925.
  4. Robins, G.; Pattison, P.; Kalish, Y.; Lusher, D. (2007). "An introduction to exponential random graph models for social networks". Social Networks 29: 173–191. doi:10.1016/j.socnet.2006.08.002.
  5. Newman, M.E.J. "Other Network Models". Networks. pp. 565–585. ISBN 978-0-19-920665-0.

Further reading

This article is issued from Wikipedia - version of the Sunday, August 09, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.