***********************************************************
***********************************************************
** Networks 2: Analyzing Sociocentric Networks in Stata **
***********************************************************
***********************************************************
* This file was developed by George Usmanov.
* Note: This file should not be distributed without the written permission of George Usmanov or Raeda Anderson Ph.D.
* If you would like to use any of this code contact George Usmanov via email (gusmanov1@gsu.edu) or Raeda Anderson (randerson39@gsu.edu).
* The specialized 'nw' commands used in this code was developed by Thomus Grung Ph.D.
* CITATION:
* Grund, Thomas U. (2015) "nwcommands. Network Analysis in Stata". Retrieved from
*http://nwcommands.org
* Sections of the code below are adapted from materials produced by Thomas Grund.
* To run code, click or highlight the code you want to run
* and hold 'control' then 'd'.
*********************************************
** download the network package into Stata **
*********************************************
* step 1 *
findit nwcommands
* step 2 *
* click on nwcommands-ado from http://www.nwcommands.org to install *
* NOTE: It needs to be the -ado file (at the top) *
* step 3 *
* click on 'click here to install' *
* step 4 *
nwinstall, all
**********************************
** download the data into Stata **
**********************************
webnwuse gang
* We will be using the 'gang' dataset from James Densley and Thomas Grund (2012).
* The dataset is of a London-based co-offending youth gang.
* The data is pre-defined as a network in the package
nwsave gang
* this is to save the dataset into Stata
***************************
** summarize the network **
***************************
nwsummarize gang, detail
* nwsummarize NetworkName, detail
* nwsummarize gang, detail
*---------------------------------------------
* Network name: gang <- name of the network
* Network id: 1 <- network set number defined (1=1st, 2=2nd)
* Directed: false <- network edges are undirected
* Nodes: 54 <- number of nodes in the network (network size)
* Edges: 133 <- number of edges present (aka ties)
* Minimum value: 0 <- lowest edge value (no edge present)
* Maximum value: 1 <- highest edge value (edge present)
* Density: .092941998602376 <- ratio of edges present to possible edges in the network
* Reciprocity: .092941998602376 <- likelihood for node to be mutually linked
* Transitivity: .3635371179039302 <- measure for local connectedness of the whole network
* Betweenness centralization: .106682962955384 <- distribution of the number of paths that go through each node
* Degree centralization: .1973875181422351 <- distribution of total degrees each node has in the network
****************************
** network-level measures **
****************************
***Density***
* DENSITY: ratio of number of edges that exist in a network to the number of possible edges that could exist in a
* network
*a network with a high density indicates a greater interconnectedness within the network
* FORMULA (undirected edges): d = [L]/[g(g-1)/2]
* d= density
* L= number of edges
* g = number of nodes
* density score range: 0 (low) - 1 (high)
nwsummarize gang
* nwsummarize NetworkName
* NOTE: there is no direct code for just density, but part of interpreting the density score is knowing network size, edge
* values, and direction of edges
* nwsummarize gang
*--------------------------------
* Network name: gang
* Network id: 2
* Directed: false
* Nodes: 54
* Edges: 133
* Minimum value: 0
* Maximum value: 1
* Density: .092941998602376
************************************
** node-level centrality measures **
************************************
***Degree centrality***
* DEGREE: count of the number of edges each node is connected to
* degree centrality indicates the importance of a node in the network
* FORMULA(undirected edges): d(ni)= [(g)_sum_(i=1) xij]
* d(ni) = degree centrality
* i=1 = When i is 1
* xij = the row and column with a value of 1
nwdegree gang
* nwdegree NetworkName
* the code automatically generates a new variable for the raw degree score of each node called '_degree'
nwdegree gang, standardize
*nwdegree NetworkName, standardize
* code for standardized degree centrality scores
* d'(ni)= [d(ni)]/[g-1]
* d'(ni)= standarized degree centrality
* ni= specific node
* d(ni)= degree centrality score
* summarizing output of degree centrality **
* e.g., if an edge indicates friendships, then degree is the number of friends a node has
* e.g., in this case an edge indicates gang ties, then degree centrality is the number of gang members a
* node is connected to within this gang
* a node with a high degree score suggests they are core gang members
* 6 nodes have 0 edge (6 in frequency column and 0 in _degree column)
* 5 nodes have 1 edge (5 in frequency column and 1 in _degree column)
* 1 node has 15 edges (1 in frequency column and 15 in _degree column)
* etc.
*Degree distribution
*_degree | Freq. Percent Cum.
*--------+-----------------------------------
* 0 | 6 11.11 11.11
* 1 | 5 9.26 20.37
* 2 | 7 12.96 33.33
* 3 | 4 7.41 40.74
* 4 | 6 11.11 51.85
* 5 | 5 9.26 61.11
* 6 | 5 9.26 70.37
* 7 | 4 7.41 77.78
* 8 | 3 5.56 83.33
* 9 | 3 5.56 88.89
* 11 | 1 1.85 90.74
* 12 | 1 1.85 92.59
* 13 | 3 5.56 98.15
* 15 | 1 1.85 100.00
*--------+-----------------------------------
* Total | 54 100.00
* Degree centralization: .1973875181422351
* Degree centralization: .1973875181422351
* Overview ratio of (number of degree edges/possible number of degree edges)
* C_D= [g_sum_(i=1)(C_D(n*) - C_D(ni)]/ [(g-1)(g-2)]
* C_D= degree centralization
* n*= largest degree centrality observed degree centrality in network
* ni= degree centrality of node i
* number of nodes in the network
* degree centralization score range: 0 (low) - 1 (high)
list _nodelab _degree
* list NodeVariableName DegreeVariableName
* this gives us the raw degree scores for each node in the network
* NOTE: only the first 5 nodes of this output are shown
* node 'net1' has a degree score of 9
* node 'net2' has a degree score of 13
* node 'net3' also has a degree score of 13
* etc.
* --------------------
* | _nodelab _degree |
* |--------------------|
*1. | net1 9 |
*2. | net2 13 |
*3. | net3 13 |
*4. | net4 9 |
*5. | net5 8 |
* |--------------------|
sum _degree
*sum VariableName
* summary statistics of raw degree scores in the network
* Variable | Obs Mean Std. Dev. Min Max
*----------+---------------------------------------------------------
* _degree | 54 4.925926 3.85517 0 15
***Betweenness Centrality***
* BETWEENESS: count of each node that is in the shortest path of two other nodes
* betweeness centrality indicates the importance of a node by connection to other nodes together
* FORMULA(standardized): C_B'(ni)= [sum_(j<k)(gjk(ni)/gjk)]/[(g-1)(g-2)/2]
* C_B'(ni)= standardized betweenness centrality score
* gjk(ni)= number of shortest paths between nodes j and k that goes through node ni
* gjk= total number of shortest paths between nodes j and k
* g= number of nodes in the network
*standardized betweenness score range: 0 (low) - 1 (high)
nwbetween gang, standardize
* nwbetween NetworkName, standardize
* if you want to calculate the raw betweeness centrality score then omit ', standardize'
* the code automatically generates a new variable for betweeness of each node called '_between'
* and automatically provides the descriptive statistics of betweeness centrality of the network.
* summarizing output of degree centrality *
* Variable | Obs Mean Std. Dev. Min Max
*----------+---------------------------------------------------------
* _between | 54 .0582433 .0747916 0 .267658
list _nodelab _between
* list NodeVariableName BetweenessVariableName
* e.g., if a node has a high standardize betweenness value, then the node is important for spread of information
* e.g., in this case an edge indicates gang ties, then betweenness is the number of paths between two nodes a gang *member is in
* a node with a low standardize betweenness value suggests they are periphery members in the gang
* this gives us the individual standardized betweenness scores for each node in the network
* NOTE: only the first 5 nodes of this output are shown
* node 'net1' has a betweenness score of .0750532
* node 'net2' has a betweenness score of .2017956
* node 'net3' has a betweenness score of .1631648
* etc.
* ---------------------
* | _nodelab _between |
* |---------------------|
*1. | net1 .0750532 |
*2. | net2 .2017956 |
*3. | net3 .1631648 |
*4. | net4 .100701 |
*5. | net5 .0717563 |
* |---------------------|
***************************
** visualize the network **
***************************
nwplot gang
* nwplot NetworkName
nwplot gang, color(Birthplace) label(_nodelab) layout(mds)
* nwplot NetworkName, color(NodeAttributeVariable) layout(LayoutType)
* plot the network with color coding of node birthplace
nwplot gang, color(Birthplace) size(Arrests) layout(mds)
* nwplot NetworkName, color(NodeAttributeVariable) size(NodeAttributeVariable) layout(LayoutType)
* plot the network with color coding of node birthplace and size of node by if members have been arrested or not
nwplotmatrix gang, legend(on)
* nwplotmatrix NetworkName, legend(on)
* plot the adjacency matrix of the network
*************************************