Skip to Main Content

Network Analysis: Network Analysis 2 Workshop

ATTENTION

ATTENTION: With the departure of Dr. Raeda Anderson and George Usmanov from the GSU Library in Summer 2020, the workshop materials and information on this guide will no longer be actively supported. These materials will be kept online for the time being. For questions about these materials, or to inquire about currently supported workshops and subject-areas, please visit the Research Data Services homepage: lib.gsu.edu/data

PowerPoint

This PowerPoint walks through important facets of the network being analyzed in the second part of the network workshop series.

Handout

This output is the code and analysis produced through this workshop in a word document.

Code

Do file of all the code from the workshop. 

Note: You should download this file and save it to your computer. You can only open it after you have opened Stata.

***********************************************************
 
***********************************************************
 
** Networks 2: Analyzing Sociocentric Networks in Stata **
 
***********************************************************
 
***********************************************************
 
* This file was developed by George Usmanov.
 
* Note: This file should not be distributed without the written permission of George Usmanov or Raeda Anderson Ph.D.
 
* If you would like to use any of this code contact George Usmanov via email (gusmanov1@gsu.edu) or Raeda Anderson (randerson39@gsu.edu).
 
 
 
* The specialized 'nw' commands used in this code was developed by Thomus Grung Ph.D.
 
* CITATION:
 
* Grund, Thomas U. (2015) "nwcommands. Network Analysis in Stata". Retrieved from                                
*http://nwcommands.org
 
* Sections of the code below are adapted from materials produced by Thomas Grund.
 
* To run code, click or highlight the code you want to run
 
* and hold 'control' then 'd'.
 
 
 
*********************************************
 
** download the network package into Stata **
 
*********************************************
 
* step 1 *
 
findit nwcommands
 
* step 2 *
 
* click on nwcommands-ado from http://www.nwcommands.org to install *
 
* NOTE: It needs to be the -ado file (at the top) *
 
* step 3 *
 
* click on 'click here to install' *
 
* step 4 *
 
nwinstall, all
 
 
 
**********************************
 
** download the data into Stata **
 
**********************************
 
webnwuse gang
 
* We will be using the 'gang' dataset from James Densley and Thomas Grund (2012).
 
* The dataset is of a London-based co-offending youth gang.
 
* The data is pre-defined as a network in the package
 
 
 
nwsave gang
 
* this is to save the dataset into Stata
 
 
 
***************************
 
** summarize the network **
 
***************************
 
nwsummarize gang, detail
 
* nwsummarize NetworkName, detail
 
 
 
* nwsummarize gang, detail
 
*---------------------------------------------
 
* Network name:  gang                           <- name of the network
 
* Network id:  1                                <- network set number defined (1=1st, 2=2nd)
 
* Directed: false                               <- network edges are undirected
 
* Nodes: 54                                     <- number of nodes in the network (network size)
 
* Edges: 133                                    <- number of edges present (aka ties)
 
* Minimum value:  0                             <- lowest edge value (no edge present)
 
* Maximum value:  1                             <- highest edge value (edge present)
 
* Density: .092941998602376                     <- ratio of edges present to possible edges in the network
 
* Reciprocity: .092941998602376                 <- likelihood for node to be mutually linked
 
* Transitivity: .3635371179039302               <- measure for local connectedness of the whole network
 
* Betweenness centralization: .106682962955384  <- distribution of the number of paths that go through each node
 
* Degree centralization: .1973875181422351      <- distribution of total degrees each node has in the network
 
 
 
****************************
 
** network-level measures **
 
****************************
 
***Density***
 
* DENSITY: ratio of number of edges that exist in a network to the number of possible edges that could exist in a                                
* network
 
                *a network with a high density indicates a greater interconnectedness within the network
 
* FORMULA (undirected edges): d = [L]/[g(g-1)/2]
 
                * d= density
 
                * L= number of edges
 
                * g = number of nodes
 
* density score range: 0 (low) - 1 (high)
 
 
 
nwsummarize gang
 
* nwsummarize NetworkName
 
* NOTE: there is no direct code for just density, but part of interpreting the density score is knowing network size, edge 
* values, and direction of edges
 
                * nwsummarize gang
 
                *--------------------------------
 
                *   Network name:  gang
 
                *   Network id:  2
 
                *   Directed: false
 
                *   Nodes: 54
 
                *   Edges: 133
 
                *   Minimum value:  0
 
                *   Maximum value:  1
 
                *   Density:  .092941998602376
 
  
 
************************************
 
** node-level centrality measures **
 
************************************
 
***Degree centrality***
 
 
 
* DEGREE: count of the number of edges each node is connected to
 
                * degree centrality indicates the importance of a node in the network
 
* FORMULA(undirected edges): d(ni)= [(g)_sum_(i=1) xij]
 
                                * d(ni) = degree centrality
 
                                * i=1 = When i is 1 
 
                                * xij = the row and column with a value of 1
 
 
 
nwdegree gang
 
* nwdegree NetworkName
 
                * the code automatically generates a new variable for the raw degree score of each node called '_degree'
 
 
 
nwdegree gang, standardize
 
*nwdegree NetworkName, standardize
 
                * code for standardized degree centrality scores
 
                                * d'(ni)= [d(ni)]/[g-1]
 
                                                * d'(ni)= standarized degree centrality 
 
                                                * ni= specific node
 
                                                * d(ni)= degree centrality score
 
 
 
                * summarizing output of degree centrality **
 
                                * e.g., if an edge indicates friendships, then degree is the number of friends a node has
 
                                * e.g., in this case an edge indicates gang ties, then degree centrality is the number of gang members a                                              
* node is connected to within this gang
 
                                                * a node with a high degree score suggests they are core gang members
 
 
 
                * 6 nodes have 0 edge (6 in frequency column and 0 in _degree column)
 
                * 5 nodes have 1 edge (5 in frequency column and 1 in _degree column)
 
                * 1 node has 15 edges (1 in frequency column and 15 in _degree column)
 
                * etc.
 
 
 
    *Degree distribution
 
    *_degree |      Freq.     Percent        Cum.
 
    *--------+-----------------------------------
 
    *      0 |          6       11.11       11.11
 
    *      1 |          5        9.26       20.37
 
    *      2 |          7       12.96       33.33
 
    *      3 |          4        7.41       40.74
 
    *      4 |          6       11.11       51.85
 
    *      5 |          5        9.26       61.11
 
    *      6 |          5        9.26       70.37
 
    *      7 |          4        7.41       77.78
 
    *      8 |          3        5.56       83.33
 
    *      9 |          3        5.56       88.89
 
    *     11 |          1        1.85       90.74
 
    *     12 |          1        1.85       92.59
 
    *     13 |          3        5.56       98.15
 
    *     15 |          1        1.85      100.00
 
    *--------+-----------------------------------
 
    *  Total |         54      100.00
 
 
 
    * Degree centralization: .1973875181422351
 
 
 
* Degree centralization: .1973875181422351
 
                * Overview ratio of (number of degree edges/possible number of degree edges)
 
                                * C_D= [g_sum_(i=1)(C_D(n*) - C_D(ni)]/ [(g-1)(g-2)]
 
                                * C_D= degree centralization
 
                                * n*= largest degree centrality observed degree centrality in network
 
                                * ni= degree centrality of node i
* number of nodes in the network
 
                * degree centralization score range: 0 (low) - 1 (high)
 
               
 
list _nodelab _degree
 
* list NodeVariableName DegreeVariableName
 
                * this gives us the raw degree scores for each node in the network
 
                * NOTE: only the first 5 nodes of this output are shown
 
                                * node 'net1' has a degree score of 9
 
                                * node 'net2' has a degree score of 13
 
                                * node 'net3' also has a degree score of 13
 
                                * etc.
 
               
 
  *    --------------------
 
  *   | _nodelab   _degree |
 
  *   |--------------------|
 
  *1. |     net1         9 |
 
  *2. |     net2        13 |
 
  *3. |     net3        13 |
 
  *4. |     net4         9 |
 
  *5. |     net5         8 |
 
  *   |--------------------|
 
                              
 
sum _degree
 
*sum VariableName
 
* summary statistics of raw degree scores in the network   
 
    * Variable |        Obs        Mean    Std. Dev.       Min        Max
 
    *----------+---------------------------------------------------------
 
    *  _degree |         54    4.925926     3.85517          0         15
 
             
 
***Betweenness Centrality***
 
* BETWEENESS: count of each node that is in the shortest path of two other nodes
 
                * betweeness centrality indicates the importance of a node by connection to other nodes together
 
* FORMULA(standardized): C_B'(ni)= [sum_(j<k)(gjk(ni)/gjk)]/[(g-1)(g-2)/2]
 
                * C_B'(ni)= standardized betweenness centrality score
 
                * gjk(ni)= number of shortest paths between nodes j and k that goes through node ni
 
                * gjk= total number of shortest paths between nodes j and k
 
                * g= number of nodes in the network
 
               
 
*standardized betweenness score range: 0 (low) - 1 (high)
 
nwbetween gang, standardize
 
* nwbetween NetworkName, standardize
 
* if you want to calculate the raw betweeness centrality score then omit ', standardize'
 
* the code automatically generates a new variable for betweeness of each node called '_between'
 
* and automatically provides the descriptive statistics of betweeness centrality of the network.
 
* summarizing output of degree centrality *
 
 
 
    * Variable |        Obs        Mean    Std. Dev.       Min        Max
 
    *----------+---------------------------------------------------------
 
    * _between |         54    .0582433    .0747916          0    .267658
 
 
 
list _nodelab _between
 
* list NodeVariableName BetweenessVariableName
 
* e.g., if a node has a high standardize betweenness value, then the node is important for spread of information
 
* e.g., in this case an edge indicates gang ties, then betweenness is the number of paths between two nodes a gang *member is in
 
                * a node with a low standardize betweenness value suggests they are periphery members in the gang
 
                                               
 
                * this gives us the individual standardized betweenness scores for each node in the network
 
                * NOTE: only the first 5 nodes of this output are shown
 
                                * node 'net1' has a betweenness score of .0750532
 
                                * node 'net2' has a betweenness score of .2017956
 
                                * node 'net3' has a betweenness score of .1631648
 
                                * etc.                                             
 
  *    ---------------------
 
  *   | _nodelab   _between |
 
  *   |---------------------|
 
  *1. |     net1   .0750532 |
 
  *2. |     net2   .2017956 |
 
  *3. |     net3   .1631648 |
 
  *4. |     net4    .100701 |
 
  *5. |     net5   .0717563 |
 
  *   |---------------------|
 
 
 
***************************
 
** visualize the network **
 
***************************
 
nwplot gang
 
* nwplot NetworkName
 
 
 
nwplot gang, color(Birthplace) label(_nodelab) layout(mds)
 
* nwplot NetworkName, color(NodeAttributeVariable) layout(LayoutType)
 
* plot the network with color coding of node birthplace
 
 
 
nwplot gang, color(Birthplace) size(Arrests) layout(mds)
 
* nwplot NetworkName, color(NodeAttributeVariable) size(NodeAttributeVariable) layout(LayoutType)
 
* plot the network with color coding of node birthplace and size of node by if members have been arrested or not
 
 
 
nwplotmatrix gang, legend(on)
 
* nwplotmatrix NetworkName, legend(on)
 
* plot the adjacency matrix of the network
 
*************************************