SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
This	
  tutorial	
  describes	
  how	
  to	
  use	
  network	
  analysis	
  tools	
  to	
  visually	
  explore	
  the	
  links	
  
between	
  companies	
  working	
  on	
  the	
  same	
  contract.	
  

1	
  
The	
  example	
  dataset	
  we	
  will	
  use	
  comes	
  from	
  the	
  World	
  Bank.	
  
	
  
Each	
  row	
  represents	
  a	
  contract.	
  Inspec@ng	
  the	
  column	
  names	
  tells	
  us	
  what	
  data	
  we	
  
have	
  available	
  about	
  each	
  contract.	
  
	
  
Looking	
  at	
  the	
  data,	
  we	
  can	
  see	
  how	
  we	
  could	
  order	
  the	
  companies	
  based	
  on	
  the	
  
value	
  of	
  the	
  total	
  contract	
  amount;	
  or	
  we	
  might	
  order	
  the	
  contracts	
  by	
  @me;	
  or	
  we	
  
might	
  look	
  to	
  see	
  which	
  contracts	
  were	
  awarded	
  in	
  a	
  par@cular	
  project,	
  or	
  to	
  a	
  
par@cular	
  company	
  in	
  the	
  event	
  of	
  the	
  same	
  company	
  being	
  awarded	
  more	
  than	
  one	
  
contract.	
  

2	
  
We	
  might	
  also	
  wish	
  to	
  look	
  for	
  paFerns	
  in	
  the	
  data	
  that	
  show	
  us	
  how	
  the	
  things	
  
described	
  in	
  one	
  row	
  might	
  connect	
  to	
  things	
  described	
  in	
  other	
  rows.	
  
	
  
For	
  example,	
  can	
  we	
  organise	
  the	
  data	
  somehow	
  to	
  see	
  which	
  companies	
  are	
  
associated	
  with	
  which	
  projects?	
  Could	
  a	
  network	
  style	
  visualisa@on	
  help	
  us	
  do	
  this?	
  
	
  

3	
  
But	
  if	
  we	
  were	
  to	
  draw	
  a	
  network,	
  what	
  sort	
  of	
  thing	
  should	
  we	
  connect	
  to	
  what?	
  
And	
  how	
  would	
  would	
  know	
  what	
  to	
  connect	
  to	
  each	
  other?	
  
	
  
One	
  way	
  is	
  to	
  look	
  at	
  the	
  data…	
  at	
  which	
  point	
  we	
  might	
  no@ce	
  that	
  some	
  of	
  entries	
  
within	
  a	
  column	
  take	
  on	
  the	
  same	
  value.	
  This	
  means	
  that	
  we	
  can	
  “connect”	
  the	
  data	
  
that	
  appears	
  in	
  different	
  rows	
  using	
  these	
  common	
  elements…	
  

4	
  
So	
  what	
  columns	
  have	
  usefully	
  repea@ng	
  elements?	
  The	
  projects	
  column	
  certainly	
  
has	
  repea@ng	
  elements,	
  so	
  if	
  we	
  should	
  be	
  able	
  to	
  draw	
  diagrams	
  that	
  show	
  all	
  the	
  
companies	
  that	
  connect	
  to	
  each	
  project.	
  And	
  if	
  a	
  company	
  is	
  associated	
  with	
  more	
  
than	
  one	
  project,	
  it	
  should	
  in	
  a	
  certain	
  sense	
  be	
  seen	
  to	
  join	
  those	
  projects	
  together…	
  
	
  

5	
  
A	
  few	
  of	
  the	
  contract	
  numbers	
  repeat,	
  so	
  it	
  might	
  be	
  interes@ng	
  to	
  explore	
  the	
  extent	
  
to	
  which	
  companies	
  connect	
  to	
  contracts.	
  If	
  two	
  different	
  companies	
  are	
  associated	
  
with	
  the	
  same	
  contracts,	
  that	
  might	
  be	
  interes@ng.	
  
	
  

6	
  
Let’s	
  get	
  some	
  data	
  so	
  we	
  can	
  start	
  to	
  explore	
  the	
  network…	
  

7	
  
We	
  just	
  need	
  to	
  do	
  a	
  liFle	
  bit	
  of	
  @dying	
  of	
  the	
  data	
  before	
  we	
  make	
  use	
  of	
  it.	
  
	
  
The	
  major	
  problem	
  is	
  that	
  the	
  Total	
  Contract	
  Amount	
  column	
  does	
  not	
  contain	
  
numbers,	
  as	
  such…	
  In	
  par@cular,	
  we	
  need	
  to	
  get	
  rid	
  of	
  the	
  dollar	
  sign.	
  Let’s	
  create	
  a	
  
new	
  column	
  into	
  which	
  we	
  can	
  put	
  the	
  cleaned	
  values.	
  

8	
  
This	
  liFle	
  bit	
  of	
  code	
  says:	
  take	
  the	
  value	
  of	
  each	
  cell	
  in	
  the	
  original	
  column	
  and	
  
replace	
  the	
  $	
  symbol	
  with	
  nothing	
  (that	
  is,	
  an	
  empty	
  string).	
  In	
  other	
  words,	
  delete	
  
the	
  dollar	
  sign…	
  Put	
  this	
  value	
  in	
  the	
  corresponding	
  cell	
  of	
  the	
  new	
  column,	
  and	
  make	
  
the	
  cell	
  a	
  number	
  type.	
  

9	
  
Now	
  we	
  can	
  export	
  the	
  data	
  using	
  the	
  Custom	
  Tabular	
  Exporter,	
  which	
  allows	
  us	
  to	
  
select	
  just	
  those	
  columns	
  we	
  want	
  to	
  export.	
  (This	
  can	
  be	
  very	
  handy	
  when	
  a	
  table	
  
has	
  a	
  large	
  number	
  of	
  columns	
  that	
  we	
  are	
  not	
  interested	
  in!)	
  
	
  
I	
  have	
  rearranged	
  the	
  cells	
  in	
  the	
  Custom	
  Tabular	
  Exporter	
  simply	
  by	
  	
  clicking	
  on	
  
them	
  and	
  dragging	
  them	
  around.	
  We	
  just	
  want	
  three	
  columns	
  for	
  now:	
  Project	
  ID,	
  
Supplier,	
  and	
  our	
  new	
  Amount	
  column.	
  
	
  
Now	
  that	
  you	
  know	
  how	
  to	
  export	
  the	
  data	
  just	
  a	
  few	
  columns	
  at	
  a	
  @me,	
  once	
  you	
  
are	
  comfortable	
  with	
  the	
  process	
  of	
  visualising	
  the	
  data,	
  you	
  should	
  be	
  able	
  to	
  take	
  
other	
  slices	
  through	
  the	
  data	
  (such	
  as	
  companies	
  related	
  to	
  contracts)	
  and	
  visualise	
  
them	
  yourself.	
  
	
  
You	
  might	
  also	
  like	
  to	
  try	
  using	
  a	
  similar	
  method	
  on	
  a	
  data	
  set	
  of	
  your	
  own…	
  

10	
  
There’s	
  a	
  final	
  bit	
  of	
  @dying	
  to	
  do	
  before	
  we	
  can	
  use	
  this	
  data	
  in	
  Gephi,	
  the	
  
applica@on	
  we’ll	
  be	
  using	
  to	
  visualise	
  the	
  network.	
  
	
  
In	
  par@cular,	
  Gephi	
  expects	
  the	
  data	
  to	
  be	
  presented	
  to	
  it	
  with	
  par@cular	
  column	
  
names.	
  
	
  
Open	
  the	
  exported	
  CSV	
  data	
  in	
  a	
  text	
  editor	
  and	
  rename	
  the	
  columns:	
  
Source,Target,Weight	
  (no	
  spaces?)	
  
	
  
Note	
  –	
  you	
  could	
  have	
  also	
  renamed	
  the	
  columns	
  in	
  OpenRefine	
  before	
  expor@ng	
  
them…	
  

11	
  
We	
  might	
  also	
  wish	
  to	
  look	
  for	
  paFerns	
  in	
  the	
  data	
  that	
  show	
  us	
  how	
  the	
  things	
  
described	
  in	
  one	
  row	
  might	
  connect	
  to	
  things	
  described	
  in	
  other	
  rows.	
  
	
  
For	
  example,	
  can	
  we	
  organise	
  the	
  data	
  somehow	
  to	
  see	
  which	
  companies	
  are	
  
associated	
  with	
  which	
  projects?	
  Could	
  a	
  network	
  style	
  visualisa@on	
  help	
  us	
  do	
  this?	
  
	
  

12	
  
Network	
  diagrams	
  allow	
  us	
  to	
  show	
  rela@onships	
  between	
  different	
  things.	
  Networks	
  
are	
  referred	
  to	
  in	
  mathema@cal	
  terms	
  as	
  graph	
  structures,	
  or	
  graphs.	
  You	
  may	
  be	
  
more	
  familiar	
  with	
  thinking	
  of	
  things	
  like	
  line	
  charts	
  and	
  bar	
  charts	
  as	
  graphs,	
  but	
  
when	
  it	
  comes	
  to	
  network,	
  we	
  use	
  the	
  term	
  graph	
  to	
  describe	
  the	
  mathema@cal	
  
structure	
  that	
  defines	
  the	
  network.	
  
	
  
The	
  circles	
  –	
  or	
  nodes	
  –	
  represent	
  “things”	
  in	
  the	
  network,	
  in	
  this	
  case,	
  par@cular	
  
companies	
  or	
  projects.	
  
	
  
The	
  lines	
  –	
  or	
  edges	
  –	
  represent	
  rela@onships	
  between	
  the	
  things	
  in	
  the	
  network.	
  In	
  
this	
  example,	
  the	
  edges	
  represent	
  contracts	
  that	
  associate	
  a	
  par@cular	
  company	
  with	
  
one	
  or	
  more	
  projects,	
  (or	
  conversely,	
  associate	
  a	
  project	
  with	
  one	
  or	
  more	
  
companies).	
  
	
  
Where	
  nodes	
  are	
  placed	
  in	
  the	
  diagram	
  can	
  be	
  used	
  to	
  convey	
  informa@on	
  about	
  the	
  
structure	
  of	
  the	
  network.	
  Many	
  different	
  algorithms	
  exist	
  to	
  lay	
  out	
  (that	
  is,	
  place,	
  or	
  
posi@on)	
  the	
  nodes	
  at	
  specific	
  points	
  in	
  the	
  diagram.	
  Typically,	
  we	
  try	
  to	
  place	
  nodes	
  
that	
  are	
  heavily	
  interconnected	
  by	
  edges	
  close	
  to	
  each	
  other.	
  Nodes	
  that	
  are	
  grouped	
  
closely	
  together	
  on	
  the	
  page	
  might	
  then	
  be	
  assumed	
  to	
  be	
  associated	
  in	
  some	
  way	
  
because	
  of	
  the	
  increasing	
  number	
  of	
  links	
  that	
  connect	
  them	
  to	
  each	
  other.	
  
	
  

13	
  
Launch	
  Gephi	
  and	
  from	
  the	
  File	
  menu	
  select	
  New	
  Project.	
  Click	
  on	
  the	
  Data	
  
Laboratory	
  tab,	
  and	
  then	
  Import	
  Spreadsheet.	
  
	
  
Load	
  in	
  the	
  file	
  (with	
  amended	
  column	
  names)	
  as	
  an	
  Edges	
  Table.	
  The	
  default	
  seings	
  
should	
  be	
  fine…	
  

14	
  
Click	
  on	
  the	
  Overview	
  tab	
  –	
  you	
  should	
  see	
  the	
  network	
  that	
  connects	
  Companies	
  to	
  
Project	
  IDs	
  displayed	
  there…	
  
	
  
But	
  what	
  does	
  it	
  mean?	
  And	
  can	
  we	
  @dy	
  it	
  up	
  a	
  liFle?!	
  

15	
  
I	
  used	
  the	
  Yifan	
  Hu	
  layout	
  to	
  generate	
  this	
  view	
  over	
  the	
  network.	
  
	
  
Yifan	
  Hu	
  is	
  a	
  good	
  all	
  round	
  layout	
  engine	
  that	
  works	
  par@cularly	
  well	
  when	
  the	
  data	
  
is	
  hierarchically	
  structured.	
  
	
  
Another	
  good	
  general	
  purpose	
  layout	
  algorithm	
  is	
  ForeceAtlas2.	
  

16	
  
Whilst	
  we	
  might	
  get	
  a	
  feeling	
  for	
  the	
  structure	
  and	
  shape	
  of	
  the	
  dataset	
  as	
  a	
  whole	
  
from	
  the	
  overall	
  visualisa@on,	
  we	
  oken	
  want	
  to	
  inspect	
  one	
  or	
  more	
  of	
  the	
  nodes	
  in	
  
detail.	
  
	
  
The	
  quickest	
  way	
  of	
  doing	
  this	
  is	
  to	
  look	
  at	
  the	
  labels…	
  
	
  
You	
  may	
  also	
  have	
  no@ced	
  that	
  the	
  edge	
  thickness	
  is	
  thicker	
  for	
  some	
  lines	
  than	
  
others.	
  In	
  this	
  case,	
  the	
  line	
  thicknesses	
  are	
  propor@onal	
  to	
  the	
  contract	
  value,	
  which	
  
we	
  set	
  in	
  the	
  weight	
  column.	
  	
  
	
  
If	
  a	
  company	
  is	
  associated	
  with	
  more	
  than	
  a	
  single	
  contract	
  on	
  a	
  par@cular	
  project,	
  
the	
  edge	
  weight	
  well	
  be	
  propor@onal	
  to	
  the	
  overall	
  (total)	
  sum	
  of	
  values	
  of	
  all	
  the	
  
contracts	
  rela@ng	
  that	
  company	
  to	
  that	
  project.	
  
	
  

17	
  
As	
  well	
  as	
  using	
  space	
  (or	
  posi@on)	
  and	
  colour	
  to	
  represent	
  structural	
  elements	
  of	
  the	
  
network,	
  we	
  can	
  also	
  use	
  edge	
  weight	
  (that	
  is	
  the	
  thickness,	
  or	
  width)	
  of	
  the	
  lines	
  
connec@ng	
  nodes	
  to	
  each	
  other	
  to	
  represent	
  some	
  feature	
  of	
  the	
  network.	
  
	
  
In	
  this	
  case,	
  we	
  might	
  use	
  edge	
  weight	
  to	
  represent	
  the	
  value	
  of	
  contract	
  that	
  
connects	
  a	
  company	
  with	
  a	
  project,	
  or	
  the	
  number	
  of	
  contracts	
  that	
  a	
  company	
  has	
  
on	
  a	
  par@cular	
  project.	
  
	
  
When	
  placing	
  nodes,	
  we	
  might	
  also	
  use	
  edge	
  weight	
  to	
  contribute	
  to	
  the	
  
determina@on	
  of	
  how	
  closely	
  two	
  connected	
  nodes	
  should	
  be	
  placed	
  to	
  each	
  other.	
  If	
  
you	
  think	
  of	
  the	
  edge	
  thickness	
  in	
  terms	
  of	
  the	
  size,	
  thickness	
  or	
  strength	
  of	
  a	
  
mechanical	
  spring,	
  you	
  might	
  perhaps	
  start	
  to	
  imagine	
  how	
  nodes	
  connected	
  by	
  thick	
  
springs	
  will	
  be	
  pulled	
  closer	
  to	
  each	
  other	
  than	
  nodes	
  connected	
  by	
  much	
  weaker	
  
springs.	
  	
  
	
  
	
  

18	
  
As	
  well	
  as	
  edge	
  thickness,	
  we	
  might	
  also	
  make	
  use	
  of	
  node	
  size	
  to	
  highlight	
  some	
  
feature	
  of	
  the	
  network.	
  
	
  
In	
  this	
  example,	
  we	
  use	
  node	
  size	
  to	
  represent	
  the	
  degree	
  of	
  each	
  node,	
  that	
  is,	
  the	
  
number	
  of	
  edges	
  connected	
  to	
  it.	
  Some@mes,	
  we	
  might	
  want	
  to	
  highlight	
  nodes	
  that	
  
have	
  small	
  numbers	
  of	
  connec@ons,	
  for	
  example	
  to	
  iden@fy	
  projects	
  with	
  very	
  few	
  
companies	
  contracted	
  to	
  them.	
  In	
  this	
  case,	
  we	
  might	
  make	
  nodes	
  with	
  only	
  a	
  single	
  
incoming	
  edge	
  very	
  large,	
  and	
  nodes	
  with	
  large	
  number	
  of	
  edges	
  much	
  smaller.	
  
	
  
The	
  node	
  size	
  thus	
  represents	
  how	
  well	
  connected	
  a	
  node	
  is.	
  In	
  this	
  case,	
  the	
  size	
  of	
  
the	
  project	
  nodes	
  indicates	
  how	
  many	
  companies	
  are	
  associated	
  with	
  it,	
  and	
  the	
  size	
  
of	
  the	
  company	
  nodes	
  depicts	
  how	
  many	
  project	
  contracts	
  the	
  company	
  is	
  engaged	
  
with.	
  
	
  
Note	
  that	
  we	
  can	
  combine	
  edge	
  weight	
  and	
  node	
  size,	
  for	
  example,	
  by	
  seing	
  node	
  
size	
  propor@onal	
  to	
  the	
  summed	
  weights	
  of	
  edges	
  that	
  are	
  connected	
  to	
  the	
  node.	
  
	
  
Hopefully,	
  you	
  are	
  already	
  star@ng	
  to	
  see	
  how	
  a	
  network	
  diagram	
  can	
  provide	
  a	
  
range	
  of	
  powerful	
  visual	
  representa@ons	
  for	
  helping	
  us	
  explore	
  the	
  structure	
  of	
  
network	
  and	
  iden@fy	
  key	
  elements	
  of	
  it.	
  

19	
  
We	
  can	
  size	
  the	
  nodes	
  according	
  to	
  sta@s@cal	
  values	
  calculated	
  over	
  the	
  network.	
  
	
  
In	
  this	
  case,	
  we	
  might	
  want	
  to	
  highlight	
  nodes	
  according	
  to	
  the	
  total	
  value	
  of	
  
contracts	
  flowing	
  into	
  them	
  (for	
  companies)	
  or	
  out	
  of	
  them	
  (for	
  projects).	
  The	
  
weighted	
  average	
  sta@s@c	
  calculates	
  the	
  corresponding	
  value	
  for	
  each	
  node	
  in	
  the	
  
network.	
  
	
  
The	
  spline	
  operator	
  in	
  the	
  Ranking	
  tab	
  –	
  where	
  we	
  set	
  the	
  node	
  size	
  –	
  allows	
  us	
  to	
  
tweak	
  the	
  rela@onship	
  between	
  the	
  value	
  used	
  to	
  size	
  the	
  node	
  and	
  the	
  node	
  size.	
  
The	
  default	
  is	
  a	
  simple	
  linear	
  propor@onal	
  map.	
  However,	
  we	
  may	
  find	
  that	
  the	
  range	
  
of	
  values	
  we	
  want	
  to	
  map	
  are	
  “clumped”	
  together	
  (for	
  example,	
  one	
  very	
  large	
  value	
  
and	
  a	
  range	
  of	
  smaller	
  values	
  clumped	
  together	
  at	
  the	
  other	
  end	
  of	
  the	
  overall	
  
range).	
  In	
  such	
  a	
  case,	
  we	
  might	
  want	
  to	
  tweak	
  the	
  mapping	
  to	
  provide	
  a	
  liFle	
  more	
  
salience	
  when	
  it	
  comes	
  to	
  dis@nguishing	
  between	
  the	
  values	
  that	
  are	
  otherwise	
  
clumped	
  together.	
  
	
  
As	
  well	
  as	
  making	
  node	
  size	
  propor@onal	
  to	
  some	
  quan@ty,	
  we	
  can	
  also	
  set	
  the	
  label	
  
size	
  to	
  be	
  propor@onal	
  to	
  the	
  node	
  size.	
  

20	
  
There	
  are	
  several	
  other	
  tools	
  available	
  to	
  us	
  that	
  allow	
  us	
  to	
  explore	
  other	
  proper@es	
  
of	
  the	
  network.	
  For	
  example,	
  there	
  is	
  a	
  wide	
  selec@on	
  of	
  filters	
  that	
  allow	
  us	
  to	
  select	
  
par@cular	
  filtered	
  views	
  of	
  the	
  network.	
  
	
  
In	
  this	
  case,	
  we	
  use	
  the	
  degree	
  range	
  filter	
  to	
  show	
  only	
  nodes	
  that	
  have	
  degree	
  of	
  
two	
  or	
  more.	
  This	
  filters	
  out	
  nodes	
  that	
  have	
  degree	
  1	
  –	
  for	
  example,	
  companies	
  that	
  
are	
  only	
  associated	
  with	
  a	
  single	
  project.	
  The	
  result	
  is	
  a	
  view	
  over	
  the	
  network	
  that	
  
shows	
  which	
  companies	
  are	
  associated	
  with	
  two	
  or	
  more	
  projects,	
  and	
  which	
  
projects	
  they	
  are.	
  The	
  node	
  sizes	
  are	
  indica@ve	
  of	
  the	
  total	
  overall	
  vale	
  of	
  contracts	
  
associated	
  with	
  each	
  par@cular	
  node.	
  
	
  
So	
  for	
  example,	
  we	
  see	
  that	
  Siemens	
  AG	
  is	
  associated	
  with	
  contracts	
  from	
  projects	
  
P072018	
  and	
  P090104.	
  The	
  large	
  node	
  size	
  suggests	
  that	
  the	
  sum	
  total	
  of	
  contracts	
  
Siemens	
  AG	
  has	
  received	
  via	
  this	
  projects	
  is	
  quite	
  significant.	
  In	
  addi@on,	
  the	
  line	
  
from	
  P072018	
  to	
  Siemens	
  AG	
  suggests	
  that	
  the	
  total	
  value	
  of	
  contracts	
  (or	
  maybe	
  just	
  
a	
  single	
  contract)	
  Siemens	
  AG	
  has	
  received	
  from	
  that	
  project	
  is	
  quite	
  large.	
  

21	
  
So	
  far,	
  out	
  network	
  diagram	
  has	
  shown	
  us	
  how	
  companies	
  relate	
  to	
  projects,	
  and	
  
conversely,	
  how	
  projects	
  relate	
  to	
  companies.	
  
	
  
But	
  some@mes	
  we	
  may	
  want	
  to	
  know	
  rather	
  more	
  directly	
  the	
  extent	
  to	
  which	
  two	
  
things	
  are	
  connected	
  by	
  virtue	
  of	
  having	
  a	
  common	
  partner	
  –	
  for	
  example,	
  which	
  
companies	
  worked	
  on	
  the	
  same	
  projects	
  together,	
  or	
  which	
  projects	
  are	
  linked	
  by	
  
virtue	
  of	
  having	
  used	
  the	
  same	
  companies.	
  
	
  
When	
  the	
  data	
  is	
  represented	
  as	
  a	
  graph,	
  we	
  can	
  manipulate	
  the	
  graph	
  in	
  order	
  to	
  
generate	
  derived	
  graphs	
  that	
  can	
  capture	
  these	
  sorts	
  of	
  rela@onship	
  directly.	
  

22	
  
When	
  we	
  have	
  a	
  dataset	
  represented	
  in	
  the	
  form	
  of	
  a	
  network,	
  we	
  can	
  start	
  to	
  
analyse	
  it	
  by	
  looking	
  at	
  addi@onal	
  network	
  proper@es.	
  
	
  
For	
  example,	
  for	
  the	
  projects	
  and	
  companies	
  graph,	
  we	
  might	
  process	
  the	
  graph	
  so	
  as	
  
to	
  remove	
  project	
  nodes	
  and	
  replace	
  the	
  edges	
  with	
  edges	
  that	
  connect	
  companies	
  
that	
  were	
  on	
  one	
  or	
  more	
  project	
  with	
  each	
  other.	
  We	
  might	
  even	
  use	
  edge	
  weight	
  
to	
  depict	
  how	
  many	
  projects	
  there	
  were	
  in	
  common	
  between	
  two	
  companies.	
  

23	
  
From	
  the	
  workspace	
  menu,	
  duplicate	
  the	
  original	
  network	
  (remember	
  to	
  turn	
  off	
  all	
  
the	
  filters!	
  We	
  want	
  the	
  whole	
  network.)	
  
	
  
You	
  will	
  automa@cally	
  be	
  moved	
  to	
  a	
  new	
  workspace	
  containing	
  a	
  copy	
  of	
  the	
  original	
  
network.	
  (Navigate	
  between	
  workspaces	
  from	
  the	
  workspace	
  selector	
  at	
  the	
  boFom	
  
right	
  hand	
  corner	
  of	
  the	
  whole	
  applica@on	
  window.)	
  
	
  
In	
  the	
  Mul@mode	
  Networks	
  Projec@on	
  panel,	
  click	
  on	
  Graph	
  Coloring	
  to	
  try	
  to	
  split	
  
the	
  network	
  into	
  complementary	
  types	
  of	
  node	
  (companies	
  and	
  projects).	
  Hopefully,	
  
the	
  tool	
  will	
  return	
  with	
  the	
  report	
  that	
  Bipar22e:true.	
  That	
  is,	
  two	
  complementary	
  
sets	
  of	
  nodes	
  have	
  been	
  found	
  (nodes	
  in	
  the	
  first	
  group	
  are	
  only	
  ever	
  connected	
  to	
  
nodes	
  in	
  the	
  second	
  group.)Click	
  on	
  Load	
  aFributes	
  and	
  select	
  the	
  Node	
  Color	
  
Mul@mode	
  op@on.	
  
	
  

24	
  
To	
  check	
  what	
  the	
  mul@mode	
  tool	
  has	
  called	
  nodes	
  of	
  each	
  type,	
  click	
  on	
  the	
  edit	
  
buFon	
  in	
  the	
  paleFe	
  toolbar,	
  and	
  click	
  on	
  a	
  project	
  node.	
  An	
  edit	
  panel	
  will	
  appear	
  –	
  
make	
  a	
  note	
  of	
  what	
  colour	
  the	
  project	
  type	
  node	
  has	
  been	
  labeled.	
  
	
  
We	
  can	
  now	
  use	
  the	
  mul@mode	
  network	
  projec@on	
  tool	
  to	
  process	
  the	
  network	
  by	
  
joining	
  together	
  company	
  nodes	
  that	
  are	
  connected	
  by	
  a	
  common	
  project,	
  and	
  
dele@ng	
  the	
  project	
  nodes.	
  
	
  
That	
  is,	
  we	
  want	
  to	
  connect	
  blue	
  company	
  nodes	
  to	
  blue	
  company	
  nodes	
  if	
  they	
  are	
  
connected	
  by	
  edges	
  that	
  pass	
  through	
  a	
  common	
  red	
  project	
  node.	
  One	
  we	
  have	
  
made	
  the	
  mapping,	
  we	
  can	
  delete	
  the	
  inner	
  red	
  project	
  nodes.	
  
	
  
Running	
  the	
  projec@on	
  results	
  in	
  several	
  dis@nct	
  clusters	
  of	
  companies	
  that	
  are	
  
connected	
  to	
  each	
  other	
  by	
  virtue	
  of	
  being	
  associated	
  with	
  the	
  same	
  project,	
  as	
  well	
  
as	
  some	
  companies	
  that	
  bridge	
  different	
  clusters	
  by	
  virtueof	
  being	
  associated	
  with	
  
companies	
  from	
  different	
  projects.	
  

25	
  
Conversely,	
  we	
  might	
  remove	
  the	
  company	
  nodes,	
  and	
  iden@fy	
  a	
  new	
  set	
  of	
  edges	
  
that	
  connect	
  projects	
  that	
  shared	
  one	
  or	
  more	
  common	
  contracted	
  companies.	
  
Again,	
  edge	
  thickness	
  might	
  be	
  use	
  to	
  show	
  how	
  @ghtly	
  connected	
  two	
  projects	
  were	
  
by	
  virtue	
  of	
  increasing	
  numbers	
  of	
  common	
  contracted	
  companies.	
  

26	
  
By	
  projec@ng	
  the	
  original	
  network	
  onto	
  the	
  network	
  that	
  shows	
  links	
  between	
  
projects	
  that	
  arise	
  from	
  common	
  companies,	
  we	
  get	
  a	
  much	
  clearer	
  picture	
  about	
  
how	
  many	
  projects	
  there	
  are,	
  as	
  well	
  as	
  possible	
  linkages	
  between	
  them.	
  

27	
  
Here	
  are	
  some	
  of	
  the	
  things	
  you	
  have	
  hopefully	
  learned…feel	
  free	
  to	
  add	
  anything	
  
else	
  you	
  might	
  have	
  learned	
  to	
  the	
  list…	
  

28	
  
For	
  more	
  informa@on,	
  and	
  a	
  wide	
  range	
  of	
  further	
  tutorials	
  on	
  all	
  maFers	
  data	
  
related,	
  visit	
  the	
  School	
  Of	
  Data	
  at	
  SchoolOfData.org,	
  or	
  on	
  TwiFer	
  via	
  
@SchoolOfData.	
  

29	
  

Weitere ähnliche Inhalte

Ähnlich wie Scoda project companygraph

Tableau interview questions and answers
Tableau interview questions and answersTableau interview questions and answers
Tableau interview questions and answerskavinilavuG
 
Top tableau questions and answers in 2019
Top tableau questions and answers in 2019Top tableau questions and answers in 2019
Top tableau questions and answers in 2019minatibiswal1
 
Scoda company networks2
Scoda company networks2Scoda company networks2
Scoda company networks2Tony Hirst
 
Database mapping of XBRL instance documents from the WIP taxonomy
Database mapping of XBRL instance documents from the WIP taxonomyDatabase mapping of XBRL instance documents from the WIP taxonomy
Database mapping of XBRL instance documents from the WIP taxonomyAlexander Falk
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNabclearnn
 
Tableau free tutorial
Tableau free tutorialTableau free tutorial
Tableau free tutorialtekslate1
 
Bt0082 visual basic2
Bt0082 visual basic2Bt0082 visual basic2
Bt0082 visual basic2Techglyphs
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
PATTERNS07 - Data Representation in C#
PATTERNS07 - Data Representation in C#PATTERNS07 - Data Representation in C#
PATTERNS07 - Data Representation in C#Michael Heron
 
Big Data Madison: Architecting for Big Data (with notes)
Big Data Madison: Architecting for Big Data (with notes)Big Data Madison: Architecting for Big Data (with notes)
Big Data Madison: Architecting for Big Data (with notes)MIO | the data experts
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.combigclasses.com
 
Every student will have the opportunity to show the ability to con.docx
Every student will have the opportunity to show the ability to con.docxEvery student will have the opportunity to show the ability to con.docx
Every student will have the opportunity to show the ability to con.docxturveycharlyn
 
SharePoint - Crunch the Numbers Together
SharePoint - Crunch the Numbers TogetherSharePoint - Crunch the Numbers Together
SharePoint - Crunch the Numbers TogetherDavid J Rosenthal
 
Design PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docxDesign PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docxcarolinef5
 
CPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semesteCPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semesteCruzIbarra161
 
Excel vs Tableau the comparison you should know
Excel vs Tableau  the comparison you should knowExcel vs Tableau  the comparison you should know
Excel vs Tableau the comparison you should knowStat Analytica
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdfSunilMatsagar1
 

Ähnlich wie Scoda project companygraph (20)

Tableau interview questions and answers
Tableau interview questions and answersTableau interview questions and answers
Tableau interview questions and answers
 
Top tableau questions and answers in 2019
Top tableau questions and answers in 2019Top tableau questions and answers in 2019
Top tableau questions and answers in 2019
 
Scoda company networks2
Scoda company networks2Scoda company networks2
Scoda company networks2
 
My tableau
My tableauMy tableau
My tableau
 
Database mapping of XBRL instance documents from the WIP taxonomy
Database mapping of XBRL instance documents from the WIP taxonomyDatabase mapping of XBRL instance documents from the WIP taxonomy
Database mapping of XBRL instance documents from the WIP taxonomy
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARN
 
Tableau free tutorial
Tableau free tutorialTableau free tutorial
Tableau free tutorial
 
Bt0082 visual basic2
Bt0082 visual basic2Bt0082 visual basic2
Bt0082 visual basic2
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
PATTERNS07 - Data Representation in C#
PATTERNS07 - Data Representation in C#PATTERNS07 - Data Representation in C#
PATTERNS07 - Data Representation in C#
 
Big Data Madison: Architecting for Big Data (with notes)
Big Data Madison: Architecting for Big Data (with notes)Big Data Madison: Architecting for Big Data (with notes)
Big Data Madison: Architecting for Big Data (with notes)
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.com
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
Every student will have the opportunity to show the ability to con.docx
Every student will have the opportunity to show the ability to con.docxEvery student will have the opportunity to show the ability to con.docx
Every student will have the opportunity to show the ability to con.docx
 
SharePoint - Crunch the Numbers Together
SharePoint - Crunch the Numbers TogetherSharePoint - Crunch the Numbers Together
SharePoint - Crunch the Numbers Together
 
Design PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docxDesign PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docx
 
CPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semesteCPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semeste
 
Excel vs Tableau the comparison you should know
Excel vs Tableau  the comparison you should knowExcel vs Tableau  the comparison you should know
Excel vs Tableau the comparison you should know
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdf
 
Data Mining _ Weka
Data Mining _ WekaData Mining _ Weka
Data Mining _ Weka
 

Mehr von Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 

Mehr von Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Scoda project companygraph

  • 1. This  tutorial  describes  how  to  use  network  analysis  tools  to  visually  explore  the  links   between  companies  working  on  the  same  contract.   1  
  • 2. The  example  dataset  we  will  use  comes  from  the  World  Bank.     Each  row  represents  a  contract.  Inspec@ng  the  column  names  tells  us  what  data  we   have  available  about  each  contract.     Looking  at  the  data,  we  can  see  how  we  could  order  the  companies  based  on  the   value  of  the  total  contract  amount;  or  we  might  order  the  contracts  by  @me;  or  we   might  look  to  see  which  contracts  were  awarded  in  a  par@cular  project,  or  to  a   par@cular  company  in  the  event  of  the  same  company  being  awarded  more  than  one   contract.   2  
  • 3. We  might  also  wish  to  look  for  paFerns  in  the  data  that  show  us  how  the  things   described  in  one  row  might  connect  to  things  described  in  other  rows.     For  example,  can  we  organise  the  data  somehow  to  see  which  companies  are   associated  with  which  projects?  Could  a  network  style  visualisa@on  help  us  do  this?     3  
  • 4. But  if  we  were  to  draw  a  network,  what  sort  of  thing  should  we  connect  to  what?   And  how  would  would  know  what  to  connect  to  each  other?     One  way  is  to  look  at  the  data…  at  which  point  we  might  no@ce  that  some  of  entries   within  a  column  take  on  the  same  value.  This  means  that  we  can  “connect”  the  data   that  appears  in  different  rows  using  these  common  elements…   4  
  • 5. So  what  columns  have  usefully  repea@ng  elements?  The  projects  column  certainly   has  repea@ng  elements,  so  if  we  should  be  able  to  draw  diagrams  that  show  all  the   companies  that  connect  to  each  project.  And  if  a  company  is  associated  with  more   than  one  project,  it  should  in  a  certain  sense  be  seen  to  join  those  projects  together…     5  
  • 6. A  few  of  the  contract  numbers  repeat,  so  it  might  be  interes@ng  to  explore  the  extent   to  which  companies  connect  to  contracts.  If  two  different  companies  are  associated   with  the  same  contracts,  that  might  be  interes@ng.     6  
  • 7. Let’s  get  some  data  so  we  can  start  to  explore  the  network…   7  
  • 8. We  just  need  to  do  a  liFle  bit  of  @dying  of  the  data  before  we  make  use  of  it.     The  major  problem  is  that  the  Total  Contract  Amount  column  does  not  contain   numbers,  as  such…  In  par@cular,  we  need  to  get  rid  of  the  dollar  sign.  Let’s  create  a   new  column  into  which  we  can  put  the  cleaned  values.   8  
  • 9. This  liFle  bit  of  code  says:  take  the  value  of  each  cell  in  the  original  column  and   replace  the  $  symbol  with  nothing  (that  is,  an  empty  string).  In  other  words,  delete   the  dollar  sign…  Put  this  value  in  the  corresponding  cell  of  the  new  column,  and  make   the  cell  a  number  type.   9  
  • 10. Now  we  can  export  the  data  using  the  Custom  Tabular  Exporter,  which  allows  us  to   select  just  those  columns  we  want  to  export.  (This  can  be  very  handy  when  a  table   has  a  large  number  of  columns  that  we  are  not  interested  in!)     I  have  rearranged  the  cells  in  the  Custom  Tabular  Exporter  simply  by    clicking  on   them  and  dragging  them  around.  We  just  want  three  columns  for  now:  Project  ID,   Supplier,  and  our  new  Amount  column.     Now  that  you  know  how  to  export  the  data  just  a  few  columns  at  a  @me,  once  you   are  comfortable  with  the  process  of  visualising  the  data,  you  should  be  able  to  take   other  slices  through  the  data  (such  as  companies  related  to  contracts)  and  visualise   them  yourself.     You  might  also  like  to  try  using  a  similar  method  on  a  data  set  of  your  own…   10  
  • 11. There’s  a  final  bit  of  @dying  to  do  before  we  can  use  this  data  in  Gephi,  the   applica@on  we’ll  be  using  to  visualise  the  network.     In  par@cular,  Gephi  expects  the  data  to  be  presented  to  it  with  par@cular  column   names.     Open  the  exported  CSV  data  in  a  text  editor  and  rename  the  columns:   Source,Target,Weight  (no  spaces?)     Note  –  you  could  have  also  renamed  the  columns  in  OpenRefine  before  expor@ng   them…   11  
  • 12. We  might  also  wish  to  look  for  paFerns  in  the  data  that  show  us  how  the  things   described  in  one  row  might  connect  to  things  described  in  other  rows.     For  example,  can  we  organise  the  data  somehow  to  see  which  companies  are   associated  with  which  projects?  Could  a  network  style  visualisa@on  help  us  do  this?     12  
  • 13. Network  diagrams  allow  us  to  show  rela@onships  between  different  things.  Networks   are  referred  to  in  mathema@cal  terms  as  graph  structures,  or  graphs.  You  may  be   more  familiar  with  thinking  of  things  like  line  charts  and  bar  charts  as  graphs,  but   when  it  comes  to  network,  we  use  the  term  graph  to  describe  the  mathema@cal   structure  that  defines  the  network.     The  circles  –  or  nodes  –  represent  “things”  in  the  network,  in  this  case,  par@cular   companies  or  projects.     The  lines  –  or  edges  –  represent  rela@onships  between  the  things  in  the  network.  In   this  example,  the  edges  represent  contracts  that  associate  a  par@cular  company  with   one  or  more  projects,  (or  conversely,  associate  a  project  with  one  or  more   companies).     Where  nodes  are  placed  in  the  diagram  can  be  used  to  convey  informa@on  about  the   structure  of  the  network.  Many  different  algorithms  exist  to  lay  out  (that  is,  place,  or   posi@on)  the  nodes  at  specific  points  in  the  diagram.  Typically,  we  try  to  place  nodes   that  are  heavily  interconnected  by  edges  close  to  each  other.  Nodes  that  are  grouped   closely  together  on  the  page  might  then  be  assumed  to  be  associated  in  some  way   because  of  the  increasing  number  of  links  that  connect  them  to  each  other.     13  
  • 14. Launch  Gephi  and  from  the  File  menu  select  New  Project.  Click  on  the  Data   Laboratory  tab,  and  then  Import  Spreadsheet.     Load  in  the  file  (with  amended  column  names)  as  an  Edges  Table.  The  default  seings   should  be  fine…   14  
  • 15. Click  on  the  Overview  tab  –  you  should  see  the  network  that  connects  Companies  to   Project  IDs  displayed  there…     But  what  does  it  mean?  And  can  we  @dy  it  up  a  liFle?!   15  
  • 16. I  used  the  Yifan  Hu  layout  to  generate  this  view  over  the  network.     Yifan  Hu  is  a  good  all  round  layout  engine  that  works  par@cularly  well  when  the  data   is  hierarchically  structured.     Another  good  general  purpose  layout  algorithm  is  ForeceAtlas2.   16  
  • 17. Whilst  we  might  get  a  feeling  for  the  structure  and  shape  of  the  dataset  as  a  whole   from  the  overall  visualisa@on,  we  oken  want  to  inspect  one  or  more  of  the  nodes  in   detail.     The  quickest  way  of  doing  this  is  to  look  at  the  labels…     You  may  also  have  no@ced  that  the  edge  thickness  is  thicker  for  some  lines  than   others.  In  this  case,  the  line  thicknesses  are  propor@onal  to  the  contract  value,  which   we  set  in  the  weight  column.       If  a  company  is  associated  with  more  than  a  single  contract  on  a  par@cular  project,   the  edge  weight  well  be  propor@onal  to  the  overall  (total)  sum  of  values  of  all  the   contracts  rela@ng  that  company  to  that  project.     17  
  • 18. As  well  as  using  space  (or  posi@on)  and  colour  to  represent  structural  elements  of  the   network,  we  can  also  use  edge  weight  (that  is  the  thickness,  or  width)  of  the  lines   connec@ng  nodes  to  each  other  to  represent  some  feature  of  the  network.     In  this  case,  we  might  use  edge  weight  to  represent  the  value  of  contract  that   connects  a  company  with  a  project,  or  the  number  of  contracts  that  a  company  has   on  a  par@cular  project.     When  placing  nodes,  we  might  also  use  edge  weight  to  contribute  to  the   determina@on  of  how  closely  two  connected  nodes  should  be  placed  to  each  other.  If   you  think  of  the  edge  thickness  in  terms  of  the  size,  thickness  or  strength  of  a   mechanical  spring,  you  might  perhaps  start  to  imagine  how  nodes  connected  by  thick   springs  will  be  pulled  closer  to  each  other  than  nodes  connected  by  much  weaker   springs.         18  
  • 19. As  well  as  edge  thickness,  we  might  also  make  use  of  node  size  to  highlight  some   feature  of  the  network.     In  this  example,  we  use  node  size  to  represent  the  degree  of  each  node,  that  is,  the   number  of  edges  connected  to  it.  Some@mes,  we  might  want  to  highlight  nodes  that   have  small  numbers  of  connec@ons,  for  example  to  iden@fy  projects  with  very  few   companies  contracted  to  them.  In  this  case,  we  might  make  nodes  with  only  a  single   incoming  edge  very  large,  and  nodes  with  large  number  of  edges  much  smaller.     The  node  size  thus  represents  how  well  connected  a  node  is.  In  this  case,  the  size  of   the  project  nodes  indicates  how  many  companies  are  associated  with  it,  and  the  size   of  the  company  nodes  depicts  how  many  project  contracts  the  company  is  engaged   with.     Note  that  we  can  combine  edge  weight  and  node  size,  for  example,  by  seing  node   size  propor@onal  to  the  summed  weights  of  edges  that  are  connected  to  the  node.     Hopefully,  you  are  already  star@ng  to  see  how  a  network  diagram  can  provide  a   range  of  powerful  visual  representa@ons  for  helping  us  explore  the  structure  of   network  and  iden@fy  key  elements  of  it.   19  
  • 20. We  can  size  the  nodes  according  to  sta@s@cal  values  calculated  over  the  network.     In  this  case,  we  might  want  to  highlight  nodes  according  to  the  total  value  of   contracts  flowing  into  them  (for  companies)  or  out  of  them  (for  projects).  The   weighted  average  sta@s@c  calculates  the  corresponding  value  for  each  node  in  the   network.     The  spline  operator  in  the  Ranking  tab  –  where  we  set  the  node  size  –  allows  us  to   tweak  the  rela@onship  between  the  value  used  to  size  the  node  and  the  node  size.   The  default  is  a  simple  linear  propor@onal  map.  However,  we  may  find  that  the  range   of  values  we  want  to  map  are  “clumped”  together  (for  example,  one  very  large  value   and  a  range  of  smaller  values  clumped  together  at  the  other  end  of  the  overall   range).  In  such  a  case,  we  might  want  to  tweak  the  mapping  to  provide  a  liFle  more   salience  when  it  comes  to  dis@nguishing  between  the  values  that  are  otherwise   clumped  together.     As  well  as  making  node  size  propor@onal  to  some  quan@ty,  we  can  also  set  the  label   size  to  be  propor@onal  to  the  node  size.   20  
  • 21. There  are  several  other  tools  available  to  us  that  allow  us  to  explore  other  proper@es   of  the  network.  For  example,  there  is  a  wide  selec@on  of  filters  that  allow  us  to  select   par@cular  filtered  views  of  the  network.     In  this  case,  we  use  the  degree  range  filter  to  show  only  nodes  that  have  degree  of   two  or  more.  This  filters  out  nodes  that  have  degree  1  –  for  example,  companies  that   are  only  associated  with  a  single  project.  The  result  is  a  view  over  the  network  that   shows  which  companies  are  associated  with  two  or  more  projects,  and  which   projects  they  are.  The  node  sizes  are  indica@ve  of  the  total  overall  vale  of  contracts   associated  with  each  par@cular  node.     So  for  example,  we  see  that  Siemens  AG  is  associated  with  contracts  from  projects   P072018  and  P090104.  The  large  node  size  suggests  that  the  sum  total  of  contracts   Siemens  AG  has  received  via  this  projects  is  quite  significant.  In  addi@on,  the  line   from  P072018  to  Siemens  AG  suggests  that  the  total  value  of  contracts  (or  maybe  just   a  single  contract)  Siemens  AG  has  received  from  that  project  is  quite  large.   21  
  • 22. So  far,  out  network  diagram  has  shown  us  how  companies  relate  to  projects,  and   conversely,  how  projects  relate  to  companies.     But  some@mes  we  may  want  to  know  rather  more  directly  the  extent  to  which  two   things  are  connected  by  virtue  of  having  a  common  partner  –  for  example,  which   companies  worked  on  the  same  projects  together,  or  which  projects  are  linked  by   virtue  of  having  used  the  same  companies.     When  the  data  is  represented  as  a  graph,  we  can  manipulate  the  graph  in  order  to   generate  derived  graphs  that  can  capture  these  sorts  of  rela@onship  directly.   22  
  • 23. When  we  have  a  dataset  represented  in  the  form  of  a  network,  we  can  start  to   analyse  it  by  looking  at  addi@onal  network  proper@es.     For  example,  for  the  projects  and  companies  graph,  we  might  process  the  graph  so  as   to  remove  project  nodes  and  replace  the  edges  with  edges  that  connect  companies   that  were  on  one  or  more  project  with  each  other.  We  might  even  use  edge  weight   to  depict  how  many  projects  there  were  in  common  between  two  companies.   23  
  • 24. From  the  workspace  menu,  duplicate  the  original  network  (remember  to  turn  off  all   the  filters!  We  want  the  whole  network.)     You  will  automa@cally  be  moved  to  a  new  workspace  containing  a  copy  of  the  original   network.  (Navigate  between  workspaces  from  the  workspace  selector  at  the  boFom   right  hand  corner  of  the  whole  applica@on  window.)     In  the  Mul@mode  Networks  Projec@on  panel,  click  on  Graph  Coloring  to  try  to  split   the  network  into  complementary  types  of  node  (companies  and  projects).  Hopefully,   the  tool  will  return  with  the  report  that  Bipar22e:true.  That  is,  two  complementary   sets  of  nodes  have  been  found  (nodes  in  the  first  group  are  only  ever  connected  to   nodes  in  the  second  group.)Click  on  Load  aFributes  and  select  the  Node  Color   Mul@mode  op@on.     24  
  • 25. To  check  what  the  mul@mode  tool  has  called  nodes  of  each  type,  click  on  the  edit   buFon  in  the  paleFe  toolbar,  and  click  on  a  project  node.  An  edit  panel  will  appear  –   make  a  note  of  what  colour  the  project  type  node  has  been  labeled.     We  can  now  use  the  mul@mode  network  projec@on  tool  to  process  the  network  by   joining  together  company  nodes  that  are  connected  by  a  common  project,  and   dele@ng  the  project  nodes.     That  is,  we  want  to  connect  blue  company  nodes  to  blue  company  nodes  if  they  are   connected  by  edges  that  pass  through  a  common  red  project  node.  One  we  have   made  the  mapping,  we  can  delete  the  inner  red  project  nodes.     Running  the  projec@on  results  in  several  dis@nct  clusters  of  companies  that  are   connected  to  each  other  by  virtue  of  being  associated  with  the  same  project,  as  well   as  some  companies  that  bridge  different  clusters  by  virtueof  being  associated  with   companies  from  different  projects.   25  
  • 26. Conversely,  we  might  remove  the  company  nodes,  and  iden@fy  a  new  set  of  edges   that  connect  projects  that  shared  one  or  more  common  contracted  companies.   Again,  edge  thickness  might  be  use  to  show  how  @ghtly  connected  two  projects  were   by  virtue  of  increasing  numbers  of  common  contracted  companies.   26  
  • 27. By  projec@ng  the  original  network  onto  the  network  that  shows  links  between   projects  that  arise  from  common  companies,  we  get  a  much  clearer  picture  about   how  many  projects  there  are,  as  well  as  possible  linkages  between  them.   27  
  • 28. Here  are  some  of  the  things  you  have  hopefully  learned…feel  free  to  add  anything   else  you  might  have  learned  to  the  list…   28  
  • 29. For  more  informa@on,  and  a  wide  range  of  further  tutorials  on  all  maFers  data   related,  visit  the  School  Of  Data  at  SchoolOfData.org,  or  on  TwiFer  via   @SchoolOfData.   29