PS1 Final

Networks: Fall 2018 Homework 1
David Easley & Austin Benson Due 11:15am, Friday, September 7, 2018
As described on the course home page, homework solutions must be typed and submitted
by uploading a PDF file to the CMS site at https://cmsx.cs.cornell.edu/. It is fine to
write the homework in another format such as Word, as long as it’s saved out as PDF. (From
Word, for example, you can save files into PDF format.)
The CMS site will stop accepting homework uploads after the posted due date. We cannot
accept late homework except for University-approved excuses (which include illness, a family
emergency, or travel as part of a University sports team or other University activity).
Reading: The questions below are primarily based on the material in Chapters 2, 3, and 5
of the book. All of the course material related to this problem set will be covered by the end
of Friday’s lecture on August 31.
1
(1) A group of social psychologists is studying the rivalries among students at a middle
school, where each student is in grade 6, 7, or 8. They’ve found that there are rivalries among
students in grades that are directly adjacent to each other (i.e., between 6th graders and 7th
graders and between 7th graders and 8th graders). Other than these rivalries, everyone in
the school gets along with each other. Specifically, let’s abstract this finding by saying that
for any two students A and B at the school:
(i) A and B are friendly to each other if they are in the same grade;
(ii) A and B are not friendly to each other if their grades differ by 1 (that is, if one is in
grade n and the other is in grade n + 1, for some value of n); and finally
(iii) A and B are friendly to each other if they are in grades that differ by at least two (so
that they are not in the same grade or in adjacent grades).
The social psychologists decide to analyze this situation using the notion of structural
balance, so they build a signed complete graph on the set of all students, labeling an edge
between two students with a “+” if the two students are friendly to each other, and labeling
the edge with a “-” if the two students are not friendly to each other.
(a) Given assumptions (i), (ii), and (iii) above, does the social network of the middle
school satisfy the Structural Balance Property? Provide an explanation for why it either
does or doesn’t satisfy this property.
(b) The social psychologists perform a similar study on a high school, where each student
is in grade 9, 10, 11, or 12. They find that the same pattern of rivalries and friendliness holds:
students are friendly to each other unless they are in adjacent grades. In other words, the
model assumptions (i), (ii), and (iii) still hold. They now build a signed complete graph on
the set of all students at the high school. Does the social network of the high school satisfy
the Structural Balance Property? Provide an explanation for why it either does or doesn’t
satisfy this property.
2
(2) A group of TV critics are discussing the history of a TV series that ran for 5 seasons.
The show had six main characters, who appeared during different overlapping times of the
show. Naming the six main characters KG, FS, SB, OL, MS, ST, the following table shows
the seasons when each of them was part of the show.
Seasons
Character 1 2 3 4 5
KG 7 7 7 7 7
SB 7 7 7
FS 7
OL 7 7 7
MS 7 7 7
ST 7
The critics want to understand how this set of characters evolved over the five seasons
using the language of social networks. They build a social network on the six characters:
each pair of characters has a strong tie if they overlapped on the TV show for at least two
seasons; a weak tie if they overlapped on the TV show for just one season; and no link at all
if they didn’t appear on any season together.
(a) Using this way of constructing the social network, draw the network on the six
characters, labeling each edge as a strong or weak tie.
(b) Which characters in the resulting network satisfy the Strong Triadic Closure Property,
and which do not? (Recall that unless a node violates the property, following the definition
from the book, it is said to satisfy it.) Provide an explanation for your answer.
(c) The critics also want to identify which subsets of the six main characters formed
tightly-connected “clusters.” Formally, let’s say that a cluster is a set S of at least two
characters that satisfies the following two properties: (i) every pair of characters in S is
connected by a strong or a weak tie; and (ii) there is at most one character not in S that has
a strong tie to a character in S (although this character could have more than one strong tie
to the cluster).
With this definition, list all the clusters amongst the six characters. Do any characters
appear in all of the clusters? If so, which ones? Give an explanation for your answer.
3
(3) In the previous question, we took a timeline with a set of time intervals (the seasons
when each character was part of a TV series) and built a graph from this data: there was a
node for each character’s time interval, and an edge between characters if their time intervals
overlapped. Now let’s consider going in the opposite direction: starting with a graph, can we
determine if it corresponds to the overlaps of time intervals?
Here’s how a question like this might come up. Suppose you’re working with a set of
historians, and they’re studying a set of documents from the sixteenth century that describe
a set of meetings of a council populated by members of the nobility that met over a few years.
Different members of the nobility belonged to the council for different intervals of time. The
documents don’t have dates on them (or at least, the historians haven’t been able to figure
out the scheme for recording dates), but the documents do contain information about who
attended meetings together, as follows
A and B attenˇd t˙ meeting toget˙r

B and C were at t˙ meeting of t˙ noble council
C and D ˇ`ted at t˙ council meeting
Here’s what the historians would like to do. For each person X in the documents, they
plan to hypothesize a contiguous time interval during which X was a member of the council.
(That is, their hypothesis is that each person X had a start date when they joined the council,
and then they served continuously until their end date; no one ever left the council and then
joined it again later.) They’d like to create these intervals in such a way that for every pair of
council members who are mentioned in the documents as attending a meeting together, their
time intervals overlap, and for every pair of council who are never mentioned as attending a
meeting together, their time intervals should not overlap.
So starting from the documents, they first create a graph: there is a node for each person
X, and for two people X and Y , there is an edge precisely when X and Y are mentioned as
having attended a meeting together. They then ask you whether you can create a set of time
intervals, one for each node in their graph, so that the time intervals for two nodes overlap
when they are connected by an edge, and the time intervals for two nodes do not overlap
when they are not connected by an edge. (Note that the historians are not concerned with
the distinction between strong ties and weak ties in their work, only with the presence or
absence of an edge.)
B D
A C
A B C D
Time
(a) (b)
Figure 1
4
Here’s an example for how this works. The historians start with a set of documents
and turn them into a graph as in Figure 1(a), showing that A and B attended meetings
together, B and C attended meetings together, and C and D attended meetings together.
But there is no evidence in the documents that A attended meetings with C or D, nor is
there evidence that B attended meetings with D, so there are no edges between these pairs
of nodes. Finally, they would like a set of time intervals as in Figure 1(b) so that A overlaps
B, B overlaps C, and C overlaps D, but A does not overlap C or D, and B does not overlap
D. That corresponds to a valid hypothesis for people’s membership on the council, given the
documents they have.
Now let’s try this on some other datasets. Suppose that the historians have found new
texts from three other councils. Using the same procedure as above, they have constructed
the three graphs in Figure 2.
B B G
B
A A
A C D C D
C D E E F
(a) (b) (c)
Figure 2: Graphs constructed by historians based on texts of council meetings from three
different councils.
For each of the the graphs in Figure 2(a), Figure 2(b), and Figure 2(c), can you construct
a set of time intervals, one for each node, so that the time intervals overlap for each pair of
nodes connected by an edge, and the time intervals do not overlap for each pair of nodes not
connected by an edge?
For each graph, give one of two possible answers:
(i) A set of time intervals with this property; or
(ii) A brief explanation for why it is not possible to construct a set of time intervals with
this property—that is, why there cannot be a set of time intervals with the desired
pattern of overlaps corresponding to the edges in the graph. You do not need to provide
a complete proof for why it is not possible; a brief explanation will suffice.
5
(4) Recall from the first lecture the example of an ego network, which is a graph whose
node set is an ego (representing one person), along with all of the friends of the ego. There is
an edge between the ego and each of their friends, as well as edges between friends of the ego
that are also friends themselves.
Suppose that you are a social media company studying the ego network of person A and
their f friends, where there are m additional friendships amongst the friends of A. Thus, the
ego network has f + m edges in total.
The following fact from class will be useful in this problem. If a node u is in exactly d
edges, then the number of pairs of friends of u in the graph is “d choose 2”, which is denoted
by d2 and equal to d(d − 1)/2.
(a) What are the minimum and maximum number of possible friendships in the ego
network? Write your answers as expressions in terms of f and provide a brief explanation of
your answer.
(b) Recall from class and the textbook that the clustering coefficient of a node X is
the fraction of the pairs of friends of X that are friends themselves. What is the clustering
coefficient of the ego A in the ego network? Write your answer as an expression in terms of f
and m and provide a brief explanation.
(c) What is the minimum and maximum number of local bridges amongst the friends of
A (i.e., local bridges that do not contain A)? Provide a brief explanation.
(d) Suppose that s ≤ f of A’s friendships are strong ties and that A satisfies the Strong
Triadic Closure property. What is the minimum value of m, as an expression of s? Provide a
brief explanation.
(e) Suppose just for this part of the question that all of A’s friends are also friends, which
makes the ego network a complete graph. Further suppose that A has a positive relationship
with all their friends. We can interpret this as a signed complete graph with positive signs
from A to each of the friends of A and an unknown sign for the edges between friends of A.
Our goal is to make the network structurally balanced by labeling the edges amongst
the friends of A as either positive or negative. One way to do this is to label all edges to
be positive, in which case there are no negative links and hence no unbalanced triangles. Is
there a way to label the signs with at least one negative edge so that the network satisfies
the Structural Balance Property?
If you answer “yes”, describe how you could divide the nodes into two sets X and Y ,
as in lecture and the book, so everyone in X has a positive relationship with each other,
everyone in Y has a positive relationship with each other, and everyone in X has a negative
relationship with everyone in Y (negative sign). If you answer “no”, explain why.
(f ) Consider the social network obtained by removing the ego and their connections
from the network. In other words, we consider only the friendships amongst the f friends of
A. What are the minimum and maximum number of connected components in this social
network? Provide a brief explanation of your answer.
6
farms
farms town
highway
Figure 3
(5) Consider a rural county in which a set of farms is arranged on a 50-mile stretch of
east-west highway. We do not know the exact distances among all the farms, but we do know
that there is a town in the middle of this stretch of highway, as shown in Figure 3, with some
farms to the west of the town and other farms to the east of the town.
Suppose that there is a social network among the people who live on these farms with the
following structure.
• If two people live on farms on the same side of town (i.e. if they can travel between
their farms on the highway without passing through the town), then they are linked by
a strong tie.
• If two people live on farms on opposite sides of the town (i.e. if the trip between their
farms on the highway passes through the town), then they are linked by a strong tie
provided they are at most 5 miles apart and linked by a weak tie if they are more than
5 and at most 10 miles apart. If they are on opposite sides of the town and also more
than 10 miles apart, then they have no edge between them at all in the social network.
In this social network, do all nodes satisfy the Strong Triadic Closure property? Your
answer can be “yes,” “no,” or “without knowing the exact distances among the farms, there
is not enough given information to tell.” Explain your answer, by either describing why all
nodes must satisfy the property, or by describing why there must be a node that violates the
property, or by arguing why there is not enough information to tell.
7
Y
X U
W R
P
T
V S
Q
Figure 4
(6) Suppose you’re helping a large social media site refine its algorithms for identifying
spam accounts. In particular, they’re studying the network of communications, where there
is an edge between nodes A and B if one of A or B sent a message to the other.
They hope to identify spam accounts based on the following distinction. Real accounts
correspond to people who are communicating with their friends, whereas spam accounts tend
to communicate with people more randomly (since they’re fake, they don’t have a genuine set
of friends to communicate with, and they tend not to be so controlled in whom they initiate
communications with).
To refine the algorithms for identifying spam accounts, the site integrity group at the
social media platform holds meetings where they look at small portions of the network and
discuss which account they think the algorithm should rank as most likely to be a spammer.
They’re currently discussing a small portion of the network corresponding to the graph
shown in Figure 4. If one of the nodes in the picture corresponded to a spam account, while
all the rest corresponded to real users, which node do you think is most likely to be the
spam account? Give an explanation for your answer, including some discussion of how your
explanation relates to principles from class.

PS1 Final

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

PS1 Final

Enviado por

Direitos autorais:

Formatos disponíveis

Networks: Fall 2018 Homework 1

A and B attenˇd t˙ meeting toget˙r

Você também pode gostar