We want to discuss the logic. Through which networks form. And by that, I mean we want to think of these nodes as being agents in a way. These nodes make decisions about who and which other nodes to connect to. And, as they make those decisions, that’s gonna result in a structure to the network.
我们要探讨的是网络形成的逻辑。也就是说,我们要把这些节点视作某种代理。这些节点会决定与其他哪些节点建立连接。随着它们做出这些决定,网络的结构也就形成了。
So we’re also gonna see how that process works, and how we go really from the micro-level actions of the nodes, which are making these connections, to macro-level properties of the network. So the network structure’s gonna be something that emerges through these micro-level interactions. We’re gonna focus on three very simple ways from in networks might form.
所以,我们还要看看这个过程是如何运作的,以及我们如何从节点做出这些连接的微观层面行为,过渡到网络的宏观层面特性。所以,网络结构是通过这些微观层面的互动而形成的。
The first one’s gonna be random connections, where each node randomly decides to connect to other nodes. The second one is gonna be a small world model. And the is, is gonna work as follows. Each person is gonna have some friends that are, sort of, a part of a clique. They’re sort of nearby. And then some random friends that they randomly connect to. So we’re gonna start out by having people connected to just people near them. And then assume that they sort of rewire, in a way, and randomly connect to some people who are further away in social space? Cause this is what a lot of social networks look like.
And the last thing we’re gonna do is we’re gonna look at something called a preferential attachment network. And this has been used to describe the internet. And the World Wide Web. And the idea here is the following. It’s that you’re more likely to connect to more connected nodes. So, that’s true certainly in the World Wide Web. [inaudible] like it links into Google. Or links into major universities because they’re more connected than we are to link to less connected things. And we’ll see how that micro-level rule results in properties in the network globally, structure in terms of measures that might not have been anticipated.
Okay, so let’s get started.
Random networks, here’s what you do. You assume there are N nodes, and p is the probability that you can. Back to another node. So what you do is you draw a bunch of nodes. Like this. And then with just some probability. You draw connections between the nodes randomly, and you get some sort of random graph. Now this has been studied. They’re sometimes called Aridish Rainy networks. These have been studied in great detail, and here’s a really interesting result. It turns out you get a contextual tipping point. For a large [inaudible]. That means you have a lot, a large number of nodes. The network almost always becomes connected if P, the probability of being connected, is bigger than 1/N-1. So, what you can get is here’s our graph. Here’s the probability of being connected, which is one of the measures we care about. And here’s 1/N-1. And what you get is you get a graph that looks like this. You get a tip. So you’re less than one of N minus one for large N, odds are you’re not connected. And once you get bigger than one of N minus one, odds are you are fully connected. So it’s cool. A tipping point within networks. Now, these random graphs, you can prove a lot of nice results about random graphs, but the thing is, that’s not what real social networks look like. Remember, here’s that middle school graph from James Moody. That doesn’t look random. Here’s the Adult Network graph from the Friendly Ham study, which doesn’t look random either, and here’s an email network. That doesn’t look random either. So, what we’d like to do is have a network that looks more like what real networks look like.
And this leads to the small-world graph idea, this is due to Duncan Watts, who is a network theorist, and Steve Strogatz. So here’s the idea behind the Small World network. You’ve got a pre-, percentage of friends who are local. Or near you, like sort of click friends. And then you have other friends who are more random. So your local or click friends would be the people you work with, the people you live near, that sort of thing. And your random friends are the people you went to summer camp with, or your old college roommate, things like that. So if you look at real-world social networks, they’ve got these sorts of clicks of people that interact, and then each person has sort of a random friend somewhere else, and those random friends also belong to the clicks a little bit.
So let’s look at that network and see how it works. So we’re gonna open a net logo model here. And what we can do is we can set this thing up, and we’re just gonna put people in a circle. Now, the thing to notice is that in the circle, each person is connected to people on either side, and there are 40 nodes, and what I can do is I can start rewiring people. One at a time. So that they’ve got more and more. Randomly rewired nodes, and then what I can do is I can ask, what’s happening to the graph? Now, notice here on the graph, you see the clustering coefficient, and over here, you see average path length. And this graph you see up here on the upper left, the green line, so it’s the clustering coefficient. And the red line shows average path length. So as I rewire, what you can see is that the average path length is falling, but the clustering coefficient is. He’s also falling. So let’s do this one more time. I’m going to sit here and I’m going to rewire. Notice the clustering coefficient starts at half, and the average path length is 5.4. Now, as I start rewiring, the clustering coefficient falls, and that’s because before, everybody was connected just to their neighbors around them so there were lots of clusters. So the clustering coefficient falls. And the average path length. It’s falling fast, it’s not under 2.8, 2.7, and so on and so again we see this every time we run this, we see a decrease in the [inaudible] coefficient and a decrease in the average path length as we construct this small world network. So what we get is is people have more random friends, we get less clustering, and we also get an average, a shorter average distance between people. So that’s the Small Rope model. We’ll come back to that a little bit in the next lecture. Cuz the Small Rope model, small rope network actually tells us a lot about social network systems, and it can help us explain the six degrees of separation property that I talked about in the first lecture.
Now, let’s go to our third way that nodes can connect unannounced, and this is called preferential attachment. In preferential attachment, nodes arrive on the scene, and then they get to decide, who do I connect to? So think of yourself moving into a new city, or think of you creating a webpage. So your webpage is your node. Now, you’ve got to decide what other webpages I connect to? We’re gonna assume that the probability of connecting to an existing node is proportional to the number of connections it already has. So, here’s an example. Here’s a network, and some new node comes in. And it’s trying to think about what are the, what’s the likelihood I should connect each one of these nodes? Well, this node only has one network connection, one edge. This one has four. And so, what’s gonna happen is, when this node comes in, it’s gonna be four times as likely to connect to here as it is to over here. So the probability of connecting to someone is proportional to the number of nodes it already has. Let’s look at a Net Logo model of preferential attachment. Now, I’m gonna set this up, and what’s gonna happen is, I’m gonna let this go once. I’m just gonna add one node at a time for a second. And what you’re seeing here on these different graphs. The top graph shows the degree distribution. So there’s one node; these are the nodes that have one edge. Connected to it, and this is in the nodes that have one of them has a degree of six, and this lot below is going to plot that redistribution on a log-log plot. So let’s let this thing run. And this is always sort of a fun video because you can see these nodes being added and drawing new connections. And I want you to notice two things. Let me stop it for a second. I want you to notice two things. First is some nodes are way more connected to others, and that’s captured in this degree distribution. There are a lot of nodes that only have one, and then there’s a handful of nodes, including one that has 23. Now I can resize the nodes by the number of connections, and that gives you another way to see that these degree distributions are always skewed. So there are some nodes with very few. Very low degree, and there are some with a lot. So if I let this run even further, what you see is that degree di-, distribution stays skewed, like there’s some nodes that have lots of awful connections and then others that have very, very few. So this preferential attachment rule results in this very skewed degree distribution, which is interesting. Now we saw that happen once, but just to prove it happens every time, let’s run it again, and I’ll put the re-size node button on, and again, you see the same sort of degree distribution showing up where. Sub-nodes are highly connected, others are less connected, but we’re getting a very different network. So one of the interesting things is the distribution properties [inaudible] extend the same, even though the exact network you’re gonna get is going to be a little bit different depending on how the process plays out. Well, guess what? That means it’s dependent. So the exact network you’re gonna get is gonna be path dependent. The equilibrium distribution of degrees is not gonna be path dependent. You’re almost always gonna get the same degree distribution. But the network you get is gonna depend a lot on the particular history. So we’ve got sort of path-dependent outcomes, but we don’t have a path-dependent equilibrium in terms of the degree distribution. That we know. Just from this process. At least we figured it out. It’s, you know, a physicist network there is to figure it out, that you’re always going to get this degree distribution. And this is the way to think of it. What you’ll get inaudible] long tail of degree distribution. So what that means is, there’s a whole bunch of nodes that have a degree of one. And then there’s a handful of nodes that have a large degree. And so this is the probability that you have one degree, and here’s the probability of a large degree. And this is what that preferential [inaudible] gives you. This sort of long tail. This is something that’s called a long-tailed distribution. And distribution’s always gonna occur when you have a preferential attachment rule. Now, which nodes end up being large degree and which ones don’t, that ends up being somewhat path dependent. But. The distribution itself is not what you get these long tails. And what that means is there’s a big skew, some sites have lots of, look at the internet, the World Wide Web. Some sites have lots of links, and other sites have very, very few links. Alright. So, that’s some sense of the logic of networks. If you can stuff random connections, then you get this interesting tipping point phenomenon. If you have a small world connect network, then as people have more random friends, you get a lot less clustering, but you get shorter social distances between people. And finally, from the preferential attachment rule, what we end up getting is we get this long-tailed degree distribution where most nodes have very, very low degrees, but a handful of nodes have a really, really high degree.
我们将重点关注网络形成过程中的三种非常简单的方式。第一种是随机连接,即每个节点随机决定与其他节点建立连接。第二种是小世界模型。其运作方式如下:每个人都会有一些属于某个小圈子的朋友,他们彼此距离较近,还有一些随机结识的朋友。所以,我们先从人们只与身边的人建立联系开始。然后假设它们在某种程度上重新连接,随机地与在社交空间中距离更远的一些人建立联系。因为很多社交网络就是这样的。接下来我们要研究的是所谓的优先连接网络。这种网络被用来描述互联网和万维网。其理念是这样的:你更有可能与那些连接更多的节点建立连接。这在万维网上确实如此。比如,它会链接到谷歌,或者链接到一些知名大学,因为它们比我们连接得更多,所以更有可能与它们建立连接,而不是与那些连接较少的节点建立连接。我们将会看到这种微观层面的规则如何在全球范围内产生网络的特性,产生一些可能未曾预料到的结构特征。
好的,那我们开始吧。
随机网络,做法如下。假设存在 N 个节点,p 是任意两个节点之间建立连接的概率。所以,你先画出一堆节点,就像这样。然后以一定的概率在节点之间随机建立连接,这样就得到了一个随机图。这种网络已经被研究过了。它们有时被称为干旱雨季网络。这些网络已被深入研究,这里有一个非常有趣的结果。事实证明,会出现一个情境临界点。对于一个大型网络来说,这意味着节点数量众多。如果 P(连接的概率)大于 1/N – 1,网络几乎总是会连通。所以你可以得到这样的图。这是连接的概率,这是我们关心的一个指标。这是 1/N – 1。你得到的图会是这样的。会出现一个临界点。所以对于大型网络来说,如果 P 小于 1/N – 1,那么几乎可以肯定网络是不连通的。而一旦 P 大于 1/N – 1,几乎可以肯定网络是完全连通的。这很酷。网络中的临界点。不过这些随机图,你可以证明很多关于随机图的漂亮结果,但问题是,真实的社交网络并非如此。还记得詹姆斯·穆迪的中学图吗?那看起来并不随机。还有来自友好汉姆研究的成人网络图,那看起来也不太随机,还有这个电子邮件网络,看起来也不随机。所以,我们想做的是构建一个更像真实网络的网络。
这就引出了小世界图的概念,这是由网络理论家邓肯·沃茨和史蒂夫·斯特罗加茨提出的。这就是小世界网络背后的理念。你有一部分朋友是本地的,或者就在你附近,比如点击好友。然后你还有其他比较随机的朋友。所以,你的本地或点击好友可能是你一起工作的人、住在你附近的邻居之类的人。而你的随机朋友可能是你夏令营时的伙伴,或者大学时的老室友等等。所以,如果你观察现实世界中的社交网络,你会发现其中有一些群体的人相互交流,然后每个人在其他地方也有一些随机的朋友,而这些随机朋友也多少属于某个群体。那么,让我们来看看这个网络是如何运作的。现在我们要打开一个 NetLogo 模型。我们可以设置一下,把人放在一个圈里。现在要注意的是,在这个圈子里,每个人与两边的人相连,总共有 40 个节点,我可以开始一个接一个地重新连接这些节点,让它们随机连接的节点越来越多。
然后我可以问,这个图会发生什么变化。现在请注意这个图上的聚类系数,以及随机重新连接节点,然后我能做的就是问,这个图发生了什么变化。现在请注意这个图,您可以看到聚类系数,还有这里,您可以看到平均路径长度。您看到的这个图,上面左上角的绿色线,就是聚类系数。红色线表示平均路径长度。所以当我重新连接时,您可以看到平均路径长度在下降,但聚类系数也在下降。让我们再做一次。我会坐在这里重新连接。请注意聚类系数一开始是 0.5,平均路径长度是 5.4。现在当我开始重新连接时,聚类系数下降了,这是因为之前每个人都只与周围的邻居相连,所以有很多簇。所以聚类系数下降了。而平均路径长度也在快速下降,不到 2.8,2.7 等等。所以每次运行这个程序时,我们都会看到聚类系数和平均路径长度在构建这个小世界网络时都下降了。所以,我们得到的结果是,人们拥有的随机朋友更多,聚类程度更低,而且人与人之间的平均距离也更短。这就是“小世界模型”。
我们会在下一讲再稍微讲讲这个模型。因为“小世界模型”或者说“小世界网络”实际上能让我们对社交网络系统有更多了解,它也能帮助我们解释我在第一讲中提到的“六度分隔”现象。
现在,我们来看节点之间无序连接的第三种方式,这被称为“优先连接”。在优先连接中,节点出现后,它们会决定要连接谁。想象一下你搬到一个新城市,或者想象你创建了一个网页。那么,你的网页就是你的节点。现在,你得决定要连接哪些其他网页?我们假设连接到一个已存在节点的概率与该节点已有的连接数成正比。这里有个例子。这是一个网络,然后有一个新节点加入进来。它正在思考,连接到这些节点中的每一个的可能性有多大?那么这个节点只有一个网络连接,一条边。这个节点有四条边。所以,当这个节点加入时,它连接到这里的可能性是连接到那里的四倍。因此,与某个节点建立连接的概率与它已有的节点数量成正比。让我们来看一个 Net Logo 模型中的优先连接。现在,我来设置一下,接下来会发生的是,我将让它运行一次。我每次只添加一个节点,运行一会儿。你们现在看到的这些不同图表。上面的图表显示的是度分布。所以有一个节点,这些是只有一条边连接到它的节点,而这里这些节点中有一个节点的度为六,下面这些图会以对数对数图的形式绘制这种重新分配。让我们让这个程序运行起来。这总是很有趣的视频,因为你可以看到这些节点被添加并建立新的连接。我想让你们注意两件事。
让我暂停一下。我想让你们注意两件事。首先,有一些节点与其他节点的连接要多得多,这在度分布中有所体现。有很多节点只有一个连接,然后有少数几个节点,其中一个有 23 个连接。现在我可以根据连接数量调整节点大小,这又给你提供了一种方式来观察这种度分布总是偏斜的。所以有些节点的度非常低,而有些节点的度则很高。如果我让这个程序继续运行,你会看到度分布仍然偏斜,就像有些节点有很多连接,而有些节点则几乎没有连接。所以这种优先连接规则导致了这种非常偏斜的度分布,这很有趣。
我们已经看到这种情况发生过一次,但为了证明每次都会发生,让我们再运行一次,这次我会打开调整节点大小的按钮,你又会看到同样的度分布出现,其中一些子节点高度连接,而其他节点连接较少,但我们得到的是一个完全不同的网络。有趣的是,分布特性即使每次网络的具体情况会有所不同,但分布特性是一致的。那么,猜猜怎么着?这意味着它是路径依赖的。所以你最终得到的具体网络会受到路径的影响。但度数的均衡分布实际上不会受到路径的影响。你几乎总是会得到完全相同的度数分布。不过你得到的网络会很大程度上取决于具体的历史情况。所以我们得到了某种程度上取决于路径的结果,但在度分布方面,我们并没有得到路径依赖的均衡。所以我们得到了某种程度上取决于路径的结果,但在度分布方面,我们并没有得到路径依赖的均衡。