Assignment 1

Insight on Technology Definitions: Spider webs A spider is a software program that travels the Web, locating and indexing
websites for search engines. All the major search engines, such as Google and Yahoo!, use spiders to build and update their indexes. These programs constantly browse the Web, traveling from one hyperlink to another. For example, when a spider visits a website's home page, there may be 30 links on the page. The spider will follow each of the links, adding all the pages it finds to the search engine's index. Of course, the new pages that the spider finds may also have links, which the spider continues to follow. Some of these links may point to pages within the same website (internal links), while others may lead to different sites (external links). The external links will cause the spider to jump to new sites, indexing even more pages. Because of the interwoven nature of website links, spiders often return to websites that have already been indexed. This allows search engines to keep track of how many external pages link to each page. Usually, the more incoming links a page has, the higher it will be ranked in search engine results. Spiders not only find new pages and keep track of links, they also track changes to each page, helping search engine indexes stay up to date. Spiders are also called robots and crawlers, which may be preferable for those who are not fond of arachnids. The word "spider" can also be used as verb, such as, "That search engine finally spidered my website last week." Bow Ties
Scale-Free Networks A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as P(k) \ \sim \ k^\boldsymbol{-\gamma} where \gamma is a parameter whose value is typically in the range 2 < \gamma < 3, although occasionally it may lie outside these bounds.[1][2] Many networks are conjectured to be scale-free, including World Wide Web links, biological networks, and social networks, although the scientific community is still discussing these claims as more sophisticated data analysis techniques become available.[3] Preferential attachment and the fitness model have been proposed as mechanisms to explain conjectured power law degree distributions in real networks. In studies of the networks of citations between scientific papers, Derek de Solla Price showed in 1965 that the number of links to papersi.e., the number of citations they receivehad a heavy-tailed distribution following a Pareto distribution or power law, and thus that the citation network is scale-free. He did not however use the term "scale-free network", which was not coined until some decades later. In a later paper in 1976, Price also proposed a mechanism to explain the occurrence of power laws in citation
networks, which he called "cumulative advantage" but which is today more commonly known under the name preferential attachment. Recent interest in scale-free networks started in 1999 with work by Albert-Lszl Barabsi and colleagues at the University of Notre Dame who mapped the topology of a portion of the World Wide Web,[4] finding that some nodes, which they called "hubs", had many more connections than others and that the network as a whole had a power-law distribution of the number of links connecting to a node. After finding that a few other networks, including some social and biological networks, also had heavy-tailed degree distributions, Barabsi and collaborators coined the term "scale-free network" to describe the class of networks that exhibit a power-law degree distribution. Amaral et al. showed that most of the real-world networks can be classified into two large categories according to the decay of degree distribution P(k) for large k. Barabsi and Albert proposed a generative mechanism to explain the appearance of power-law distributions, which they called "preferential attachment" and which is essentially the same as that proposed by Price. Analytic solutions for this mechanism (also similar to the solution of Price) were presented in 2000 by Dorogovtsev, Mendes and Samukhin [5] and independently by Krapivsky, Redner, and Leyvraz, and later rigorously proved by mathematician Bla Bollobs.[6] Notably, however, this mechanism only produces a specific subset of networks in the scale-free class, and many alternative mechanisms have been discovered since.[7]
The Deep Web The Deep Web (also called the Deepnet, the Invisible Web, the Undernet or the hidden Web) is World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines. It should not be confused with the dark Internet, the computers that can no longer be reached via Internet, or with a Darknet distributed filesharing network, which could be classified as a smaller part of the Deep Web. The deep Web is the part of the Internet that is inaccessible to conventional search engine s, and consequently, to most users. According to researcher Marcus P. Zillman of DeepWebResearch.info, as of January 2006, the deep Web contained somewhere in the vicinity of 900 billion pages of information. In contrast, Google, the largest search engine, had indexed just 25 billion pages. Deep Web content includes information in private databases that are accessible over the Internet but not intended to be crawled by search engines. For example, some universities, government agencies and other organizations maintain databases of information that were not created for general public access. Other sites may restrict database access to members or subscribers. The term, "deep Web," was coined by BrightPlanet, an Internet search technology company that specializes in searching deep Web content. In their 2001 white paper, 'The Deep Web: Surfacing Hidden Value,' BrightPlanet noted that the deep Web was growing much more quickly than the surface Web and that the quality of the content within it was significantly higher than the vast majority of surface Web content. Although some of the content is not open to the general public, BrightPlanet estimates that 95% of the deep Web can be accessed through specialized search.
Name: Sami Elkurdi Student #: 20122608
Course: E-Commerce Code: MIS537
Assignment
Questions for Discussion: What is the small world theory of the web? Its about Connecting people around the globe via the web. In 1968, sociologist Stanley Milgram invented small-world theory for social networks by noting that every human was separated from any other human by only six degrees of separation. On the Web, the small world theory was supported by early research on a small sampling of Web sites. And in 2008 Microsoft researchers announced the theory was right - nearly. By studying billions of electronic messages, they worked out that any two strangers are, on average, distanced by precisely 6.6 degrees of separation. In other words, putting fractions to one side, you are linked by a string of seven or fewer acquaintances to Madonna, the Dalai Lama and the Queen. In the small world theory of the Web, every Web page is thought to be separated from any other Web page by an average of about 19 clicks. What is the significance of the Bow - Tie for the web? The Web is shaped uncannily like a big bow tie, according to new research by scientists at IBM, Compaq and AltaVista. It is hoped the research, which defines how travels around the Web traffic, will lead to improved methods of searching the Net, as well as better e-commerce strategies. The researchers discovered that the Web was not like a spider web at all, but rather like a bow tie. The bow-tie Web had a strongly connected component (SCC) composed of about 56 million Web pages. On the right side of the bow tie was a set of 44 million OUT pages that you could get to from the center, but could not return to the center from. Finally, there were 16 million pages totally disconnected from everything. Why does Barabsi call the web a scale-free network with very connected super nodes? Barabasis team found that; the web far from being a random, exponentially exploding network of 8 billion Web pages, activity on the Web was actually highly concentrated in very connected super nodes that provided the connectivity to less well-connected nodes. Barabasi dubbed this type of network a scale-free network and found parallels in the growth of cancers, disease transmission, and computer viruses. As its turns out, scale-free networks are highly vulnerable to destruction. Destroy their super nodes and transmission of messages breaks down rapidly.

Assignment 1

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Assignment 1

Enviado por

Direitos autorais:

Formatos disponíveis

Insight on Technology Definitions: Spider webs A spider is a software program that travels the Web, locating and indexing

Name: Sami Elkurdi Student #: 20122608

Course: E-Commerce Code: MIS537

Você também pode gostar