Escolar Documentos
Profissional Documentos
Cultura Documentos
INTRODUCTION
1.1 Overview
BitTorrent[5] is a peer-to-peer file sharing protocol used to distribute large amounts
of data. BitTorrent is one of the most common protocols for transferring large files. Its main
usage is for the transfer of large sized files. It makes transfer of such files easier by
implementing a different approach. A user can obtain multiple files simultaneously without
any considerable loss of the transfer rate. It is said to be a lot better than the conventional file
transfer methods because of a different principle that is followed by this protocol. It also
evens out the way a file is shared by allowing a user not just to obtain it but also to share it
with others. This is what has made a big difference between this and the conventional file
transfer methods. It makes a user to share the file he is obtaining so that the other users who
are trying to obtain the same file would find it easier and also in turn making these users to
involve themselves in the file sharing process. Thus the larger the number of users the more
is the demand and more easily a file can be transferred between them.
BitTorrent Protocol[5] has been built on a technology which makes it
possible to distribute large amounts of data without the need of a high
capacity server, and expensive bandwidth. This is the most striking
feature of this file transfer protocol. The transferring of files will never
depend on a single source which is supposed the original copy of the file
but instead the load will be distributed across a number of such sources.
Here not just the sources are responsible for file transfer but also the
clients or users who want to obtain the file are involved in this process.
This makes the load get distributed evenly across the users and thus
making the main source partially free from this process which will reduce
the network traffic imposed on it. Because of this, BitTorrent has become
one of the most popular file transfer mechanisms in today’s world. Though
the mechanism itself is not as simple as an ordinary file transfer protocol,
it has gained its popularity because of the sharing policy that it imposes
1
on its users. This fact is quite obvious, since the recent surveys made by
various organizations show that 35% of the overall internet traffic is
because of BitTorrent. This shows that the amount of files that are being
transferred and shared by users through BitTorrent is very huge.
1.2 History
BitTorrent was created by a programmer named Bram Cohen. After inventing this
new technology he said, "I decided I finally wanted to work on a project that people would
actually use, would actually work and would actually be fun". Before this was invented, there
were other techniques for file sharing but they were not utilizing the bandwidth effectively.
The bandwidth had become a bottleneck in such methods. Even other peer to peer file sharing
systems like Napster and Kazaa had the capability of sharing files by making the users
involve in the sharing process, but they required only a subset of users to share the files not
all. This meant that most of the users can simply download the files without being needed to
upload. So this again put a lot of network load on the original sources and on small number of
users. This led to inefficient usage of bandwidth of the remaining users. This was the main
intention behind Cohen’s invention, i.e., to make the maximum utilization of all the users’
bandwidth who are involved in the sharing of files. By doing so, every person who wants to
download a file had to contribute towards the uploading process also. This new and novel
concept of Cohen gave birth to a new peer to peer file sharing protocol called BitTorrent.
Cohen invented this protocol in April 2001. The first usable version of BitTorrent appeared in
October 2002, but the system needed a lot of fine-tuning. BitTorrent really started to take off
in early 2003 when it was used to distribute a new version of Linux and fans of Japanese
anime started relying on it to share cartoons. The most important part of this protocol that
matters a lot about this is that it makes it possible for people with limited bandwidth to supply
very popular files. This means that if you are a small software developer you can put up a
package, and if it turns out that millions of people want it, they can get it from each other in
an automated way. Thus the bandwidth which used to be a bottleneck in previous systems no
longer poses a problem.
2
CHAPTER-2
BitTorrent AND OTHER APPROACHES
3
discussed above of popular downloads is somewhat mitigated, because there's a greater
chance that a popular file will be offered by a number of peers. The breadth of files available
tends to be fairly good, though download speeds for obscure files tend to be low. Another
common problem sometimes associated with these systems is the significant protocol
overhead for passing search queries amongst the peers, and the number of peers that one can
reach is often limited as a result. Partially downloaded files are usually not available to other
peers, although some newer clients may offer this functionality. Availability is generally
dependent on the goodwill of the users, to the extent that some of these networks have tried to
enforce rules or restrictions regarding send/receive ratios.
Use of the Usenet binary newsgroups is yet another method of file distribution, one
that is substantially different from the other methods. Files transferred over Usenet are often
subject to miniscule windows of opportunity. Typical retention times of binary news servers
are often as low as 24 hours, and having a posted file available for a week is considered a
long time. However, the Usenet model is relatively efficient, in that the messages are passed
around a large web of peers from one news server to another, and finally fanned out to the
end user from there. Often the end user connects to a server provided by his or her ISP,
resulting in further bandwidth savings. Usenet is also one of the more anonymous forms of
file sharing, and it too is often used for illicit files of almost any nature. Due to the nature
of NNTP, a file's popularity has little to do with its availability and hence downloads from
Usenet tend to be quite fast regardless of content. The downside of this method includes a set
of rules and procedures, and requires a certain amount of effort and understanding from the
user. Patience is often required to get a complete file due to the nature of splitting big files
into a huge number of smaller posts. Finally, access to Usenet often must be purchased due to
the extremely high volume of messages in the binary groups.
BitTorrent is closest to Usenet. It is best suited to newer files, of which a number of
people have interest in. Obscure or older files tend to not be available. Perhaps as the
software matures a more suitable means of keeping torrents seeded will emerge, but currently
the client is quite resource-intensive, making it cumbersome to share a number of files.
BitTorrent also deals well with files that are in high demand, especially compared to the other
methods.
4
The most common type of file transfer is through a HTTP server[1]. In this method, a
HTTP server listens to the client’s requests and serves them. Here the client can only depend
on the lone server that is providing the file. The overall download scheme will be limited to
the limitations of that server. Also this kind of transfer of file is subjected to single point of
failure, where if the server crashes then the whole download process will seize. A single
server can handle many such clients and serve the requested file simultaneously to all the
clients. The file being served will be available as one single piece, which means that if the
download process stops abruptly in the middle the whole file has to be downloaded again.
BitTorrent Protocol[5][5] has overcome all these shortcomings seen in this type
and thus it is more robust due to which it is chosen by many people over this
traditional method of file transfer.
5
identifies available mirror sites that host the requested file. As soon as it is
triggered, DAP's client side optimization begins to determine - in real time - which mirror
sites offer the fastest response for the specific user's location. The file is downloaded in
several segments simultaneously through multiple connections from the most responsive
server(s) and reassembled at the user's PC. This results in better utilization of the user's
available bandwidth. This ensures that each available mirror server is utilized to serve the
users that most benefit. This in turn effects an efficient balancing of the load among available
servers across the entire World Wide Web, and reduces download times for users while
allowing them to receive maximum benefit from their available bandwidth. DAP's[2] Resume
functionality and the ability to continue downloading even when one of the participating
connections has dropped also provides users with a more reliable download experience.
6
Figure 2.4: BitTorrent File Transfer
Each client independently sends a file, called a torrent, that contains the location of
the tracker along with a hash of each piece. Clients keep each other updated on the status of
their download. Clients download blocks from other (randomly chosen) clients who claim
they have the corresponding data. Accordingly, clients also send data that they have
previously downloaded to other clients. Once a client receives all the blocks for a given
piece, he can verify the hash of that piece against the provided hash in the torrent. Thus once
a client has downloaded and verified all pieces, he can be confident that he has the complete
data.
Both BitTorrent and DAP[2] download files from multiple sources. Also the files are
divided into pieces in both approaches. But BitTorrent has many such features that DAP
doesn’t, which has made it the most popular one. In BitTorrent the users participate actively
in sharing files along with servers. This is the uniqueness of this protocol. Also this needs an
implementation of a dedicated server called tracker to handle the peers connected in the
network. The file transfer in DAP takes place through the traditional HTTP or FTP protocol
which means that the transfer rate will always be limited by the server’s bandwidth. If these
servers are flooded with requests then the breakdown and the transaction will terminate. This
is not the case in BitTorrent since the whole process is not depending on servers alone. The
load is distributed across the network between peers and servers. This makes BitTorrent far
better than its competing peers like DAP and others.
7
CHAPTER-3
WORKING OF BitTorrent
8
Figure 3.1: A Typical BitTorrent System
The protocol[3] shares data through what are known as torrents. For a torrent to be
alive or active it must have several key components to function. These components include a
tracker server, a .torrent file, a web server where the .torrent file is stored and a complete
copy of the file being exchanged. Each of these components is described in the following
paragraphs.
The file being exchanged is the essence of the torrent and a complete copy is referred
to as a seed. A seed is a peer in the BitTorrent network willing to share a file with other peers
in the network. Why seed owners choose to share their files is debatable, as the BitTorrent
Protocol[5] does not reward seed behavior. In fact, some researchers believe the protocol
lacks any incentive mechanism for encouraging seeds to remain in torrents. Some argue that
the lack of incentive in the protocol is a fundamental design flaw that leads to the punishment
of seeds.
Peers lacking the file and seeking it from seeds are called leechers. While seeds only
upload to leechers, leechers may both download from seeds and upload to other leechers.
BitTorrent’s protocol is designed so leeching peers seek each other out for data transfer in a
process known as “optimistic unchoking”. Together seeds and leechers engaged in file
transfer are referred to as a swarm. A swarm is coordinated by a tracker server serving the
particular torrent and interested peers find the tracker[6] via metadata known as a .torrent file.
9
Since BitTorrent has no built in search functionality, .torrent files are usually located via
HTTP through search engines or trackers.
The first step in the BitTorrent exchange occurs when a peer downloads a .torrent file
from a server. The role of .torrent files is to provide the metadata that allows the protocol to
function; .torrent files can be viewed as surrogates for the files being shared. These .torrent
files contain key pieces of data to function correctly including file length, assigned name,
hashing information about the file and the URL of the tracker[3] coordinating the torrent
activity. Torrent files can be created using a program such as MakeTorrent, another open
source tool available under the free software model.
When a .torrent file is opened by the peer’s client software, the peer then connects to
the tracker server responsible for coordinating activity for that specific torrent. The tracker
and client communicate by a protocol layered on top of HTTP[3] and the tracker’s key role is
to coordinate peers seeking the same file for Cohen envisioned “The tracker’s responsibilities
are strictly limited to helping peers find each other”. In reality the tracker’s role is a bit more
complex as many trackers collect data about peers engaged in a swarm. Additionally, some of
the newer tracker software being released has integrated the functions of the tracker and
.torrent server.
Leechers[7] and seeds are coordinated by the tracker server and the peers periodically
update the tracker on their status allowing the tracker to have a global view of the system.
The data monitored by the tracker can include peer IP addresses, amount of data
uploaded/downloaded for specific peers, data transfer rates among peers, the percentage of
the total file downloaded, length of time connected to the tracker, and the ratio of sharing
among peers. Usually a tracker coordinates multiple torrents and the most popular trackers
are busy coordinating thousands of swarms simultaneously.
It should be noted that .torrent files are not the actual file being shared; rather .torrent
files are the metadata information which allow which trackers and peers to coordinate their
activities. As previously mentioned, the complete file is actually stored on peer seed nodes
and not the tracker server. Since .torrent files are small and require little space to store, one
server can easily host thousands of .torrent files without prohibitive server or bandwidth
requirements. There is some issue with bandwidth usage to host a tracker, however,
especially if the tracker becomes popular and begins to see heavy usage. Regardless, the
10
tracker’s bandwidth requirements are much less than hosting the complete files in a
traditional client-server model such as one would encounter with an FTP[3] site.
While trackers and .torrent files serve as mechanisms to assist the BitTorrent Protocol[5], the
process of actually transferring data is handled by the peers engaged in the swarm. Cohen’s
vision of “tit for tat” is the sole incentive measure he saw necessary for the protocol’s
success. Peers seek tit for tat behavior from others and discourage free riding by a
“choke/unchoke” policy. This choke policy uses a process known as “optimistic unchoking”
to constantly seek other swarm peers who may have more beneficial connections to offer an
interested peer.
There has been some research of the tit for tat algorithm by modeling rational users
whose behavior is then studied. This work defined rational users as those peer nodes
manipulating their client software beyond default settings. The fact that many newer
BitTorrent clients allow for custom tweaking of specific upload or download speed indicates
that perhaps the original tit for tat coding was too good, and thus detrimental to other peer
node functions such as normal HTTP[3] traffic. Some BitTorrent FAQs recommend limiting
uploads to approximately 80% of known capacity and personal tests indicate this strategy
does benefit download speeds.
The final important aspect of the BitTorrent Protocol[5]’s architecture is its use of a
“rarest piece first” algorithm when a peer begins a file download. The rarest first algorithm
has as its goal the uniform distribution of data across peers, also known as the “endgame
mode”. A rarest first policy requires a seed to upload new file chunks (those not yet uploaded
to a swarm) to the newest peer connecting to a torrent. This policy encourages distribution of
the file further across peer nodes.. The rarest first algorithm is an interesting aspect of
BitTorrent that when combined with optimistic unchoking may explain why the protocol has
achieved such success.
11
CHAPTER-4
TERMINOLOGY
These are the common terms that one would come across while making a typical
BitTorrent file transfer.
Torrent: this refers to the small metadata file you receive from the web server
(the one that ends in .torrent.) Metadata here means that the file contains
information about the data you want to download, not the data itself.
Peer: A peer is another computer on the internet that you connect to and
transfer data. Generally a peer does not have the complete file.
Leeches: They are similar to peers in that they won’t have the complete file.
But the main difference between the two is that a leech will not upload once
the file is downloaded.
Seed: A computer that has a complete copy of a certain torrent. Once a client
downloads a file completely, he can continue to upload the file which is called
as seeding. This is a good practice in the BitTorrent world since it allows other
users to have the file easily.
12
Reseed: When there are zero seeds for a given torrent, then eventually all the
peers will get stuck with an incomplete file, since no one in the swarm has the
missing pieces. When this happens, a seed must connect to the swarm so that
those missing pieces can be transferred. This is called reseeding.
Swarm: The group of machines that are collectively connected for a particular
file.
Tracker: A server on the Internet that acts to coordinate the action of
BitTorrent clients. The clients are in constant touch with this server to know
about the peers in the swarm.
Share ratio: This is ratio of amount of a file downloaded to that of uploaded. A
ratio of 1 means that one has uploaded the same amount of a file that has been
downloaded.
Distributed copies: Sometimes the peers in a swarm will collectively have a
complete file. Such copies are called distributed copies.
Interested: This is the state of a downloader which suggests that the other end
has some pieces that the downloader wants. Then the downloader is said to be
interested in the other end.
Snubbed: If the client has not received anything after a certain period, it marks
a connection as snubbed, in that the peer on the other end has chosen not to
send in a while.
Optimistic unchoking: Periodically, the client shakes up the list of uploaders
and tries sending on different connections that were previously choked, and
choking the connections it was just using. This is called optimistic unchoking.
13
CHAPTER-5
ARCHITECTURE OF BitTorrent
The BitTorrent Protocol[5] can be split into the following five main components:
Metainfo File - a file which contains all details necessary for the protocol to operate.
Tracker - A server which helps manage the BitTorrent Protocol[5].
Peers - Users exchanging data via the BitTorrent Protocol[5].
Data - The files being transferred across the protocol.
Client - The program which sits on a peer’s computer and implements the protocol.
Peers use TCP (Transport Control Protocol) to communicate and send data. This
protocol is preferable over other protocols such as UDP (User Datagram Protocol) because
TCP[3] guarantees reliable and in-order delivery of data from sender to receiver. UDP cannot
give such guarantees, and data can become scrambled, or lost all together. The tracker allows
peers to query which peers have what data, and allows them to begin communication. Peers
communicate with the tracker via the plain text via HTTP (Hypertext Transfer Protocol) The
14
following diagram illustrates how peers interact with each other, and also communicate with
a central tracker.
When someone wants to publish data using the BitTorrent Protocol[5], they must create a
metainfo file. This file is specific to the data they are publishing, and contains all the
information about a torrent, such as the data to be included, and IP address of the tracker to
connect to. A tracker is a server which 'manages' a torrent, and is discussed in the next
section. The file is given a '.torrent' extension, and the data is extracted from the file by a
BitTorrent client. This is a program which runs on the user computer, and implements the
BitTorrent Protocol[5]. Every metainfo file must contain the following information, (or
'keys'):
15
• info: A dictionary which describes the file(s) of the torrent. Either for the single file,
or the directory structure for more files. Hashes for every data piece, in SHA 1 format
are stored here.
• announce: The announce URL of the tracker as a string
Instead of transmitting the keys in plain text format, the keys contained in the
metainfo file are encoded before they are sent. Encoding is done using bittorrent specific
method known as 'bencoding'.
5.1.1 Bencoding:
16
Bencoding Structure:
Minus integers are allowed, but prefixing the number with a zero is not permitted.
However '0' is allowed.
Examples of bencoding:
Because all information which is needed for the torrent is included in a single file, this
file can easily be distributed via other protocols, and as the file is replicated, the number of
peers can increase very quickly. The most popular method of distribution is using a public
indexing site which hosts the metainfo files. A seed will upload the file, and then others can
download a copy of the file over the HTTP[3] protocol and participate in the torrent.
5.2 Tracker
17
start communication, i.e. to find peers with the data they require. Peers know nothing of each
other until a response is received from the tracker. Whenever a peer contacts the tracker, it
reports which pieces of a file they have. That way, when another peer queries the tracker, it
can provide a random list of peers who are participating in the torrent, and have the required
piece.
A tracker is a HTTP/HTTPS service and typically works on port 6969. The address of
the tracker managing a torrent is specified in the metainfo file, a single tracker can manage
multiple torrents. Multiple trackers can also be specified, as backups, which are handled by
the BitTorrent client running on the user’s computer. BitTorrent clients communicate with
the tracker using HTTP GET requests, which is a standard CGI method. This consists of
appending a "?" to the URL, and separating parameters with a "&".
18
• info_hash: 20-byte SHA1 hash of the info key from the metainfo file.
• peer_id: 20-byte string used as a unique ID for the client.
• port: The port number the client is listed on.
• uploaded: The total amount uploaded since the client sent the 'started' event to the
tracker in base ten ASCII.
• downloaded: The total amount downloaded since the client sent the 'started' event to
the tracker in base ten ASCII.
• left: The number of bytes the client till has to download, in base ten ASCII.
• compact: Indicates that the client accepts compacted responses. The peer list can then
be replaced by 6 bytes per peer. The first 4 bytes are the host, and the last 2 bytes are
port.
• event: If specified, must be one of the following: started, stopped, completed.
• ip: (optional) The IP address of the client machine, in dotted format.
• numwant: (optional) The number of peers the client wishes to receive from the
tracker.
• key: (optional) allows a client to identify itself if their IP address changes.
• trackerid: (optional) If previous announce contained a tracker id, it should be set
here.
The tracker then responds with a "text/plain" document with the following keys:
• failure message: If present, then no other keys are included. The value is a human
readable error message as to why the request failed.
• warning message: Similar to failure message, but response still gets processed.
• interval: The number of seconds a client should wait between sending regular
requests to the tracker.
• min interval: Minimum announce interval.
• tracker id: A string that the client should send back with its next announces.
• complete: Number of peers with the complete file.
• incomplete: number of non-seeding peers (leechers)
• peers: A list of dictionaries including: peer id, IP and ports of all the peers.
19
5.2.1 Scraping
Scraping[7] is the process of querying the state of a given torrent (or all torrents) that
the tracker is managing. The result is known as a "scrape page". To get the scrape, you must
start with the announce URL, find the last '/' and if the text immediately following the '/' is
'announce', then this can be substituted for 'scrape' to find the scrape page.
Examples:
http://example.com/annnounce http://example.com/scrape
http://example.com/a/annnounce http://example.com/a/scrape
http://example.com/announce.php http://example.com/scrape.php
The tracker then responds with a "text/plain" document with the following bencoded
keys:
• files: A dictionary containing one key pair for each torrent. Each key is made up of a
20-byte binary hash value. The value of that key is then a nested dictionary with the
following keys:
• complete: number of peers with the entire file (seeds)
• downloaded: total number of times the entire file has been downloaded.
• incomplete: the number of active downloaders (lechers)
• name: (optional) the torrent name
5.3 Peers
Peers are other users participating in a torrent, and have the partial file, or the
complete file (known as a seed). Pieces are requested from peers, but are not guaranteed to be
sent, depending on the status of the peer. BitTorrent uses TCP (Transmission Control
20
Protocol) ports 6881-6889 to send messages and data between peer, and unlike other
protocols, do not use UDP (User Datagram Protocol)
Peers[7] continuously queue up the pieces for download which they require.
Therefore the tracker is constantly replying to the peer with a list of peers who have the
requested pieces. Which piece is requested depends upon the BitTorrent client. There are
three stages of piece selection, which change depending on which stage of completion a peer
is at.
When downloading first begins, as the peer has nothing to upload, a piece is selected
at random to get the download started. Random pieces are then chosen until the first piece is
completed and checked. Once this happens, the 'rarest first' strategy begins.
When a peer selects which piece to download next, the rarest piece will be chosen
from the current swarm, i.e. the piece held by the lowest number of peers. This means that the
most common pieces are left until later, and focus goes to replication of rarer pieces.
At the beginning of a torrent, there will be only one seed with the complete file. There
would be a possible bottle neck if multiple downloaders were trying to access the same piece.
rarest first avoids this because different peers have different pieces. As more peers connect,
rarest first will the some load off of the tracker, as peers begin to download from one another.
Eventually the original seed will disappear from a torrent. This could be because of
cost reasons, or most commonly because of bandwidth issues. Losing a seed runs the risk of
pieces being lost if no current downloaders have them. Rarest first works to prevent the loss
of pieces by replicating the pieces most at risk as quickly as possible. If the original seed goes
before at least one other peer has the complete file, then no one will reach completion, unless
a seed re-connects.
21
5.3.4 Endgame Mode
When a download nears completion, and waiting for a piece from a peer[2] with slow
transfer rates, completion may be delayed. To prevent this, the remaining sub-pieces are
request from all peers in the current swarm.
The role of the tracker ends once peers have 'found each other'. From then on,
communication is done directly between peers, and the tracker is not involved. The set of
peers a BitTorrent[5] client is in communication with is known as a swarm.
To maintain the integrity of the data which has been downloaded, a peer does not
report that they have a piece until they have performed a hash check with the one contained
in the metainfo file.
Peers will continue to download data from all available peers that they can, i.e. peers
those posses the required pieces. Peers can block others from downloading data if necessary.
This is known as choking.
5.3.6 Choking
When a peer receives a request for a piece from another peer, it can opt to refuse to
transmit that piece. If this happens, the peer is said to be choked[6]. This can be done for
different reasons, but the most common is that by default, a client will only maintain a default
number of simultaneous uploads (max_uploads) All further requests to the client will be
marked as choked. Usually the default for max_uploads is 4.
22
Fig 5.3: Choking by a peer
The peer will then remain choked until an unchoke message is sent. Another example
of when a peer is choked would be when downloading from a seed, and the seed requires no
pieces. To ensure fairness between peers, there is a system in place which rotates which peers
are downloading. This is known as optimistic unchoking[6].
To ensure that connections with the best data transfer rates are not favoured, each peer
has a reserved 'optimistic unchoke' which is left unchoked regardless of the current transfer
rate. The peer which is assigned to this is rotated every 30 seconds. This is enough time for
the upload / downloads rates to reach maximum capacity.
The peer[2]s then cooperate using the tit for tat strategy, where the downloader
responds in one period with the same action the uploader used in the last period.
Peers who are exchanging data are in constant communication. Connections are
symmetrical, and therefore messages can be exchanged in both directions. These messages
are made up of a handshake[1], followed by a never-ending stream of length-prefixed
messages.
23
5.3.9 Handshaking
1. The handshake starts with character 19 (base 10) followed by the string 'BitTorrent
Protocol[5]'.
2. A 20 byte SHA1 hash of the bencoded info value from the metainfo is then sent. If
this does not match between peers the connection is closed.
3. A 20 byte peer id is sent which is then used in tracker requests and included in peer
requests. If the peer id does not match the one expected, the connection is closed.
This constant stream of messages allows all peers in the swarm to send data, and
control interactions with other peers.
Additional
Prefix Message Structure
Information
Fixed length,
no payload.
This enables a
0 Choke <len=0001><id=0> peer to block
another peers
request for
data.
24
upload will
begin.
Fixed length,
no payload. A
user is
2 interested <len=0001><id=2> interested if a
peer has the
data they
require.
Fixed length,
no payload.
not
3 <len=0001><id=3> The peer does
interested
not have any
data required.
Fixed length.
Payload is the
zero-based
index of the
4 Have <len=0005><id=4><piece index>
piece. Details
the pieces that
peer currently
has.
Sent
immediately
after
handshaking.
Optional, and
only sent if
client has
pieces. Variable
5 bitfield <len=0001+X><id=5><bitfield> length, X is the
length of
bitfield.
Payload
represents
pieces that have
been
successfully
downloaded.
25
Fixed length,
used to request
a block of
pieces. The
payload
6 Request <len=0013><id=6><index><begin><length> contains integer
values
specifying the
index, begin
location and
length.
Sent together
with request
messages.
Fixed length, X
is the length of
the block. The
7 Piece <len=0009+X><id=7><index><begin><block> payload
contains integer
values
specifying the
index, begin
location and
length.
Fixed length,
used to cancel
block requests.
payload is the
8 Cancel <len=13><id=8><index><begin><length> same as
‘request’.
Typically used
during ‘end
game’ mode.
A peer will be 'interested' in data if there is a peer which has the required pieces. If the
peer who has this data is not choked, then data will be transferred. After handshaking[2], by
default, connections start out as choked, and not interested.
26
5.4 Data
BitTorrent[1] is very versatile, and can be used to transfer a single file, of multiple
files of any type, contained within any number of directories. File sizes can vary hugely, from
kilobytes to hundreds of gigabytes.
Data is split into smaller pieces which sent between peers using the BitTorrent
Protocol[5]. These pieces are of a fixed size, which enables the tracker to keep tabs on who
has which pieces of data. This also breaks the file into verifiable pieces; each piece can then
be assigned a hash code, which can be checked by the downloader for data integrity. These
hashes are stored as part of the 'metinfo file' which is discussed in the next section.
The size of the pieces remains constant throughout all files in the torrent except for
the final piece which is irregular. The piece size a torrent[5] is allocated depends on the
amount of data. Piece sizes which are too large will cause inefficiency when downloading
(larger risk of data corruption in larger pieces due to fewer integrity checks), whereas if the
piece sizes are too small, more hash checks will need to be run.
As the number of pieces increase, more hash codes need to be stored in the metainfo
file. Therefore, as a rule of thumb, pieces should be selected so that the metainfo file is no
larger than 50 - 75kb. The main reason for this is to limit the amount of hosting storage and
bandwidth needed by indexing servers. The most common piece sizes are 256kb, 512kb and
1mb. The number of pieces is therefore: total length / piece size. Pieces may overlap file
boundaries.
For example, a 1.4Mb file could be split into the following pieces. This show
5 * 256kb pieces and a final piece of 120kb.
27
5.5 BitTorrent Clients
A metainfo file must be opened by the client to start partaking in a torrent. Once the
file is read, the necessary data is extracted, and a socket must be opened to contact the
tracker. BitTorrent clients use TCP ports 6881-6999. To find an available port, the client will
start at the lowest port, and work upwards until it finds one it can use. This means the client
will only use one port, and opening another BitTorrent[6] client will use another port. A
client can handle multiple torrents running concurrently.
Clients come in many flavors, and can range from basic applications with few features
to very advanced, customizable ones. For example, some advanced features are metainfo file
wizards and inbuilt trackers. These additional features mean different clients behave very
differently, and may use multiple ports, depending on the number of processes it is running.
As all applications implement the same protocol, there are no incompatibility issues, however
because of various tweaks and improvements between clients, a peer may experience better
performance from peers running the same client[6].
28
The tracker protocol is implemented on top of HTTP/HTTPS. This means that the
machine running the tracker runs a HTPP[3] or HTTPS server, and has the behaviour
described below:
1. The client sends a GET request to the tracker URL, with certain CGI variables and
Values added to the URL. This is done in the standard way, i.e., if the base URL is
“http://some.url.com/announce”, the full URL would be of this form:
“http://some.url.com/announce?var1=value1&var2=value2&var3=value3”.
2. The tracker responds with a “text/plain” document, containing a bencoded dictionary.
This dictionary has all the information required for the client.
3. The client then sends re-requests, either on regular intervals, or when an event occurs,
and the tracker responds.
The CGI variables and values added to the base URL by the client sending a GET request
are:
info_hash: The 20 byte SHA1 hash calculated from whatever value the info key maps
to in the metainfo file.
peer_id: A 20 character long id of the downloading client, random generated at start
of every download. There is no formal definition on how to generate this id, but some
client applications have adapted some semiformal standards on how to generate this
id.
ip: This is an optional variable, giving the IP address of the client. This can usually be
extracted from the TCP connection, but this field is useful if the client and tracker are
on the same machine, or behind the same NAT gateway. In both cases, the tracker
then might publish an unroutable IP address to the client.
port: The port number that the client is listening on. This is usually in the range 6881-
6889.
uploaded: The amount of data uploaded so far by the client. There is no official
definition on the unit, but generally bytes are used
left: How much the user has left for the download to be complete, in bytes.
event: An optional variable, corresponding to one of four possibilities:
29
• started: Sent when the client starts the download
• stopped: Sent when the client stops downloading
• completed: Sent when the download is complete. If the download is complete
at start up, this message should not be sent.
• empty: Has the same effect as if the event key is nonexistent. In either case,
the message in question is one of the messages sent with regular intervals.
There are some optional variables that can be sent along with the GET request that are
not specified in the official description of the protocol, but are implemented by some tracker
servers:
numwant: The number of peers the client wants in the response.
key: An identification key that is not published to other peers. peer_id is public, and
is thus useless as authorization. key is used if the peer changes IP number to prove it’s
identity to the tracker.
trackerid: If a tracker previously gave its trackerid, this should be given here.
30
min interval: If present, the client must do rereqests more often than this.
warning message: Has the same information as failure reason, but the other keys in
the dictionary is present.
tracker id: A string identificating the tracker. A client should resend it in the
trackerid variable to the tracker.
complete: This is the number of peers that have the complete file available for upload.
incomplete: The number of peers that not have the complete file yet.
Handshake message
The handshake message consists of five parts:
A single byte, containing the decimal value 19. This is the length of the character
string following this byte.
A character string “BitTorrent Protocol[5]”, which describes the protocol. Newer
protocols should follow this convention to facilitate easy identification of protocols.
Eight reserved bytes for further extension of the protocol. All bytes are zero in current
implementations.
A 20 byte SHA1 hash of the value mapping to the info key in the torrent file. This is
the same hash sent to the tracker in the info_hash variable.
The 20 byte character string representing the peer id. This is the same value sent to
the tracker.
31
If a peer is the first recipient to a handshake, and the info_hash doesn’t match any torrent
it is serving, it should break the connection. If the initiator of the connection receives a
handshake where the peer id doesn’t match with the id received from the tracker, the
connection should be dropped. Each peer needs to keep the state of each connection. The
state consists of two values, interested and choking. A peer can be either interested or not in
another peer, and either choke or not choke the other peer. Choking means that no requests
will be answered, and interested means that the peer is interested in downloading pieces of
the file from the other peer.
This means that each peer needs four Boolean values for each connection to keep track of
the state.
• am_interested
• am_choking
• peer_interested
• peer_choking
All connections start out as not interested and choking[7] for both peers. Clients should
keep the am_interested value updated continuously, and report changes to the other peer. The
messages sent after the handshaking are structured as: [message length as an integer] [single
byte describing message type] [payload] Keep alive messages are sent with regular intervals,
and they are simply a message with length 0, and no type or payload.
Type 0, 1, 2, 3 are choke, unchoke, interested and not interested respectively. All of
them have length 1 and no payload. These messages simply describe changes in state.
Type 4 is a have. This message has length = 5, and a payload that is a single integer,
giving the integer index of which piece of the file the peer has successfully downloaded and
verified.
Type 5 is bitfield. This message is only sent directly after handshake. It contains a
bitfield representation of which pieces the peer has. The payload is of variable length, and
consists of a bitmap, where byte 0 corresponds to piece 0-7, byte 1 to piece 8-15 etc. A bit set
to 1 represents having the piece. Peers that have no pieces can neglect to send this message.
Type 6 is a request. The payload consists of three integers, piece index, begin and
length. The piece index decides within which piece the client wants to download, begin gives
the byte offset within the piece, and length gives the number of bytes the client wants to
download. Length is usually a power of two.
32
Type 7 is a block. This message follows a request. The payload contains piece index,
length and the data itself that was requested. Type 8 is cancel. This message has the same
payload as request messages, and it is used to cancel requests made. Peers should
continuously update their interested status to neighbours, so that clients know which peers
will begin downloading when unchoked.
33
CHAPTER-6
VULNERABILITIES OF BitTorrent
6.1Attacks on BitTorrent
As we have seen so far, BitTorrent[5] is one of most favoured file transfer protocol in
today’s world. But it has been exposed to various attacks in the recent past due to the
vulnerabilities that are being exploited by the hacker community. Here are some of the
attacks that are commonly seen.
Pollution attacks[7] have become increasingly popular and have been used by
anti-piracy groups. In 2005 HBO used pollution attacks to prevent people from downloading
their show Rome.
34
4. The peers then attempt to connect to the victim to try and download a chunk of
the file.
6.2 Solutions
Many of the attacks that BitTorrent[5] suffers have been dealt with and some
measures have been taken to avoid such attacks. Here are a few solutions to the attacks that
were discussed above.
35
list in the torrent. Another measure could be to restrict the size of the tracker list to reduce the
effectiveness of such an attack.
36
CONCLUSION
BitTorrent pioneered mesh-based file distribution that effectively utilizes all the
uplinks of participating nodes. Most followon research used similar distributed and
randomized algorithms for peer and piece selection, but with different emphasis or twists.
This work takes a different approach to the mesh-based file distribution problem by
considering it as a scheduling problem, and strives to derive an optimal schedule that could
minimize the total elapsed time. By comparing the total elapsed time of BitTorrent and CSFD
in a wide variety of scenarios, we are able to determine how close BitTorrent is to the
theoretical optimum. In addition, the study of applicability of BitTorrent to real-time media
streaming applications, shows that with minor modifications, BitTorrent can serve as an
effective media streaming tool as well.
BitTorrent’s application in this information sharing age is almost priceless. However,
it is still not perfected as it is still prone to malicious attacks and acts of misuse. Moreover,
the lifespan of each torrent is still not satisfactory, which means that the length of file
distribution can only survive for a limited period of time. Thus, further analysis and a more
thorough study in the protocol will enable one to discover more ways to improve it.
REFERENCES
37
[2]Networking: A Beginner’s Guide by HALLBERG
[3]Understanding Different Protocols by LUCKETT
[4]Cache logic, BitTorrent bandwidth usage
http://www.cachelogic.com/research/2005_slide06.php
[5] Information on BitTorrent Protocol
en.wikipedia.org/wiki/BitTorrent_(protocol)
[6]BitTorrent FAQ: http://btfaq.com
[7]BitTorrent Specifications http://wiki.theory.org/BitTorrentSpecification
38