11 years ago · 810ddd40f3
--- a/papers/ipfs/ipfs.tex
+++ b/papers/ipfs/ipfs.tex
@@ -12,9 +12,7 @@
 
				 
			
 
				 \begin{document}
			
 
				 
			
 
				-% \conferenceinfo{WOODSTOCK}{'97 El Paso, Texas USA}
			
 
				-
			
 
				-\title{Galactic File System}
			
 
				+\title{IPFS - Towards The Permanent Web (DRAFT 2)}
			
 
				 \subtitle{}
			
 
				 
			
 
				 \numberofauthors{1}
			
@@ -39,16 +37,16 @@
 
				 \maketitle
			
 
				 \begin{abstract}
			
 
				 The Galactic File System is a peer-to-peer distributed file system capable of
			
 
				-sharing the same files with millions of nodes. GFS combines a distributed
			
 
				+sharing the same files with millions of nodes. IPFS combines a distributed
			
 
				 hashtable, cryptographic techniques, merkle trees, content-addressable
			
 
				 storage, bittorrent, and tag-based filesystems to build a single massive
			
 
				-file system shared between peers. GFS has no single point of failure, and
			
 
				+file system shared between peers. IPFS has no single point of failure, and
			
 
				 nodes do not need to trust each other.
			
 
				 \end{abstract}
			
 
				 
			
 
				 \section{Introduction}
			
 
				 
			
 
				-[Motivate GFS. Introduce problems. Describe BitTorrent existing problems (
			
 
				+[Motivate IPFS. Introduce problems. Describe BitTorrent existing problems (
			
 
				 multiple files. one swarm. sloppy dht implementation.) Describe version
			
 
				 control efforts. Propose potential combinations of good ideas.]
			
 
				 
			
@@ -63,15 +61,15 @@ Ori,
 
				 Coral]
			
 
				 
			
 
				 This paper introduces
			
 
				-GFS, a novel peer-to-peer version-controlled filesystem;
			
 
				-and BitSwap, the novel peer-to-peer block exchange protocol serving GFS.
			
 
				+IPFS, a novel peer-to-peer version-controlled filesystem;
			
 
				+and BitSwap, the novel peer-to-peer block exchange protocol serving IPFS.
			
 
				 
			
 
				 The rest of the paper is organized as follows.
			
 
				 Section 2 describes the design of the filesystem.
			
 
				 Section 3 evaluates various facets of the system under benchmark and common
			
 
				 workloads.
			
 
				-Section 4 presents and evaluates a world-wide deployment of GFS.
			
 
				-Section 5 describes existing and potential applications of GFS.
			
 
				+Section 4 presents and evaluates a world-wide deployment of IPFS.
			
 
				+Section 5 describes existing and potential applications of IPFS.
			
 
				 Section 6 discusses related and future work.
			
 
				 
			
 
				 Notation Notes:
			
@@ -83,9 +81,9 @@ Notation Notes:
 
				 
			
 
				 \section{Design}
			
 
				 
			
 
				-\subsection{GFS Nodes}
			
 
				+\subsection{IPFS Nodes}
			
 
				 
			
 
				-GFS is a distributed file system where all nodes are the same. They are
			
 
				+IPFS is a distributed file system where all nodes are the same. They are
			
 
				 identified by a \texttt{NodeId}, the cryptographic hash of a public-key
			
 
				 (note that \textit{checksum} will henceforth refer specifically to crypographic
			
 
				 hashes of an object). Nodes also store their public and private keys. Clients
			
@@ -107,8 +105,8 @@ accrued benefits. It is recommended that nodes remain the same.
 
				 
			
 
				 
			
 
				 Together, the
			
 
				-nodes store the GFS files in local storage, and send files to each other.
			
 
				-GFS implements its features by combining several subsystems with many
			
 
				+nodes store the IPFS files in local storage, and send files to each other.
			
 
				+IPFS implements its features by combining several subsystems with many
			
 
				 desirable properties:
			
 
				 
			
 
				 \begin{enumerate}
			
@@ -127,12 +125,12 @@ desirable properties:
 
				 
			
 
				 These subsystems are not independent. They are well integrated and leverage
			
 
				 their blended properties. However, it is useful to describe them separately,
			
 
				-building the system from the bottom up. Note that all GFS nodes are identical,
			
 
				+building the system from the bottom up. Note that all IPFS nodes are identical,
			
 
				 and run the same program.
			
 
				 
			
 
				 \subsection{Distributed Sloppy Hash Table}
			
 
				 
			
 
				-First, GFS nodes implement a DSHT based on Kademlia and Coral to coordinate
			
 
				+First, IPFS nodes implement a DSHT based on Kademlia and Coral to coordinate
			
 
				 and identify which nodes can serve a particular block of data.
			
 
				 
			
 
				 \subsubsection{Kademlia DHT}
			
@@ -158,7 +156,7 @@ Kademlia is a DHT that provides:
 
				 
			
 
				 While some peer-to-peer filesystems store data blocks directly in DHTs,
			
 
				 this ``wastes storage and bandwidth, as data must be stored at nodes where it
			
 
				-is not needed''. Instead, GFS stores a list of peers that can provide the data block.
			
 
				+is not needed''. Instead, IPFS stores a list of peers that can provide the data block.
			
 
				 
			
 
				 \subsubsection{Coral DSHT}
			
 
				 
			
@@ -189,15 +187,15 @@ Coral extends Kademlia in three particularly important ways:
 
				 \end{enumerate}
			
 
				 
			
 
				 
			
 
				-\subsubsection{GFS DSHT}
			
 
				+\subsubsection{IPFS DSHT}
			
 
				 
			
 
				-The GFS DSHT supports four RPC calls:
			
 
				+The IPFS DSHT supports four RPC calls:
			
 
				 
			
 
				 
			
 
				 
			
 
				 \subsection{Block Exchange - BitSwap Protocol}
			
 
				 
			
 
				-The exchange of data in GFS happens by exchanging blocks with peers using a
			
 
				+The exchange of data in IPFS happens by exchanging blocks with peers using a
			
 
				 BitTorrent inspired protocol: BitSwap. Like BitTorrent, BitSwap peers are
			
 
				 looking to acquire a set of blocks, and have blocks to offer in exchange.
			
 
				 Unlike BitTorrent, BitSwap is not limited to the blocks in one torrent.
			
@@ -453,7 +451,7 @@ the sender. Both receiver and sender should update their ledgers accordingly,
 
				 though the sender is either malfunctioning or attacking the receiver. Note that
			
 
				 BitSwap expects to operate on a reliable transmission channel, so data errors
			
 
				 -- which could lead to incorrect penalization of an honest sender -- are
			
 
				-expected to be caught before the data is given to BitSwap. GFS uses the uTP
			
 
				+expected to be caught before the data is given to BitSwap. IPFS uses the uTP
			
 
				 protocol.
			
 
				 
			
 
				 \paragraph{Peer.close(Bool)}
			
@@ -491,12 +489,12 @@ the future, if it is useful to do so.
 
				 
			
 
				 \subsection{Object Model}
			
 
				 
			
 
				-The DHT and BitSwap allow GFS to form a massive peer-to-peer system for storing
			
 
				+The DHT and BitSwap allow IPFS to form a massive peer-to-peer system for storing
			
 
				 and distributing blocks quickly and robustly to users.
			
 
				-GFS builds a filesystem out of this efficient block distribution system,
			
 
				+IPFS builds a filesystem out of this efficient block distribution system,
			
 
				 constructing files and directories out of blocks.
			
 
				 
			
 
				-Files in GFS are represented as a collection of inter-related objects, like in
			
 
				+Files in IPFS are represented as a collection of inter-related objects, like in
			
 
				 the version control system Git. Each object is addressed by the cryptographic
			
 
				 hash of its contents (\texttt{Checksum}). The file objects are:
			
 
				 
			
@@ -524,10 +522,10 @@ Notes:
 
				 
			
 
				 The \texttt{Block} object contains an addressable unit of data, and
			
 
				 represents a file.
			
 
				-GFS Blocks are like Git blobs or filesystem data blocks. They store the
			
 
				+IPFS Blocks are like Git blobs or filesystem data blocks. They store the
			
 
				 users' data. (The name \textit{block} is preferred over \textit{blob}, as the
			
 
				-Git-inspired view of a \textit{blob} as a \textit{file} breaks down in GFS.
			
 
				-GFS files can be represented by both \texttt{lists} and \texttt{blocks}.)
			
 
				+Git-inspired view of a \textit{blob} as a \textit{file} breaks down in IPFS.
			
 
				+IPFS files can be represented by both \texttt{lists} and \texttt{blocks}.)
			
 
				 Format:
			
 
				 \begin{verbatim}
			
 
				 block <size>
			
@@ -539,9 +537,9 @@ block <size>
 
				 \subsubsection{List Object}
			
 
				 
			
 
				 The \texttt{List} object represents a large or de-duplicated file made up of
			
 
				-several GFS \texttt{Blocks} concatenated together. \texttt{Lists} contain
			
 
				+several IPFS \texttt{Blocks} concatenated together. \texttt{Lists} contain
			
 
				 an ordered sequence of \texttt{block} or \texttt{list} objects.
			
 
				-In a sense, the GFS \texttt{List} functions like a filesystem file with
			
 
				+In a sense, the IPFS \texttt{List} functions like a filesystem file with
			
 
				 indirect blocks. Since \texttt{lists} can contain other \texttt{lists}, topologies including linked lists and balanced trees are possible. Directed graphs where the same node appears in multiple places allow in-file deduplication. Cycles are not possible (enforced by hash addessing).
			
 
				 Format:
			
 
				 \begin{verbatim}
			
@@ -554,7 +552,7 @@ list <num objects> <size varint>
 
				 
			
 
				 \subsubsection{Tree Object}
			
 
				 
			
 
				-The \texttt{tree} object in GFS is similar to Git trees: it represents a
			
 
				+The \texttt{tree} object in IPFS is similar to Git trees: it represents a
			
 
				 directory, a list of checksums and names. The checksums reference \texttt{blob}
			
 
				 or other \texttt{tree} objects. Note that traditional path naming
			
 
				 is implemented entirely by the \texttt{tree} objects. \texttt{Blocks} and
			
@@ -569,7 +567,7 @@ tree <num objects> <size varint>
 
				 
			
 
				 \subsubsection{Commit Object}
			
 
				 
			
 
				-The \texttt{commit} object in GFS is similar to Git's. It represents a
			
 
				+The \texttt{commit} object in IPFS is similar to Git's. It represents a
			
 
				 snapshot in the version history of a \texttt{tree}. Note that user
			
 
				 addresses are NodeIds (the hash of the public key).
			
 
				 
			
@@ -592,17 +590,17 @@ it references are accessible, all preceding versions are retrievable and the
 
				 full history of the filesystem changes can be accessed. This is a consequence
			
 
				 of the \texttt{Git} object model and the graph it forms.
			
 
				 
			
 
				-The full power of the \texttt{Git} version control tools is available to GFS
			
 
				+The full power of the \texttt{Git} version control tools is available to IPFS
			
 
				 users. The object model is compatible (though not the same). The standard
			
 
				-\texttt{Git} tools can be used on the \texttt{GFS} object graph after a
			
 
				+\texttt{Git} tools can be used on the \texttt{IPFS} object graph after a
			
 
				 conversion. Additionally, a fork of the tools is under development that will
			
 
				 allow users to use them directly without conversion.
			
 
				 
			
 
				 \subsubsection{Object-level Cryptoraphy}
			
 
				 
			
 
				-GFS is equipped to handle object-level cryptographic operations. Any additional
			
 
				+IPFS is equipped to handle object-level cryptographic operations. Any additional
			
 
				 bytes are appended to the bottom of the object. This changes the object's hash
			
 
				-(defining a different object, as it should). GFS exposes an API that
			
 
				+(defining a different object, as it should). IPFS exposes an API that
			
 
				 automatically verifies signatures or decrypts data.
			
 
				 
			
 
				 \begin{itemize}
			
@@ -612,7 +610,7 @@ automatically verifies signatures or decrypts data.
 
				 
			
 
				 \subsubsection{Merkle Trees}
			
 
				 
			
 
				-The object model in GFS forms a \textit{Merkle Tree}, which provides GFS with
			
 
				+The object model in IPFS forms a \textit{Merkle Tree}, which provides IPFS with
			
 
				 useful properties:
			
 
				 
			
 
				 \begin{enumerate}
			
@@ -634,7 +632,7 @@ useful properties:
 
				 
			
 
				 \subsubsection{Filesystem Paths}
			
 
				 
			
 
				-GFS exposes a slash-delimited path-based API. Paths work the same as in any
			
 
				+IPFS exposes a slash-delimited path-based API. Paths work the same as in any
			
 
				 traditional UNIX filesystem. Path subcomponents have different meanings per
			
 
				 object:
			
 
				 
			
@@ -772,11 +770,11 @@ This is mitigated by:
 
				 \begin{itemize}
			
 
				   \item \textbf{tree caching}: since all objects are hash-addressed, they
			
 
				         can be cached indefinitely. Additionally, \texttt{trees} tend to be
			
 
				-        small in size so GFS prioritizes caching them over \texttt{blocks}.
			
 
				+        small in size so IPFS prioritizes caching them over \texttt{blocks}.
			
 
				   \item \textbf{flattened trees}: for any given \texttt{tree}, a special
			
 
				         \texttt{flattened tree} can be constructed to list all objects
			
 
				         reachable from the \texttt{tree}. Figure \ref{flattened-ttt111} shows
			
 
				-        an example of a flattened tree. While GFS does not construct flattened
			
 
				+        an example of a flattened tree. While IPFS does not construct flattened
			
 
				         trees by default, it provides a function for users. For example,
			
 
				 \end{itemize}
			
 
				 
			
@@ -796,13 +794,13 @@ This is mitigated by:
 
				 
			
 
				 \subsubsection{Publishing Objects}
			
 
				 
			
 
				-GFS is globally distributed. It is designed to allow the files of millions
			
 
				+IPFS is globally distributed. It is designed to allow the files of millions
			
 
				 of users to coexist together. The \textbf{DHT} with content-hash addressing
			
 
				 allows publishing objects in a fair, secure, and entirely distributed way.
			
 
				 Anyone can publish an object by simply adding its key to the DHT, adding
			
 
				 themselves as a peer, and giving other users the object's hash.
			
 
				 
			
 
				-Additionally, the GFS root directory supports special functionality to
			
 
				+Additionally, the IPFS root directory supports special functionality to
			
 
				 allow namespacing and naming objects in a fair, secure, and distributed
			
 
				 manner.
			
 
				 \begin{itemize}
			
@@ -816,9 +814,9 @@ manner.
 
				         a user can publish a \texttt{tree} or \texttt{commit} under their
			
 
				         name, and others can verify it by checking the signature matches.
			
 
				 
			
 
				-  \item[(c)] If \texttt{/<domain>} is a valid domain name, GFS
			
 
				-        looks up key \texttt{gfs} in its \texttt{DNS TXT} record. GFS
			
 
				-        interprets the value as either an object hash or another GFS path:
			
 
				+  \item[(c)] If \texttt{/<domain>} is a valid domain name, IPFS
			
 
				+        looks up key \texttt{gfs} in its \texttt{DNS TXT} record. IPFS
			
 
				+        interprets the value as either an object hash or another IPFS path:
			
 
				         \begin{verbatim}
			
 
				   # this DNS TXT record
			
 
				   fs.benet.ai. TXT "gfs=/aabbccddeeffgg ..."
			
@@ -832,15 +830,15 @@ manner.
 
				 
			
 
				 \subsection{Local Objects}
			
 
				 
			
 
				-GFS clients require some \textit{local storage}, an external system
			
 
				-on which to store and retrieve local raw data for the objects GFS manages.
			
 
				+IPFS clients require some \textit{local storage}, an external system
			
 
				+on which to store and retrieve local raw data for the objects IPFS manages.
			
 
				 The type of storage depends on the node's use case.
			
 
				 In most cases, this is simply a portion of disk space (either managed by
			
 
				-the native filesystem, or directly by the GFS client). In others, non-
			
 
				+the native filesystem, or directly by the IPFS client). In others, non-
			
 
				 persistent caches for example, this storage is just a portion of RAM.
			
 
				 
			
 
				-Ultimately, all blocks available in GFS are in some node's
			
 
				-\textit{local storage}. And when nodes open files with GFS, the objects are
			
 
				+Ultimately, all blocks available in IPFS are in some node's
			
 
				+\textit{local storage}. And when nodes open files with IPFS, the objects are
			
 
				 downloaded and stored locally, at least temporarily. This provides
			
 
				 fast lookup for some configurable amount of time thereafter.