ipfs-cap2pfs.tex 49 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108
  1. \documentclass{sig-alternate}
  2. \usepackage{tikz}
  3. \usetikzlibrary{arrows}
  4. \usetikzlibrary{trees}
  5. \usetikzlibrary{positioning}
  6. \usepackage{array}
  7. \usepackage{amstext}
  8. \usepackage{mathtools}
  9. \DeclarePairedDelimiter{\ceil}{\lceil}{\rceil}
  10. \begin{document}
  11. \title{IPFS - Content Addressed, Versioned, P2P File System (DRAFT 3)}
  12. \subtitle{}
  13. \numberofauthors{1}
  14. \author{
  15. \alignauthor
  16. Juan Benet\\
  17. \email{juan@benet.ai}
  18. }
  19. \maketitle
  20. \begin{abstract}
  21. The InterPlanetary File System is a peer-to-peer distributed file system
  22. capable of sharing the same files with millions of nodes. IPFS combines a
  23. distributed hashtable, cryptographic techniques, merkle trees, content-
  24. addressable storage, bittorrent, and tag-based filesystems to build a single
  25. massive file system shared between peers. IPFS has no single point of failure,
  26. and nodes do not need to trust each other.
  27. \end{abstract}
  28. \section{Introduction}
  29. [Motivate IPFS. Introduce problems. Describe BitTorrent existing problems (
  30. multiple files. one swarm. sloppy dht implementation.) Describe version
  31. control efforts. Propose potential combinations of good ideas.]
  32. [Cite:
  33. CFS,
  34. Kademlia,
  35. Bittorrent,
  36. Chord,
  37. DHash,
  38. SFS,
  39. Ori,
  40. Coral]
  41. This paper introduces
  42. IPFS, a novel peer-to-peer version-controlled filesystem;
  43. and BitSwap, the novel peer-to-peer block exchange protocol serving IPFS.
  44. %The rest of the paper is organized as follows.
  45. %Section 2 describes the design of the filesystem.
  46. %Section 3 evaluates various facets of the system under benchmark and common
  47. %workloads.
  48. %Section 4 presents and evaluates a world-wide deployment of IPFS.
  49. %Section 5 describes existing and potential applications of IPFS.
  50. %Section 6 discusses related and future work.
  51. Notation Notes:
  52. (a) data structures are specified in Go syntax,
  53. (b) rpc protocols are specified in capnp interfaces,
  54. (c) wire protocols are specified in capnp schemas.
  55. \section{Background}
  56. This section reviews important properties of successful peer-to-peer systems, which IPFS combines.
  57. \subsection{Distributed Hash Tables}
  58. Distributed Hash Tables (DHTs) are widely used to coordinate and maintain metadata about peer-to-peer systems. For example, the BitTorrent MainlineDHT tracks sets of peers part of a torrent swarm.
  59. \subsubsection{Kademlia DHT}
  60. Kademlia \cite{Kademlia} is a popular DHT that provides:
  61. \begin{enumerate}
  62. \item Efficient lookup through massive networks:
  63. queries on average contact $ \ceil{log_2 (n)} $ nodes.
  64. (e.g. $20$ hops for a network of $10,000,000$ nodes).
  65. \item Low coordination overhead: it optimizes the number of
  66. control messages it sends to other nodes.
  67. \item Resistance to various attacks, by preferring nodes who have been
  68. part of the DHT longer.
  69. \item wide useage in peer-to-peer applications, including \\
  70. Gnutella and Bittorrent, forming networks of over 100 million nodes.
  71. \end{enumerate}
  72. \subsubsection{Coral DSHT}
  73. While some peer-to-peer filesystems store data blocks directly in DHTs,
  74. this ``wastes storage and bandwidth, as data must be stored at nodes where it
  75. is not needed'' \cite{Coral}. Coral extends Kademlia in three particularly important ways:
  76. \begin{enumerate}
  77. \item Kademlia stores values in nodes whose ids are ``nearest'' (using
  78. XOR-distance) to the key. This does not take into account application
  79. data locality, ignores ``far'' nodes who may already have the data, and
  80. forces ``nearest'' nodes to store it, whether they need it or not.
  81. This wastes significant storage and bandwith. Instead, Coral stores
  82. addresses to peers who can provide the data blocks.
  83. \item Coral relaxes the DHT API from \texttt{get\_value(key)} to
  84. \texttt{get\_any\_values(key)} (the ``sloppy'' in DSHT).
  85. This still works since Coral users only need a single (working) peer,
  86. not the complete list. In return, Coral can distribute only subsets of
  87. the values to the ``nearest'' nodes, avoiding hot-spots (overloading
  88. \textit{all the nearest nodes} when a key becomes popular).
  89. \item Additionally, Coral organizes a hierarchy of separate DSHTs called
  90. \textit{clusters} depending on region and size. This enables nodes to
  91. query peers in their region first, ``finding nearby data without
  92. querying distant nodes'' and greatly reducing the latency of
  93. lookups.
  94. \end{enumerate}
  95. \subsubsection{S/Kademlia DHT}
  96. S/Kademlia extends Kademlia to protect against malicious attacks:
  97. \begin{enumerate}
  98. \item S/Kademlia provides schemes to secure \texttt{NodeId} generation,
  99. and prevent Sybill attacks. It requires nodes to create a PKI key pair, derive their identity from it, and sign their messages to each other. One scheme includes a proof-of-work crypto puzzle to make generating Sybills expensive.
  100. \item S/Kademlia nodes lookup values over disjoint paths, in order to
  101. ensure honest nodes can connect to each other in the presence of a large fraction of adversaries in the network. S/Kademlia achieves a success rate of 0.85 even with an adversarial fraction as large as half of the nodes.
  102. \end{enumerate}
  103. \subsection{Block Exchanges - BitTorrent}
  104. BitTorrent \cite{BitTorrent} is a widely successful peer-to-peer filesharing system, which succeeds in coordinating networks of untrusting peers (swarms) to cooperate in distributing pieces of files to each other. Key BitTorrent features that inform IPFS design:
  105. \begin{enumerate}
  106. \item BitTorrent's data exchange protocol uses a quasi tit-for-tat strategy
  107. which rewards nodes that contribute to each other, and punishes nodes who would only leech others' resources.
  108. \item BitTorrent peers track the availability of file pieces, prioritizing
  109. sending rarest-first. This takes load off seeds, making non-seed peers capable of trading with each other.
  110. \item BitTorrent's standard tit-for-tat is vulnerable to some exploitative
  111. bandwidth sharing strategies. PropShare \cite{propshare} is a different peer bandwidth allocation strategy that better resists exploitative strategies, and improves the performance of swarms.
  112. \end{enumerate}
  113. \subsection{Version Control Systems - Git}
  114. Version Control Systems provide facilities to model files changing over time and distribute different versions efficiently. The popular version control system Git provides a powerful Merkle DAG \footnote{Merkle Directed Acyclic Graph -- similar but more general construction than a Merkle Tree. Deduplicated, does not need to be balanced, and non-leaf nodes contain data.} object model that captures changes to a filesystem tree in a distributed-friendly way.
  115. \begin{enumerate}
  116. \item Immutable objects represent Files (\texttt{blob}), Directories (\texttt{tree}), and Changes (\texttt{commit}).
  117. \item Objects are content-addressed, by the cryptographic hash of their contents.
  118. \item Links to other objects are embedded, forming a Merkle DAG. This
  119. provides many useful integrity and workflow properties.
  120. \item Most versioning metadata (branches, tags, etc) are simply pointer references, and thus inexpensive to create and update.
  121. \item Version changes only update references or add objects.
  122. \item Distributing version changes to other users is simply transferring objects and updating remote references.
  123. \end{enumerate}
  124. \subsection{Self-Certified Filesystems - SFS}
  125. SFS \cite{SFS} proposed a compelling solution to both (a) implementing distributed trust chains, in a (b) egalitarian shared global namespace. SFS introduces a technique for building \textit{Self-Certified Filesystem}: address remote filesystems via the following scheme
  126. \begin{verbatim}
  127. /sfs/<Location>:<HostID>
  128. \end{verbatim}
  129. Where \texttt{Location} is the filesystem's server network address, and:
  130. \begin{verbatim}
  131. HostID = hash(public_key || Location)
  132. \end{verbatim}
  133. Thus the \textit{name} of an SFS file system certifies its server. The user can verify the public key offered by the server, negotiate a shared secret, and secure all traffic. Additionaly all SFS instances share a global namespace where name allocation is cryptographic, not gated by any centralized body.
  134. \section{Design}
  135. IPFS is a distributed file system which synthesizes successful ideas from previous peer-to-peer sytems, including DHTs, BitTorrent, Git, and SFS. The contribution of IPFS is simplifying, evolving, and connecting proven techniques into a single cohesive system, greater than the sum of its parts. IPFS presents a new platform for writing and deploying applications, a new system for distributing and versioning large data, and could evolve the web itself.
  136. IPFS is peer-to-peer; no nodes are privileged. IPFS nodes store IPFS objects in local storage. Nodes connect to each other and transfer objects. These objects represent files and other data structures. The IPFS Protocol is divided into a stack of sub-protocols responsible for different functionality:
  137. \begin{enumerate}
  138. \item \textbf{Identities} - manages node identity generation and verification. Described in Section 3.1.
  139. \item \textbf{Network} - manages connections to other peers, using various underlying network protocols. Configurable. Described in Section 3.2.
  140. \item \textbf{Routing} - maintains information to locate specific peers and objects. Responds to both local and remote queries. Defaults to a DHT, but is swappable. Described in Section 3.3.
  141. \item \textbf{Exchange} - a novel block exchange protocol (BitSwap) that governs efficient block distribution. Modelled as a market, weakly intentivizes replication. Trade Strategies swappable. Described in Section 3.4.
  142. \item \textbf{Objects} - a Merkle DAG of content-addressed immutable objects with links. Used to represent arbitrary datastructures, e.g. file hierarchies and communication systems. Described in Section 3.5.
  143. \item \textbf{Files} - versioned file system hierarchy, inspired by Git. Described in Section 3.6.
  144. \item \textbf{Naming} - A self-certifying mutable name system. Described in Section 3.7.
  145. \end{enumerate}
  146. These subsystems are not independent; they are integrated and leverage
  147. blended properties. However, it is useful to describe them separately,
  148. building the protocol stack from the bottom up.
  149. \subsection{Identities}
  150. Nodes are identified by a \texttt{NodeId}, the cryptographic hash\footnote{throughout this document, \textit{hash} and \textit{checksum} refer specifically to cryptographic hash checksums of data} of a public-key, created with S/Kademlia's static crypto puzzle \cite{skademlia}. Nodes store their public and private keys (encrypted with a passphrase). Users are free to instatiate a ``new'' node identity on every launch, though that loses accrued network benefits. Nodes are incentivized to remain the same.
  151. \begin{verbatim}
  152. type NodeId Multihash
  153. type Multihash []byte
  154. // self-describing cryptographic hash digest
  155. type PublicKey []byte
  156. type PrivateKey []byte
  157. // self-describing keys
  158. type Node struct {
  159. NodeId NodeID
  160. PubKey PublicKey
  161. PriKey PrivateKey
  162. }
  163. \end{verbatim}
  164. S/Kademlia based IPFS identity generation:
  165. \begin{verbatim}
  166. difficulty = <integer parameter>
  167. n = Node{}
  168. do {
  169. n.PubKey, n.PrivKey = PKI.genKeyPair()
  170. n.NodeId = hash(hash(n.PubKey))
  171. p = count_preceding_zero_bits(n.NodeId)
  172. } while (p < difficulty)
  173. \end{verbatim}
  174. Upon first connecting, peers exchange public keys, and check: \texttt{hash(other.PublicKey) equals other.NodeId}. If not, the connection is terminated.
  175. \paragraph{Note on Cryptographic Functions} Rather than locking the system to a particular set of function choices, IPFS favors self-describing values. Hash digest values are ``multihashes'', a format including a short header identifying the hash function used, and the digest length in bytes. Example:
  176. \begin{verbatim}
  177. <function code><digest length><digest bytes>
  178. \end{verbatim}
  179. This allows the system (a) chose the best function for the use case (e.g. stronger security vs faster performance), and (b) evolve as function choices change. Self-describing values allow using different parameter choices compatibly.
  180. \subsection{Network}
  181. IPFS nodes communicate regualarly with hundreds of other nodes in the network, across the wide internet. IPFS can use any reliable transport protocol, and is best suited for WebRTC DataChannels \cite{WebRTC} (for browser connectivity) or uTP \cite{uTP} (LEDBAT) \cite{LEDBAT}. IPFS also uses the ICE NAT traversal techniques \cite{ICE} to increase connectivity between peers.
  182. \begin{itemize}
  183. \item \textbf{Reliability:} IPFS can provide reliability if underlying networks do not provide it, using uTP or SCTP.
  184. \item \textbf{Integrity:} optionally checks integrity of messages using a hash checksum.
  185. \item \textbf{Authenticity:} optionally checks authenticity of messages using HMAC with sender's public key.
  186. \end{itemize}
  187. \subsection{Routing}
  188. IPFS nodes require a routing system that can find (a) other peers' network addresses and (b) peers who can serve particular objects. IPFS achieves this using a DSHT based on S/Kademlia and Coral, using the properties discussed in 2.1. The size of objects and use patterns of IPFS are similar to Coral \cite{Coral} and Mainline \cite{Mainline}, so references are stored in the DHT instead of entire blocks. References are the \texttt{NodeIds} of peers who can serve the block.
  189. The interface of this DSHT is:
  190. \begin{verbatim}
  191. routing.findPeer(node NodeId)
  192. // gets a particular peer's network address
  193. routing.findValuePeers(key Multihash, min int)
  194. // gets a number of peers serving a value
  195. routing.setValue(key []bytes, value []bytes)
  196. // stores a small metadata value in the DHT
  197. routing.provideValue(key Multihash)
  198. // announces that this node can serve a value
  199. \end{verbatim}
  200. Note: different use cases will call for substantially different routing systems (e.g. DHT in wide network, static HT in local network). Thus the IPFS routing system can be swapped for one to fit the users' needs. As long as the interface above is met, the rest of the system will continue to function.
  201. \subsection{Block Exchange - BitSwap Protocol}
  202. The exchange of data in IPFS happens by exchanging blocks with peers using a
  203. BitTorrent inspired protocol: BitSwap. Like BitTorrent, BitSwap peers are
  204. looking to acquire a set of blocks, and have blocks to offer in exchange.
  205. Unlike BitTorrent, BitSwap is not limited to the blocks in one torrent.
  206. BitSwap operates as a persistent marketplace where node can acquire the
  207. blocks they need, regardless of what files the blocks are part of. The
  208. blocks could come from completely unrelated files in the filesystem.
  209. But nodes come together to barter in the marketplace.
  210. While the notion of a barter system implies a virtual currency could be
  211. created, this would require a global ledger to track ownership
  212. and transfer of the currency. This can be implemented as a BitSwap Strategy, and will be explored in a future paper.
  213. In the base case, BitSwap nodes have to provide direct value to each other
  214. in the form of blocks. This works fine when the distribution of blocks across
  215. nodes is such that they have complements, what each other wants. This will
  216. seldom be the case. Instead, it is more likely that nodes must \textit{work}
  217. for their blocks. In the case that a node has nothing that its peers want (or
  218. nothing at all), it seeks the pieces its peers want, with lower
  219. priority than what the node wants itself. This incentivizes nodes to cache and
  220. disseminate rare pieces, even if they are not interested in them directly.
  221. \subsubsection{BitSwap Credit}
  222. The protocol must also incentivize nodes to seed when they do not need
  223. anything in particular, as they might have the blocks others want. Thus,
  224. BitSwap nodes send blocks to their peers optimistically, expecting the debt to
  225. be repaid. But, leeches (free-loading nodes that never share) must be protected against. A simple credit-like system solves the problem:
  226. \begin{enumerate}
  227. \item Peers track their balance (in bytes verified) with other nodes.
  228. \item Peers send blocks to debtor peers probabilistically, according to
  229. a function that falls as debt increases.
  230. \end{enumerate}
  231. Note that if a peer decides not to send, the peer subsequently ignores the
  232. other node for an \texttt{ignore\_cooldown} timeout. This prevents senders
  233. from trying to game the probability by just causing more dice-rolls.
  234. (Default BitSwap is 10 seconds).
  235. \subsubsection{BitSwap Strategy}
  236. The differing strategies that BitSwap peers might employ have wildly different effects on the performance of the exchange as a whole. In BitTorrent, while a standard strategy is specified (tit-for-tat), a variety of others have been implemented, ranging from BitTyrant \cite{BitTyrant} (sharing the least-possible), to BitThief \cite{BitThief} (exploiting a vulnerability and never share), to PropShare \cite{PropShare} (sharing proportionally). A range of strategies (good and malicious) could similarly be implemented by BitSwap peers. The choice of function, then, should aim to:
  237. \begin{enumerate}
  238. \item maximize the trade performance for the node, and the whole exchange
  239. \item prevent freeloaders from exploiting and degrading the exchange
  240. \item be effective with and resistant to other, unknown
  241. strategies
  242. \item be lenient to trusted peers
  243. \end{enumerate}
  244. The exploration of the space of such strategies is future work.
  245. One choice of function that works in practice is a sigmoid, scaled by a
  246. \textit{debt retio}:
  247. Let the \textit{debt ratio} $ r $ between a node and its peer be:
  248. \[ r = \dfrac{\texttt{bytes\_sent}}{\texttt{bytes\_recv} + 1} \]
  249. Given $r$, let the probability of sending to a debtor be:
  250. \[ P\Big( \; send \; | \; r \;\Big) = 1 - \dfrac{1}{1 + exp(6-3r)} \]
  251. \begin{figure}
  252. \centering
  253. \begin{tikzpicture}[domain=0:4]
  254. \draw[->] (-0,0) -- (4.2,0) node[right] {$r$};
  255. \draw[->] (0,-0) -- (0,1.20) node[above] {$P(\;send\;|\;r\;)$};
  256. %ticks
  257. \foreach \x in {0,...,4}
  258. \draw (\x,1pt) -- (\x,-3pt)
  259. node[anchor=north] {\x};
  260. \foreach \y in {1,...,1}
  261. \draw (1pt,\y) -- (-3pt,\y)
  262. node[anchor=east] {\y};
  263. \draw[color=red] plot[] function{1 - 1/(1+exp(6-3*x))};
  264. \end{tikzpicture}
  265. \caption{Probability of Sending as $r$ increases}
  266. \label{fig:psending-graph}
  267. \end{figure}
  268. As you can see in Figure \ref{fig:psending-graph}, this function drops off quickly as the nodes' \
  269. \textit{debt ratio} surpasses twice the established credit.
  270. The \textit{debt ratio} is a measure of trust:
  271. lenient to debts between nodes that have previously exchanged lots of data
  272. successfully, and merciless to unknown, untrusted nodes. This
  273. (a) provides resistance to attackers who would create lots of new nodes
  274. (sybill attacks),
  275. (b) protects previously successful trade relationships, even if one of the
  276. nodes is temporarily unable to provide value, and
  277. (c) eventually chokes relationships that have deteriorated until they
  278. improve.
  279. % \begin{center}
  280. % \begin{tabular}{ >{$}c<{$} >{$}c<{$}}
  281. % P(\;send\;|\quad r) \;\;\;\;\;& \\
  282. % \hline
  283. % \hline
  284. % P(\;send\;|\;0.0) =& 1.00 \\
  285. % P(\;send\;|\;0.5) =& 1.00 \\
  286. % P(\;send\;|\;1.0) =& 0.98 \\
  287. % P(\;send\;|\;1.5) =& 0.92 \\
  288. % P(\;send\;|\;2.0) =& 0.73 \\
  289. % P(\;send\;|\;2.5) =& 0.38 \\
  290. % P(\;send\;|\;3.0) =& 0.12 \\
  291. % P(\;send\;|\;3.5) =& 0.03 \\
  292. % P(\;send\;|\;4.0) =& 0.01 \\
  293. % P(\;send\;|\;4.5) =& 0.00 \\
  294. % \end{tabular}
  295. % \end{center}
  296. % TODO look into computing share of the bandwidth, as described in propshare.
  297. \subsubsection{BitSwap Ledger}
  298. BitSwap nodes keep ledgers accounting the transfers with other nodes. This allows nodes to keep track of history, and to avoid tampering. When activating a connection, BitSwap nodes exchange their ledger information. If it does not match exactly, the ledger is reinitialized from scratch, loosing the accrued credit or debt. It is possible for malicious nodes to purposefully ``loose'' the Ledger, hoping the erase debts. It is unlikely that nodes will have accrued enough debt to warrant also losing the accrued trust, however the partner node is free to count it as \textit{misconduct} (discussed later).
  299. \begin{verbatim}
  300. type Ledger struct {
  301. owner NodeId
  302. partner NodeId
  303. bytes_sent int
  304. bytes_recv int
  305. timestamp Timestamp
  306. }
  307. \end{verbatim}
  308. Nodes are free to keep the ledger history, though it is not necessary for
  309. correct operation. Only the current ledger entries are useful. Nodes are
  310. also free to garbage collect ledgers as necessary, starting with the less
  311. useful ledgers: the old (peers may not exist anymore) and small.
  312. \subsubsection{BitSwap Specification}
  313. BitSwap nodes follow a simple protocol.
  314. \begin{verbatim}
  315. // Additional state kept
  316. type BitSwap struct {
  317. ledgers map[NodeId]Ledger
  318. // Ledgers known to this node, inc inactive
  319. active map[NodeId]Peer
  320. // currently open connections to other nodes
  321. need_list []Multihash
  322. // checksums of blocks this node needs
  323. have_list []Multihash
  324. // checksums of blocks this node has
  325. }
  326. type Peer struct {
  327. nodeid NodeId
  328. ledger Ledger
  329. // Ledger between the node and this peer
  330. last_seen Timestamp
  331. // timestamp of last received message
  332. want_list []Multihash
  333. // checksums of all blocks wanted by peer
  334. // includes blocks wanted by peer's peers
  335. }
  336. // Protocol interface:
  337. interface Peer {
  338. open (nodeid :NodeId, ledger :Ledger);
  339. send_want_list (want_list :WantList);
  340. send_block (block :Block) -> (complete :Bool);
  341. close (final :Bool);
  342. }
  343. \end{verbatim}
  344. Sketch of the lifetime of a peer connection:
  345. \begin{enumerate}
  346. \item Open: peers send \texttt{ledgers} until they agree.
  347. \item Sending: peers exchange \texttt{want\_lists} and \texttt{blocks}.
  348. \item Close: peers deactivate a connection.
  349. \item Ignored: (special) a peer is ignored (for the duration of a timeout)
  350. if a node's strategy avoids sending
  351. \end{enumerate}
  352. \paragraph{Peer.open(NodeId, Ledger)}
  353. When connecting, a node initializes a connection with a
  354. \texttt{Ledger}, either stored from a connection in the past or a new one
  355. zeroed out. Then, sends an Open message with the \texttt{Ledger} to the peer.
  356. Upon receiving an \texttt{Open} message, a peer chooses whether to activate
  357. the connection. If -- acording to the receiver's \texttt{Ledger} -- the sender
  358. is not a trusted agent (transmission below zero, or large outstanding debt) the
  359. receiver may opt to ignore the request. This should be done probabilistically
  360. with an \texttt{ignore\_cooldown} timeout, as to allow errors to be corrected
  361. and attackers to be thwarted.
  362. If activating the connection, the receiver initializes a Peer object, with the
  363. local version of the \texttt{Ledger}, and setting the \texttt{last\_seen}
  364. timestamp). Then, it compares the received
  365. \texttt{Ledger} with its own. If they match exactly, the connections have
  366. opened. If they do not match, the peer creates a new zeroed out
  367. \texttt{Ledger}, and sends it.
  368. \paragraph{Peer.send\_want\_list(WantList)}
  369. While the connection is open, nodes advertise their
  370. \texttt{want\_list} to all connected peers. This is done (a) upon opening the
  371. connection, (b) after a randomized periodic timeout, (c) after a change in
  372. the \texttt{want\_list} and (d) after receiving a new block.
  373. Upon receiving a \texttt{want\_list}, a node stores it. Then, it checks whether
  374. it has any of the wanted blocks. If so, it sends them according to the
  375. \textit{BitSwap Strategy} above.
  376. \paragraph{Peer.send\_block(Block)}
  377. Sending a block is straightforward. The node simply transmits the block of
  378. data. Upon receiving all the data, the receiver computes the Multihash
  379. checksum to verify it matches the expected one, and returns confirmation.
  380. Upon finalizing the correct transmission of a block, the receiver moves the
  381. block from \texttt{need\_list} to \texttt{have\_list}, and both the receiver
  382. and sender update their ledgers to reflect the additional bytes transmitted.
  383. If a transmission verfication fails, the receiver instead \textit{penalizes}
  384. the sender. Both receiver and sender should update their ledgers accordingly,
  385. though the sender is either malfunctioning or attacking the receiver. Note that
  386. BitSwap expects to operate on a reliable transmission channel, so data errors
  387. -- which could lead to incorrect penalization of an honest sender -- are
  388. expected to be caught before the data is given to BitSwap. IPFS uses the uTP
  389. protocol.
  390. \paragraph{Peer.close(Bool)}
  391. The \texttt{final} parameter to \texttt{close} signals whether the intention
  392. to tear down the connection is the sender's or not. If false, the receiver
  393. may opt to re-open the connection immediatelty. This avoids premature
  394. closes.
  395. A peer connection should be closed under two conditions:
  396. \begin{itemize}
  397. \item a \texttt{silence\_wait} timeout has expired without receiving any
  398. messages from the peer (default BitSwap uses 30 seconds).
  399. The node issues \texttt{Peer.close(false)}.
  400. \item the node is exiting and BitSwap is being shut down.
  401. In this case, the node issues \texttt{Peer.close(true)}.
  402. \end{itemize}
  403. After a \texttt{close} message, both receiver and sender tear down the
  404. connection, clearing any state stored. The \texttt{Ledger} may be stored for
  405. the future, if it is useful to do so.
  406. \paragraph{Notes}
  407. \begin{itemize}
  408. \item Non-\texttt{open} messages on an inactive connection should be ignored.
  409. In case of a \texttt{send\_block} message, the receiver may check
  410. the block to see if it is needed and correct, and if so, use it.
  411. Regardless, all such out-of-order messages trigger a
  412. \texttt{close(false)} message from the receiver, to force
  413. re-initialization of the connection.
  414. \end{itemize}
  415. % TODO: Rate Limiting / Node Silencing
  416. \subsection{Object Merkle DAG}
  417. The DHT and BitSwap allow IPFS to form a massive peer-to-peer system for storing and distributing blocks quickly and robustly to users. On top of these, IPFS builds a Merkle DAG, a directed acyclic graph where links between objects are cryptographic hashes of the targets embedded in the sources. This is a generalization of which the Git data structure is a special case, and upon which it can be built. Merkle DAGs provide IPFS many useful properties, including:
  418. \begin{enumerate}
  419. \item \textbf{Content Addressing:} all content is uniquely identified by its
  420. \texttt{multihash} checksum, \textbf{including links}.
  421. \item \textbf{Tamper resistance:} all content is verified with its checksum.
  422. If data is tampered with or corrupted, IPFS detects it.
  423. \item \textbf{Deduplication:} all objects who hold the exact same content
  424. are equal, and only stored once. This is particularly useful with
  425. index objects, such as git \texttt{trees} and \texttt{commits}, or common portions of data.
  426. \end{enumerate}
  427. The IPFS Object format is:
  428. \begin{verbatim}
  429. type IPFSLink struct {
  430. Name string
  431. // name or alias of this link
  432. Hash Multihash
  433. // cryptographic hash of target
  434. Size int
  435. // total size of target
  436. }
  437. type IPFSObject struct {
  438. links []IPFSLink
  439. // array of links
  440. data []byte
  441. // opaque content data
  442. }
  443. \end{verbatim}
  444. The IPFS Merkle DAG is an extremely flexible way to store data. The only requirements are that object references be (a) content addressed, and (b) encoded in the format above. IPFS grants applications complete control over the data field; applications can use any custom data format they chose, which IPFS may not understand. The separate in-object link table allows IPFS to:
  445. \begin{itemize}
  446. \item List all object references in an object. For example:
  447. \begin{verbatim}
  448. > ipfs ls /XLZ1625Jjn7SubMDgEyeaynFuR84ginqvzb
  449. XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x 189458 less
  450. XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5 19441 script
  451. XLF4hwVHsVuZ78FZK6fozf8Jj9WEURMbCX4 5286 template
  452. <object multihash> <object size> <link name>
  453. \end{verbatim}
  454. \item Resolve string path lookups, such as \texttt{foo/bar/baz}. Given an object, IPFS resolves the first path component to a hash in the object's link table, fetching that second object, and repeats with the next component. Thus, string paths can walk the Merkle DAG no matter what the data formats are.
  455. \item Resolve all objects referenced recursively:
  456. \begin{verbatim}
  457. > ipfs refs --recursive \
  458. /XLZ1625Jjn7SubMDgEyeaynFuR84ginqvzb
  459. XLLxhdgJcXzLbtsLRL1twCHA2NrURp4H38s
  460. XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x
  461. XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5
  462. XLWVQDqxo9Km9zLyquoC9gAP8CL1gWnHZ7z
  463. ...
  464. \end{verbatim}
  465. \end{itemize}
  466. A raw data field and a common link structure are the necessary components for constructing arbitrary data structures on top of IPFS. While it is easy to see how the Git object model fits on top of this DAG, consider these other potential data structures:
  467. (a) key-value stores
  468. (b) traditional relational databases
  469. (c) Linked Data triple stores
  470. (d) linked document publishing systems
  471. (e) linked communications platforms
  472. (f) cryptocurrency blockchains.
  473. These can all be modeled on top of the IPFS Merkle DAG, which allows any of these systems to use IPFS as a transport protocol for more complex applications.
  474. \subsubsection{Paths}
  475. IPFS objects can be traversed with a string path API. Paths work as they do in traditional UNIX filesystems and the Web. The Merkle DAG links make traversing it easy. Note that full paths in IPFS are of the form:
  476. \begin{verbatim}
  477. # format
  478. /ipfs/<hash-of-object>/<name-path-to-object>
  479. # example
  480. /ipfs/XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x/foo.txt
  481. \end{verbatim}
  482. The \texttt{/ipfs} prefix allows mounting into existing systems at a standard mount point without conflict (mount point names are of course configurable). The second path component (first within IPFS) is the hash of an object. This is always the case, as there is no global root. A root object would have the impossible task of handling consistency of millions of objects in a distributed (and possibly disconnected) environment. Instead, we simulate the root with content addressing. All objects are always accessible via their hash. Note this means that given three objects in path \texttt{<foo>/bar/baz}, the last object is accessible by all:
  483. \begin{verbatim}
  484. /ipfs/<hash-of-foo>/bar/baz
  485. /ipfs/<hash-of-bar>/baz
  486. /ipfs/<hash-of-baz>
  487. \end{verbatim}
  488. \subsubsection{Local Objects}
  489. IPFS clients require some \textit{local storage}, an external system
  490. on which to store and retrieve local raw data for the objects IPFS manages.
  491. The type of storage depends on the node's use case.
  492. In most cases, this is simply a portion of disk space (either managed by
  493. the native filesystem, by a key-value store such as leveldb \cite{leveldb}, or
  494. directly by the IPFS client). In others, non-persistent caches for example,
  495. this storage is just a portion of RAM.
  496. Ultimately, all blocks available in IPFS are in some node's
  497. \textit{local storage}. When users request objects, they are
  498. found, downloaded, and stored locally, at least temporarily. This provides
  499. fast lookup for some configurable amount of time thereafter.
  500. \subsubsection{Object Pinning}
  501. Nodes who wish to ensure the survival of particular objects can do so by
  502. \texttt{pinning} the objects. This ensures the objects are kept in the node's
  503. \textit{local storage}. Pinning can be done recursively, to pin down all
  504. linked descendent objects as well. All objects pointed to are then stored
  505. locally. This is particularly useful for nodes wishing to keep all their own
  506. files, or backup references to others. This also makes IPFS a Web where links
  507. are \textit{permanent}, and Objects can ensure the survival of others they
  508. point to.
  509. \subsubsection{Publishing Objects}
  510. IPFS is globally distributed. It is designed to allow the files of millions of users to coexist together. The \textbf{DHT} with content-hash addressing allows publishing objects in a fair, secure, and entirely distributed way. Anyone can publish an object by simply adding its key to the DHT, adding themselves as a peer, and giving other users the object's path. Note that Objects are essentially immutable, just like in Git. New versions hash differently, and thus are new objects. Tracking versions is the job of additional versioning objects.
  511. \subsubsection{Object-level Cryptography}
  512. IPFS is equipped to handle object-level cryptographic operations. An encrypted or signed object is wrapped in a special frame that allows encryption or verification of the raw bytes.
  513. \begin{verbatim}
  514. type EncryptedObject struct {
  515. Object []bytes
  516. // raw object data encrypted
  517. Tag []bytes
  518. // optional tag for encryption groups
  519. }
  520. type SignedObject struct {
  521. Object []bytes
  522. // raw object data signed
  523. Signature []bytes
  524. // hmac signature
  525. PublicKey []multihash
  526. // multihash identifying key
  527. }
  528. \end{verbatim}
  529. Cryptographic operations change the object's hash (defining a different object, as it should). IPFS automatically verifies signatures, and can decrypt data with user-specified keychains. Links of encrypted objects are protected as well, making traversal impossible without a decryption key. It is possible to have a parent object encrypted under one key, and a child under another or not at all. This allows securing links to shared objects.
  530. \subsection{Files}
  531. IPFS also defines a set of objects for modeling a versioned filesystem on top of the Merkle DAG. This object model is similar to Git's:
  532. \begin{enumerate}
  533. \item \texttt{block}: a variable-size block of data.
  534. \item \texttt{list}: a collection of blocks or other lists.
  535. \item \texttt{tree}: a collection of blocks, lists, or other trees.
  536. \item \texttt{commit}: a snapshot in the version history of a tree.
  537. \end{enumerate}
  538. I hoped to use the Git object formats exactly, but had to depart to introduce certain features useful in a distributed filesystem, namely (a) fast size lookups (aggregate byte sizes have been added to objects), (b) large file deduplication (adding a \texttt{list} object), and (c) embedding of \texttt{commits} into \texttt{trees}. However, IPFS File objects are close enough to Git that conversion between is possible. Also, a set of Git objects can be introduced to convert between the two without losing any information (unix file permissions, etc).
  539. Notation: File object formats below use JSON. Note that this structure is actually binary encoded using protobufs. Though, ipfs includes import/export to JSON.
  540. \subsubsection{File Object: \texttt{blob}}
  541. The \texttt{blob} object contains an addressable unit of data, and
  542. represents a file. IPFS Blocks are like Git blobs or filesystem data blocks. They store the users' data. Note that IPFS files can be represented by both \texttt{lists} and \texttt{blobs}. Blobs have no links.
  543. \begin{verbatim}
  544. {
  545. "data": "some data here",
  546. // blobs have no links
  547. }
  548. \end{verbatim}
  549. \subsubsection{File Object: \texttt{list}}
  550. The \texttt{list} object represents a large or de-duplicated file made up of
  551. several IPFS \texttt{blobs} concatenated together. \texttt{lists} contain
  552. an ordered sequence of \texttt{blob} or \texttt{list} objects.
  553. In a sense, the IPFS \texttt{list} functions like a filesystem file with
  554. indirect blocks. Since \texttt{lists} can contain other \texttt{lists}, topologies including linked lists and balanced trees are possible. Directed graphs where the same node appears in multiple places allow in-file deduplication. Of course, cycles are not possible, as enforced by hash addessing.
  555. \begin{verbatim}
  556. {
  557. "data": ["blob", "list", "blob"],
  558. // lists have an array of object types as data
  559. "links": [
  560. { "hash": "XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x",
  561. "size": 189458 },
  562. { "hash": "XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5",
  563. "size": 19441 },
  564. { "hash": "XLWVQDqxo9Km9zLyquoC9gAP8CL1gWnHZ7z",
  565. "size": 5286 }
  566. // lists have no names in links
  567. ]
  568. }
  569. \end{verbatim}
  570. \begin{figure}
  571. \centering
  572. \begin{tikzpicture}[->,>=stealth',auto,thick,
  573. minimum height=2em,minimum width=5em]
  574. \tikzstyle{ghost}=[rectangle,rounded corners=.8ex];
  575. \tikzstyle{block}=[rectangle,draw,fill=blue!20,rounded corners=.8ex];
  576. \tikzstyle{list}=[rectangle,draw,fill=cyan!20,rounded corners=.8ex];
  577. \tikzstyle{tree}=[rectangle,draw,fill=green!20,rounded corners=.8ex];
  578. \tikzstyle{commit}=[rectangle,draw,fill=magenta!20,rounded corners=.8ex];
  579. \tikzstyle{every path}=[draw]
  580. \node[commit] (ccc111) {ccc111};
  581. \node[tree] (ttt111) [below=3em of ccc111] {ttt111};
  582. \node[tree] (ttt222) [below left=3em and 3em of ttt111] {ttt222};
  583. \node[tree] (ttt333) [below=3em of ttt111] {ttt333};
  584. \node[ghost] (ghost1) [below right=3em and 3em of ttt111] {};
  585. \node[list] (lll111) [below=3em of ttt333] {lll111};
  586. \node[block] (bbb111) [below=3em of ttt222] {bbb111};
  587. \node[block] (bbb222) [below right=3em and 3em of ttt333] {bbb222};
  588. \node[block] (bbb333) [below left=3em and 3em of lll111] {bbb333};
  589. \node[block] (bbb444) [below=3em of lll111] {bbb444};
  590. \node[block] (bbb555) [below right=3em and 3em of lll111] {bbb555};
  591. \path[every node/.style={font=\sffamily\small}]
  592. (ccc111) edge[out=-90,in=90] (ttt111)
  593. (ttt111) edge[out=-90,in=90] (ttt222)
  594. edge[out=-90,in=90] (ttt333)
  595. to [out=-90,in=90] (ghost1)
  596. to [out=-90,in=90] (bbb222)
  597. (ttt222) edge[out=-90,in=90] (bbb111)
  598. (ttt333) edge[out=-90,in=90] (lll111)
  599. edge[out=-90,in=90] (bbb222)
  600. (lll111) edge[out=-90,in=90] (bbb333)
  601. edge[out=-90,in=90] (bbb444)
  602. edge[out=-90,in=90] (bbb555)
  603. ;
  604. \end{tikzpicture}
  605. \caption{Sample Object Graph} \label{fig:sample-object-graph}
  606. \begin{verbatim}
  607. > ipfs file-cat <ccc111-hash> --json
  608. {
  609. "data": {
  610. "type": "tree",
  611. "date": "2014-09-20 12:44:06Z",
  612. "message": "This is a commit message."
  613. },
  614. "links": [
  615. { "hash": "<ccc000-hash>",
  616. "name": "parent", "size": 25309 },
  617. { "hash": "<ttt111-hash>",
  618. "name": "object", "size": 5198 },
  619. { "hash": "<aaa111-hash>",
  620. "name": "author", "size": 109 }
  621. ]
  622. }
  623. > ipfs file-cat <ttt111-hash> --json
  624. {
  625. "data": ["tree", "tree", "blob"],
  626. "links": [
  627. { "hash": "<ttt222-hash>",
  628. "name": "ttt222-name", "size": 1234 },
  629. { "hash": "<ttt333-hash>",
  630. "name": "ttt333-name", "size": 3456 },
  631. { "hash": "<bbb222-hash>",
  632. "name": "bbb222-name", "size": 22 }
  633. ]
  634. }
  635. > ipfs file-cat <bbb222-hash> --json
  636. {
  637. "data": "blob222 data",
  638. "links": []
  639. }
  640. \end{verbatim}
  641. \caption{Sample Objects} \label{fig:sample-objects}
  642. \end{figure}
  643. \subsubsection{File Object: \texttt{tree}}
  644. The \texttt{tree} object in IPFS is similar to Git trees: it represents a
  645. directory, a map of names to hashes. The hashes reference \texttt{blobs}, \texttt{lists}, other \texttt{trees}, or \texttt{commits}. Note that traditional path naming is already implemented by the Merkle DAG. Though, collapsing \texttt{commits} and \texttt{lists}, and representing only \texttt{trees} as directories is achieved by a special mounting application which exposes the objects through a different file system interface.
  646. \begin{verbatim}
  647. {
  648. "data": ["blob", "list", "blob"],
  649. // trees have an array of object types as data
  650. "links": [
  651. { "hash": "XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x",
  652. "name": "less", "size": 189458 },
  653. { "hash": "XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5",
  654. "name": "script", "size": 19441 },
  655. { "hash": "XLWVQDqxo9Km9zLyquoC9gAP8CL1gWnHZ7z",
  656. "name": "template", "size": 5286 }
  657. // trees do have names
  658. ]
  659. }
  660. \end{verbatim}
  661. \subsubsection{File Object: \texttt{commit}}
  662. The \texttt{commit} object in IPFS represents a snapshot in the version history of any object. It is similar to Git's, but can reference any type of object. It also links to author objects.
  663. \begin{verbatim}
  664. {
  665. "data": {
  666. "type": "tree",
  667. "date": "2014-09-20 12:44:06Z",
  668. "message": "This is a commit message."
  669. },
  670. "links": [
  671. { "hash": "XLa1qMBKiSEEDhojb9FFZ4tEvLf7FEQdhdU",
  672. "name": "parent", "size": 25309 },
  673. { "hash": "XLGw74KAy9junbh28x7ccWov9inu1Vo7pnX",
  674. "name": "object", "size": 5198 },
  675. { "hash": "XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm",
  676. "name": "author", "size": 109 }
  677. ]
  678. }
  679. \end{verbatim}
  680. \subsubsection{Version control}
  681. The \texttt{commit} object represents a particular snapshot in the version
  682. history of an object. Comparing the object (and children) of two
  683. different commits reveals the differences between two versions of the
  684. filesystem. As long as a single \texttt{commit} and all the children objects
  685. it references are accessible, all preceding versions are retrievable and the
  686. full history of the filesystem changes can be accessed. This is a consequence
  687. of the Merkle DAG object model.
  688. The full power of the \texttt{Git} version control tools is available to IPFS users. The object model is compatible, though not the same. It is possible to (a) build a version of the \texttt{Git} tools modified to use the \texttt{IPFS} object graph, (b) build a mounted FUSE filesystem that mounts an IPFS \texttt{tree} as a Git repo, translating Git filesystem read/writes to the IPFS formats.
  689. \subsubsection{Filesystem Paths}
  690. As we saw in the Merkle DAG section, IPFS objects can be traversed with a string path API. The IPFS File Objects are designed to make mounting IPFS onto a UNIX filesystem simpler. They restrict \texttt{trees} to have no data, in order to represent them as directories, and \texttt{commits} can either be represented as directories too, or hidden from the filesystem entirely.
  691. \paragraph{Path Lookup Performance}
  692. Path-based access traverses the object graph. Retrieving
  693. each object requires potentially looking up its key in the DHT,
  694. connecting to peers, and retrieving its blocks. This is considerable
  695. overhead, particularly when looking up paths with many components.
  696. This is mitigated by:
  697. \begin{itemize}
  698. \item \textbf{tree caching}: since all objects are hash-addressed, they
  699. can be cached indefinitely. Additionally, \texttt{trees} tend to be
  700. small in size so IPFS prioritizes caching them over \texttt{blobs}.
  701. \item \textbf{flattened trees}: for any given \texttt{tree}, a special
  702. \texttt{flattened tree} can be constructed to list all objects
  703. reachable from the \texttt{tree}. Names in the \texttt{flattened tree}
  704. would really be paths parting from the original tree, with slashes.
  705. \end{itemize}
  706. For example, \texttt{flattened tree} for \texttt{ttt111} above:
  707. \begin{verbatim}
  708. {
  709. "data":
  710. ["tree", "blob", "tree", "list", "blob" "blob"],
  711. "links": [
  712. { "hash": "<ttt222-hash>", "size": 1234
  713. "name": "ttt222-name" },
  714. { "hash": "<bbb111-hash>", "size": 123,
  715. "name": "ttt222-name/bbb111-name" },
  716. { "hash": "<ttt333-hash>", "size": 3456,
  717. "name": "ttt333-name" },
  718. { "hash": "<lll111-hash>", "size": 587,
  719. "name": "ttt333-name/lll111-name"},
  720. { "hash": "<bbb222-hash>", "size": 22,
  721. "name": "ttt333-name/lll111-name/bbb222-name" },
  722. { "hash": "<bbb222-hash>", "size": 22
  723. "name": "bbb222-name" }
  724. ] }
  725. \end{verbatim}
  726. \subsection{IPNS: Naming and Mutable State}
  727. So far, the IPFS stack describes a peer-to-peer block exchange constructing a content-addressed DAG of objects. It serves to publish and retrieve immutable objects. It can even track the version history of these objects. However, there is a critical component missing: Mutable Naming. Without it, all communication of new content must happen off-band, via sending links to each other. What is required is some way to retrive mutable state at \textit{the same path}.
  728. It is worth stating why -- if mutable data is necessary in the end -- we worked hard to build up an \textit{immutable} Merkle DAG. Consider the properties of IPFS that fall out of the Merkle DAG: Objects can be (a) retrieved via their hash, (b) integrity checked, (c) linked to others, and (d) cached indefinitely. In a sense:
  729. \begin{center}
  730. Objects are \textbf{permanent}.
  731. \end{center}
  732. These are the critical properties of a high-performance distributed system, where data is expensive to move across network links. Object content addressing constructs a web with (a) significant bandwidth optimizations, (b) untrusted content serving, (c) permanent links, and (d) the ability to make full permanent backups of any object and its references.
  733. The Merkle DAG and Naming, immutable content-addressed objects and mutable pointers, instantiate a dichotomy present in many successful distributed systems. Most notably, the Git Version Control System with its immutable objects and mutable refs. So does Plan9 \cite{Plan9}, the distributed successor to UNIX, with its mutable Fossil \cite{Fossil} and immutable Venti \cite{Venti} filesystems. LBFS \cite{LBFS} also uses mutable indices and immutable chunks.
  734. \subsubsection{Self-Certified Names}
  735. Using the self-certification naming scheme from SFS \cite{SFS} gives us a way to construct (a) self-certified (verifiable) names, (b) in another cryptographically assigned global namespace, that are (c) mutable. The IPFS scheme is as follows.
  736. \begin{enumerate}
  737. \item Recall that in IPFS:
  738. \begin{verbatim}
  739. NodeId = hash(node.PubKey)
  740. \end{verbatim}
  741. \item We assign every user a mutable namespace at:
  742. \begin{verbatim}
  743. /ipns/<NodeId>
  744. \end{verbatim}
  745. \item A user can publish (described below) an Object to this path \textbf{Signed} by her private key, say at:
  746. \begin{verbatim}
  747. /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/
  748. \end{verbatim}
  749. \item When other users retrieve the object, they can check the signature matches the public key and NodeId, verifying that this indeed was an Object published by the user, achieving the mutable state retrival.
  750. \end{enumerate}
  751. Note the following details:
  752. \begin{itemize}
  753. \item The \texttt{ipns} (InterPlanetary Name Space) separate prefix is to cause a recognizable distinction in human path readers between \textit{mutable} and \textit{immutable} paths.
  754. \item because this is \textit{not} a content-addressed object, publishing it relies on the only mutable state distribution system in IPFS, the Routing system. The process is (1) publish the object as a regular immutable IPFS object, (2) publish its hash on the Routing system as a metadata value:
  755. \begin{verbatim}
  756. routing.setValue(NodeId, <ns-object-hash>)
  757. \end{verbatim}
  758. \item any links in the Object published act as sub-names in the namespace:
  759. \end{itemize}
  760. \begin{verbatim}
  761. /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/
  762. /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/docs
  763. /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/docs/ipfs
  764. \end{verbatim}
  765. \begin{itemize}
  766. \item it is advised to publish a \texttt{commit} object, or some other object with a version history so that clients may be able to find old names. This is left as a user option, as it is not always desired.
  767. \end{itemize}
  768. Note that when users publish this Object, it cannot be published in the same way
  769. \subsubsection{Human Friendly Names}
  770. While IPNS is indeed a way of assigning and reassigning names, it is not very user friendly, as it exposes long hash values as names, which are notoriously hard to remember. These work for URLs, but not for many kinds of offline transmission. Thus, IPFS increases the user-friendliness of IPNS with the following techniques.
  771. \paragraph{Peer links}
  772. As encouraged by SFS, users can link other users' Objects directly into their own Objects (namespace, home, etc). This has the benefit of also creating a web of trust (and supports the old Certificate Authority model):
  773. \begin{verbatim}
  774. # Alice links to bob Bob
  775. ipfs link /<alice-pk-hash>/friends/bob /<bob-pk-hash>
  776. # Eve links to Alice
  777. ipfs link /<eve-pk-hash/friends/alice /<alice-pk-hash>
  778. # Eve also has access to Bob
  779. /<eve-pk-hash/friends/alice/friends/bob
  780. # access Verisign certified domains
  781. /<verisign-pk-hash>/foo.com
  782. \end{verbatim}
  783. \paragraph{DNS TXT IPNS Records}
  784. If \texttt{/ipns/<domain>} is a valid domain name, IPFS
  785. looks up key \texttt{ipns} in its \texttt{DNS TXT} records. IPFS
  786. interprets the value as either an object hash or another IPNS path:
  787. \begin{verbatim}
  788. # this DNS TXT record
  789. ipfs.benet.ai. TXT "ipfs=XLF2ipQ4jD3U ..."
  790. # behaves as symlink
  791. ln -s /ipns/XLF2ipQ4jD3U /ipns/fs.benet.ai
  792. \end{verbatim}
  793. \paragraph{Proquint Pronounceable Identifiers}
  794. There have always been schemes to encode binary into pronounceable words. IPNS supports Proquint \cite{Proquint}. Thus:
  795. \begin{verbatim}
  796. # this proquint phrase
  797. /ipns/dahih-dolij-sozuk-vosah-luvar-fuluh
  798. # will resolve to corresponding
  799. /ipns/KhAwNprxYVxKqpDZ
  800. \end{verbatim}
  801. \paragraph{Name Shortening Services}
  802. Services are bound to spring up that will provide name shortening as a service, offering up their namespaces to users. This is similar to what we see today with DNS and Web URLs:
  803. \begin{verbatim}
  804. # User can get a link from
  805. /ipns/shorten.er/foobar
  806. # To her own namespace
  807. /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm
  808. \end{verbatim}
  809. \section{The Future}
  810. (a) Content Addressed DAG of objects, (b) with links traversed like filesystems or the web, (c) with versioning and cryptographic operations built in, (d) whose data blocks are retrieved by trade in a peer-to-peer block exchange, (e) whose peer connections are found through a DHT, (f) that can run on any reliable datagram transport.
  811. \section{Acknowledgments}
  812. IPFS is the synthesis of many great ideas and systems. It would be impossible to dare such ambitious goals without standing on the shoulders of such giants. Personal thanks to David Dalrymple, Joe Zimmerman, and Ali Yahya for long discussions on many of these ideas, in particular: exposing the general Merkle DAG (David, Joe), rolling hash blocking (David), and s/kademlia sybill protection (David, Ali). And special thanks to David Mazieres, for his ever brilliant ideas.
  813. %\bibliographystyle{abbrv}
  814. %\bibliography{gfs}
  815. %\balancecolumns
  816. %\subsection{References}
  817. \end{document}