Some Principles of the Internet and A Simple Explanation of the Functioning of the Internet Via an Introduction to TCP/IP Jay Hauben jrh29@columbia.edu A) Some Principles of the Internet Twenty five years ago there was no Internet. Today more than 120,000 packet switching networks with very many different characteristics, interconnecting more than 50,000,000 computers as nodes, comprise this communications system used by over 150,000,000 people worldwide. Yet the Internet is still young. It is likely to keep expanding for many years to come. There are many aspects of Internet technology, such as the assumed unreliability at the internetwork level, that are unique and differentiable from other telecommunications network technologies like the telephone system. Also the scaling of the Internet to meet the expected increase in demand for its use is in no way assured. Therefore, it is a system worth studying. Packet switching networks appeared in the 1970's as a consequence of the development in the 1960's of the time-sharing mode of computer operation. Greater stand-alone-computer user efficiency was achieved when computer processing time was parceled out in round-robin fashion. Formally parceled out in batches to one user's needs at a time (called batch processing), new operating systems were designed that could offer each user a set of small time slots one at a time in turn leading to the successful illusion that each of the simultaneous users was the sole user. The operation of such systems suggested that two time-sharing computers could be connected, each appearing to the other as just another user. A cross country hookup of such systems was attempted in 1965 using slow telephone lines. The result was a success for long distance time-sharing computer networking but the call set ups and tear downs created time delays that were unacceptable for actual use of such a network. The problem was that computer data is often bursty or is a message of minimal size as when a single key stroke is sent to solicit a response. Therefore computer data communication over normal telephone lines required frequent call setups or wasteful quiet times. A solution suggested by queueing theory and other lines of reasoning was packet switching as opposed to circuit switching. Data to be communicated from a number of sessions could be broken into small packets which would be transmitted inter- spersed each routed to its destination separately without setting up a path for each packet. Queueing theory, especially the work of Leonard Kleinrock, predicted that interspersed demands utilizing common resources would be efficient. Packet switching experimentation was initiated in Europe and the US in the early 1970s. Best known of the early packet switched computer networks were the ARPANET in the US, Cyclades in France, and the National Physical Laboratory network in the UK. The ARPANET designers and researchers succeeded in achieving resource sharing among time-shared computers manufactured by different vendors and using different operating systems, character sets, etc. The computers were located at universities and military related research laboratories. The ARPANET was funded and encouraged by the Advanced Research Projects Agency (ARPA), a civil agency of the US Department of Defense. ARPA also funded and encouraged packet switching experimentation using ground based radio receivers and transmitters and using satellites. Encouraged by the success of the ARPANET, commercial networks like Tymnet and Telenet were established. In Europe a number of packet switched network experiments were undertaken. Just as isolated time-shared computers suggested networking, so too the existence of isolated packet switching networks suggested some sort of interconnectivity. Robert Kahn in the US and Louis Pouzin in France were among the first to consider what needed to be done to create such a meta network of networks. Pouzin developed the concept of a Catenet and Kahn at ARPA developed the Internetting Project. The goal of the Catenet and Internetting concept and project was to develop an effective technology to interconnect the packet switched data networks that were beginning to emerge from the experimental stage. Both rejected the alternative of integrating all networks into a single unified system based on multi physical media links. The later might have produced better integration and performance but would have limited the autonomy and continued experimental development of the new network technologies. Also, the developing networks were under different political or economic administrations and it is not likely they could have been enticed to give up their autonomy to voluntarily join together as part of a single network. Kahn had been involved trying to solve a problem of great complexity: could a ground based packet radio network be developed that would even allow mobile transmitters and receivers? The complexity was that radio communication is prone to fading, interference, obstruction of line-of-site by local terrain or blackout such as when a tunnel is traveled through. The radio signal link is unreliable in itself for data communication. Crucial therefore to the success of such a packet radio network would be an end-to-end mechanism that could call for retransmissions and employ other techniques so that a reliable communication service could be provided despite the unreliability of the underlying link level. Pouzin had worked on the time-sharing experiments at MIT in the 1960's. He was impressed by the successful way individual users were 'networked' on a single time sharing computer and then how these computers themselves were networked. He looked for the essence of packet switching networks to give the clue how they could be interconnected. He saw many features which were not mandatory to packet switching such as virtual circuits, end-to- end acknowledgments, large buffer allocations, etc. He felt that any end-to-end function which users might desire could be implemented at the user interface. The Catenet need only provide a basic service, packet transport. How then to achieve an effective interconnection of packet switched networks? If the interconnection was to include packet radio networks the resulting internet would have at least some unreliable links. Should packet radio networks and others that could not offer reliable network service be excluded? Kahn's answer was that the new interconnection should be open to all packet switching and even other data networks. That was the first principle of the Internet that was to emerge: open architecture networking --- the interconnection of as many current and future networks as possible by requiring the least possible from each. Each network would be based on the network technology dictated by its own purpose and achieved via its own architectural design. Networks would not be federated into circuits that formed a reliable end to end path, passing individual bits on a synchronous basis. Instead, a new "internetworking Architecture" would view networks as peers in helping offer an end-to-end service independent of path or of the unreliability or failure of any links. " Four ground rules were critical to Kahn's early thinking: * Each distinct network would have to stand on its own and no internal changes could be required to any such network to connect it to the Internet. * Communications would be on a best effort basis. If a packet didn't make it to the final destination, it would shortly be retransmitted from the source. * Black boxes would be used to connect the networks; these would later be called gateways and routers. There would be no information retained by the gateways about the individual flows of packets passing through them, thereby keeping them simple and avoiding complicated adaptation and recovery from various failure modes. * There would be no global control at the operations level." (from A Brief History of the Internet at http://www.isoc.org/internet/history/brief.html) Pouzin and his colleagues developed similar ground rules and applied them in the development of the Cyclades network and its interconnection with the National Physical Laboratory (NPL) in London in August 1974, with the European Space Agency (ESA) in Rome in October 1975 and with the European Informatics Network (EIN) in June 1976. They were the first to implement a packet service which did not assume any interdependence between packets. Each packet was treated as a separate entity moving from source to destination according to the conditions prevalent at each moment of their travel. Dynamic updating of their routing at the gateways and retransmissions because of congestion or link or node failures sometimes caused the packets to arrive at their destinations out of order or duplicated or missing from a sequence. The gateways were programmed to make an effort to keep the packets moving toward the source but no guarantee of success was built into them. Such a best effort transmission service is called a datagram service. In the past, out of sequence packets, packet duplication and packet loss were considered at least a burden if not serious problems, so communication switches were designed to prevent them. The French team succeeded in producing transport layer mechanisms to rectify these events. In that way they brought substantial simplicity, cost reduction and generality to the service that their gateways provided. This was a second Internet principle: as much as possible be done above the internetwork level. This came to be called the end-to-end principle. It provided for successful communication under almost any condition except the total failure of the whole system. Another way to state this principle was that the only information about a communication session (state information) would be at the end points. Intermediate failures could not destroy such information and disrupted communication resulting from such failures could be continued when the packets began to arrive again at the destination. In October 1972, Kahn had organized a large public demonstration of the ARPANET at the International Computer Communications Conference (ICCC72) in Washington, DC. This was the first international public demonstration of packet switching network technology. Researchers were there from Europe, Asia, and North America. At the meeting, an International Network Working Group (INWG) was established to share experiences and be a forum to help work out standards and protocols. In 1973-4 it was adopted by the International networking professional organization, the International Federation of Information Processing as its Telecommunications Committee Working Group 6.1 (IFWP/TC 6.1). Donald Davies from the UK, Pouzin and Kahn knew of each other's work and the work of others who were considering these problems by attending and presenting papers at meetings of the IFWP/TC 6.1 and sharing their work with each other on a regular basis. This is an early example of a long tradition in the networking world of openness and collaboration. This was to become a third principle of the Internet: open and public documentation and standards and protocol development. In 1973, Kahn brought Vinton Cerf into the work on internetting. The ARPA project gave rise to a proposed general solution to the internetting problem with specifications for what was needed in common on the end computers and the gateways so that the interconnection would be successful. The set of such specifications is called a communication protocol. This protocol at the time was called Transmission Control Protocol (TCP). Cerf and Kahn first shared their thinking in a formal way at a meeting of the INWG members who were in Brighton, England in September, 1973 at a conference sponsored by NATO. What emerged was a reliable sequenced data stream delivery service provided at the end points despite the unreliability of the underlying internetwork level. But the first implementation only resulted in virtual circuit internetwork service. For some network services such virtual circuits were too restrictive. At the time it was argued by Danny Cohen who was working on packet voice delivery that TCP functionality should be split between what was required end-to-end, like reliability and flow control, and what was required hop-by-hop to get from one network to another via gateways. Cohen felt packet voice needed timeliness more than it needed reliable delivery. This led to the reorganization of the original TCP into two protocols, the simple Internet Protocol (IP) which provided only for addressing, fragmentation and forwarding of individual packets and a separate TCP concerned with recovery from lost packets. This brought the internetting work into line with the success of the Cyclades datagram service. A major boost to the use of what became known as TCP/IP was its adoption by the US Department of Defense (DOD). The DOD funded work that incorporated TCP/IP into modifications of the Unix operating system being made at the University of California at Berkeley. When this version was distributed, much of the computer science community in the US and around the world began to have TCP/IP capability built into their operating system. This was a great boost for broad adoption of the Internet. It is also another example of the principle of free and open documentation in this case source code. In 1983 the DOD required all users of the ARPANET to adopt TCP/IP further insuring that it would be broadly implemented. A key element of the design of IP is the capability at each gateway to break packets too large for the next network into fragments, each a datagram in its own right, that will fit in that network's network frames. These fragments then travel along as ordinary datagrams until they are reassembled at the destination host. By allowing for fragmentation IP makes it possible for large packet handling and small packet handling networks to coexist on the same Internet. This is an example of applying the open architecture principle. Allowing fragmentation relieves the necessity of specifying a minimum or a maximum packet size (although in practice such limits do exist). Leaving the reassembly until the destination minimizes the requirements on the gateway/routers. Schemes that would eliminate fragmentation from future versions of IP should be carefully scrutinized because they may cause the obsoleting of under resourced networks that could not adapt to the mandated packet sizes. That would violate the open architecture principle. From one point of view, that of the most value for the whole of society, the highest order feature a communications system can provide is universal connectivity. This has been up until the present the guiding vision and goal of the Internet pioneers. Leonard Kleinrock has argued that "as the system resources grow in size to satisfy an ever increasing population of users" gains in efficiency occur (Queueing Theory Volume II, p.275). This is an example of the law of large numbers which suggests that the more resources and users there are, the more sharing there is. This results in a greater level of efficient utilization of resources without increased delays in delivery. So far the scaling of the Internet has conformed to the law of large numbers and provides such a convenient and efficient communications system that its users use it more than other communications systems they have available. Also the desire for connectivity grows with the Internet's growth as does its value since with its growth comes more connectivity for those who have already been connected as well. This is an example of a regenerative system. In its first 25 years (1973-1998) the Internet has grown to provide communication to 2.5% of the world's people. This is a spectacular technical and social accomplishment. But much of the connectivity is concentrated in a few parts of the world (North America, Europe and parts of Asia). The web of the Internet's connectivity is also still sparse even in North America. Often there are too few alternative paths even though there is sufficient total bandwidth so that the communication service available has uncomfortably long delays. In my opinion the top priority for the Internet technical community is to find ways of continuing the growth and scaling of the connectivity provided by the Internet. But the Internet is a very complex technology. To achieve the necessary further scaling, the Internet will require a large pool of well supported talented and highly educated scientists and engineers who have studied the principles and unique features of the Internet. They will need to work collaboratively online and line to hold each other to the principles as they seek solutions to the current and future problems, then the Internet has a chance of reaching the goal of universal connectivity. B) A Simple Explanation of the Functioning of the Internet Via an Introduction to TCP/IP I. Introduction The Internet as we know it in 1998, although vast, is still a new and developing communications technology. It is based on a number of ingenious engineering accomplishments, first of which is the Transmission Control Protocol and Internet Protocol suite, known as TCP/IP. The elements that comprise the Internet are computers and networks of computers. These being physical entities, in order to perform reliably, require careful design based on solid engineering principles. The Internet itself is more than the sum of its elements. It too requires careful and evolving design based on principles similar to those for computers and networks and some unique to the Internet. II The Internet The Internet is the successful interconnecting of many different networks to give the illusion of being one big computer network. What the networks have in common is that they all use packet switching technology or at least can carry packets of data from one computer to another. On the other hand, each of the connected networks may have its own addressing mechanism, packet size, speed etc. Any of the computers on the connected networks no matter what its operating system or other characteristics can communicate via the Internet if it has software implemented on it that conforms to the set of protocols which resulted from open research funded by the Advanced Research Projects Agency (ARPA) of the United States Department of Defense in the late 1970s. That set of protocols is built around the Internet Protocol (IP) and the Transmission Control Protocol (TCP). Informally, the set of protocols is called TCP/IP (pronounced by saying the names of the letters T-C-P-I-P). The Internet Protocol is the common agreement to have software on every computer on the Internet add a bit of additional information to each of the packets that it sends out. Without such software a computer can not be connected to the Internet even if Internet traffic passes over the network that the computer is attached to. A packet that has the additional information required by IP is called an IP datagram. To each IP datagram the computer adds its own network addressing information. The whole package is called a network frame. It is network frames containing IP datagrams rather than ordinary packets that a computer must send onto its local packet switching network in order to communicate with a computer on another network via the Internet. If the communication is between computers on the same network the network information is enough to deliver the frame to its intended destination computer. If the communication is intended for a computer on a different network, the network information directs the frame to the closest computer that serves to connect the local network with a different network. Such a special purpose computer is called a router (some times a gateway). It is such routers that make internetworking possible. The Internet is not a single giant network of computers. It is over one hundred thousand networks interconnected by routers. A router is a high speed, electronic, digital computer very much like all the other computers in use today. What makes a router special is that it has all the hardware and connections necessary to be able to connect to and communicate on two or more different networks. It also has the software to create and interpret network frames for each network it is attached to. In addition it must have capabilities require by IP. It must have software that can remove network information from the network frames that come to it and read the IP information in the datagrams. Based on the IP information it can add new network information to create a an appropriate network frame and send it out on that different network. But how does it know where to send that IP datagram? The entire process of Internet communication requires that each computer participating in the Internet has a unique digital address. The unique addresses of the source and destination are part of the IP information added to packets to make IP datagrams. The unique number assigned to a computer is its Internet Protocol or IP address. The IP address is a binary string of 32 digits. Therefore the Internet can provide communication among 2 to the 32nd power or about 4 billion 300 million computers (two unique addresses for every three people in the world). Internet addresses are written for example like 129.77.19.140. Each such address has two parts, a network ID and a host ID. In this example 129.77 (network ID) identifies that this computer is part of a particular university network and 19.140 (the host ID) identifies which particular computer it is. A router's IP software examines the IP information to determine the destination network from the net work ID of the destination address. Then the software consults a routing table to pick the next router to send the IP datagram to so that it takes the "shortest" path. A path is short only if it is active and it is not congested. Ingenious software programs called routing daemons send and receive short messages among adjacent routers characterizing the condition on each path. These messages are analyzed and the routing table is continually up dated. In this way IP datagrams pass from router to router over different networks until they reach a router connected to their destination network. That router puts network information into the network frame that delivers the datagram to its destination computer. The IP datagram is unchanged by this whole process. Each router has put next router information along with the IP datagram into the next network frame. When the IP datagram finally reaches its destination it has no information how it got there and different packets from the original source may have taken different paths to get to the same destination. IP as described above requires nothing of the interconnected networks except that they are packet switching networks with IP compliant routers. If a transmitting network uses a very small frame size, the IP software can even fragment an IP datagram into a few smaller ones to fit the network's frame size. It is this minimum requirement by the Internet Protocol that makes it possible for a great variety of networks to participate in the Internet. But this minimum requirement also results in little or no error detection. IP arranges for a best-effort process but has no guarantee of reliability. The remainder of the TCP/IP set of protocols adds a sufficient level of reliability to make the Internet useful. There are problems that IP does not solve. For example, interspersed network frames from many computers can sometimes arrive faster than a router can route them. A small backlog of data can be stored on most routers but if too many frames keep arriving some must be discarded. This possibility was antici- pated. On most computers on the Internet except routers software behaving according to the Transmission Control Protocol (TCP) is installed. When IP datagrams arrive at the destination computer, the TCP compliant software scans the IP information put into the IP datagram at the source. From this information the software can put packets, if they are all there, back together again. If there are duplications the software will discard all but the first copy of such packets to have arrived. But what if some IP datagrams have been lost? As a destination computer receives data, the TCP software sends a short message back over the Internet to the original source computer specifying what data has arrived. Such a message is called an "acknowledgment". Every time TCP and IP software send out data, TCP software starts a timer (sets a number and de- creases it periodically using the computer's internal clock) and waits for an acknowledgment. If an acknowledgment arrives first, the timer is canceled. If the timer expires before an acknowledgment is received back the TCP software retransmits the data. In this way missing data can usually be replaced at the destination computer in a reasonable time. To achieve efficient data transfer the timeout interval can not be preset. It needs to be longer for more distant destinations and for times of greater network congestion and shorter for closer destinations and times of normal network traffic. TCP automatically adjusts the timeout interval and the size of its sliding window based on the rate of acknowledgments it receives back it. This ability to dynamically adjust the timeout interval contributes greatly to the success of the Internet. Having been designed together and engineered to perform two separate but related and needed tasks, TCP and IP complement each other. IP makes possible the travel of packets over different networks but it and thus the routers are not concerned with data loss or data reassembly. The Internet is possible because so little is required of the intervening networks. TCP makes the Internet reliable by detecting and correcting duplications, out of order arrival and data loss using an acknowledgment and time out mechanism with dynamically adjusted timeout intervals. III Conclusion The Internet is a wonderful engineering achievement. Since January 1, 1983, the cutoff date of the old ARPANET protocols, TCP/IP technology has successfully dealt with tremendous increases in usage and in the speed of connecting computers. This is a testament to the success of the TCP/IP protocol design and implementation process. Douglas Comer highlighted the features of this process as follows: * TCP/IP protocol software and the Internet were designed by talented dedicated people. * The Internet was a dream that inspired and challenged the research team. * Researchers were allowed to experiment, even when there was no short-term economic payoff. Indeed, Internet research often used new, innovative technologies that were expensive compared to existing technologies. * Instead of dreaming about a system that solved all problems, researchers built the Internet to operate efficiently * Researchers insisted that each part of the Internet work well in practice before they adopted it as standard./ * Internet technology solves an important, practical problem; the problem occurs whenever an organization has multiple networks. (from The Internet Book) The high speed, electronic, digital, stored program controlled computer and the TCP/IP Internet are major historic breakthroughs in engineering technology. Every such breakthrough in the past like the printing press, the steam engine, the telephone, the airplane have had profound effects on human society. The computer and the Internet have already begun to have such effects and this promises to be just the beginning. In the long run, despite the growing pains and dislocations every great technological break through serves to make possible a more fulfilling and comfortable life for more people. The computer and the Internet have the potential to speed up this process although it may take a hard fight for most people to experience any of the improvement. We live however in a time of great invention and great potential. The TCP/IP Internet is a major historical achievement. It provides human society with a new global communications technology with great promise and potential. This Internet has sustained unprecedented growth both in the number of its users and the volume of messages it handles daily. In the 15 years since the cutover from the NCP ARPANET to the TCP/IP Internet, the Internet has proven itself founded on solid principles. But there can be setbacks and false steps. As proposals for further development of the Internet are made, it would be proper to expect that they reaffirm and build on the proven principles. But there is, for example, research currently being under taken to "make IP more reliable." Since the principle of minimal requirement on component networks is IP's strength, such research if implemented would be a fundamental change for the Internet. In exchange for reliability, IP has made possible the interconnection of the most diverse of networks. To require greater reliability at the IP level could be an imposition of undue conformity on the component networks. That would be a backwards step. When today's Internet is developed and improved, the principles of TCP and IP will in all likelihood play crucial roles in that development. --------------------------------------------------------------- Bibliography Carpenter, B. RFC 1958: Architectural Principles of the Internet. June, 1996 Cerf, Vinton G. and Robert Kahn. "A Protocol for Packet Network Intercommunication". IEEE Transactions on Communications, Vol. Com-22, No 5. May, 1974. Cerf, Vinton G. IEN48: The Catenet Model for Internetworking. July, 1978. http://lwp.ualg.pt/htbin/ien/ien48.html Clark, David D. "The Design Principles of the DARPA Internet Protocols". Proceedings SIGCOMM88 ACM CCR Vol 18 #4. August, 1988. Comer, Douglas E. Internetworking with TCP/IP Vol I: Principles, Protocols, and Architecture 2nd Edition. Englewood Cliffs, NJ. Prentice Hall. 1991. Comer, Douglas E. The Internet Book: Everything You Need to Know about Computer Networking and How the Internet Works. Englewood Cliffs, NJ. Prentice Hall. 1995. Davies, D.W., D.L.A. Barber, W.L. Price C.M. Solomonides. Computer Networks and Their Protocols. Chichester. John Wiley & Sons. 1979. Hauben, Michael and Ronda Hauben. Netizens: On the History and Impact of Usenet and the Internet. Los Alamitos, CA. IEEE Computer Society Press. 1997 Kleinrock, Leonard. Queueing Systems Volume II: Computer Applications. New York. John Wiley and Sons.1976. Leiner, Barry M., et al. "A brief History of the Internet" at http://www.isoc.org/internet/history/brief.html Lynch, Daniel C. and Marshall T. Rose. Editors. Internet Systems Handbook. Reading, MA. Addison-Wesley. 1993. Pouzin, Louis. "A Proposal for Interconnecting Packet Switching Networks". Proceedings of EUROCOMP. Brunel University. May, 1974. Pages 1023-36. Pouzin, L. Ed. The Cyclades Computer Network. Amsterdam. North Holland. 1982. Stevens, W. Richard. TCP/IP Illustrated, Vol 1 Protocols. Reading, MA. Addison-Wesley. 1994. -----------------------------------------------------------------