Friday 13 November 2015

The birth of Web

Oh what a tangled web we weave.....

November 12 marks 25 years of the beginning of the World Wide Web. Shivanand Kanavi gives us the story of how it all began

"Great Cloud. Please help me. I am away from my beloved and miss her very much. Please go to the city called Alaka where my beloved lives in our moonlit house”

From Meghadoot (messenger cloud) of Kalidasa,Sanskrit poet, playwright, fourth century AD

Twenty five years ago on Nov 12, 2015, Particle Physicist Tim Berners Lee, working at the European Organisation for Nuclear Research (CERN) at Geneva submitted a note to his bosses on Hyper Text (http) and thus started a chain of events that led to the information revolution of the World Wide Web (see for a copy of Tim Berners Lee's note).

Tim Berners Lee

Today we have over 150 million users of Internet already in India and the number is growing by leaps as SmartPhones are selling by the millions every month. The Internet has become a massive labyrinthine library, where one can search for and obtain information in seconds. It has also evolved into an instant, inexpensive communication medium where one can send email text and even images, sounds and videos, to a receiver, girdling the globe.

There are billions of documents in the Internet, on millions of computers known as Internet servers, all interconnected by a tangled web of cables, optic fibres and wireless links. We can be part of the Net through our own PC, laptop, SmartPhone, using a wired or a wireless connection to an Internet Service Provider.

Like Jack’s beanstalk, the Net is growing at a tremendous speed.

However, one thing we learn from ‘Jack and the Beanstalk’ is that every giant magical tree has humble origins. The beans, in the case of Internet, were sown as far back as the sixties. To understand the significance of Lee's contribution, one should briefly look at the history of the Internet.

It all started with the Advanced Research Projects Agency (ARPA) of the US Department of Defence. ARPA was funding advanced computer science research from the early ’60s. J.C.R Licklider, who was then working in ARPA, took the initiative in encouraging several academic groups in the US to work on interactive computing and time-sharing.

Bob Taylor (Photos: Palashranjan Bhaumick)

One glitch, however, was that these different groups could not communicate their programmes or data or even ideas with each other easily. The situation was so bad that Taylor had three different terminals in his office in the Pentagon connected to three different computers that were being used for time -sharing experiments at MIT, UCLA and Stanford Research Institute. Thus started an experiment in enabling computers to exchange files among themselves. Bob Taylor played a crucial role in Information Processing Technology Office of ARPA in creating this network, which was later named Arpanet. “We wanted to create a network to support the formation of a community of shared interests among computer scientists and that was the origin of the Arpanet”, says Taylor.

It is a fact, however, that the first computer network to be proposed theoretically was for military purposes. It was to decentralize nuclear missile command and control. The idea was not to have centralized, computer-based command facilities, which could be destroyed in a missile attack. In order to survive a missile attack and retain what was known, during the US-Soviet Cold War, as ‘Second Strike Capability’, Paul Baran of Rand Corporation had proposed the idea of a distributed network. In those mad days of Mutually Assured Destruction (MAD), it seemed logical.

Baran elaborated his ideas to the military in an eleven-volume report ‘Distributed Communications System’ during 1962-64. This report was available to civilian research groups as well. However, no civilian network was built based on it. Baran even worked out the details of a packet switched network, though he used a clumsy name, ‘Distributed Adaptive Message Block Switching’. Donald Davies, in the UK, independently discovered the same a little later and called it packet switching.

Networking pioneers like Paul Baran, Bob Taylor, Larry Roberts, Frank Heart,Vinton Cerf, Steve Crocker, Bob Metcalfe, Len Kleinrock, Bob Kahn and others have recalled, in several interviews, the struggle they had to go through to convince AT&T, the US telephone monopoly of those days.

AT&T did not believe packet switching would work, and that, if it ever did, it would become a competing network and kill their business! This battle between data communication and incumbent telephone companies is still not over. As voice communication adopts packet technology, as in Voice Over Internet, the old phone companies all over the world are barely conceding to packet switching, kicking and crying.

Using ARPA funds, the first computer network based on packet switching was built in the US between 1966 and 1972. A whole community of users came into being at over a dozen sites, and started exchanging files. Soon they also developed a system to exchange notes and they called it ‘e-mail’ (an abbreviation for electronic mail). Abhay Bhushan, who worked in the Arpanet project from 1967 to 1974 was then at MIT and wrote the note on FTP or File Transfer Protocol, the basis of email. In those days, several theoretical and practical problems were sorted out through RFCs, which stood for Request For Comments –a message sent to all Arpanet users. Any researcher in a dozen ARPA sites could pose a problem or post a solution through such RFCs. Thus, an informal, non-hierarchical culture developed among these original Netizens. “Those were heady days when so many things were done for the first time without much ado,” recalls Abhay Bhushan.


Abhay Bhushan

 An email program that immediately became popular due to its simplicity was sendmsg, written by Ray Tomlinson, a young engineer at Bolt Beranek and Newman (BBN), a Boston-based company, which was the prime contractor for building the Arpanet. His email programs have obviously been superseded in the last thirty years by others. But one thing that has survived is the @ sign to denote the computer address of a sender. Tomlinson was looking for a symbol to separate the receiver’s user name and the address of his host computer. When he looked at his Teletype, he saw a few punctuation marks available and chose @ since it had the connotation of ‘at’ among accountants, and did not occur in software programs in some other connotation.

A ‘communication protocol’ is a favourite word of networking engineers just as ‘algorithm’ is a favourite of computer scientists. Leaving the technical details aside, a protocol is actually a step-by-step approach to enable two computers “talk to each other” i.e. exchange data. We use protocols all the time in human communication, so we don’t notice it, but if two strangers met, then how would they start to converse? They would start by introducing themselves, finding a common language, agreeing on a level of communication—formal, informal, professional, personal, polite, polemical and so on, before exchanging information.

As Arpanet rose in popularity in the 70s, a clamour started from every university and research institution to be connected to Arpanet. Everybody wanted to be part of this new community of shared interests. However, not everyone in a Local Area Network could be given a separate Arpanet connection, so one needed to connect entire LANs to Arpanet. Here again there was a diversity of networks and protocols. So how would you build a network of networks (also called the Internet)? Largely, Robert Kahn and Vinton Cerf solved this problem by developing TCP (Transmission Control Protocol) and hence they are justly called the inventors of the Internet.

Meanwhile, in 1971, an undergraduate student at IIT Bombay, Yogen Dalal, was frustrated by the interminable wait to get his programs executed by the old Russian computer. Thanks to encouragement from a faculty member, J R Isaac, who was then head of the computer centre, Dalal started a BTech project on building a remote terminal for the mainframe. “Like all undergraduate projects, this also did not work,” laughs Dalal, recalling those days. But when he went to Stanford for his MS and PhD and saw cutting-edge work being done in networking by Cerf & Co., he naturally got drawn into it.

Vinton Cerf with the author

As a result, Vinton Cerf, Yogen Dalal and another graduate student, Carl Sunshine, wrote the first paper setting forth the standards for an improved version of TCP/IP, in 1974, which became the standard for the Internet. “Yogen did some fundamental work on TCP/IP. I remember, during 1974, when we were trying to sort out various problems of the protocol, we would come to some conclusions at the end of the day and Yogen would go home and come back in the morning with counter examples. He was always blowing up our ideas to make this work,” recalls Cerf.

“They were the most exciting years of my life,” says Yogen Dalal, who after a successful career at Xerox PARC and Apple, is a respected venture capitalist in Silicon Valley. Recently he was listed as among the top fifty venture capitalists in the world.

Yogen Dalal

Two things changed the Internet, one was the development of the World Wide Web and the other was a small program called the Browser that allowed you to navigate in this web and read the web pages.

The web is made up of host computers connected to the Internet containing a program called a Web Server. The Web Server is a piece of computer software that can respond to a browser’s request for a page and deliver the page to the Web browser through the Internet. You can think of a Web server as an apartment complex with each apartment housing someone’s Web page. In order to store your page in the complex, you need to pay rent on the space. Pages that live in this complex can be displayed to and viewed by anyone all over the world. The host computer is your landlord and your rent is called your hosting charge. Every day, there are millions of Web servers delivering pages to the browsers of tens of millions of people through the network we call the Internet.

The host computers connected to the Net, called Internet servers, are given a certain address. The partitions within the server hosting separate documents belonging to different owners are called Websites. Each website in turn is also given an address—Universal Resource Locator (URL). These addresses are assigned by an independent agency. It acts in a manner similar to that of the registrar of newspapers and periodicals or the registrar of trademarks, who allow you to use a unique name for your publication or product if others are not using it.

When you type in the address or URL of a website in the space for the address in your browser, the program sends packets requesting to see the website. The welcome page of the website is called the home page. The home page carries an index of other pages, which are part of the same website and residing in the same server. When you click with your mouse on one of them, the browser recognises your desire to see the new document and sends a request to the new address, based on the hyperlink. Thus, the browser helps you navigate the Web or surf the information waves of the Web—which is also called Cyberspace, to differentiate from real navigation in real space.

The web pages carry composing or formatting instructions in a computer language known as Hyper Text Markup Language (HTML). The browser reads these instructions or tags when it displays the web page on your screen. It is important to note that the page, on the Internet, does not actually look the way it does on your screen. It is a text file with embedded HTML tags giving instructions like ‘this line should be bold’, ‘that line should be in italics’, ‘this heading should be in this colour and font,’ ‘here you should place a particular picture’ and so on. When you ask for that page, the browser brings it from the Internet web servers and displays it according to the coded instructions. A web browser is a computer program in your computer that has a communication function and a display function. When you ask it to go to an Internet address and get a particular page, it will send a message through the Internet to that server and get the file and then, interpreting the coded HTML instructions in that page, compose the page and display it to you.

An important feature of the web pages is that they carry hyperlinks. Such text (with embedded hyperlinks) is called Hyper Text, which is basically text within text. For example, in the above paragraphs, there are words like ‘HTML’, ‘World Wide Web’ and ‘Browser’. Now if these words are hyperlinked and you want to know more about them, then I need not give the information right here, but provide a link to a separate document to explain each of these words. So, only if you want to know more about them, would you go that deep.

In case you do want to know more about the Web and you click on it, then a new document that appears might explain what the Web is and how it was invented by Tim Berners-Lee, a particle physicist, when he was at CERN, the European Centre for Nuclear Research at Geneva. Now if you wanted to know more about Tim Berners-Lee or CERN then you could click on those words with your mouse and a small program would hyperlink the words to other documents containing details about Lee or CERN and so on.

Thus, starting with one page, you might ‘crawl’ to different documents in different servers over the Net depending on where the hyperlinks are pointing. This crawling and connectedness of documents through hyperlinks seems like a spider crawling over its web and there lies the origin of the term ‘World Wide Web.’

For a literary person, the hyperlinked text looks similar to what writers call non-linear text. A linear text has a plot and a beginning, a middle and an end. It has a certain chronology and structure. But a nonlinear text need not have a beginning, middle and an end in the normal sense. It need not be chronological. It can have flashbacks and flash-forwards and so on.

If you were familiar with Indian epics then you would understand hyperlinked text right away. After all, Mahabharat, Ramayana, Kathasaritsagar, Panchatantra, Vikram and Betal’s stories have nonlinearities built into them. Every story has a sub-story. Sometimes there are storytellers as characters within stories, who then tell other stories, and so on. At times you can lose the thread because, unlike Hyper Text and hyperlinks—where the reader can exercise his choice to follow a hyperlink or not—the sub-stories in our epics drag you there anyway!

Earlier, you could get only text documents on the Net. With HTML pages, one could now get text with pictures or animations or even some music clips or video clips and so on. The documents on the Net became so much livelier, while the hyperlinks embedded within the page took you to different servers—host computers on the Internet acting as repositories of documents.

It is as if you open one book in a library and it offers you the chance to browse through the whole library of books, CDs and videos! By the way, the reference to the Web as a magical library is not fortuitous. This idea of a hyperlinked electronic library was essentially visualised in the 1940s by Vannevar Bush at MIT, which he had called Memex.

Incidentally, Tim Berners-Lee was actually trying to solve the problem of documentation and knowledge management in CERN. He was grappling with the problem of how to create a database of knowledge so that the experience of the past could be distilled in a complex organisation. It would also allow different groups in a large organisation to share their knowledge resources. That is why his proposal to his boss to create a hyperlinked web of knowledge within CERN, written in 1989-90, was called: ‘Information Management: A Proposal’. Luckily, his boss is supposed to have written two famous words, “Why not?” on his proposal. Lee saw that the concept could be generalised to the Internet. The Internet community quickly grasped it, and we saw the birth of the Internet as we know it today. A new era had begun.

Lee himself developed a program, that looked like a word processor and had hyperlinks as underlined words. He called it a browser. The browser had two functions: a communication function which used Hyper Text Transfer Protocol (HTTP) to communicate with servers, and a presentation function. As more and more servers capable of using HTTP were set up, the Web grew.

Soon more browsers started appearing. The one written by a graduate student at the University of Illinois, Marc Andreessen, became very popular for its high quality and free downloading. It was called Mosaic. Soon, Andreessen left the university, teamed up with Jim Clark, founder of Silicon Graphics, and floated a new company called Netscape Communications. Its Netscape Navigator created a storm and the company started the Internet mania on the stock market when it went public, attracting billions of dollars in valuation even though it was not making any profit!

Meanwhile, Tim Berners-Lee did not make a cent from his path breaking work since he refused to patent it. He continues to look at the development of the next generation of the Internet as a non-profit service to society and heads a research group, W3C, at MIT, which has become a standards-setting consortium for the Web.

No comments: