Saturday, August 29, 2009

Rajeev Motwani: Interview 2002

Rajeev Motwani unplugged

(Excerpts appeared in India Abroad, Aug 7, 2009)

Shivanand Kanavi, interviewed Rajeev Motwani in July 2002, while researching for his book, Sand to Silicon: The amazing story of digital technology. Here are the edited excerpts:

SK: Tell me about your childhood and growing up and the influences that shaped you.

RM: One of the shaping influences was that my father was in the Army and that meant not being in one place for too long, not more than 2 years. My parents were great believers in education so where ever we were they sent me to the best possible school available. All of them were missionary schools and had many inspiring teachers. I also always wanted to be a mathematician or a scientist at any rate. Then I decided that I did not want to be an Einstein but wanted to be a Gauss. That was because I was an avid reader and I used to read a lot of books. My parents had given me a lot of ten great scientists, 5 great mathematicians kind of popular science books and biographies which were very inspiring. I was not reading about other kind of heroes. That is what I wanted to become. Fortunately I was good at Maths. I also did not graduate from school. I used to study in St Columbus, Delhi, where they had just switched from 11 to 10+2 and IITs gave us permission to join them after 11th without completing 10+2. It was just for that year. I did have problems when getting permanent residence here because I had a PhD but did not have a school certificate.
IIT Kanpur at that time had for the first time an undergraduate program in computer science B.Tech. I really wanted to be a mathematician and I did not have any idea what a computer was. My parents were hesitant because they did not know how a mathematician would make money and support a family. Basically I was forced to do Computer Science. I then realized that Computer Science was very closely related to mathematics. Some of the faculty in IIT Kanpur were also a shaping influence for me. One of the people who really influenced me was Kesav Nori. At that time there was Prof Rajaraman, R.Shankar, Sahasrabuddhe, Somnath Biswas, Kesav Nori, Harish Karnik to name a few. I could not have constructed a better environment for doing computer science in India. It was an amazing confluence of people.

SK: But they already had a masters programme for a long time.

RM: Nori had just come back from Europe. He stayed for a year or so and taught the first course in programming. He was a wonderful teacher and used to tell great stories. We started out programming on comp cards, which you probably remember but most other people don’t. That time we used to work on DEC machines and Vac machines with a terminal. We then had to use a login and a password. Nori could have made up random passwords, or give names of flowers but instead he gave names of famous computer scientists as passwords. Somebody had Don Knuth as password (who is down the hall).
I went and did research to see who these guys were. Bob Floyd was my password. He was also at Stanford and passed away recently. He was one of the mentors of this field called analysis of algorithms. He also did the early work on randomized algorithms. I ended up eventually did some work on randomized algorithms. The very first chapter of my book on randomized algorithms was on Floyd’s algorithm. It is hard to believe that because he was my password this happened. But there must have been some connection! That was the wonderful thing about Nori who was a very inspiring person. He did more than just teaching. He created such a wonderful ecosystem and developed a personal connection with his students.

There are a lot of very good schools not only in the US but elsewhere. Going by what I have seen in this country and going by what I have learnt out there, definitely IIT Kanpur was one of the top five schools in computer science education.
I finished all that. Everybody else was coming to the US for PhD or Masters or whatever. Actually I did not want to come here for reasons I don’t quite understand now. I remember getting a job at DCM Data Products because getting visas at that time (1983) was a big problem. I was then interviewed at Wipro by the top three guys. It was a small outfit then.

SK: Was it Ashok Narasimhan?

RM: It could have been. I met Ashok Narasimhan last year when he was doing a company called July Systems which I was considering investing in, but he did not remember me. The interviewer said we would love to give you a job looking at your track record but isn’t every one with your kind of back ground going to the US on a scholarship? So have you applied to US? I said yes I have an offer from Berkley. He asked do you have a scholarship? I said yes but I am not sure if I will get a visa. He said I did my MBA from Stanford and believe me you would want to go there.
If you come back we will give you a job.

Berkley was very different from Stanford. It was a very politically oriented university. You could call it the JNU of the US because it was highly politically charged. Ronald Reagan was the president then. So then for 3 years I had a blast. Did not do any work and fully enjoyed the environment. My advisor was Richard Carp, who won the Turing award – which is like the Nobel Prize in computer science in 1985-86, when I had finished those 3 years. It was then that I thought that I was not doing anything and letting this man down. So from then on I worked really hard and was quite productive for the next two years.

SK: What did you work on?

RM: My PhD thesis was on randomized or probabilistic analysis of problems in optimization. Problems in network flows, graph matchings and so on. These are general formulations of a large class of problems.

SK: Traffic problems?

RM: Yes traffic problems or network routing. Routers on network, are basically implementing matching algorithms in some form, that is at the micro level. At the macro level flow of packets on networks. My advisor, Dick Carp was a pioneer in that field. These problems were hard and so I was trying to find a heuristic faster and better solution to get the right optimal value. So that was what my thesis on.

SK: People in the telecom sector had used such things.

RM: The technique itself had been used to some degree. In fact pioneered by Carp. Called NP- completeness. It says that some problems are essentially impossible to solve. They cannot be solved exactly. So the question is if you can solve the problem approximately with some assumptions such that the instance of the problem or the input is randomly distributed with known distribution. That’s what I was trying to do.

SK: Why random? Connection with Gauss again?

RM: Well the reason being that once you assume that there is distribution, then you can give some structure to the problem and you can use some probabilistic techniques to say that on a certain fraction of the inputs I’m still going to get screwed up but if I bound the fraction on which I wont perform well and with a typical example I will get a good solution. This is a problem with the theory of algorithms as it exists. In those days everything was taken from the worst case point of view which means that if I want to route a flow on a network I can always construct an example where you are unable to route the flow and perform badly. But the real flows patterns that emerge are not the worst case patterns. They have some niceness to it and things do get routed. Randomness is a way of capturing that by saying that in distribution there is a probability that you will get bad flows but many times you will also get good flows and that is good enough.

So I was doing all this and was about to graduate and was wondering what to do next. To go back to India or stay in the US because again other people made the decisions for me, Don Knuth, one of the founding fathers of comp science, also one of Nori’s passwords, came over to meet my advisor and told him that they wanted to hire someone young for algorithms at Stanford. So Carp suggested my name. I was then invited by Knuth at Stanford for lunch during a dinner hosted for him at Berkley. I was wondering why this great man wants to have lunch with me. So I went to Stanford and met him at a restaurant near the church at the quad. He then told me to be with Stanford for a year and see if they liked me and vice versa after which if things worked out well they would hire me.
For the first year I was a visiting faculty. I did not want that job as I was getting better offers and permanent jobs at other places but since it was an offer by Knuth it was hard to turn down. It’s the same as Einstein inviting you to Princeton for a job. So then I came and taught at Stanford and started some courses and had a very good time.
I was then given a permanent offer at Stanford as they liked what they saw. It was 3 – 4 months after I finished my 1 year at Stanford that I was to get married. My wife was shifting from LA to Berkley. Then we saw that Stanford was a good place and decided to stay in Stanford.

SK: What have been your major research interests here at Stanford?

RM: Teaching has been a major preoccupation first of all. I enjoy teaching. The reason I was in Stanford was that there were many faculty retiring like Don Knuth and so they needed someone to fill in and walk in their footsteps. So since so many people were retiring or leaving there was a vacuum created here. Also there were a lot of courses to be taught. I ended up teaching all these courses. I even made my own courses like topography and algorithms and complexity theory. I did not get enough sleep as I did not know a lot of these areas but I did learn a lot through teaching these things. I am a perfectionist and still get nervous to talk before a class till today. I get nervous, what if someone asks me a question and I find myself unable to answer it. So for this reason I always over prepare.
So this nervousness has taught me more than what I learnt as a student. I now have worked in many different areas and it broadened me up. I have the tendency of getting bored very easily and so if I stay in one area too long I quickly move over to another area. My threshold of working in one particular area is about 5 years.
Some of the non obvious areas are robotics. I was inspired by Jean Claude Latombe from France who was in this Dept. He told me that there were a lot of algorithms in robotics which are needed to plan the actions of the robot. Robots in a generic sense could include cars and or any mechanism which has to plan a motion to move about. It’s like the human body wanting to get up from this chair and walk to the door. It may seem like a triggered action but there are a lot of complexities and degree of freedom involved. In the human body itself every joint in the body gives a degree of freedom. Each can be controlled independently by setting the angle of each joint to accomplish the task. The reason why humans have this degree of freedom is for them to operate in the real world and do a lot of things. The control of these degrees of motions becomes very high. Although we live in a three dimensional world, the robotic movements and freedom work in a higher dimensional surrounding. So if you send a space craft to Mars, then you have to be sure by planning intelligence that it goes the right way without hitting anything along the path. This requires very high dimensional planning. It is like having a starting point A and end point B in space, and moving from A to B without being hit by any obstacles. The same task would be easier with 2 points on the table. So the space that we are talking about is not the physical space but the space of complex possible motions.

SK: There are constraint surfaces?

RM: They become very complex constraint surfaces in high dimensional geometry. I learnt this space for a few months and realized that this problem could be solved through randomization. It is very hard to plan motion in high dimensional complex places while it is very easy to pick a random point in space and figure out if it is going to hit any obstacle in space or if it is a free part of space.
If you pick many random points it is very easy to sample but very hard to find a free point in that space. If you find many free points, then you hook them together and make a path. The path may not be the smoothest but you can smooth it later. So that was the fundamental idea we used. But to realize this and analyse it, apply it and turn it into real systems is a lot of work and I worked for 5 years on this space, putting high dimension geometry and randomization together. Jean Claude was a systems guy while I was the theoretical guy. So the students implemented some of these things and these were used at places like GM in their robot assembly lines.
None of these things were used in isolation. The actual robots do many things and one of the techniques they use is this. They have many softwares running them. That was the first time I did something mathematical but practical. I suddenly realised that I could operate in the real world after seeing the robots move and perform using my algorithms. I could now move over from the paper pencil world. I credit Stanford for creating an environment where people in different areas can work together – the whole is greater than the sum of its parts. So I spend one third of my time in doing my work in a theoretical and mathematical way and then I collaborate with people. I may collaborate with someone in networking or databases or compilers and use my skills to solve their problems. But I need them to use my mathematical problems to solve the real world problems. It has been a good synergy so far.
I got the Godel prize for my theoretical work. It was a very theoretical work. In science it is said that one guy stands on the shoulders of another and another on his and so on. The guy on top gets the prize. In my case I was on the tip of the pyramid and so got the prize. Everyone forgets the pyramid. In my case we had worked and come up with excellent results before my paper which we used and leveraged to prove what we proved. We proved that you take a mathematical theorem then there is a way of writing the proof of that theorem and there is a way to easily verify for correctness. Verify correctness means for eg - Take the theorem and the proof and probe the proof in 7 places, read what is written there and then give an answer if the proof is correct or incorrect. The answer would be correct to as high a probability as you want. Random sampling of 7 places. Irrespective of the length of the proof. They would work for any given proof provided it is in a specific format and a particular mathematical language.

SK: What makes you say that the whole is correct?

RM: That is probably too hard to explain. The basic idea is that of an error correcting code. The code takes a sentence or a sequence of bits, 0’s and 1’s and replaces it by a slightly longer string of 0’s and 1’s , but there is some redundancy built into it.

SK: It is like writing a checksum.

RM: It is exactly like writing a checksum. I am giving a very simple example.
There are ways of error correcting where you rewrite small bits of the whole thing. It may not even look like the original thing but from that you can extract the original message. So take a proof of a mathematical theorem and write it in an error correcting code. If there are a large no of errors then by random sampling of a few points in that I will detect that there a large no of errors. So I will dismiss the group as nonsense if there are a large no of errors. But if there are small no of errors then I can correct it because I know there is a correct version of this. So after sampling it if I find the number of errors to be small then I know that there is a good proof somewhere.

SK: So it is an existence theorem?

RM: It is random verifier. We over simplified it in this discussion, however the implications are important. Sometimes in mathematics you do things just for the sheer elegance of it and after we wrote this paper I learnt that Intel had a problem with pentium2. When you multiply 2 specific numbers on that p2 chip which was being used in all the desktops and laptops, the answer was wrong. An overflow perhaps. I got a call from Intel to ask me if there was any way they could use the verification technology. But it was not possible as I was a purely mathematical abstract trained whereas they needed a real system with 10 million gates on it where you had to do some checking. Very hard to translate it to this. But this did show the possibility of verifying errors in a complex system by doing a small amount of work. But it required the system to be written in a certain code which was the catch.
There is a wonderful mathematical theorem, I am quite proud of it, it is a combined effort of a lot of people in our field, and is one of the deeper pieces of mathematics which has been done in modern day computer science.
It turned out that implications of this are much more profound than the statement itself. It goes back to the motion of optimization problems we were discussing earlier where we were talking about the flow algorithms where we couldn’t get an exact answer because you can show that in a “polynomial” amount of time which is a measure of efficiency in computer science, one could not get an the exact answer. In my thesis I had looked at one particular way of getting around the problem which was to assume that the problem input itself was randomly distributed. And I was not faced with the bad cases alone but was faced with a distribution of cases; some good some bad. But on the average I would do ok. There is also some other way of tackling hard optimization problems which is to say that I still guarantee you the right answer but the right answer won’t be the optimal answer. So if you are trying to route the maximum no of circuits through your network, I will not route 100% routing but will route 95% and to get 100% is an impossible computation. So these are called “approximation algorithms”.
So for some large class of problems we faced, using this theorem we had the possibility of reaching an approximate answer but for some problems reaching the approximate answer was also not possible. That was one of the bigger breakthroughs in comp science. Carp and Cook from Berkley in 1971 came up with the theory of NP-completeness, which tries to describe why some problems can only be solved by reaching the approximate answer and not the optimum answer. Twenty years later as a consequence of this we said that for large subset of their problems not only is it not possible to get the optimal answer but also hard to get the approximate answer which means that there is no use trying to even solve these problems.
In science you have the family tree where people are followed by their advisors, juniors or siblings. Where each person follows up on another’s work thereby expanding it. Like the musical schools in India, where the Gharanas pass on the knowledge from one generation to the next. I have been a beneficiary in that kind of a system. To me it all came together very nicely. No one knows what happened underneath to build it up. A lot of people contributed to it. I was at the tip, a small part of it, towards the end point.
That has been my mathematical pinnacle. After that it has been downhill all the way (laughs). So I did this and robotics and compiler optimization. I did PLIW compiler optimization and then Pfizer wanted to fund research on computational drug design. And while finishing the work on random motion planning in robots we realised that molecules and robots actually behave in a very similar way.
What is a molecule? One model of molecule which we learn in chemistry in high school is atoms are like balls bound together by bonds thereby making a molecule. So for eg. a protein molecule which has about 30, 50 thousand atoms has to fold and form a shape for another protein molecule to come and bind on that fold. The place where it binds has the activity.

SK: They are called pharmacophores.

RM: Yes. This is basically drug design. It is all about figures folding and matching like a lock and key mechanism. We said we know how things fold, we know how degrees of freedom are created in high dimensional space. Let’s throw it at this problem. So Pfizer funded this research which went on for 2 – 3 years and we came up with software based on our theory and we added some new theory. On a particular approach, the project being called RAPID (Randomized Pharmacophore Identification for Drug Design). It went very well. I learnt a lot. It was an intriguing experience. I had to go back and learn my high school chemistry and biology and the other fun stuff. And the software that we made is being used by Pfizer labs for their drug design. Of course the problem is complex and needs more than one software. So due to confidentiality issues they didn’t tell us how our software was used or what drugs were made from it but you feel you have made a contribution to society instead of just doing formal mathematics which is hard to justify. It felt like we contributed to social welfare through mathematics.

SK: These techniques are mostly used by medicinal chemists. They have an intuition as to which molecules to use and how the binding would take place.

RM: Now they have a tool wherein they can enhance their intuition, do away with some things and focus on what they really want, which is what we allow them to do.

SK: It can be applied to catalysis and designing chemzymes as well right?

RM: Yes. We never got into that because by then the world wide web was coming up and I just got sucked into that. There was this guy Jeff Ullman, another one of the grand old men of computer science, who retired this year. He was in the office next to me and was in database. I was talking to him and a new student – Sergey Brin, and I remember at that time we were using Mosaic, and we were looking at the web and I was sitting there and thinking that we could randomize the web in some way because that was going to grow and become big and randomness was going to be important; though I did not know how and why. So I thought about doing random walks on the web and there was this problem of crawling on the web. At that time a search engine called Inktomi had just come out of Berkley. Excite and Yahoo had come out from Stanford so we had seen the first signs of all of this.
I remember going to Inktomi and searching for the word Inktomi and it could not find itself. I don’t know if that is still true but at that time if you went to Inktomi and typed in the word it said no results found. My Godelian past induced me to do these self referential queries but what amazed me was that this is a simple thing that people screw up on. So in the context of all this I was listening to some people from IBM talk on Data mining and Ullman had just introduced me to some problems in databases. I broke them down with a student and was getting pretty excited about the concept of databases. Ullman took me for this talk on data mining which sounded very interesting to me. So Sergey and Ullman and we decided to do some data mining on the web because it sounded like a nice mix. We then formed this research group called Midas which stood for Mining Data At Stanford. We did a lot of good work on data mining. Then there was this guy called Larry Page who wasn’t really a part of the Midas group but was a friend of Sergey and would show up for these meetings. He was working on this very cool idea of doing random walks on the web.
When I understood what the World Wide Web would look like, I knew I had to somehow force randomness into it. When Larry showed us what he was doing, it was like a complete epiphany, we thought it was absolutely the right thing to do. So Sergey got involved and it became a sub group inside Midas. I was really a good sounding board for Sergey and Larry and I could relate to what they were doing through randomness. They then created a search engine called Backrub. It was running as a search engine from Stanford just like Yahoo ran till the traffic got big and the IT guys sent it off the campus. So these 2 guys would come to the office and say “hey we need some more disc space”. They were completely non respectful of me, which was a wonderful thing. They treated me like an equal. These 21 year old guys were demanding things from me. They needed more disc space because it’s getting bigger. So we need more disc and more money. There are still pictures around the building of how they used to use Legos, to create a box inside which the discs were being put. These discs were those cheap ones bought from the back of a truck and were generating a lot of heat. So they put it in Legos to allow for air circulation.
For me it was a fun research project. We had a lot of ideas which we shared. At some point this thing started getting very serious and we wanted a better name for this than Backrub. So somebody came up with the name Google. Google means 10 raised to the power of 100. It is actually spelt as GOOGOL but somebody miss spelt it and that’s how the search engine got its name. Of course the official story is we deliberately spelt it that way but my guess is we miss spelt it.
So Google started and pretty soon everybody in the world was using Google. The results were much better than all the other search engines going around. It was by word of mouth like I tell my brother to use it, he would tell his wife, the wife would tell her kids and so on. At some point these guys said we want to start a company. Everybody said it was not worth it. There were 37 search engines already there. How would you raise money? How would you form the company? But they decided to do it and they did it. There were some big names which supported the company. Andy Bechtolsheim, an ex Stanford guy who along with Vinod Khosla had founded the Sun Microsystems, put in a little bit of money. They managed to raise a million dollars. They started the company and it was right here in the university avenue. It used to be on my drive home so I used to go and hang out with these guys. It used to be wonderful.

Then they took over the world!

Right now the other search engines don’t even compare and I remember people who I don’t want to name saying why do you need another search engine? Today it is the only search engine people use. Now it is a company of 500 people or more doing hundreds of millions of searches every day, generating revenue. One of the few companies which in today’s economic conditions is not only surviving but also growing. Feels like I was part of a little bit of history and contributed to that history.

SK: Can you explain in simple words, the concept of search engines? How has it evolved?

RM: There was this thing called Information Retrieval which was used for document management. Consider your desktop machine. So let’s say you are writing chapters of a book and chapters of others books. And there are thousands of documents. Each document is just a document of English words. This could be more related to encyclopedias and newspapers articles and Reuters news feeds and so on. So you have all these documents with you and somebody comes to you and asks you a question. Can you answer that question or at least point to the right document that answers that question?
All of this was unattainable. So it reduced down to can you at least show me the document that contains the same words that are there in the question? If I said, “what kind of yeast should I use to make bread?” Ideally you would like it to come back with the answer. But you can’t do that. You could though at least find the document that uses the word bread and yeast and hope that has something to do with the question. That was the field of information retrieval. When the web evolved it became pretty clear that there was going to be a tremendous information explosion.

SK: Initially you had to give some key words for every web page and it would search only those keywords?

Even now it is still the same. It goes through the entire document and indexes the entire document. The computer helps to solve the scale of the problem which is huge. There are billions of web pages, each having thousands of words on the average. So there are trillions of words. So what you do is create an index that is trillion big and when you come along and say yeast and bread you find all the documents on the web that contain yeast and bread. It turns out millions of pages containing yeast and bread which is of no use to you. So people come up with this rule where if the document contains the word yeast 7 times and the word bread 3 times, then that is a score of ten, so give me the highest score document. I am just simplifying the rules of information retrieval but they have even more complex ones. For eg the word bread is very common and can be found in many places but the word yeast is uncommon so you start doing weighting and the other heuristics. None of this required rocket science, none of this is very deep.

It was library science. Not computer or mathematical science. So the old search engines like Alta Vista, Inktomi and so on, to a large extent this is what they are doing today too. The problem they were solving, the key secret sauce was the scale. They were able to harness a lot of memory and solve these problems on a large scale. So instead of building indexes of thousands of words they could build indexes of trillions of words and search through it very quickly. They just put a lot of horse power behind it but not any science. There were a few small ones which were exceptions but this is generally what search engines are doing today.
What Google changed was the following
It actually started looking at the structure of the web. The basic structure we noticed was that it’s a collection of documents. These documents talk to each another. The reason is the hyperlink. Without the hyperlink, the web would have been useless. When you click on it, it takes you to another document. It is like this document saying “hey, look at that other document”. Now if I create a web page and make the effort to point to your webpage, there is some meaning and connection between my webpage and your webpage. The content here and there should be related in some form. This was the basic insight in Google. Instead of looking inside the webpage or the words inside it, look at how the web pages talk to each other, how they talk about each other how they point to each other. So that was the basic insight and everything else was built around that insight.
One of the key things they did was coming up with this ranking function. So if you went and queried on yeast and bread what I would like to do is go to the most authoritative page on the web which talks about yeast and bread. So if there is a yeast or bread makers association of America, then presumably theirs is the most authoritative page on how to make bread out of yeast.
You do use yeast in a bread right? Don’t want to get that wrong. (laughs)
So the question was how to find the most authoritative page on a certain topic? Here is a simple basic idea that underlies that notion. The notion is – Look at the structure of the web. Secondly how to we convert this structure to a ranking scale? This goes back to the random walking I was talking about earlier. Suppose you are at my web page. There are 5 links out of my web page. You are a surfer. What did you do as a surfer. When we first surfed the web, we went to a page, clicked on it and found it magical. That clicking led to another page, and so I look at that and click and go to another page. And so now you are surfing. Suppose you are surfing at random. Let us say my page has 7 links. You randomly clicked on one of those 7 links. You reached the next page which had 3 links and clicked on one of those at random and it took you somewhere and so forth. After a while you will be on a random page on the web. Suppose you surf long enough, you do it a million or billion times, you will be distributed somewhere on the web. You could be on any page. The question is what is the probability that you are sitting on a particular webpage?

SK: It’s a graph theoretic problem.

RM: Yes it is. This is called doing your random walk on the graph which excited me about all these things. Turns out the probability distribution is not unique. There is a different probability being on different pages. Quite obviously if every page in the world points out to my web page then the chances of ending up on my page are very high because wherever you are there is some probability that you will come here. Nobody points to me or if one guy points to me then it is very unlikely. On the other hand if the important pages in the world point to me, then you are likely to end up at my page. But what are important pages? Those are the pages to which other important pages point. So this logic of circularity or flow is what led to the notion of page rank. Ranking of pages is Google’s secret sauce. So they came up with those and discovered that this is the right thing to do.

In hindsight I came up with the estimation of the random web surfer that there was a purely mathematical thing of eigenvectors and matrices etc. Now on a query on yeast and bread, what we do is look at all pages that contain yeast and bread , find the page with the highest rank or score which has the word yeast and bread and we say that must be your answer and it is usually right. Google got so cocky on this, that it has ‘I’m feeling lucky’ button. So if you give a query and hit that button, it takes you to a page. That page has the right answer.

No comments: