More nuggets from I’m Feeling Lucky to help us understand Sergey Brin better. On the relationship between Larry Page and Sergey Brin, Edwards quotes David Krane, then a senior member of the communications team: “Those guys had a communication channel that was very direct, very open. When there was tension, it was when they were fighting over data. They would be downright rude to each other, confidently dismissing ideas as stupid or naïve or calling each other bastards. But no one would pout.” THEY SHOULD BE PAYING US It didn’t take long for Douglas to realize that Larry and Sergey wanted Google ideally to churn out perfect products while incurring no expense. “That was impossible, of course, but it didn’t stop us from trying,” he writes. Douglas said he learned that for the founders, the starting point for any negotiation would be the position, “they should pay us”. To the founders, winning a deep discount wasn’t a victory, wrote Douglas. “It was an admission of a failure to get something for free.” Another core value at Google was to ‘never pay retail’. In the early months of 2000, this was thoroughly tested when an expanding Google wanted more space and Silicon Valley had nothing to offer at less than $8 per square foot. Despite his protestations, Google’s official office hunter George Salah was forced to put in a lowball offer of $6.45 per square foot by Larry and Sergey. The landlord screamed at George’s broker and raised the price to $8.25 per square feet, notes Edwards. As luck would have it for the Google founders, two weeks later the dotcom fever broke and the real estate market collapsed. Salah “sublet space in that same building for $3.5 per square foot, and a year later he leased a completely furnished building nearby for 45 cents a square foot”, writes Edwards. So the Google core value, after all, was based on sound logic. GOOGLE’S GREATEST CORPORATE EXPENSE At an all-hands meet one Friday, Sergey popped the question, “Do you know what’s our greatest corporate expense?” Douglas remembers that everyone wanted to answer that one. Answers came in thick and fast like ‘health insurance’, ‘salaries’, ‘servers’, ‘electricity’, and even ‘Charlie’s grocery bills’. Sergey wasn’t impressed. “No,” he said solemnly, shaking his head, “opportunity cost.” He explained that products the company wasn’t launching and deals they weren’t doing threatened its economic stability more than any single line item in the budget. (To be continued. Vignettes taken from the book I’m Feeling Lucky by Douglas Edwards. The first, second and third parts of this series can be accessed by clicking on the links)
0 Comments
In the first part of this post, we saw how Google engineer Paul Buchheit’s 20% side project which led to the creation of Gmail had its origins in his itch to fix the buggy Web emails available in the market. He wanted to add a then unheard of 1 gigabyte of storage space so that users would never have to spend hours to sort and delete their mails. Ryan Tate’s book The 20% Doctrine says this was roughly five hundred times the storage space offered by competitors Hotmail.com and Yahoo! Mail. The question then was how to finance the expenses for this free extra storage space. Tate says Buchheit’s manager Marissa Mayer wanted him to charge users for the extra storage. But instead, Buchheit started looking at contextual advertisements like in the case of AdWords. AdWorks shows web searchers in Google, advertisements based on their search terms on the right hand side as well as on top of the search results page. For instance, if someone searched for ‘hotel’, advertisements for hotels would turn up. Buchheit wondered if the same logic could be extended to email. What if ads were shown on the side of emails based on the contents of the mail, he thought. It was brilliant on the face of it, but equally sounded creepy, says Tate. Mayer expressed her misgivings bluntly. “People are going to think there are people here reading their emails and picking out the ads, and it’s going to be terrible,” she recalled thinking in a Stanford University podcast done later. The podcast also recounted how Buchheit actually broke his promise to Mayer on not to work on combining advertisements with email. “I remember leaving, and when I walked out the door I stopped for a minute, and I remember I leaned back and I said, ‘So Paul, we agreed we are not exploring the whole ad thing now, right?’ And he was like, ‘Yup, right’.” Tate says Buchheit broke his word almost immediately. “Over the next few hours, he hacked together a prototype of the ‘ad thing’, a system that would read your email and automatically find a related ad to display next to it.” HE USED A PORN FILTER TO CREATE ADSENSE Tate also gives the details about how Buchheit went about creating the AdSense building blocks. Just as he adapted the Usenet search experience to create Gmail, he started working with another tool, a porn filter no less, to create AdSense. This was basically a code editor he had created to screen for adult content. Probably it was used to switch on and switch off Safe Search filters in Google Search Settings. “Normally, the filter examined a batch of known porn pages and listed words that occurred disproportionately within those pages. Other pages containing those words were then assumed to be porn. Buchheit instead turned the filter on Gmail messages, using the resulting keywords to select advertisements from Google’s AdWords database.” Tate advises youngsters who are following side projects to copy Buchheit’s method of adapting old work. “As tempting as it is start from a clean slate, always look for opportunities to use something old to create something fresh,” Tate advises. HOW BUCHHEIT WON OVER THE DECISION MAKERS Although Buchheit directly rebelled against his boss in putting together the delivery mechanism of what turned out to be AdSense, Tate says it helped that Google had a culture where results prevail over preconceptions. When the next day Marissa Mayer opened her Gmail account, only to see ads running on the side of mails, her immediate instinct was to summon Buchheit for an explanation. But she delayed action, thinking he deserved the mercy of sleeping for a few more hours after having worked the whole night. Tate writes, “While she waited, Mayer checked her Gmail. There was an email from a friend who invited her to go hiking — and next to it, an ad for hiking boots. Another email was about Al Gore coming to speak at Stanford University — and next to it was an ad for books about Al Gore. Just a couple of hours after the system had been invented, Mayer grudgingly admitted to herself, AdWords was already useful, entertaining, and relevant.” Tate writes that like Mayer, Larry Page and Sergei Brin loved AdSense. “In short order, the Google high command decided AdSense would be a top priority. It was a no-brainer: Google’s main revenue source, AdWords, placed contextual ads alongside search results. But search results were just 95% of Web views; AdSense promised to open up the other 95 percent to ads, since it could go inside any Web page,” Tate writes. According to Tate, it took just six months for AdSense to launch. In June 2003, it was made available to the public as a widget that any publisher could attach to any Web page. It generates more than $10 billion per year for Google. Gmail itself, for which AdSense was first developed by Buchheit, launched to the public on April 1, 2004, in what was initially thought of as an April Fool’s Day practical joke. Today it’s probably the world’s largest free Webmail service, as well as the pivot around which the Google Apps for Business suite functions. So what are the lessons which we can take from Buchheit’s innovations in the development of Gmail and AdSense for people who run 20% projects:
e.o.m. Google’s ‘20 percent policy’ has been much celebrated. Although people are skeptical about the company’s commitment to the policy now, it still remains in force, though it was never a written-down document. This unwritten policy allows employees to spend up to one day a week or four days a month or 75 days a year on side projects which they want to pursue, using the company’s own resources. Many such projects later went on to become part of Google’s core offerings, including Gmail, AdSense, and Google News. Google engineer Paul Buchheit was the person responsible for the creation of both AdSense and Gmail. Both began as side projects by Buchheit. Today, AdSense is Google’s second biggest revenue earner after AdWords. Gmail is probably the biggest web-based email in the world. It was revolutionary when it was introduced. It still leads from the front, and is the pivot around which Google Apps for Business suite on the cloud is offered by the company. So successful and threatening did Google Apps for Business become for Microsoft’s core bread and butter Office suite that it was forced to offer a cloud version of the same in Office 365. This meant the Outlook email client had to be made available as a Web offering. So Microsoft ended up renaming Hotmail as Outlook. Look how a 20% project started by an engineer at Google ended up affecting even the company’s competitors! Much of what I am going to write here is taken from Ryan Tate’s book, The 20% Doctrine, where the first chapter chronicles Buchheit’s Operation Gmail and AdSense. BEGIN BY SOLVING A PERSONAL ITCH Tate says 20% side projects usually begin as an attempt to satisfy some personal itch. In the early years of the aughts, Buchheit’s itch was clearly email. Most of the popular Web-based offerings majorly sucked for him. For one, their storage capacity was minimal and users had to constantly work at trimming and deleting mails. Also, search capabilities were sadly lacking in email. Most providers didn’t have the knowhow to search mails for keywords appearing in the body of the text. Buchheit was conveniently placed. He had just finished fixing Google Groups, which was an archive of online conversations earlier known as Usenet. Buchheit’s fix involved making the archive searchable. He realized that email messages were fairly identical to messages in a board like Google Groups. The ‘To:’, ‘From:’, ‘Date:’, and ‘Subject:’ fields were shared, and the formatting rules were common as well. So Buchheit had an itch to solve, and he knew what to do. And it took him just a few hours to release the first version of Gmail. He shared it with a few colleagues, with code supporting only his own account. It would be good if the code supported our accounts too, they replied. And so, Gmail 2.0 was soon released, which supported search for users’ own email accounts. He followed the ‘release early and release often principle’ which is today the defining theme of the agile software development school. An early innovation was ‘Conversation View’, which displayed all replies to an email message as a unified thread. This prevented colleagues from talking past one another as was the practice before. “They would have to read all prior replies to an email before they could send one of their own,” says Tate. Very early, Gmail distinguished itself by its search capabilities. As mentioned before, this was an area where other email providers sucked. But Gmail quickly nailed comprehensive email searches. Another innovation by Buchheit was in the extensive use of JavaScript. It made Gmail feel like a desktop email client like Outlook, in contrast to other then Web emails like Hotmail.com. “For example, writing a message on Hotmail.com could easily require four page loads: one for ‘new message’, one to open your address book, one to search it, and one to pick a recipient. On Gmail, you clicked just once, and JavaScript generated the blank message form rightaway. If you started to type a friend’s name, Gmail would offer to autocomplete his email address. This felt like magic,” Tate writes in his book. HOW TO SUCCEED WITH A 20% PROJECT, THE BUCCHEIT WAY Tate writes that to succeed with a 20% project, “the trick is to find a way to make a small initial prototype and then take small steps forward”. Tech start-ups refer to this as the Minimum Viable Product, notes Tate. “The sooner you release, the sooner you get information from your users about where the product should go,” writes Tate. For instance, the Gmail churn was so intense that the front-end was rewritten about six times and the back-end about three times. The next concern for a 20% project developer is to know when to stop. As in, when do you consider your project sufficiently developed that you are ready to ship? Buchheit took to heart the advice from then Google CEO Eric Schmidt that he should launch only after getting 100 happy users for Gmail inside Google. Buchheit later said that he and his team would approach people directly for their feedback, and if the bar was set too high, they would abandon that user saying they were unlikely to ever satisfy him. But in short order, they won over 100 happy users by making small tweaks to the code based on user feedback. Humility is an important quality to have for such a project developer. Tate quotes Chris Wetherell, the lead developer of another 20% project called Google Reader (now shut down) as saying about the Gmail project, “Can you imagine working on it for two years? No daylight. Very little feeback. Many iterations, many. Some so bad that people thought, ‘This will never launch. This is the worst thing ever.’ I remember being in a meeting, and a founding member of Google said, ‘This is brand destroying. This will destroy our brand. This will crush our company’.” But Buchheit never gave up even after such withering criticism from within the company. In the next part, we will take up how he created AdSense as a way of monetizing Gmail since it came with a till-then unheard of one gig of free storage for users. There are many lessons to be learned for innovators and entrepreneurs from understanding the strategies used by Paul Buchheit in getting the buy-in from the company’s top leadership to invest its best resources in both Gmail and AdSense. eo.m. Google’s translation service is in the news in India now for the wrong reasons. Apparently, the Union Public Service Commission (UPSC), which conducts the civil services examinations, uses the Google Translate free service to translate most of the questions in the Civil Service Aptitude Test or CSAT for the preliminary exam. Many exam takers blame the poor Hindi-to-English translation for making CSAT insurmountable for them.
Obviously, UPSC needs to fix the translation part. It could consider using the services of professional translators, instead of an algorithm-based service like that of Google. But having said that, one has to note that on the whole, Google has considerably improved on the translation front from where it began. Randall Stross in his book Planet Google has provided a fascinating account of how Google nailed the machine translation problem which has been a bugbear element in computing for long. Stross begins by saying that machine translation in computing has a long tradition of overpromising and underdelivering. Considering Cold War priorities, Russian-to-English translation of documents was the initial area of focus for researchers. But word-for-word matching had its limitations, including the famous ‘water goat’ problem, a reference to how computers frequently translated the word hydraulic ram. Researchers thought all they had to do was add syntactical rules to word-for-word matching and perfect the process until translation was fixed. It certainly improved the quality of translations, and soon commercial providers of such translation services, including Systran, began entering the field. But Stross notes that this rules-based methodology was only one approach to machine translation. An alternative approach was advanced by researchers at IBM in the 1970s known as the Statistical Machine Translation. It was not based on linguistic rules manually drawn up by humans, but on a translation model that the software develops on its own as it is fed millions of paired documents —an original and a translation done by a human translator. GOOGLE MADE USE OF IBM RESEARCH Historically, IBM is known as a company with such a vast bureaucracy that many divisions do not know the findings and research advances of other divisions in the same organization. It often falls on others to make the most of the research advances made at IBM. For instance, Oracle was formed after Larry Ellison was alerted to the potential of an obscure research paper published at IBM about relational databases. Google made its tentative foray into translations in 2003 by hiring a small group of researchers and letting them free to have a go at fixing the problem. As is to be expected, they soon saw the potential of Statistical Machine Translation. In this model, says Stross, “the software looks for patterns, comparing the words and phrases, beginning with the first sentence in the first page of Language A, and its corresponding sentence in Language B. Nothing much can be deduced by comparing a single pair of documents. But compare millions of paired documents, and highly predictable patterns can be discerned…” So the task before the Google translators was one of scale. To fix the translation problem, they needed millions of paired documents. Stross says the Google engineers solved it by getting them a corpus of 200 billion words from the United Nations, where every speech made in the General Assembly as well as every document made, is translated into five other languages. “The results were revelatory,” says Stross. “Without being able to read Chinese characters or Arabic script, without knowing anything at all about Chinese or Arabic morphology, semantics, or syntax, Google’s English-language programmers came up with a self-teaching algorithm that could produce accurate, and sometimes astoundingly fluid, translations.” Google soon went to town with its achievement. At a briefing in May 2005, it held two translations of a headline in an Arabic newspaper side by side — its own as well as that of Systran. The first translation by Systran read as ‘Apline white new presence tape registered for coffee confirms Laden’. It was sheer nonsense. The Google translation rendered it as ‘The White House confirmed the existence of a new Bin Laden Tape’. Pretty impressive! Google didn’t stop there. It entered its translation service at the annual competition for machine-translation software run by the National Institute of Standards and Technology in the United States. Google came first in both Arabic-to-English and Chinese-to-English leaving Systran far behind. Google repeated its feat in 2006, coming first in Arabic and second in Chinese. Stross says a stupefied Dimitris Sabatakakis, the CEO of Systran, could not grasp how Google’s statistical approach could outsmart his company, which was in the machine translation business since 1968, and which had initially even powered the Google translation efforts. At Systran, “if we don’t have some Chinese guys, our system may contain some enormous mistakes”, he was quoted as saying. Stross says he could not understand how Google, without those Chinese speakers double-checking the translation, had beat Systran so soundly. Incidentally, Google hasn’t taken part in the competition since 2008 since it may have found that there’s nothing left to prove. FROM MONOLINGUAL TO BILINGUAL Stross’ description of how Google built up a monolingual language model is also a fascinating read. While in bilingual, translation happens from one language to another, in the monolingual language model the efforts are directed at using software to fluently rephrase whatever the translation model produced. In other words, this model perfected the language after it was already translated from another. How did Google manage this? Randall Stross has an answer. “The algorithm taught itself to recognize what was the natural phrasing in English by looking for patterns in large quantities of professionally written and edited documents. Google happened to have ready access to one such collection on its servers —the stories indexed by Google News.” Stross says that “even though Google News users were directed to the Web sites of news organizations, Google stored copies of the stories to feed its news algorithm. Serendipitiously, this repository of professionally polished text —50 billion words that Google had collected by April 2007 —was a handy training corpus perfectly suited to teach the machine translation algorithm how to render English smoothly.” So Google Translate may not be perfect. But it is constantly getting better, using software that teaches itself to read patterns by looking at a large volume of data. “Google did not claim to have the most sophisticated translation algorithms, but it did have something that other machine-translation teams lacked — the largest body of training data. As Franz Och, the engineer who led (and still leads) Google Translate said, “There’s a famous saying in the natural processing field, ‘More data is better data’.” Indeed. Data has helped Google to prevail as the leader in yet another segment of search. e.o.m. |
Archives
December 2014
AuthorI'm Georgy S. Thomas, the chief SEO architect of SEOsamraat. The Searchable site will track interesting developments in the world of Search Engine Optimization, both in India as well as abroad. Categories
All
|