Tuesday, August 10, 2004

Internet and need for new search engines

A brief article, but what I found interesting were the estimates on the size of the universe of documents. Reinforces the suggestion that content is easy to add, its the tools for manipulating and searching the content that will be difficult to achieve.

"About 100 million different books have been published in history, Kahle said, citing estimates from professor Raj Reddy at Carnegie Mellon University. About 28 million sit in the Library of Congress. On average, a book can be condensed to a megabyte in Microsoft Word. Thus, the books in the Library of Congress could fit into a 28-terabyte storage system.

"For the cost of a house, you could have the Library of Congress," Reddy said, adding that mass book-scanning projects are currently under way in India and China.

"Universal access to all human knowledge is within our grasp. It could be one of the greatest achievements of all time."
-- Brewster Kahle, founder, Internet Archive

Only about 2 million to 3 million audio recordings--mostly music--have ever been published for public consumption. The Internet Archive has begun to store digitized recordings of concerts as well and has about 15,000 shows in its database to date. There are between 100,000 to 200,000 theatrical movies--half of them from India--in existence and about 20 terabytes of TV broadcasts a month. The Web grows by about 20 terabytes of compressed data a month as well. (One terabyte equals 1 trillion bytes.) Since 1984, about 50,000 software titles, including CD-ROMs, have emerged"

-from Resource Shelf

No comments: