History of Search Engines - From Archie to Google

The first search engine, Archie, launched in 1990 and let people find files on FTP servers by name. It didn’t search web pages. It didn’t search content. It searched filenames, full stop. Google, the engine most of us now treat as a verb, wouldn’t show up for another eight years.

That gap matters. A lot happened in those eight years, and almost none of it gets taught anywhere outside a search history footnote. If you’ve ever wondered how we got from “type a filename and hope” to “ask a question and get an answer,” this is that story.

1990: Archie and the problem nobody had a name for

By 1990, the internet had thousands of files scattered across public FTP servers, but no central way to find them. You had to already know which server held the file you wanted.

Alan Emtage, a graduate student at McGill University in Montreal, built a fix. He wrote the original implementation of Archie in 1990 while a postgraduate student at McGill University. The name comes from “archive” with the v dropped. Archie crawled FTP servers on a schedule, gathered their file listings, and let users search those listings with a basic keyword match. It debuted on 10 September 1990 and is recognized as the first well-documented search engine built to search content files.

It worked, and it caught on fast. At its peak, Archie accounted for half of all internet traffic in Montreal. But it had a hard limit: it only indexed filenames, not the contents inside those files. If you didn’t already know roughly what the file was called, Archie couldn’t help you.

1991 to 1993: Gopher, Veronica, and Jughead

The next jump came from a different protocol entirely. Gopher, built at the University of Minnesota in 1991, organized internet content into menus instead of raw file listings, closer to a table of contents than a card catalogue. Two search tools grew up around it, and yes, their names are a deliberate Archie joke. Veronica gave keyword search across most Gopher menu titles network wide, while Jughead pulled menu information from individual Gopher servers. Neither was an accident: Veronica and Jughead are characters from the Archie comic books, a nod to their predecessor.

Around the same time, the World Wide Web itself was still being indexed by hand. Before September 1993, a human-curated list of web servers, maintained by Tim Berners-Lee, was the closest thing the web had to a directory. That couldn’t scale, and everyone building on the early web knew it.

1993 to 1995: The first real web crawlers

1993 is when things shifted from “indexing files” to “indexing the web.” JumpStation, built by Jonathan Fletcher in late 1993, was the first tool to combine crawling, indexing, and keyword search into one system, the basic three-part shape every search engine has used since.

A wave of commercial engines followed within two years. WebCrawler introduced full-text search, letting people search for any word that appeared anywhere on a page, a feature so fundamental now that it’s strange to remember a time it didn’t exist. Excite started as a Stanford student project under the name Architext. Six Stanford undergraduates founded it in February 1993, using statistical analysis of word relationships to make searches more useful, and it relaunched as Excite in 1995. Infoseek and AltaVista arrived around the same window, with AltaVista in particular standing out for how much of the web it could crawl and how fast it could return results, by mid-1990s standards.

Yahoo took a different approach. Instead of crawling automatically, it paid humans to organize websites into categories, a directory model more like a library than a search box. That hybrid, part directory and part search, defined how a lot of people navigated the web through the mid-1990s, even as automated crawlers were quietly getting better in the background.

1998: PageRank and the Google breakthrough

By the late 1990s, most search engines ranked pages by how often a keyword showed up on them. That made the results easy to manipulate. Stuff a page with a keyword fifty times, and you’d often outrank a page that actually answered the question.

Larry Page and Sergey Brin, two Stanford graduate students, attacked the problem from a different angle. Their idea, originally called BackRub, treated a link from one page to another as a vote of confidence. A page linked to by many other respected pages ranked higher, regardless of how many times it repeated its target keyword. They named the system PageRank, and it became the foundation Google launched on in 1998.

It’s worth being precise about what PageRank actually was: a way to estimate a page’s authority based on its link structure, not a complete ranking system on its own. Google has layered hundreds of additional signals on top of it since, and the company has confirmed it now uses far more than links alone to decide what ranks. But the core insight, that the web’s own link structure carries useful signal about quality, is the idea that let Google’s results pull ahead of AltaVista’s and Excite’s almost immediately.

A misunderstanding worth clearing up

People often assume “PageRank” and “Google’s ranking algorithm” are the same thing. They aren’t, and haven’t been for a long time. PageRank is one input among many in how Google decides what to show. Page experience, content relevance, and the cluster of signals Google groups under E-E-A-T (experience, expertise, authoritativeness, trust) all play a role too. Google has never published a full list of every ranking factor, and treats specifics as closely guarded. Anyone who tells you they know the exact formula is guessing.

1998 to 2023: Consolidation and refinement

After Google’s launch, the field narrowed fast. Smaller engines either got bought, shut down, or shrank into niche players. AltaVista was eventually folded into Yahoo, which itself struggled to keep pace with Google’s crawling speed and relevance. The Open Directory Project (DMOZ), the last major human-curated directory at scale, shut down in 2017. Meanwhile, Google kept iterating: regular core updates, the Helpful Content Update, Core Web Vitals, and a long list of named and unnamed algorithm changes, each one adjusting how pages get ranked, not just whether they get found.

2023 to 2026: Search starts answering, not just listing

The most recent shift is the one still unfolding. Generative AI moved from a novelty into the search results page itself. Google rolled out AI Overviews, summaries generated directly above the traditional blue links, answering a question before the reader ever clicks anything. Featured snippets used to be the prize everyone chased. In 2026, they’re sharing space with AI Overviews, and that’s changed what “ranking well” actually looks like. ChatGPT Search and Perplexity built entirely new search experiences around conversational answers rather than ranked lists, and “agentic search,” where an AI assistant doesn’t just answer a question but takes an action on your behalf, is the direction most of the major players are now building toward.

Search market share figures move quickly in this environment, so treat any specific percentage as a snapshot rather than a permanent fact. What hasn’t changed is the underlying shape of the problem: someone has a question, and a system has to find, evaluate, and surface the best answer to it. Archie just answered “what file is this.” Google’s answering something a lot closer to “what do I actually need to know.”

FAQ’s About History of Search Engines

1. What was the first search engine ever made?

Archie, built in 1990 by Alan Emtage at McGill University, is generally credited as the first search engine. It indexed filenames on FTP servers rather than web page content.

2. Was Yahoo a search engine or a directory?

Both, at different points. Yahoo started as a human-curated directory of websites organized by category, then layered automated search on top as the web grew too large to catalogue by hand.

3. When did Google overtake other search engines?

Google launched in 1998 with PageRank as its core ranking innovation and pulled ahead of AltaVista, Excite, and Infoseek within a few years, largely on the strength of more relevant results.

4. Is PageRank still used today?

Yes, in updated form. PageRank remains one signal among many that Google’s ranking systems use, alongside page experience, content relevance, and E-E-A-T signals.