search engine databases are selected and built by the computer robot programs called spider
A search engine maintains the following processes in near real time:
->Web crawling
->Indexing
->Searching
.
2. indexing - Indexing is the process of taking all of that data that you have from a crawling and placing it in system database. Imagine to make a list of all the books their author and the number of pages. Going through each book is the crawl and writing the list is the index All of this data is stored in huge data-centres with thousands of petabytes of drives.Hence in indexing it is analyzed and stored in system databases .
3. searching- In this step user query fetches a list of relevant pages.When you search anything in the browser search bar and the search engine attempts to display the most relevant information or document it finds that match your query. It is only the area in which search engines differentiate themselves. Example some work with keywords, some allow you to ask a question, and some include advanced features like keyword proximity .
In advance search engine find the pages for the potential inclusion by the following the links in the pages that already have in their database during crawling.Search engine spider only find them if web page is linked to any other page.We can access brand of new pages which are never linked to the other page by submitting the pages to the search engine.In Indexing identifies text ,links to other page and stores in the search engine database.Hence database can be searched by the keyword during searching.
Some types of the pages and links are excluded from most search engine policy and other are excluded because search engine spider cannot find them.Such excluded pages are referred as Invisible Web.We can't see invisible web on normal search engine results. The Invisible Web is estimated to be two or three times bigger than the visible web.
If you want to get more information click here and comment below.