Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #86068 > unrolled thread

How to design a search engine in Python?

Started bysubhabangalore@gmail.com
First post2015-02-21 12:51 -0800
Last post2015-02-22 10:14 -0800
Articles 7 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  How to design a search engine in Python? subhabangalore@gmail.com - 2015-02-21 12:51 -0800
    Re: How to design a search engine in Python? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-22 15:42 +1100
      Re: How to design a search engine in Python? subhabangalore@gmail.com - 2015-02-21 21:02 -0800
        Re: How to design a search engine in Python? Denis McMahon <denismfmcmahon@gmail.com> - 2015-02-22 05:37 +0000
          Re: How to design a search engine in Python? subhabangalore@gmail.com - 2015-02-21 22:07 -0800
            Re: How to design a search engine in Python? Laura Creighton <lac@openend.se> - 2015-02-22 10:12 +0100
              Re: How to design a search engine in Python? subhabangalore@gmail.com - 2015-02-22 10:14 -0800

#86068 — How to design a search engine in Python?

Fromsubhabangalore@gmail.com
Date2015-02-21 12:51 -0800
SubjectHow to design a search engine in Python?
Message-ID<9701e262-7c29-49b6-bb66-8138f484bbea@googlegroups.com>
Dear Group, 

I am trying to build a search engine in Python. 

To do this, I have read tutorials and working methodologies from web and books like Stanford IR book [ http://www-nlp.stanford.edu/IR-book/]. I know how to design a crawler, I know PostgresSql, I am fluent with PageRank, TF-IDF, Zipf's law, etc. 
I came to know of Whoosh[https://pypi.python.org/pypi/Whoosh/]

But I am looking for a total tutorial how to implement it. If any body may kindly direct me. 

I heard there are good source codes and prototypes, but I am not getting. 

Apology if this is not a question of the room. I tried to post as this is a room of Python bigwigs. 

Regards,
Subhabrata. 

[toc] | [next] | [standalone]


#86078

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-02-22 15:42 +1100
Message-ID<54e95e34$0$13006$c3e8da3$5496439d@news.astraweb.com>
In reply to#86068
subhabangalore@gmail.com wrote:

> Dear Group,
> 
> I am trying to build a search engine in Python.

How to design a search engine in Python?

First, design a search engine.

Then, write Python code to implement that search engine.


> To do this, I have read tutorials and working methodologies from web and
> books like Stanford IR book [ http://www-nlp.stanford.edu/IR-book/]. I
> know how to design a crawler, I know PostgresSql, I am fluent with
> PageRank, TF-IDF, Zipf's law, etc. I came to know of
> Whoosh[https://pypi.python.org/pypi/Whoosh/]

How does your search engine work? What does it do?

You MUST be able to describe the workings of your search engine in English,
or the natural language of your choice. Write out the steps that it must
take, the tasks that it must perform. This is your algorithm. Without an
algorithm, how do you expect to write code? What will the code do?

Once you have designed your search engine algorithm, then *and only then*
should you start to write code to implement that algorithm.




-- 
Steven

[toc] | [prev] | [next] | [standalone]


#86079

Fromsubhabangalore@gmail.com
Date2015-02-21 21:02 -0800
Message-ID<be1e90e1-ea64-4a93-ac7a-1bf6a823dd83@googlegroups.com>
In reply to#86078
On Sunday, February 22, 2015 at 10:12:39 AM UTC+5:30, Steven D'Aprano wrote:
> wrote:
> 
> > Dear Group,
> > 
> > I am trying to build a search engine in Python.
> 
> How to design a search engine in Python?
> 
> First, design a search engine.
> 
> Then, write Python code to implement that search engine.
> 
> 
> > To do this, I have read tutorials and working methodologies from web and
> > books like Stanford IR book [ http://www-nlp.stanford.edu/IR-book/]. I
> > know how to design a crawler, I know PostgresSql, I am fluent with
> > PageRank, TF-IDF, Zipf's law, etc. I came to know of
> > Whoosh[https://pypi.python.org/pypi/Whoosh/]
> 
> How does your search engine work? What does it do?
> 
> You MUST be able to describe the workings of your search engine in English,
> or the natural language of your choice. Write out the steps that it must
> take, the tasks that it must perform. This is your algorithm. Without an
> algorithm, how do you expect to write code? What will the code do?
> 
> Once you have designed your search engine algorithm, then *and only then*
> should you start to write code to implement that algorithm.
> 
> 
> 
> 
> -- 
> Steven

Dear Sir,

Thank you for your suggestion. But I was looking for a small tutorial of algorithm of the whole engine. I would try to check it build individual modules and integrate them. I was getting some in google and youtube, but I tried to consult you as I do not know whether they would be fine. I am trying your way, let me see how much I go. There are so many search algorithms in our popular data structure books, that is not an issue but how a search engine is getting done, I am thinking bit on that. 

Regards,
Subhabrata.

[toc] | [prev] | [next] | [standalone]


#86080

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2015-02-22 05:37 +0000
Message-ID<mcbpvf$rp4$3@dont-email.me>
In reply to#86079
On Sat, 21 Feb 2015 21:02:34 -0800, subhabangalore wrote:

> Thank you for your suggestion. But I was looking for a small tutorial of
> algorithm of the whole engine. I would try to check it build individual
> modules and integrate them. I was getting some in google and youtube,
> but I tried to consult you as I do not know whether they would be fine.
> I am trying your way, let me see how much I go. There are so many search
> algorithms in our popular data structure books, that is not an issue but
> how a search engine is getting done, I am thinking bit on that.

Presumably a search engine is simply a database of keyword -> result, 
possibly with some scoring factor.

Calculating scoring factor is going to be fun.

Then of course result pages might have scoring factors too. What about a 
search with multiple keywords. Some result pages might match more than 
one keyword, so you might add their score for each keyword together to 
get the ranking in that enquiry for that page.

But then pages with lots and lots of different keywords might be low 
scoring, because searchers are looking for content, not pages of keywords.

Finally, What special, unique feature is your search engine going to have 
that makes it better than all the existing ones?

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


#86081

Fromsubhabangalore@gmail.com
Date2015-02-21 22:07 -0800
Message-ID<0914bd64-6a5b-4014-af6e-f41dcb0c4fba@googlegroups.com>
In reply to#86080
On Sunday, February 22, 2015 at 11:08:47 AM UTC+5:30, Denis McMahon wrote:
> On Sat, 21 Feb 2015 21:02:34 -0800, subhabangalore wrote:
> 
> > Thank you for your suggestion. But I was looking for a small tutorial of
> > algorithm of the whole engine. I would try to check it build individual
> > modules and integrate them. I was getting some in google and youtube,
> > but I tried to consult you as I do not know whether they would be fine.
> > I am trying your way, let me see how much I go. There are so many search
> > algorithms in our popular data structure books, that is not an issue but
> > how a search engine is getting done, I am thinking bit on that.
> 
> Presumably a search engine is simply a database of keyword -> result, 
> possibly with some scoring factor.
> 
> Calculating scoring factor is going to be fun.
> 
> Then of course result pages might have scoring factors too. What about a 
> search with multiple keywords. Some result pages might match more than 
> one keyword, so you might add their score for each keyword together to 
> get the ranking in that enquiry for that page.
> 
> But then pages with lots and lots of different keywords might be low 
> scoring, because searchers are looking for content, not pages of keywords.
> 
> Finally, What special, unique feature is your search engine going to have 
> that makes it better than all the existing ones?
> 
> -- 
> Denis McMahon,
Dear Sir,

Thank you for your kind suggestion. Let me traverse one by one. 
My special feature is generally Semantic Search, but I am trying to build
a search engine first and then go for semantic I feel that would give me a solid background to work around the problem. 

Regards,
Subhabrata. 

[toc] | [prev] | [next] | [standalone]


#86090

FromLaura Creighton <lac@openend.se>
Date2015-02-22 10:12 +0100
Message-ID<mailman.18991.1424596357.18130.python-list@python.org>
In reply to#86081
In a message of Sat, 21 Feb 2015 22:07:30 -0800, subhabangalore@gmail.com write
>Dear Sir,
>
>Thank you for your kind suggestion. Let me traverse one by one. 
>My special feature is generally Semantic Search, but I am trying to build
>a search engine first and then go for semantic I feel that would give me a solid background to work around the problem. 
>
>Regards,
>Subhabrata. 

You may find the API docs surrounding rdelbru.github.io/SIREn/
of interest then.

Laura Creighton

[toc] | [prev] | [next] | [standalone]


#86141

Fromsubhabangalore@gmail.com
Date2015-02-22 10:14 -0800
Message-ID<b63cc7b5-b72c-4b95-bc7f-320a8bfca2a0@googlegroups.com>
In reply to#86090
On Sunday, February 22, 2015 at 2:42:48 PM UTC+5:30, Laura Creighton wrote:
> In a message of Sat, 21 Feb 2015 22:07:30 -0800,  write
> >Dear Sir,
> >
> >Thank you for your kind suggestion. Let me traverse one by one. 
> >My special feature is generally Semantic Search, but I am trying to build
> >a search engine first and then go for semantic I feel that would give me a solid background to work around the problem. 
> >
> >Regards,
> >Subhabrata. 
> 
> You may find the API docs surrounding rdelbru.github.io/SIREn/
> of interest then.
> 
> Laura Creighton

Dear Madam,

Thank you for your kind help. I would surely check then. 

Regards,
Subhabrata. 

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web