Recently I have been playing a lot with Sphinx full-text search engine, in particular with regard to indexing the Geograph archive. (a bit of background – Geograph has a fairly good homegrown site text search – but its not full text, so many queries will not return that many results – not to mention been based on MySQL ‘like’, so is pretty slow – so a full text search is the next level). And I have to say I am liking it a LOT, in fact I would say I am a fanboy
So to that end of created a whole bunch of demos based around the flexible indexing it provides, location based searching is even possible!
At the most basic is simple text based search, one point of note, there is no pagination, simply add more keywords (including negative) or grid references to refine the selection.
Next is a ‘auto-complete’ style image finder, this is designed to find ‘that image’ quickly, in a similar way to the above but shows the results in a autocomplete box immediately!
A refinement of the first is search with location, this allows you limit the search to near a particular Grid References – this is particully cool in that there is Sphinx powered auto-complete for place names for finding GRs. (a real auto-complete not a like the search in the previous one pretending to be one)
This is all building towards the Illustrator demo. Which from a block of text attempts to find relevent images. The idea is that a (geolocated) news article, walking route, place description and such could be automatically have relevent(ish) images shown. (an example demo here)
(a few more ‘toys’ can be found in GeographTools!)…. Try them out and let me know how you get on…
I have learnt a lot about search indexing from this, including how to perform location searches in the index (I know latest versions of sphinx include a lat/long based geosearch – but I think this r-tree method in text has better scalability), and how to create an autocomplete function with sphinx. If anybody is interested in these, they will eventually make it into the geograph codebase, or let me know and I might make a separate post.
Interestingly (huh?), it was actually creating a ‘autocomplete’ textbox for finding trigpoints (which included the forerunner to the sphinx location search in but implemented in mysql), is actually what inspired me to actually go the trouble if figuring out how to install Sphinx on linux, which I have been interested in for a long time! – that is also now sphinx powered for text searches
As a side note have now reached the ‘linux sysadmin’ level that I can compile it on Geographes servers, yay! But I do worry for the sanity of others due to this (a little knowledge is a dangerous thing!)
Tags: auto complete, autocomplete, full text search, Geograph, image search, indexing, Search, Sphinx, text search, web service
Crud, was this really 5 months ago? – still not got round to deploying on geograph servers – well Sphinx runs on a single machine there, but thats not a full system (redundant, automatic, auto updating as new content etc)