Monday, February 20, 2012

Large FullText tables - slow queries

Hi,

I currently have a large table (35 million rows, over 80GB). I have one varchar(max) column on the table that is used in the fulltext index.

To query the complete index is fast, for example:

SELECT 'ipod', COUNT(*)

FROM CONTAINSTABLE(MyDB.dbo.Contents, [Body], 'ipod') CT

This took 70 seconds (which I can live with). However, I seldom run queries like this, most are more like:

SELECT 'ipod', COUNT(*)

FROM CONTAINSTABLE(MyDB.dbo.Contents, [Body], 'ipod') CT

JOIN Pages ITP ON ITP.PageID = CT.[Key]

JOIN Feeds ITF ON ITP.IPID = ITF.IPID

JOIN Buyers ITB ON ITB.IBID = ITF.IBID

WHERE ITB.ID IN (1342,246)

These queries are much slower (this example took 17 minutes). I understand that FT searches the index and returns all rows that match the query to SQL. SQL then performs the joins and counts only the correct results. (Correct me if I'm wrong here).

One solution I've seen to this to put data or "tags" into the FT column - so my Body column would become something like:

'{ID:1342}' + [Body]

That sounds like a very good idea. I could then change the 2nd query above to be:

SELECT 'ipod', COUNT(*)

FROM CONTAINSTABLE(MyDB.dbo.Contents, [Body], '("ID:1342" OR "ID:246") AND "ipod"') CT

That all works well until I want to select 1000 different ID's because the FT query will become very long and complex. Also I'm only including one column (ID) in this example - but I have about 7 or 8 columns that I would need to include in these "tags". Quering multiple columns become very complex quickly and no doubt I will reach a query limit at somepoint.

If anyone has any other suggestions to the above I'd love to hear them. Another thought I'm having is to partition the table. I can find very little online about how FT behaves on partitioned tables - I fear it behaves exactly the same, what I'd like to think is that I could partition the table on an ID say 100 per partition or something, and then fulltext would only search the relevant partitions. If it behaves like this it may work. If no-one knows then I'll give it ago, but this will take me a while due to the table size - so I'm hoping one of you clever lot know!

Many thanks for any advice.

Simon

I should have said - I'm running 2005 Enterprise SP1 (can upgrade to SP2 though).|||

Hi I am not sure if this can help:

35 Mio. I am looking forward to encounter more soon.

try:

Make sure to use the noiseword concept of FT, get rid of the noice first.

Check how expensiv the joins are. Try with and without, maybe do not join other tables at this stage. If you show 10 results per page, you have joined 9990 for nothing.

Add "Categories". Searching in a certain category will bring less hits. (Add Category to keyword)

Instead of querying n-fields, query only one field. -> Category + Title + Text + other keywords -> Keywords. Query Keywords only

Try to Query also against other fields (not fulltext indexed)

Limit the number of hits in datatable (top n_dt ) and in the fulltext (top n_ft*) this should make it very, very, very fast!

.

*top n_ft - You find it here in this forum how to limit the text-hits. Perhaps I find it again and come back!

Of course database needs field AutoId as Clusterd index.

8GB Memory / 64-bit version of SQL

Disks RPM / Controler Cash ? Raid 10? / SAN / RAM Disk (SSD)

How many keywords per record do you have, how long are they in average?

How big is the Textfile?

What is the CPU?

Thomas

No comments:

Post a Comment