Friday, February 24, 2012

large queries: optimal settings (sqlserver, jtds)

Hi
I'm using java servlets with jdbc to query MS sqlserver.
Some of the queries can be large i.e. querying a table of 1 billion
rows and resultsets of 1 million rows. The queries
are all read only looping through the resultset up to a maximum
of 5 million rows. Sometimes the query is performed using
joined tables (select * from a,b where a.ib=b.id). A full
table scan on our hardware takes 20-30 minutes.
I'm using the sqlserver supplied driver and more recently trying the
jtds one.
So what are the best settings I should use for my connection
and statement objects.
Specifics:
i) When doing a statment.execute() does this only
return once the query has finished or are results passed
to the client on the fly.
ii) selectMethod - this option only seems to apply to the
sqlserver supplied driver. Default is "direct" but the docs
say this caches the enire result set to memory, so for large
queries use "cursor". However using direct I don't seem
to witness more memory being used up than when using cursor.
Speed performance seems about the same if the setFetchSize is not
set too small.
iii) Is there a way of roughly knowing waht setFetchSize to use.
iv) I want to use the statement.setQueryTimeout to halt long running
queries and put them into a queue. The queryTimeout only
seems to have an effect if the statement.execute has not
yet returned in the time set but this goes back to i). If the execute
returns has the query completely finished and are all the results
stored somewhere or is the query still running. Logic tells me the
former but I've run a query that takes several minutes but results
start appearing after 30 secs or so.
Thanks
Mike
Mikee wrote:

> Hi
> I'm using java servlets with jdbc to query MS sqlserver.
> Some of the queries can be large i.e. querying a table of 1 billion
> rows and resultsets of 1 million rows. The queries
> are all read only looping through the resultset up to a maximum
> of 5 million rows. Sometimes the query is performed using
> joined tables (select * from a,b where a.ib=b.id). A full
> table scan on our hardware takes 20-30 minutes.
Hi. I'd have to say that you should probably spend whatever resources
you have to redesign this. It's not right for a servlet to select millions
of rows of data out of the DBMS. Operate on raw data where it is, in the
DBMS. Build your saw mill where the trees are. Only bring out data that a
user is going to need to look at. I helped change a payroll application
from external processing that pulled all the raw data out and processing
it, to moving the same algorithms to DBMS stored procedures. This changed
the system from requiring a 16-CPU HP box and taking 8 hours to run,
to taking under 50 minutes.
Joe Weinstein at BEA

> I'm using the sqlserver supplied driver and more recently trying the
> jtds one.
> So what are the best settings I should use for my connection
> and statement objects.
> Specifics:
> i) When doing a statment.execute() does this only
> return once the query has finished or are results passed
> to the client on the fly.
> ii) selectMethod - this option only seems to apply to the
> sqlserver supplied driver. Default is "direct" but the docs
> say this caches the enire result set to memory, so for large
> queries use "cursor". However using direct I don't seem
> to witness more memory being used up than when using cursor.
> Speed performance seems about the same if the setFetchSize is not
> set too small.
> iii) Is there a way of roughly knowing waht setFetchSize to use.
> iv) I want to use the statement.setQueryTimeout to halt long running
> queries and put them into a queue. The queryTimeout only
> seems to have an effect if the statement.execute has not
> yet returned in the time set but this goes back to i). If the execute
> returns has the query completely finished and are all the results
> stored somewhere or is the query still running. Logic tells me the
> former but I've run a query that takes several minutes but results
> start appearing after 30 secs or so.
> Thanks
> Mike
>

No comments:

Post a Comment