Wednesday, March 7, 2012

Large Tables, Inserts, and selecting data.

Hello,
I'd like to point out that, obviously, I'm not a Database Admin - but I have
read a fair amount on the topic and examined several case studies similiar t
o
this issue.
Is there a recommended size limit for the size of a table in SQL Server
2000? We have a system that is gathering test data in near real time that is
then inserted into SQL and our admin indicated that after 'a while' inserts
were slowing down as the table got bigger. If the table were properly
normalized and indexed, would the size of the table even matter?
The solution currently implemented is that a new table is created every
hour. Each of these tables contain the same schema, but represent data from
a
specific hour of the day. In other words, you end up with 24 tables per day.
Obviously there are other tables now involved that have to track all of this
and the SQL necessary to select the data appears needlessly complex.
So my question, (along with the one above) is quite simple. Is this a common
practice? Are there addtional case studies, web sites, or whatnot that I can
review before participating in the next design meeting?
Thanks for reading!
Chris>> Is there a recommended size limit for the size of a table in SQL Server
No. The physical resources at your disposal determine the table size
In general, no.
That sounds like a poor approach. Other than increasing your maintenance
overhead, prone-ness to error and overall complexity, it seems to offer
nothing beneficial.
Some folks do it, assuming it is somehow simpler to use several smaller
tables instead of a single one. Such systems are often poorly designed
without any consideration to good logical principles and often fall apart
due to increased complexity and lack of scalability options.
The general recommended practice is to use a single table with an attribute
to distingush the individual sets of test data gathered over each hour,
perhaps using a temporal datatype.
Anith|||From a client (consuming) standpoint, your comments make a lot of sense to m
e
as I feel the same, however my lack of qualifications vs the db admin is
proving difficult.
Very good suggestion attributing the data - I'll be sure to bring this up at
our next meeting.
Are there any other suggestions out there? I just came from another meeting
with the DB folks and they are adamant that when the table becomes 'big'
inserts slow down... that does not make sense to me. The topic was then abou
t
indexing and about how indexing was going to kill the insert process as well
.
I think I'm missing something... thank you for the comments, I'll be looking
forward to more.
Chris
"Anith Sen" wrote:

> No. The physical resources at your disposal determine the table size
>
> In general, no.
>
> That sounds like a poor approach. Other than increasing your maintenance
> overhead, prone-ness to error and overall complexity, it seems to offer
> nothing beneficial.
>
> Some folks do it, assuming it is somehow simpler to use several smaller
> tables instead of a single one. Such systems are often poorly designed
> without any consideration to good logical principles and often fall apart
> due to increased complexity and lack of scalability options.
> The general recommended practice is to use a single table with an attribut
e
> to distingush the individual sets of test data gathered over each hour,
> perhaps using a temporal datatype.
> --
> Anith
>
>|||I'd had thought this discussion would have generated more interest.
Is there a group better suited to this question?
Thanks,
Chris
"Chris Keller" wrote:
> From a client (consuming) standpoint, your comments make a lot of sense to
me
> as I feel the same, however my lack of qualifications vs the db admin is
> proving difficult.
> Very good suggestion attributing the data - I'll be sure to bring this up
at
> our next meeting.
> Are there any other suggestions out there? I just came from another meetin
g
> with the DB folks and they are adamant that when the table becomes 'big'
> inserts slow down... that does not make sense to me. The topic was then ab
out
> indexing and about how indexing was going to kill the insert process as we
ll.
> I think I'm missing something... thank you for the comments, I'll be looki
ng
> forward to more.
> Chris
>
> "Anith Sen" wrote:
>|||> >> The solution currently implemented is that a new table is created every
> That sounds like a poor approach. Other than increasing your maintenance
> overhead, prone-ness to error and overall complexity, it seems to offer
> nothing beneficial.
>
> Some folks do it, assuming it is somehow simpler to use several smaller
> tables instead of a single one. Such systems are often poorly designed
> without any consideration to good logical principles and often fall apart
> due to increased complexity and lack of scalability options.
There are, of course, exceptions to this; you can get a performasnce
boost from using a properly-indexed partiioned design in an OLAP
environment if your data influx is very large. We have a single
partioned view containing 90 days worth of data (split into daily
tables), containing nearly 60 million rows of data. By using a
partioned view, index ss are very effecient.
This approach is very effecient on smaller servers; if I had quad
processors and high-speed drives and lots of RAM, I probably could have
gotten away with a single table.
I would NOT recommend this approach unless you have lots of data to
deal with, but I don't think that it's necessarily a bad design as
suggested in the previous quote.
Stu|||>> I would NOT recommend this approach unless you have lots of data to deal
Regarding exceptions, I agree. Logical design is generally done with data
integrity as the primary goal. As you said, exceptions do exist and are
mostly done for specific purposes on a case by case basis.
For instance, partitioned tables/views are often implemented for performance
reasons; constraints must be put in place for preventing data overlaps;
while multiple views are maintained by the system they logically appears as
a single table to the external user.
Anith|||You mention that indexed search is very fast for your solution - can you
define fast?
What about insert performance on indexed tables? Do you see an improvement
by having your data spread across multiple tables? That is a big question
that I have, specifically the argument is that with a large indexed table
that inserts get slower in relation to the size. I don't see why that would
be the case - but perhaps I'm misisng something.
Chris
"Stu" wrote:

> There are, of course, exceptions to this; you can get a performasnce
> boost from using a properly-indexed partiioned design in an OLAP
> environment if your data influx is very large. We have a single
> partioned view containing 90 days worth of data (split into daily
> tables), containing nearly 60 million rows of data. By using a
> partioned view, index ss are very effecient.
> This approach is very effecient on smaller servers; if I had quad
> processors and high-speed drives and lots of RAM, I probably could have
> gotten away with a single table.
> I would NOT recommend this approach unless you have lots of data to
> deal with, but I don't think that it's necessarily a bad design as
> suggested in the previous quote.
> Stu
>|||Our end users analyze firewall events, which can range from 1 event per
incident to hundreds of thousands of events per incident. On average,
they're pulling back 5000 records in milliseconds; note that's 5000
records out of 60,000,000.
INSERTs will slow down in relation to size if your database is having
to grow to accomodate them, especially if your indexes are physically
located on the same drive as your data. However, partitioning your
data is typically more useful during data SELECTion rather than
INSERTing, because you're providing the optimizer hints by which to
narrow down the s.
We also use the partitioning to archive data; on a daily basis, I spin
off the data older than 90 days to an archive db (which is partitioned
on a monthly basis). When a month of data is "closed", we back that
data up to a portable hard drive, which then goes to storage. That
way, we can retrieve data if we need it, but it's not sucking up disk
space on my servers.
Good references on partitioned views:
http://www.sqlteam.com/Item.ASP?ItemID=684
http://www.windowsitpro.com/SQLServ.../8234/8234.html
Stu|||Thank you for the information Stu - I appreciate it.
I should point out that the solution (as currently implemented) is not
partitioned at all. Tables are created in the same database on the same
server - hourly. And I will say that we are not seeing anything simliar to
the perf you are realizing as it can take up to a minute to retrieve 10k
rows...
Looking at the SQL, it appears that most of the work is being done simply
trying to figure out what tables are involved - it is crazy and convuluted
and therefore I'm looking at what other folks have done to see if I can't
persuade folks to change.
Chris
"Stu" wrote:

> Our end users analyze firewall events, which can range from 1 event per
> incident to hundreds of thousands of events per incident. On average,
> they're pulling back 5000 records in milliseconds; note that's 5000
> records out of 60,000,000.
> INSERTs will slow down in relation to size if your database is having
> to grow to accomodate them, especially if your indexes are physically
> located on the same drive as your data. However, partitioning your
> data is typically more useful during data SELECTion rather than
> INSERTing, because you're providing the optimizer hints by which to
> narrow down the s.
> We also use the partitioning to archive data; on a daily basis, I spin
> off the data older than 90 days to an archive db (which is partitioned
> on a monthly basis). When a month of data is "closed", we back that
> data up to a portable hard drive, which then goes to storage. That
> way, we can retrieve data if we need it, but it's not sucking up disk
> space on my servers.
> Good references on partitioned views:
> http://www.sqlteam.com/Item.ASP?ItemID=684
> http://www.windowsitpro.com/SQLServ.../8234/8234.html
> Stu
>|||The nice thing about partitioned views is that the optimizer knows
where the data comes from, and where it belongs, since it uses mutually
exclusive check constraints to place the data. All of our SQL
statements are run against the view; in other words, we'll issue a
command like:
SELECT Columns
FROM View
WHERE CheckColumn = 'some value'
INSERTS and DELETES and UPDATES are also performed on the view itself,
eliminating the need to figure out where the source table is located.
We create tables daily all on the same server; hourly seems a bit much
unless you're dealing with a heck of a lot of data.

No comments:

Post a Comment