Wednesday, March 7, 2012

Large table -- split into smaller ones?

Hi All,
I have to import about 26 million rows, each row about 1Kb in size into
our sql server. This data will be READONLY and might be updated every
quarter.
I am yet to import it and do some performance testing.
what should be the best approach to handling this much data...
should I
* have one large table?
OR
*Horizontally partition data into multiple tables?
thanks for your time
GKIt's generally best to implement a single table unless you have a
compelling reason to do otherwise. A single large table with
appropriate indexes often performs quite well without the additional
administrative complexity of partitioning.
The main advantages of horizontal partitioning are related to admin
tasks like data loads, bulk deletes and index creates. For example,
you'll need about 30GB of free space to build a clustered index on your
table but only a fraction of that amount if partitioned. On the other
hand, if you can load data sorted in clustered index sequence, you can
load with the index in place and forego the index create entirely.
--
Hope this helps.
Dan Guzman
SQL Server MVP
--
SQL FAQ links (courtesy Neil Pike):
http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
http://www.sqlserverfaq.com
http://www.mssqlserver.com/faq
--
"GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
news:edyKf7gqDHA.2496@.TK2MSFTNGP09.phx.gbl...
> Hi All,
> I have to import about 26 million rows, each row about 1Kb in size
into
> our sql server. This data will be READONLY and might be updated every
> quarter.
> I am yet to import it and do some performance testing.
> what should be the best approach to handling this much data...
> should I
> * have one large table?
> OR
> *Horizontally partition data into multiple tables?
> thanks for your time
> GK
>
>|||Thanks Dan,
I understand your point about administrative overheads, but I guess
they are much reduced if I use Partitioned views in SQL server 2000. Would
you advise using that?
As far as loading the data in clustered index sequene is concerned, are you
advising me to, say load data sorted on the clustered index column(s)?
Excuse me if this question is stupid but how do I load with index in place?
Also I didn't understand why it takes so much space to create index for a
large table and less space if the table were to be split up. Does index
creation need thatmuch space just while building the index and the finally
created index take lesser space?
Thank you for your efforts.
GK
"Dan Guzman" <danguzman@.nospam-earthlink.net> wrote in message
news:OmoikLmqDHA.2216@.TK2MSFTNGP12.phx.gbl...
> It's generally best to implement a single table unless you have a
> compelling reason to do otherwise. A single large table with
> appropriate indexes often performs quite well without the additional
> administrative complexity of partitioning.
> The main advantages of horizontal partitioning are related to admin
> tasks like data loads, bulk deletes and index creates. For example,
> you'll need about 30GB of free space to build a clustered index on your
> table but only a fraction of that amount if partitioned. On the other
> hand, if you can load data sorted in clustered index sequence, you can
> load with the index in place and forego the index create entirely.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> --
> SQL FAQ links (courtesy Neil Pike):
> http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
> http://www.sqlserverfaq.com
> http://www.mssqlserver.com/faq
> --
> "GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
> news:edyKf7gqDHA.2496@.TK2MSFTNGP09.phx.gbl...
> > Hi All,
> > I have to import about 26 million rows, each row about 1Kb in size
> into
> > our sql server. This data will be READONLY and might be updated every
> > quarter.
> > I am yet to import it and do some performance testing.
> > what should be the best approach to handling this much data...
> > should I
> > * have one large table?
> > OR
> > *Horizontally partition data into multiple tables?
> >
> > thanks for your time
> > GK
> >
> >
> >
>|||Just found this good msdn article on partioning data.
http://msdn.microsoft.com/library/default.asp?URL=/library/techart/PartitionsInDW.htm
and it says...(although they are talking about a data warehouse...mine will
not be a data warehouse exactly but it wont be an OLTP either. It'll be a
readonly table with quarterly updates)
<quote>
Query Speed
Query speed should not be considered a reason to partition the data
warehouse relational database. Query performance is similar for partitioned
and non-partitioned fact tables. When the partitioned database is properly
designed, the relational engine will include in a query plan only the
partition(s) necessary to resolve that query. For example, if the database
is partitioned by month and a query is conditioned on January 2000, the
query plan will include only the partition for January 2000. The resulting
query will perform well against the partitioned table, about the same as
against a properly indexed combined table with a clustered index on the
partitioning key.
</quote>
"GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
news:%23%23bRH5oqDHA.2772@.TK2MSFTNGP10.phx.gbl...
> Thanks Dan,
> I understand your point about administrative overheads, but I guess
> they are much reduced if I use Partitioned views in SQL server 2000. Would
> you advise using that?
> As far as loading the data in clustered index sequene is concerned, are
you
> advising me to, say load data sorted on the clustered index column(s)?
> Excuse me if this question is stupid but how do I load with index in
place?
> Also I didn't understand why it takes so much space to create index for a
> large table and less space if the table were to be split up. Does index
> creation need thatmuch space just while building the index and the finally
> created index take lesser space?
> Thank you for your efforts.
> GK
> "Dan Guzman" <danguzman@.nospam-earthlink.net> wrote in message
> news:OmoikLmqDHA.2216@.TK2MSFTNGP12.phx.gbl...
> > It's generally best to implement a single table unless you have a
> > compelling reason to do otherwise. A single large table with
> > appropriate indexes often performs quite well without the additional
> > administrative complexity of partitioning.
> >
> > The main advantages of horizontal partitioning are related to admin
> > tasks like data loads, bulk deletes and index creates. For example,
> > you'll need about 30GB of free space to build a clustered index on your
> > table but only a fraction of that amount if partitioned. On the other
> > hand, if you can load data sorted in clustered index sequence, you can
> > load with the index in place and forego the index create entirely.
> >
> > --
> > Hope this helps.
> >
> > Dan Guzman
> > SQL Server MVP
> >
> > --
> > SQL FAQ links (courtesy Neil Pike):
> >
> > http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
> > http://www.sqlserverfaq.com
> > http://www.mssqlserver.com/faq
> > --
> >
> > "GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
> > news:edyKf7gqDHA.2496@.TK2MSFTNGP09.phx.gbl...
> > > Hi All,
> > > I have to import about 26 million rows, each row about 1Kb in size
> > into
> > > our sql server. This data will be READONLY and might be updated every
> > > quarter.
> > > I am yet to import it and do some performance testing.
> > > what should be the best approach to handling this much data...
> > > should I
> > > * have one large table?
> > > OR
> > > *Horizontally partition data into multiple tables?
> > >
> > > thanks for your time
> > > GK
> > >
> > >
> > >
> >
> >
>|||> I understand your point about administrative overheads, but I guess
> they are much reduced if I use Partitioned views in SQL server 2000.
Would
> you advise using that?
It depends on the problem you are trying to solve. The article you
referenced in your other post does a pretty good job of pointing out the
pros and cons. If your primary concern is query performance, then PVs
aren't the answer. Personally, wouldn't change my design to accommodate
partitioning unless the additional complexity is more than offset by the
benefits of load speed and maintenance. IMHO, your quarterly data load
probably doesn't justify the use of PVs.
> As far as loading the data in clustered index sequene is concerned,
are you
> advising me to, say load data sorted on the clustered index column(s)?
> Excuse me if this question is stupid but how do I load with index in
place?
Just create the table and its clustered index. If you then load (bulk
insert) data in sequence by the clustered index, load performance will
be good and you won't need to create the clustered index afterward.
Non-clustered indexes can be created after the load.
> Also I didn't understand why it takes so much space to create index
for a
> large table and less space if the table were to be split up. Does
index
> creation need thatmuch space just while building the index and the
finally
> created index take lesser space?
The space requirement to build the clustered index is a consideration
with large tables. You need free space of about 120% of the original
table size. This is because the entire table is rebuilt during the
create so space for old and new data is needed plus some sort work
space. After the create, space for the old data is released and the end
result is that slightly more space than the original heap is allocated
to accommodate the non-leaf nodes of the clustered index.
With multiple smaller tables, you can build each clustered index
separately. Consequently, you'll only need free space to accommodate
120% of the largest table in the lot.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
news:%23%23bRH5oqDHA.2772@.TK2MSFTNGP10.phx.gbl...
> Thanks Dan,
> I understand your point about administrative overheads, but I
guess
> they are much reduced if I use Partitioned views in SQL server 2000.
Would
> you advise using that?
> As far as loading the data in clustered index sequene is concerned,
are you
> advising me to, say load data sorted on the clustered index column(s)?
> Excuse me if this question is stupid but how do I load with index in
place?
> Also I didn't understand why it takes so much space to create index
for a
> large table and less space if the table were to be split up. Does
index
> creation need thatmuch space just while building the index and the
finally
> created index take lesser space?
> Thank you for your efforts.
> GK
> "Dan Guzman" <danguzman@.nospam-earthlink.net> wrote in message
> news:OmoikLmqDHA.2216@.TK2MSFTNGP12.phx.gbl...
> > It's generally best to implement a single table unless you have a
> > compelling reason to do otherwise. A single large table with
> > appropriate indexes often performs quite well without the additional
> > administrative complexity of partitioning.
> >
> > The main advantages of horizontal partitioning are related to admin
> > tasks like data loads, bulk deletes and index creates. For example,
> > you'll need about 30GB of free space to build a clustered index on
your
> > table but only a fraction of that amount if partitioned. On the
other
> > hand, if you can load data sorted in clustered index sequence, you
can
> > load with the index in place and forego the index create entirely.
> >
> > --
> > Hope this helps.
> >
> > Dan Guzman
> > SQL Server MVP
> >
> > --
> > SQL FAQ links (courtesy Neil Pike):
> >
> > http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
> > http://www.sqlserverfaq.com
> > http://www.mssqlserver.com/faq
> > --
> >
> > "GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
> > news:edyKf7gqDHA.2496@.TK2MSFTNGP09.phx.gbl...
> > > Hi All,
> > > I have to import about 26 million rows, each row about 1Kb in
size
> > into
> > > our sql server. This data will be READONLY and might be updated
every
> > > quarter.
> > > I am yet to import it and do some performance testing.
> > > what should be the best approach to handling this much data...
> > > should I
> > > * have one large table?
> > > OR
> > > *Horizontally partition data into multiple tables?
> > >
> > > thanks for your time
> > > GK
> > >
> > >
> > >
> >
> >
>|||That was very helpful.
Thanks Dan.
"Dan Guzman" <danguzman@.nospam-earthlink.net> wrote in message
news:OdauywrqDHA.2568@.TK2MSFTNGP09.phx.gbl...
> > I understand your point about administrative overheads, but I guess
> > they are much reduced if I use Partitioned views in SQL server 2000.
> Would
> > you advise using that?
> It depends on the problem you are trying to solve. The article you
> referenced in your other post does a pretty good job of pointing out the
> pros and cons. If your primary concern is query performance, then PVs
> aren't the answer. Personally, wouldn't change my design to accommodate
> partitioning unless the additional complexity is more than offset by the
> benefits of load speed and maintenance. IMHO, your quarterly data load
> probably doesn't justify the use of PVs.
> > As far as loading the data in clustered index sequene is concerned,
> are you
> > advising me to, say load data sorted on the clustered index column(s)?
> > Excuse me if this question is stupid but how do I load with index in
> place?
> Just create the table and its clustered index. If you then load (bulk
> insert) data in sequence by the clustered index, load performance will
> be good and you won't need to create the clustered index afterward.
> Non-clustered indexes can be created after the load.
> > Also I didn't understand why it takes so much space to create index
> for a
> > large table and less space if the table were to be split up. Does
> index
> > creation need thatmuch space just while building the index and the
> finally
> > created index take lesser space?
> The space requirement to build the clustered index is a consideration
> with large tables. You need free space of about 120% of the original
> table size. This is because the entire table is rebuilt during the
> create so space for old and new data is needed plus some sort work
> space. After the create, space for the old data is released and the end
> result is that slightly more space than the original heap is allocated
> to accommodate the non-leaf nodes of the clustered index.
> With multiple smaller tables, you can build each clustered index
> separately. Consequently, you'll only need free space to accommodate
> 120% of the largest table in the lot.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
>
> "GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
> news:%23%23bRH5oqDHA.2772@.TK2MSFTNGP10.phx.gbl...
> > Thanks Dan,
> > I understand your point about administrative overheads, but I
> guess
> > they are much reduced if I use Partitioned views in SQL server 2000.
> Would
> > you advise using that?
> >
> > As far as loading the data in clustered index sequene is concerned,
> are you
> > advising me to, say load data sorted on the clustered index column(s)?
> > Excuse me if this question is stupid but how do I load with index in
> place?
> >
> > Also I didn't understand why it takes so much space to create index
> for a
> > large table and less space if the table were to be split up. Does
> index
> > creation need thatmuch space just while building the index and the
> finally
> > created index take lesser space?
> >
> > Thank you for your efforts.
> > GK
> > "Dan Guzman" <danguzman@.nospam-earthlink.net> wrote in message
> > news:OmoikLmqDHA.2216@.TK2MSFTNGP12.phx.gbl...
> > > It's generally best to implement a single table unless you have a
> > > compelling reason to do otherwise. A single large table with
> > > appropriate indexes often performs quite well without the additional
> > > administrative complexity of partitioning.
> > >
> > > The main advantages of horizontal partitioning are related to admin
> > > tasks like data loads, bulk deletes and index creates. For example,
> > > you'll need about 30GB of free space to build a clustered index on
> your
> > > table but only a fraction of that amount if partitioned. On the
> other
> > > hand, if you can load data sorted in clustered index sequence, you
> can
> > > load with the index in place and forego the index create entirely.
> > >
> > > --
> > > Hope this helps.
> > >
> > > Dan Guzman
> > > SQL Server MVP
> > >
> > > --
> > > SQL FAQ links (courtesy Neil Pike):
> > >
> > > http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
> > > http://www.sqlserverfaq.com
> > > http://www.mssqlserver.com/faq
> > > --
> > >
> > > "GK" <ksgkrishremovethis@.yahoo.co.uk> wrote in message
> > > news:edyKf7gqDHA.2496@.TK2MSFTNGP09.phx.gbl...
> > > > Hi All,
> > > > I have to import about 26 million rows, each row about 1Kb in
> size
> > > into
> > > > our sql server. This data will be READONLY and might be updated
> every
> > > > quarter.
> > > > I am yet to import it and do some performance testing.
> > > > what should be the best approach to handling this much data...
> > > > should I
> > > > * have one large table?
> > > > OR
> > > > *Horizontally partition data into multiple tables?
> > > >
> > > > thanks for your time
> > > > GK
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>

No comments:

Post a Comment