Friday, February 24, 2012

Large OLTP DB design ponderings

I am in the middle of designing a large OLTP DB app that will also be expected to provide complex reporting capabilities with reasonable response times. There will be tens of thousands of entries a day.

I am torn between minimally indexing my DB and trying to achieve a balance between write speed and read speed or leaving the DB in a heap and creating a linked DB that would mirror the original DB but would be fully indexed with partitions and indexed views, etc. I will probably have a third DB that would serve only as a mirrored backup to the primary with a witness.

Does this seem a reasonable approach? Or am I going over the "overkill" edge?

Fast-response OLTP apps and complex reports don't usually mix particularly well. As a test, why don't you try loading your database with a predicted couple of years' worth of test data and seeing how your design performs?

If you have the ability to add a server dedicated to reporting at this early stage then it would be a lot easier to factor the second server into the design now than it would be to incorporate it later on.

Chris

|||

Chris,

Thanks. That is what I am beginning to do. We are about half way through the project and I was brought on board to "tune" the system.

I guess I was asking if there ids anything inherently wrong with leaving the "entry" DB in a heap and creating a "reporting" DB indexed out the wazoo. I am assuming both would run faster but there would be a lag as indexes were applied to the transaction logs as they were shipped. I can live with some latency in the data if it will make both processes (read & write) "pop".

|||

There's nothing wrong with doing that at all - you're simply optimising each copy of the database for its intended purpose.

I would keep your options open in terms of indexing on the entry table - don't forget that a Primary Key is actually an index. I don't know if your OLTP application needs to read real-time data from the entry table, if it does then you might find that a strategically placed index (or two) will actually help without significantly degrading inserts.

How are you planning to copy data from the primary OLTP database to the secondary database? If you are using Database Mirroring then you will not be able to read data from the mirrored copy of the database during normal operation (due to the database being in a recovering state), however you would be able to create a database snapshot on the mirrored copy which would allow you to read the data, but frozen at a point in time. For this reason you might want to consider using transational replication instead to maintain the report server's copy of the data.

Chris

|||

Actually I was thinking of a system that combined a primary-witness-mirror/quorum system for availability with a 2nd instance on each server having a read only copy that would be updated via transaction log shipping at short intervals.

Currently every table has a clustered index/primary key field, usually an identity field. It will probably be best to leave that in place, give an ample fill factor and rebuild the DB as part of routine (nightly) maintenance.

|||

Personally, unless you are going to be performing a lot of updates to the data, I'd leave the fill factor set at 0 (or 100) on tables where the clustered index is on an identity column - any other setting will result in wasted disk space. If you are expecting updates then a value of 80-90 should be OK. Again, it would be good to test this to find the optimal setting.

If you are going to copy the data over by either shipping the transaction logs or by mirroring, then your reporting indexes would have to exist in the primary database for them to be available on the secondary server. With replication you can choose to execute a 'post-snapshot script' that could be used to apply the indexes to the subscription (secondary) database once the initial snapshot has been applied.

Chris

|||

I just converted a client from a single database system to a 2 database system, and they were able to reap other benefits as well.

They had their reports pulling from the OLTP system, but that would interfere with the business users (slow the system down) and cause the reports to be slow.

I installed a new server, put SQL Server 2005 on it, and I import the OLTP daily. I have setup a data warehouse for their reporting needs, and now the reports are very fast, and I dont interfere with the OLTP system.

BobP

|||

Thanks Bob.

That is probably the type of system I am working toward. I think I have to have data more current than daily, but that detail can be worked out. Do you import a snapshot and then run a script to index the data or is it indexed in the OLTP DB?

|||

There are some indexes in the OLTP, but what I do is using SSIS, I transform the data to make it more report friendly.

Then I indexed the heck out of it :-D

If you need more than daily, my suggestion, and I have done this in the past, is to setup a transactional replication from the OLTP server to a staging server. Then import as many times as you want from the staging server. (The staging server can even be your reporting server, if you have enough disk space)

BobP

|||

Thanks again.....I think that is my solution as eventually we are building a BI system for them.

I'll try transactional replication to a staging server and set the frequency to allow enough time for indexes to be applied. I can run daily reports from the staging DB (heck, you can get a terabyte for $500 these days) and ship the data off to the warehouse to do all the BI stuff - that I'm still learning :-)

Thanks again Chris and Bob (I love this board!)

No comments:

Post a Comment