sql large inserts: Large Volumes of Varchar Data

Hello all,

I have recently been task with rewriting a database that holds large volumes of data, whilst ensuring that query can be run in optimal time. Having never really delved into this sort of thing before, I hoped you guys might be able to offer some advice and guidance.

The design I have inherited is based around 2 main tables:

[captured_traps]
[id] [int] IDENTITY (1, 1) NOT NULL
[snmp_version] [int] NULL
[community_name] [varchar] (255)
[packet_type] [varchar] (50)
[oid] [varchar] (500)
[source_ip] [varchar] (15)
[generic] [int] NULL
[specific] [int] NULL
[time_stamp] [varchar] (15)
[trap_entered] [datetime] NULL
[status] [int] NULL

[captured_varbinds]
[id] [int] IDENTITY (1, 1) NOT NULL
[captured_trap_id] [int] NOT NULL
[varbind_oid] [varchar] (500)
[varbind_text] [varchar (500)

The relationship between the two tables is on the "captured_traps (id)" to "captured_varbinds (captured_trap_id)". Currently the "captured_traps" table contains around 350 million rows, the "captured_varbinds" table contains around 900 million rows.

Now as you can probably gather this model runs like a...well it sort of hobbles more than runs hence the need to redesign.

My current thoughts on this are:

- Normalising all varchars - there is alot of duplicate values in most of the varchar fields.
- Full Text Indexing

However beyond that I am not sure which route to go down. After googling for most of today I have come across a number of "solutions" however I do not want to go steaming down the track of one of these to discover that it is fatally flawed somewhere.what fields do you currently search and join on? are there indices there? if so have they been maintained? are they in seperate filegroups and on seperate disks than the data? Any thought given to partititioning?|||As far as searches are concerned users may search any field or combination of fields. A typical search example might be:

snmp_version = 2
oid = 1.2.5.67345.12.34
trap_entered > a UTC date time
trap_entered < a UTC date time

This would be expected to return all data from captured_traps and all associated captured_varbinds with the relationship on "captured_traps (id)" to "captured_varbinds (captured_trap_id)".

or a user might search on the varbind table instead, and expect to see all traps with a specific varbind_oid value, for example

varbind_oid = 2.3.4.123.6.4.5.8766
trap_entered > a UTC date time
trap_entered < a UTC date time

This would be expected to return all traps within the given date range that have a varbind with of the given type

The previous owner of this DB only put indexes on the two id columns

As far as filegroups and partioning are concerned, I have encountered these whilst searching for solutions - and if viable may be implemented.

Basically I have a clean slate here, so i am open to any and all suggestions|||what version of SQL Server are you running?

We will also need to see the indexing methodology...do you know how to script those?

Also, how is all the data accessed? Stored Procedures only?

What do you get when you do sp_depends on those 2 tables?|||As mentioned earlier I have a clean slate to work from, so version can be anything from 2000sp4 to 2005. Which ever is better suited to the task.

Data will only ever be accessed via stored procedures, however due to the fact that searches can be on any fields or combination of fields the main search stored procedure will build and execute the SQL dynamically (unless anybody has any better ideas)

As far as indexes are concerned, this are the full scripts for the current model

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[FK_snmp_captured_varbinds_snmp_captured_traps]') and OBJECTPROPERTY(id, N'IsForeignKey') = 1)
ALTER TABLE [dbo].[snmp_captured_varbinds] DROP CONSTRAINT FK_snmp_captured_varbinds_snmp_captured_traps
GO

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[snmp_captured_traps]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[snmp_captured_traps]
GO

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[snmp_captured_varbinds]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[snmp_captured_varbinds]
GO

CREATE TABLE [dbo].[snmp_captured_traps] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[snmp_version] [int] NULL ,
[community_name] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[packet_type] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[oid] [varchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[source_ip] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[generic] [int] NULL ,
[specific] [int] NULL ,
[time_stamp] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[packet_data] [image] NULL ,
[trap_entered] [datetime] NULL ,
[status] [int] NULL ,
[support_reference] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[listener_event_id] [int] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

CREATE TABLE [dbo].[snmp_captured_varbinds] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[trap_id] [int] NULL ,
[varbind_oid] [varchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[varbind_text] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

ALTER TABLE [dbo].[snmp_captured_traps] WITH NOCHECK ADD
CONSTRAINT [PK_snmp_captured_traps] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[snmp_captured_varbinds] WITH NOCHECK ADD
CONSTRAINT [PK_snmp_captured_varbinds] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[snmp_captured_traps] WITH NOCHECK ADD
CONSTRAINT [DF_snmp_captured_traps_status] DEFAULT (1) FOR [status]
GO

CREATE INDEX [IDX_snmp_captured_varbinds_trap_id] ON [dbo].[snmp_captured_varbinds]([trap_id]) ON [PRIMARY]
GO

ALTER TABLE [dbo].[snmp_captured_varbinds] ADD
CONSTRAINT [FK_snmp_captured_varbinds_snmp_captured_traps] FOREIGN KEY
(
[trap_id]
) REFERENCES [dbo].[snmp_captured_traps] (
[id]
) ON DELETE CASCADE
GO

Edit : in the new model the packet_data column of the snmp_captured_traps table is not required|||If you are using sprocs, why would you need dynamic sql?

In any case, you only have 1 index...

How do the rows get added to these tables, and what is the rate of inserts

You will need more indexes, and probably need to utilize index intersection instead of of coverering indexes

With the additional indexes you are probably scanning the tables...|||Well the dynamic SQL is an assumption I am carrying across from the current design. Currently there is no stored proc for doing searches, the application that sits ontop of this database builds an SQL string based on user input and then executes it. Something along the lines of:

SELECT x,y,z FROM snmp_captured_traps WHERE a=b AND c=d AND e>f AND g<h

I dont like the fact that that the SQL string is controlled by the app, it also means that remote users have to have select priveledges on the necessary tables. Hence the idea of moving that code logic into a stored procedure. I am open to any ideas/suggestions on how to do this using an alternative system

As far as getting data into the tables at the moment there is simply a series of simple insert stored procedures, e.g.

CREATE PROCEDURE add_captured_trap
(
@.snmp_version int,
@.community_name varchar(255),
@.packet_type varchar(50),
@.oid varchar(500),
@.source_ip varchar(15),
@.generic int,
@.specific int,
@.trap_entered datetime,
@.packet_data image,
@.listener_event_id int
)
AS BEGIN TRANSACTION

INSERT INTO snmp_captured_traps (
snmp_version,
community_name,
packet_type,
oid,
source_ip,
generic,
specific,
trap_entered,
packet_data,
listener_event_id
)
VALUES (
@.snmp_version,
@.community_name,
@.packet_type,
@.oid,
@.source_ip,
@.generic,
@.specific,
@.trap_entered,
@.packet_data,
@.listener_event_id
)

select scope_identity();

COMMIT TRANSACTION
GO

CREATE PROCEDURE add_captured_varbind
(
@.trap_id int,
@.varbind_oid varchar(500),
@.varbind_text text
)
AS BEGIN TRANSACTION

INSERT INTO snmp_captured_varbinds (
trap_id,
varbind_oid,
varbind_text
)
VALUES (
@.trap_id,
@.varbind_oid,
@.varbind_text
);

COMMIT TRANSACTION
GO

As you can see nothing to clever, and it worked ok until the load became to much.

As far as volumes of inserts, on average the system receives 50 traps a second, each trap has an average of 2 varbinds. So thats 150 inserts

One other thing, everything I have said so far seems to indicate that I want to redesign from scratch. Thats not entirely true, if there is something that I can apply to the current model in order to get things running quicker then that is my preferred short term option

Friday, March 9, 2012

Large Volumes of Varchar Data - Design Advice

No comments:

Post a Comment

sql large inserts

Blog Archive

About Me