Wednesday, March 28, 2012

latest unique records, how to get?

Let's say I have a data entry from a pool of employees:
table is as follow:
EmpNo Branch Date Amount
1 A101 11/30/2007 $0.90
1 A101 11/30/2007 $1.20
2 A101 11/30/2007 $0.90
3 A101 11/30/2007 $0.80

How can I select the whole table and only take in 1 unique latest entry if there are multiple entries for the same day, same branch under same employee number?

Thanks! :DLet's say I have a data entry from a pool of employees:
table is as follow:
EmpNo Branch Date Amount
1 A101 11/30/2007 $0.90
1 A101 11/30/2007 $1.20
2 A101 11/30/2007 $0.90
3 A101 11/30/2007 $0.80

How can I select the whole table and only take in 1 unique latest entry if there are multiple entries for the same day, same branch under same employee number?

Thanks! :D

What do you mean by"select the whole table"? If you are hoping to get the "1 unique latest entry" the two statements contradict each other in my mind!

For example in your quoted sample data, what would be the result you would expect to get back? Just one row containing the $0.90 entry?

Edited to say "what consitutes 'latest' (and 'unique', come to think of it) as far as you are concerned?" If there are 14 entries for employee ID 1 on 11/30/2007 and some of them are for the same amount... what are you hoping to select? If you aren't using any kind of identity field then you must have some other way to make a record unique, right? If this is date & time, then you just select "top 1" or "max" based on the date/time field...? I must be missing something!|||Unless your dates contain a time component, or you have an incrementing surrogate key, there is no way to determine which of the records is "latest" when there are ties.|||If you happen to have a transaction log for the time in question, you could try and sift through that. Though, this would not be an easy task without a third party tool.

I believe there are two: Lumigent Log Explorer, http://www.lumigent.com and Log PI, http://www.logpi.com.

You could also try and use the DBCC LOG command, but this output is rather difficult to decipher.|||wow, that's fast reply. Sorry guys I guess I had missed out some important stuff....posted that from my office before I rushed back home.

my expected end results will be:

EmpNo Branch Date Amount
1 A101 11/30/2007 $1.20
2 A101 11/30/2007 $0.90
3 A101 11/30/2007 $0.80

the first item shall not be selected as there is a "latest" entry with the same employee number, same branch and same date. I only want one unique and latest entry from each employee in a given day. Why would that happen? somebody hand itchy and keyed in twice, or he/she keyed in the first time wrongly, and trying to key in again to replace the first one.

I have no control of the data given....this is a chuck of "dirty data" given to me. And yes, there is no time factor in this case.... :(

I think it's possible...was trying to look for a combi of "TOP" function and others. It's not going to be straight forward....

really appreciate the help from you guys! :D|||Er... but if you have no time component and no unique ID...

Hang on... where has this "dirty data" come from and what format is it in? You seem to be treating the order of the data as significant... is it some kind of flat file in which case the later rows are more recent? If so, there's something to work on, if not, there isn't! Sorry, but if you look at a given employee and there is more than one transaction on a day then what on earth separates the values? Surely you can't hope to do it on value alone?|||oh yah, the "dirty data" are flat files and generated from a DOS based programme. The sequence of the flat file is in proper order. Yes, for sure we could introduce some index to it while doing bulk loading to the database, but what shall I use? even time index alone could be tough to achieve what I wanted....|||Well, unless I'm missing something here, if you simply whack it into a table with an indentity field in it, that will automatically give you a unique index, then you just need to do something like this:

Assuming fields are something like ID (new identity field), EmployeeID, Branch, TDate, Amount

select distinct t1.EmployeeID, t1.BRANCH, t1.TDate, (select top 1 amount from flattable t2 where t2.EmployeeID=t1.EmployeeIDand t2.branch = t1.branch and t2.TDate=t1.TDate order by [id] desc) as amount
FROM flattable t1
ORDER BY t1.EmployeeID ASC

This will pick out the distinct employee, branch and date records and then let you pick off the latest transaction (it'll have a higher generated ID) if there is more than one transaction for that employee, branch and date...

Worked in my test anyway... :-)

No comments:

Post a Comment