Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.databases.ms-sqlserver > #922

Re: Break Up Large Table Query Into Results of N Rows

From "Bob Barrows" <reb01501@NOyahooSPAM.com>
Newsgroups comp.databases.ms-sqlserver
Subject Re: Break Up Large Table Query Into Results of N Rows
Date 2012-02-01 15:36 -0500
Organization A noiseless patient Spider
Message-ID <jgc7nu$mbt$1@dont-email.me> (permalink)
References <eb6a2653-8609-495a-b05f-16a104b22f05@y10g2000vbn.googlegroups.com> <5609f740-b8b5-4876-8a4a-5633aa91a3a8@eb6g2000vbb.googlegroups.com> <Xns9FEBEA7DF4E19Yazorman@127.0.0.1> <e6bad866-cf61-47b7-b55e-a1478272431b@t30g2000vbx.googlegroups.com>

Show all headers | View raw


pbd22 wrote:
> On Jan 31, 5:03 pm, Erland Sommarskog <esq...@sommarskog.se> wrote:
>> pbd22 (dush...@gmail.com) writes:
>>> On Jan 31, 9:57 am, pbd22 <dush...@gmail.com> wrote:
>>>> I am working in SQL Server 2005 and want to break up a table of 1M
>>>> rows into distinct results of 80,0000.
>>
> Thanks both a bundle for your responses. Looks like there is life on
> planet
> Google Groups after all, much appreciated!
>
> The specifics is that we are doing email deployments but google is
> moving
> all of the email sent to gmail users to their spam boxes. As a result,
> we have
> to "chunk" the gmail users out of the total amount and send in
> manageable
> batches. We have figured that 80,000 per batch out of the total gmail
> users
> in the table is possible.
>
> And, Erland, to answer your question, I would prefer to turn this into
> a stored
> procedure we can use for the purpose of "chunking" email addresses.
> Accordingly,
> the number per batch should be a variable parameter.
>
> Bob, the table we are querying against is pretty simple. Essentially,
> it has one
> one column - "email_address" which is a varchar. Its data is about 1
> million email
> addresses (but that number changes often). The result set table(s)
> should only have
> two columns, the count (INT) and the email_address (varchar). Please
> see below.
>
> The query I am trying to write is supposed to dump each result set
> (batch)
> to a text file in some folder on the hard drive. Each result set
> should have
> a count and the email addresses as columns. Something like this:
>
> RESULT 1:
>
> [COUNT] [EMAIL_ADDRESS]
>  1              name@gmail.com
>  2              name@gmail.com
<snip>
> And so on up to the to total amount of the gmail addresses out of the
> original table.
>
> So, the statement should read something like this (pseudo code):
>
> select all distinct users

You keep making a point of saying "distinct". Does that imply that there are
duplicate email addresses in that 1-million row table?


> from the master table
> where email_address like '%gmail.com'
> return in batches of N (such as 80,000)
> and write each batch to a text file on the
> hard drive.

I assume you can handle this part, correct? It's only batching the data that
you need help with?

>
> I hope I have explained myself well. Let me know if anything is
> unclear.
>
> Thanks a bundle for your help.
SQL 2008 has some paging functionality builtin but you're using SQL 2005.
I can think of a couple approaches. Here's one:
1. Create a temp table (#batches) with an identity column (indentcol) and an
email column (email). Insert the distinct email addresses into it:
insert #batches (email)
select distinct email_address from master_table where email_address like
'%gmail.com'

Then use a WHILE loop to retrieve the batches, using a variable to keep
track of them.
declare @batchsize int --convert this to a parameter for your sproc
set @batchsize=80000

declare @lastrec int, @endrec int
set @lastrec=(select max(identcol) from #batches)
set @endrec=@batchsize
WHILE @endrec-@batchsize<=@lastrec
BEGIN
    select email from #batches where identcol >=@lastrec
    --process the batch
    delete #batches  where identcol >=@lastrec
    set @endrec=@endrec + @batchsize
END

Back to comp.databases.ms-sqlserver | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Break Up Large Table Query Into Results of N Rows pbd22 <dushkin@gmail.com> - 2012-01-31 06:57 -0800
  Re: Break Up Large Table Query Into Results of N Rows pbd22 <dushkin@gmail.com> - 2012-01-31 12:50 -0800
    Re: Break Up Large Table Query Into Results of N Rows "Bob Barrows" <reb01501@NOyahooSPAM.com> - 2012-01-31 16:49 -0500
    Re: Break Up Large Table Query Into Results of N Rows Erland Sommarskog <esquel@sommarskog.se> - 2012-01-31 23:03 +0100
      Re: Break Up Large Table Query Into Results of N Rows pbd22 <dushkin@gmail.com> - 2012-02-01 07:34 -0800
        Re: Break Up Large Table Query Into Results of N Rows "Bob Barrows" <reb01501@NOyahooSPAM.com> - 2012-02-01 15:36 -0500
        Re: Break Up Large Table Query Into Results of N Rows Erland Sommarskog <esquel@sommarskog.se> - 2012-02-01 23:54 +0100

csiph-web