Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #49945 > unrolled thread
| Started by | Helmut Jarausch <jarausch@igpm.rwth-aachen.de> |
|---|---|
| First post | 2013-07-05 08:22 +0000 |
| Last post | 2013-07-05 18:42 +0000 |
| Articles | 7 on this page of 27 — 6 participants |
Back to article view | Back to comp.lang.python
How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 08:22 +0000
Re: How to make this faster Fábio Santos <fabiosantosart@gmail.com> - 2013-07-05 10:38 +0100
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 10:07 +0000
Re: How to make this faster Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-07-05 11:13 +0100
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 10:53 +0000
Re: How to make this faster Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-05 12:02 +0000
Re: How to make this faster Fábio Santos <fabiosantosart@gmail.com> - 2013-07-05 13:44 +0100
Re: How to make this faster Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-07-05 14:41 +0100
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 14:28 +0000
Re: How to make this faster Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-07-05 15:45 +0100
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 14:48 +0000
Re: How to make this faster Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-07-05 16:26 +0100
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 14:54 +0000
Re: How to make this faster Fábio Santos <fabiosantosart@gmail.com> - 2013-07-05 16:18 +0100
Re: How to make this faster Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-05 16:24 +0000
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 15:17 +0000
Re: How to make this faster Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-07-05 16:38 +0100
Re: How to make this faster MRAB <python@mrabarnett.plus.com> - 2013-07-05 17:25 +0100
Re: How to make this faster Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-06 02:45 +0100
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 15:47 +0000
Re: How to make this faster Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-05 16:52 +0000
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 16:07 +0000
Re: How to make this faster Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-05 16:50 +0000
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 18:39 +0000
Re: How to make this faster Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-06 03:05 +0000
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-06 07:25 +0000
Re: How to make this faster Helmut Jarausch <jarausch@igpm.rwth-aachen.de> - 2013-07-05 18:42 +0000
Page 2 of 2 — ← Prev page 1 [2]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-05 16:52 +0000 |
| Message-ID | <51d6f9b0$0$29999$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #49997 |
On Fri, 05 Jul 2013 15:47:45 +0000, Helmut Jarausch wrote: > > for r, row_lst in enumerate(Grid): > > for c, val in enumerate(row_lst): > > I assume the creation of the temporary lists "row_list" is a bit > expensive. No temporary list is being created. The pre-existing list is just being grabbed, which is fast. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Helmut Jarausch <jarausch@igpm.rwth-aachen.de> |
|---|---|
| Date | 2013-07-05 16:07 +0000 |
| Message-ID | <b3o997Fu5lhU2@mid.dfncis.de> |
| In reply to | #49968 |
On Fri, 05 Jul 2013 16:38:43 +0100, Oscar Benjamin wrote:
> On 5 July 2013 16:17, Helmut Jarausch <jarausch@igpm.rwth-aachen.de> wrote:
>>
>> I've tried the following version
>>
>> def find_good_cell() :
>> Best= None
>> minPoss= 10
>> for r,c in Grid :
>> if Grid[(r,c)] > 0 : continue
>
> Sorry, I think what I meant was that you should have a structure
> called e.g. Remaining which is the set of all (r, c) pairs that you
> want to loop over here. Then there's no need to check on each
> iteration whether or not Grid[(r, c)] > 0. When I said "sparse" I
> meant that you don't need to set keys in Grid unless you actually have
> a value there so the test "Grid[(r, c)] > 0" would look like "(r, c)
> in Grid". Remaining is the set of all (r, c) pairs not in Grid that
> you update incrementally with .add() and .remove().
>
> Then this
>
> for r,c in Grid :
> if Grid[(r,c)] > 0 : continue
>
> becomes
>
> for r, c in Remaining:
>
>> Sq_No= (r//3)*3+c//3
>> Possibilities= 9-len(Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No])
>> if ( Possibilities < minPoss ) :
>> minPoss= Possibilities
>> Best= (r,c)
>>
>> if minPoss == 0 : Best=(-1,-1)
>> return Best
>>
>> All_digits= set((1,2,3,4,5,6,7,8,9))
>
> All_digits= set(range(1, 10))
>
> or
>
> All_digits = {1,2,3,4,5,6,7,8,9}
>
>>
>> def Solve(R_Cells) :
>> if R_Cells == 0 :
>> print("\n\n++++++++++ S o l u t i o n ++++++++++\n")
>> Print_Grid()
>> return True
>>
>> r,c= find_good_cell()
>> if r < 0 : return False
>> Sq_No= (r//3)*3+c//3
>>
>> for d in All_digits - (Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No]) :
>> # put d into Grid
>> Grid[(r,c)]= d
>> Row_Digits[r].add(d)
>> Col_Digits[c].add(d)
>> Sqr_Digits[Sq_No].add(d)
>>
>> Success= Solve(R_Cells-1)
>>
>> # remove d again
>> Grid[(r,c)]= 0
>> Row_Digits[r].remove(d)
>> Col_Digits[c].remove(d)
>> Sqr_Digits[Sq_No].remove(d)
>>
>> if Success :
>> Zuege.append((d,r,c))
>> return True
>>
>> return False
>>
>> which turns out to be as fast as the previous "dictionary only version".
>> Probably, set.remove is a bit slow
>
> No it's not and you're not using it in your innermost loops anyway.
> Probably the loop I referred to isn't your bottleneck.
>
I've tried your suggestion, but unless I've misunderstood your suggestion it's a bit slower
than the solution above.
The solution above take 0.79 seconds (mean of 100 calls) while the following version
take 1.05 seconds (mean of 100 calls):
Grid = {(i,j):0 for i in range(9) for j in range(9)}
Remaining= {(i,j) for i in range(9) for j in range(9)}
Row_Digits = [set() for r in range(9)]
Col_Digits = [set() for c in range(9)]
Sqr_Digits = [set() for s in range(9)]
.... remove some pairs from Remaining for the initial set of the given Sudoku
def find_good_cell() :
Best= None
minPoss= 10
for r,c in Remaining :
Sq_No= (r//3)*3+c//3
Possibilities= 9-len(Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No])
if ( Possibilities < minPoss ) :
minPoss= Possibilities
Best= (r,c)
if minPoss == 0 : Best=(-1,-1)
return Best
All_digits= set(range(1,10))
def Solve(R_Cells) :
if R_Cells == 0 :
print("\n\n++++++++++ S o l u t i o n ++++++++++\n")
Print_Grid()
return True
r,c= find_good_cell()
if r < 0 : return False
Sq_No= (r//3)*3+c//3
for d in All_digits - (Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No]) :
# put d into Grid
Grid[(r,c)]= d
Remaining.remove((r,c))
Row_Digits[r].add(d)
Col_Digits[c].add(d)
Sqr_Digits[Sq_No].add(d)
Success= Solve(R_Cells-1)
# remove d again
Grid[(r,c)]= 0
Remaining.add((r,c))
Row_Digits[r].remove(d)
Col_Digits[c].remove(d)
Sqr_Digits[Sq_No].remove(d)
if Success :
Zuege.append((d,r,c))
return True
return False
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-05 16:50 +0000 |
| Message-ID | <51d6f960$0$29999$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #49998 |
On Fri, 05 Jul 2013 16:07:03 +0000, Helmut Jarausch wrote:
> The solution above take 0.79 seconds (mean of 100 calls) while the
> following version take 1.05 seconds (mean of 100 calls):
1) How are you timing the calls?
2) Don't use the mean, that's the wrong statistic when you are measuring
something where the error is always one sided. You want the minimum, not
the mean.
When you measure the time taken for a piece of code, the number you get
is made up of two components:
1) the actual time the code would have taken, if there were no random
fluctuations due to other processes, etc.; and
2) random errors due to switching to other processes, etc.
Both of these are unknown; you only know the total. But obviously the
random errors are always positive. They can never be negative, and you
can never measure a time which is less than the fastest your code could
run.
(If your anti-virus software starts scanning in the middle of the trial,
it can only make your code take more time to run, never less.)
So the only way to minimize the error is to pick the minimum time, not
the average. The average just gives you:
- some unknown "true" time, plus some unknown error, somewhere
between the smallest error and the biggest error;
whereas the minimum gives you:
- some unknown "true" time, plus the smallest error yet seen.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Helmut Jarausch <jarausch@igpm.rwth-aachen.de> |
|---|---|
| Date | 2013-07-05 18:39 +0000 |
| Message-ID | <b3oi6jFu5lhU3@mid.dfncis.de> |
| In reply to | #50001 |
On Fri, 05 Jul 2013 16:50:41 +0000, Steven D'Aprano wrote: > On Fri, 05 Jul 2013 16:07:03 +0000, Helmut Jarausch wrote: > >> The solution above take 0.79 seconds (mean of 100 calls) while the >> following version take 1.05 seconds (mean of 100 calls): > > 1) How are you timing the calls? I've put the work horse "Solve" into a loop executing 100 times. That's on an otherwise idle Linux machine. > > 2) Don't use the mean, that's the wrong statistic when you are measuring > something where the error is always one sided. You want the minimum, not > the mean. Here you assume time measuring itself is without error - I doubt that. If the timing version, which executes function "Solve" one hundred times, runs about 80-100 seconds without a significant variation, then taking the mean is mathematically correct. I can't take the minimum since I don't measure the time a single call takes. > > When you measure the time taken for a piece of code, the number you get > is made up of two components: > > > 1) the actual time the code would have taken, if there were no random > fluctuations due to other processes, etc.; and > > 2) random errors due to switching to other processes, etc. > > Both of these are unknown; you only know the total. But obviously the > random errors are always positive. They can never be negative, and you > can never measure a time which is less than the fastest your code could > run. > > (If your anti-virus software starts scanning in the middle of the trial, > it can only make your code take more time to run, never less.) > > So the only way to minimize the error is to pick the minimum time, not > the average. The average just gives you: > > - some unknown "true" time, plus some unknown error, somewhere > between the smallest error and the biggest error; > > whereas the minimum gives you: > > - some unknown "true" time, plus the smallest error yet seen.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-06 03:05 +0000 |
| Message-ID | <51d7897a$0$29999$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #50007 |
On Fri, 05 Jul 2013 18:39:15 +0000, Helmut Jarausch wrote:
> On Fri, 05 Jul 2013 16:50:41 +0000, Steven D'Aprano wrote:
>
>> On Fri, 05 Jul 2013 16:07:03 +0000, Helmut Jarausch wrote:
>>
>>> The solution above take 0.79 seconds (mean of 100 calls) while the
>>> following version take 1.05 seconds (mean of 100 calls):
>>
>> 1) How are you timing the calls?
>
> I've put the work horse "Solve" into a loop executing 100 times. That's
> on an otherwise idle Linux machine.
That doesn't explain how you time it, only that you have a loop executing
100 times. Are you using time.time, or time.clock? (I trust you're not
measuring times by hand with a stop watch.)
I expect you're probably doing something like this:
start = time.time()
for i in range(100):
solve(data)
t = time.time() - start
print "time taken", t/100
In this case, you're getting times well over a second, so the overhead is
unlikely to be significant. E.g. the time it takes to create range(100)
will be insignificant compared to the time it takes to run solve.
Nevertheless, as a matter of good practice, I recommend that you use the
timeit module, as it's carefully written to minimize the overhead when
timing small code snippets where the overhead is *not* insignificant.
http://docs.python.org/2/library/timeit.html
http://docs.python.org/3/library/timeit.html
For longer running code, like this, you might also like this:
http://code.activestate.com/recipes/577896/
>> 2) Don't use the mean, that's the wrong statistic when you are
>> measuring something where the error is always one sided. You want the
>> minimum, not the mean.
>
> Here you assume time measuring itself is without error - I doubt that.
No, I don't. I may have glossed over all the sources of measurement
error, but they are all *positive* which is the important thing. The time
measured will never be *less* than the actual time taken. So regardless
of the source of measurement error, the way to minimize it is to only
look at the minimum timing, not the mean.
> If the timing version, which executes function "Solve" one hundred
> times, runs about 80-100 seconds without a significant variation, then
> taking the mean is mathematically correct.
If the best you can say is it takes "80-100 seconds", that's pretty
significant variation, of the order of 20%.
In this case, with times of the order of a second per loop, it may be
reasonable to say "in this specific case, the error is too small to care
about", or "I just don't care about the error, since it will be about the
same for different variations of my solve function". But in that case,
why bother looping 100 times?
> I can't take the minimum
> since I don't measure the time a single call takes.
Then perhaps you should.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Helmut Jarausch <jarausch@igpm.rwth-aachen.de> |
|---|---|
| Date | 2013-07-06 07:25 +0000 |
| Message-ID | <b3pv30Faoc2U1@mid.dfncis.de> |
| In reply to | #50007 |
On Sat, 06 Jul 2013 03:05:30 +0000, Steven D'Aprano wrote: > That doesn't explain how you time it, only that you have a loop executing > 100 times. Are you using time.time, or time.clock? (I trust you're not > measuring times by hand with a stop watch.) > > I expect you're probably doing something like this: > > start = time.time() I have been using time.process_time >> If the timing version, which executes function "Solve" one hundred >> times, runs about 80-100 seconds without a significant variation, then >> taking the mean is mathematically correct. > For longer running code, like this, you might also like this: > http://code.activestate.com/recipes/577896/ Thanks for the pointer. > If the best you can say is it takes "80-100 seconds", that's pretty > significant variation, of the order of 20%. That's not a variation of a SINGLE variant. One variant takes 80 seconds and the other variant to be compared with takes 100 seconds. > > In this case, with times of the order of a second per loop, it may be > reasonable to say "in this specific case, the error is too small to care > about", or "I just don't care about the error, since it will be about the > same for different variations of my solve function". But in that case, > why bother looping 100 times? > > >> I can't take the minimum >> since I don't measure the time a single call takes. > > Then perhaps you should. Many thanks, Helmut
[toc] | [prev] | [next] | [standalone]
| From | Helmut Jarausch <jarausch@igpm.rwth-aachen.de> |
|---|---|
| Date | 2013-07-05 18:42 +0000 |
| Message-ID | <b3oidfFu5lhU4@mid.dfncis.de> |
| In reply to | #49968 |
On Fri, 05 Jul 2013 17:25:54 +0100, MRAB wrote:
> For comparison, here's my solution:
Your solution is very fast, indeed.
It takes 0.04 seconds (mean of 1000 runs) restoring "grid"
in between.
But that's a different algorithm which is IMHO more difficult to understand.
Many thanks,
Helmut
>
> from collections import Counter
>
> problem = '''
> _________
> _____3_85
> __1_2____
> ___5_7___
> __4___1__
> _9_______
> 5______73
> __2_1____
> ____4___9
> '''
>
> # Build the grid.
> digits = "123456789"
>
> grid = []
>
> for row in problem.splitlines():
> if not row:
> continue
>
> new_row = []
>
> for cell in row:
> if cell.isdigit():
> new_row.append({cell})
> else:
> new_row.append(set(digits))
>
> grid.append(new_row)
>
> # Solve the grid.
> changed = True
> while changed:
> changed = False
>
> # Look for cells that contain only one digit.
> for r in range(9):
> for c in range(9):
> if len(grid[r][c]) == 1:
> digit = list(grid[r][c])[0]
>
> # Remove from other cells in same row.
> for c2 in range(9):
> if c2 != c and digit in grid[r][c2]:
> grid[r][c2].remove(digit)
> changed = True
>
> # Remove from other cells in same column.
> for r2 in range(9):
> if r2 != r and digit in grid[r2][c]:
> grid[r2][c].remove(digit)
> changed = True
>
> # Remove from other cells in the same block of 9.
> start_row = r - r % 3
> start_column = c - c % 3
> for r2 in range(start_row, start_row + 3):
> for c2 in range(start_column, start_column + 3):
> if (r2, c2) != (r, c) and digit in grid[r2][c2]:
> grid[r2][c2].remove(digit)
> changed = True
>
> # Look for digits that occur in only one cell in a row.
> for r in range(9):
> counts = Counter()
> for c in range(9):
> counts += Counter(grid[r][c])
>
> unique = {digit for digit, times in counts.items() if times == 1}
>
> for c in range(9):
> if len(grid[r][c]) > 1 and len(grid[r][c] & unique) == 1:
> grid[r][c] &= unique
> changed = True
>
> # Look for digits that occur in only one cell in a column.
> for c in range(9):
> counts = Counter()
> for r in range(9):
> counts += Counter(grid[r][c])
>
> unique = {digit for digit, times in counts.items() if times == 1}
>
> for r in range(9):
> if len(grid[r][c]) > 1 and len(grid[r][c] & unique) == 1:
> grid[r][c] &= unique
> changed = True
>
> # Look for digits that occur in only one cell in a block of 9.
> for start_row in range(0, 9, 3):
> for start_column in range(0, 9, 3):
> counts = Counter()
> for r in range(start_row, start_row + 3):
> for c in range(start_column, start_column + 3):
> counts += Counter(grid[r][c])
>
> unique = {digit for digit, times in counts.items() if times == 1}
>
> for r in range(start_row, start_row + 3):
> for c in range(start_column, start_column + 3):
> if len(grid[r][c]) > 1 and len(grid[r][c] & unique) == 1:
> grid[r][c] &= unique
> changed = True
>
> # Display the solution.
> for row in grid:
> for cell in row:
> print("".join(sorted(cell)), end=" ")
>
> print()
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web