Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.programming.threads > #2108
| From | aminer <aminer@toto.net> |
|---|---|
| Newsgroups | comp.programming.threads, comp.programming |
| Subject | Re: More information about the benchmarks... |
| Date | 2014-03-18 17:08 -0700 |
| Organization | albasani.net |
| Message-ID | <lgackn$ne5$1@news.albasani.net> (permalink) |
| References | <lgac4c$ma1$1@news.albasani.net> |
Cross-posted to 2 groups.
On 3/18/2014 4:59 PM, aminer wrote: > > Hello, > > > > Question: > > > Amine , even if you don't have a computer with 32 cores > you have done some calculations and projected to how much > all your RWLocks variants will scale to 32 cores, how have you been > able to done this ? > > > Answer: > > Fist i have said that i have made the read and write sections a little > bit bigger and the execution time inside the read and write sections was > equal and i have taken an average number of cache-lines tranfers and the > code inside the reader section was scalable, and if you look > at the source code of my RWLockX you will notice that i am using the > following code inside the WLock() method: > > --- > repeat; > event2.setEvent; > asm pause end; > event2.resetEvent; > until nbr^.nbr=0; > > for i:=0 to GetSystemThreadCount-1 do > begin > while (FCount1^[i].fcount1<>0) > do > begin > end; > end; > > --- > > first if you are using 32 threads on a 32 cores (ib an x86), the > following code will transfer a number of cache-lines between cores cause > the following code is signaling the portable event object on the reader > side so this will generate let in the worst case 32 cache-lines > transfers and on average we will take for exemple 16 cache-lines > transfers, here is the code: > > repeat; > event2.setEvent; > asm pause end; > event2.resetEvent; > until nbr^.nbr=0; > > that's the same for the following code that will generate cache-lines > transfers between cores: > > for i:=0 to GetSystemThreadCount-1 do > begin > while (FCount1^[i].fcount1<>0) > do > begin > end; > end; > > > So to be able to project the scalability on a 32 cores even if we > don't have a 32 cores computer , we must take for exemple a scenario > with 0.1% of writers , let say the writers will enter WLock() 2000 times > the WLock() method , so the readers will enter the RLock() 2000000 > times, now to be able to project with the Amdahl law , we have to > calculate how much the serial part will take time, first we must > benchmark the time that it will take to serial part of the writers > without readers , and we must benchmark how much it will take for > cache-lines transfers, this part is easy to do with a small program, > after that we will add the too previous serial parts and this will give > us the overall serial part of the Amdahl equation and we will call it > serial part S, after that we will calculate how much it will take for > the readers to enter 20000000 times the reader sections let call it I mean 2000000 times. > Parallel part P and after that we will plug all the results into the > Amdahl equation and this will give us the calculation results of the > scalability of my RWLocks on a x86 computer with 32 cores. > > > Hope you have understood now. > > > Thank you, > Amine Moulay Ramdane. > > > > > > > > > > > Reponse:
Back to comp.programming.threads | Previous | Next — Previous in thread | Next in thread | Find similar
More information about the benchmarks... aminer <aminer@toto.net> - 2014-03-18 16:59 -0700 Re: More information about the benchmarks... aminer <aminer@toto.net> - 2014-03-18 17:08 -0700 Re: More information about the benchmarks... aminer <aminer@toto.net> - 2014-03-18 17:45 -0700
csiph-web