Path: csiph.com!usenet.pasdenom.info!news.etla.org!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.034 X-Spam-Evidence: '*H*': 0.93; '*S*': 0.00; 'subject:text': 0.05; 'cache': 0.07; 'layers': 0.07; 'memory.': 0.07; "ain't": 0.09; 'bytes,': 0.09; 'caching,': 0.09; 'decodes': 0.09; 'executes': 0.09; 'linear': 0.09; 'logic': 0.09; 'lookup': 0.09; 'next,': 0.09; 'percentage': 0.09; 'rows': 0.09; 'spec': 0.09; 'subset': 0.09; 'subject:question': 0.10; 'random': 0.14; 'caching': 0.16; 'chip': 0.16; 'conditional': 0.16; 'discarded': 0.16; 'fetch': 0.16; 'processors': 0.16; 'quoted': 0.16; 'segment': 0.16; 'set,': 0.16; 'sorting': 0.16; 'variables,': 0.16; 'exception': 0.16; 'so.': 0.16; 'wrote:': 0.18; '(not': 0.18; 'memory': 0.22; 'saying': 0.22; 'separate': 0.22; 'header:User-Agent:1': 0.23; '"you': 0.24; 'byte': 0.24; 'bytes': 0.24; 'days,': 0.24; 'processor': 0.24; 'regardless': 0.24; 'first,': 0.26; 'tables': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'array': 0.29; 'instruction': 0.29; "doesn't": 0.30; 'bigger': 0.30; 'gives': 0.31; '(although': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'strip': 0.31; "we're": 0.32; 'quite': 0.32; 'becomes': 0.33; 'actual': 0.34; 'table': 0.34; 'maybe': 0.34; "can't": 0.35; 'something': 0.35; 'but': 0.35; 'add': 0.35; 'there': 0.35; 'accessing': 0.36; 'disk': 0.36; 'in.': 0.36; 'ram': 0.36; 'done': 0.36; 'next': 0.36; "i'll": 0.36; 'should': 0.36; 'example,': 0.37; 'virtual': 0.37; 'being': 0.38; 'branch': 0.38; 'generic': 0.38; 'writes': 0.38; 'to:addr:python-list': 0.38; 'fact': 0.38; 'that,': 0.38; 'to:addr:python.org': 0.39; 'address.': 0.39; 'skip:p 20': 0.39; 'space': 0.40; 'days': 0.60; 'even': 0.60; 'read': 0.60; 'dave': 0.60; 'balance': 0.61; 'simple': 0.61; 'times': 0.62; 'address': 0.63; 'real': 0.63; 'such': 0.63; 'soon': 0.63; 'happen': 0.63; 'more': 0.64; 'total': 0.65; 'levels': 0.65; 'charset:windows-1252': 0.65; 'between': 0.67; 'believe': 0.68; '10000': 0.68; 'approaches': 0.68; 'received:74.208': 0.68; 'store,': 0.68; 'subject': 0.69; 'physical': 0.72; 'address,': 0.75; 'potentially': 0.81; 'hardly': 0.84; 'optimized.': 0.84; 'presumably': 0.84; 'quicker': 0.84; 'quote,': 0.84; 'ram,': 0.84; 'slowed': 0.84; 'start.': 0.84; 'valid,': 0.84; 'angel': 0.91; 'controller': 0.91; 'divided': 0.91; 'increases': 0.91; 'ratio': 0.91; 'reside': 0.91; 'thoroughly': 0.91; 'differences': 0.93; 'many,': 0.93 Date: Fri, 27 Feb 2015 02:30:46 -0500 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Newbie question about text encoding References: <201502241507.t1OF7aUm018883@fido.openend.se> <201502241524.t1OFO09k022270@fido.openend.se> <201502241620.t1OGKf4n002146@fido.openend.se> <54ECB134.5090304@davea.name> <201502241945.t1OJjshO013092@fido.openend.se> <201502241957.t1OJvrJS015604@fido.openend.se> <00fbd940-52f6-44e2-bf08-b9f35c12e73f@googlegroups.com> <54efc2c8$0$12986$c3e8da3$5496439d@news.astraweb.com> <54f00787$0$12979$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: <54f00787$0$12979$c3e8da3$5496439d@news.astraweb.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:NHprlVXuc5CfrOrvqc5MyqzBonnuJNoO7owo067Snw5 F+MAkLy86QTyiP+soYoHRGaTSjKLjqwYHa0O29A1kx+x98ghQo SJJAfWwW+r/qdNaZ0+9ieHJPeCfMs0qJOXg9cmVAuqd7WHkBcO LSRIcB9rdx7J06HLBCC6ifAP0E0T4ADP+5Stjz1AsMpNEWrbRQ HAoaeVQHTmX3KMtZHJKrAbsGC4ZZcVU8DqiR5fijvOGBHGQuwl 6Mov1bitsmV0Qvv+dbL8F9Fo+eqVIY2gmSh1XYwvB0nhyqVsW/ AzxkXdh7JNY3OneynFJl8v8PdH328Y5+KWqpHHWvfnFdqYxQjd FPVT+lJWv93aUb0Mo5pc= X-UI-Out-Filterresults: notjunk:1; X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 70 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1425022259 news.xs4all.nl 2831 [2001:888:2000:d::a6]:60427 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:86564 On 02/27/2015 12:58 AM, Steven D'Aprano wrote: > Dave Angel wrote: > >> (Although I believe Seymour Cray was quoted as saying that virtual >> memory is a crock, because "you can't fake what you ain't got.") > > If I recall correctly, disk access is about 10000 times slower than RAM, so > virtual memory is *at least* that much slower than real memory. > It's so much more complicated than that, that I hardly know where to start. I'll describe a generic processor/OS/memory/disk architecture; there will be huge differences between processor models even from a single manufacturer. First, as soon as you add swapping logic to your processor/memory-system, you theoretically slow it down. And in the days of that quote, Cray's memory was maybe 50 times as fast as the memory used by us mortals. So adding swapping logic would have slowed it down quite substantially, even when it was not swapping. But that logic is inside the CPU chip these days, and presumably thoroughly optimized. Next, statistically, a program uses a small subset of its total program & data space in its working set, and the working set should reside in real memory. But when the program greatly increases that working set, and it approaches the amount of physical memory, then swapping becomes more frenzied, and we say the program is thrashing. Simple example, try sorting an array that's about the size of available physical memory. Next, even physical memory is divided into a few levels of caching, some on-chip and some off. And the caching is done in what I call strips, where accessing just one byte causes the whole strip to be loaded from non-cached memory. I forget the current size for that, but it's maybe 64 to 256 bytes or so. If there are multiple processors (not multicore, but actual separate processors), then each one has such internal caches, and any writes on one processor may have to trigger flushes of all the other processors that happen to have the same strip loaded. The processor not only prefetches the next few instructions, but decodes and tentatively executes them, subject to being discarded if a conditional branch doesn't go the way the processor predicted. So some instructions execute in zero time, some of the time. Every address of instruction fetch, or of data fetch or store, goes through a couple of layers of translation. Segment register plus offset gives linear address. Lookup those in tables to get physical address, and if table happens not to be in on-chip cache, swap it in. If physical address isn't valid, a processor exception causes the OS to potentially swap something out, and something else in. Once we're paging from the swapfile, the size of the read is perhaps 4k. And that read is regardless of whether we're only going to use one byte or all of it. The ratio between an access which was in the L1 cache and one which required a page to be swapped in from disk? Much bigger than your 10,000 figure. But hopefully it doesn't happen a big percentage of the time. Many, many other variables, like the fact that RAM chips are not directly addressable by bytes, but instead count on rows and columns. So if you access many bytes in the same row, it can be much quicker than random access. So simple access time specifications don't mean as much as it would seem; the controller has to balance the RAM spec with the various cache requirements. -- DaveA