Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.pascal.misc > #2763

More about the energy efficiency of Transactional memory and more..

From World90 <toto@toto.toto>
Newsgroups comp.lang.pascal.misc
Subject More about the energy efficiency of Transactional memory and more..
Date 2021-02-13 10:03 -0800
Organization A noiseless patient Spider
Message-ID <s0948l$8ph$5@dont-email.me> (permalink)

Show all headers | View raw


Hello,


More about the energy efficiency of Transactional memory and more..

I have just read the following PhD paper, it is also about energy 
efficiency of Transactional memory, here it is:

Techniques for Enhancing the Efficiency of Transactional Memory Systems

http://kth.diva-portal.org/smash/get/diva2:1258335/FULLTEXT02.pdf

And i think it is the best known energy efficient algorithm for
Transactional memory, but i think it is not good, since
look at how for 64 cores the Beta parameter can be 16 cores,
so i think i am smart and i have just invented a much more energy 
efficient and powerful scalable fast Mutex and i have also just invented 
scalable RWLocks that are starvation-free and fair, read about them in 
my below writing and thoughts:

More about deadlocks and lock-based systems and more..

I have just read the following from an software engineer from Quebec Canada:

A deadlock-detecting mutex

https://faouellet.github.io/ddmutex/

And i have just understood rapidly his algorithm, but i think
his algorithm is not efficient at all, since we can find
if a graph has a strongly connected component in around a time 
complexity O(V+E), so then the algorithm above of the engineer from 
Quebec Canada takes around a time complexity of O(n*(V+E)), so it is not 
good.

So a much better way is to use my following way of detecting deadlocks:

DelphiConcurrent and FreepascalConcurrent are here

Read more here in my website:

https://sites.google.com/site/scalable68/delphiconcurrent-and-freepascalconcurrent

And i will soon enhance much more DelphiConcurrent and 
FreepascalConcurrent to support both Communication deadlocks
and Resource deadlocks.

About Transactional memory and locks..


I have just read the following paper about Transactional memory and locks:

http://sunnydhillon.net/assets/docs/concurrency-tm.pdf


I don't agree with the above paper, since read my following thoughts
to understand:

I have just invented a new powerful scalable fast mutex, and it has the 
following characteristics:

1- Starvation-free
2- Tunable fairness
3- It keeps efficiently and very low its cache coherence traffic
4- Very good fast path performance
5- And it has a good preemption tolerance.
6- It is faster than scalable MCS lock
7- It solves the problem of lock convoying

So my new invention also solves the following problem:

The convoy phenomenon

https://blog.acolyer.org/2019/07/01/the-convoy-phenomenon/


And here is my other new invention of a Scalable RWLock that works 
across processes and threads that is starvation-free and fair and i will 
soon enhance it much more and it will become really powerful:

https://sites.google.com/site/scalable68/scalable-rwlock-that-works-accross-processes-and-threads

And about Lock-free versus Lock, read my following post:

https://groups.google.com/forum/#!topic/comp.programming.threads/F_cF4ft1Qic

And about deadlocks, here is also how i have solved it, and i will soon 
enhance much more DelphiConcurrent and FreepacalConcurrent:

DelphiConcurrent and FreepascalConcurrent are here

Read more here in my website:

https://sites.google.com/site/scalable68/delphiconcurrent-and-freepascalconcurrent


So i think with my above scalable fast mutex and my scalable RWLocks
that are starvation-free and fair and by reading the following about 
composability of lock-based systems, you will notice that lock-based 
systems are still useful.


"About composability of lock-based systems..


Design your systems to be composable. Among the more galling claims of
the detractors of lock-based systems is the notion that they are somehow
uncomposable: “Locks and condition variables do not support modular
programming,” reads one typically brazen claim, “building large programs
by gluing together smaller programs[:] locks make this impossible.”9 The
claim, of course, is incorrect. For evidence one need only point at the
composition of lock-based systems such as databases and operating
systems into larger systems that remain entirely unaware of lower-level
locking.

There are two ways to make lock-based systems completely composable, and
each has its own place. First (and most obviously), one can make locking
entirely internal to the subsystem. For example, in concurrent operating
systems, control never returns to user level with in-kernel locks held;
the locks used to implement the system itself are entirely behind the
system call interface that constitutes the interface to the system. More
generally, this model can work whenever a crisp interface exists between
software components: as long as control flow is never returned to the
caller with locks held, the subsystem will remain composable.

Second (and perhaps counterintuitively), one can achieve concurrency and
composability by having no locks whatsoever. In this case, there must be
no global subsystem state—subsystem state must be captured in
per-instance state, and it must be up to consumers of the subsystem to
assure that they do not access their instance in parallel. By leaving
locking up to the client of the subsystem, the subsystem itself can be
used concurrently by different subsystems and in different contexts. A
concrete example of this is the AVL tree implementation used extensively
in the Solaris kernel. As with any balanced binary tree, the
implementation is sufficiently complex to merit componentization, but by
not having any global state, the implementation may be used concurrently
by disjoint subsystems—the only constraint is that manipulation of a
single AVL tree instance must be serialized."

Read more here:

https://queue.acm.org/detail.cfm?id=1454462


About mathematics and about abstraction..


I think my specialization is also that i have invented many software 
algorithms and software scalable algorithms and i am still inventing 
other software scalable algorithms and algorithms, those scalable 
algorithms and algorithms that i have invented are like inventing 
mathematical theorems that you prove and present in a higher level 
abstraction,  but not only that but those algorithms and scalable 
algorithms of mine are presented in a form of higher level software 
abstraction that abstract the complexity of my scalable algorithms and 
algorithms, it is the most important part that interests me, for example 
notice how i am constructing higher level abstraction in my following 
tutorial as methodology that, first, permits to model the 
synchronization objects of parallel programs with logic primitives with 
If-Then-OR-AND so that to make it easy to translate to Petri nets so 
that to detect deadlocks in parallel programs, please take a look at it 
in my following web link because this tutorial of mine is the way of 
learning by higher level abstraction:


How to analyse parallel applications with Petri Nets


https://sites.google.com/site/scalable68/how-to-analyse-parallel-applications-with-petri-nets

So notice that my methodology is a generalization that solves 
communication deadlocks and resource deadlocks in parallel programs.

1- Communication deadlocks that result from incorrect use of
    event objects or condition variables (i.e. wait-notify
    synchronization).


2- Resource deadlocks, a common kind of deadlock in which a set of
    threads blocks forever because each thread in the set is waiting to
    acquire a lock held by another thread in the set.


This is what interests me in mathematics, i want to work efficiently in 
mathematics in a much higher level of abstraction, i give you
an example of what i am doing in mathematics so that you understand,
look at how i am implementing mathematics as a software parallel 
conjugate gradient system solvers that scale very well, and i am 
presenting them in a higher level of abstraction, this is how i am 
abstracting the mathematics of them,  read the following about it to 
notice it:

About SOR and Conjugate gradient mathematical methods..

I have just looked at SOR(Successive Overrelaxation Method),
and i think it is much less powerful than Conjugate gradient method,
read the following to notice it:

COMPARATIVE PERFORMANCE OF THE CONJUGATE GRADIENT AND SOR METHODS
FOR COMPUTATIONAL THERMAL HYDRAULICS

https://inis.iaea.org/collection/NCLCollectionStore/_Public/19/055/19055644.pdf?r=1&r=1


This is why i have implemented in both C++ and Delphi my Parallel 
Conjugate Gradient Linear System Solver Library that scales very well, 
read my following thoughts about it to understand more:


About the convergence properties of the conjugate gradient method

The conjugate gradient method can theoretically be viewed as a direct 
method, as it produces the exact solution after a finite number of 
iterations, which is not larger than the size of the matrix, in the 
absence of round-off error. However, the conjugate gradient method is 
unstable with respect to even small perturbations, e.g., most directions 
are not in practice conjugate, and the exact solution is never obtained. 
Fortunately, the conjugate gradient method can be used as an iterative 
method as it provides monotonically improving approximations to the 
exact solution, which may reach the required tolerance after a 
relatively small (compared to the problem size) number of iterations. 
The improvement is typically linear and its speed is determined by the 
condition number κ(A) of the system matrix A: the larger is κ(A), the 
slower the improvement.

Read more here:

http://pages.stat.wisc.edu/~wahba/stat860public/pdf1/cj.pdf


So i think my Conjugate Gradient Linear System Solver Library
that scales very well is still very useful, read about it
in my writing below:

Read the following interesting news:

The finite element method finds its place in games

Read more here:

https://translate.google.com/translate?hl=en&sl=auto&tl=en&u=https%3A%2F%2Fhpc.developpez.com%2Factu%2F288260%2FLa-methode-des-elements-finis-trouve-sa-place-dans-les-jeux-AMD-propose-la-bibliotheque-FEMFX-pour-une-simulation-en-temps-reel-des-deformations%2F

But you have to be aware that finite element method uses Conjugate 
Gradient Method for Solution of Finite Element Problems, read here to 
notice it:

Conjugate Gradient Method for Solution of Large Finite Element Problems 
on CPU and GPU

https://pdfs.semanticscholar.org/1f4c/f080ee622aa02623b35eda947fbc169b199d.pdf


This is why i have also designed and implemented my Parallel Conjugate 
Gradient Linear System Solver library that scales very well,
here it is:

My Parallel C++ Conjugate Gradient Linear System Solver Library
that scales very well version 1.76 is here..

Author: Amine Moulay Ramdane

Description:

This library contains a Parallel implementation of Conjugate Gradient 
Dense Linear System Solver library that is NUMA-aware and cache-aware 
that scales very well, and it contains also a Parallel implementation of 
Conjugate Gradient Sparse Linear System Solver library that is 
cache-aware that scales very well.

Sparse linear system solvers are ubiquitous in high performance 
computing (HPC) and often are the most computational intensive parts in 
scientific computing codes.  A few of the many applications relying on 
sparse linear solvers include fusion energy simulation, space weather 
simulation, climate modeling, and environmental modeling, and finite 
element method, and large-scale reservoir simulations to enhance oil 
recovery by the oil and gas industry.

Conjugate Gradient is known to converge to the exact solution in n steps 
for a matrix of size n, and was historically first seen as a direct 
method because of this. However, after a while people figured out that 
it works really well if you just stop the iteration much earlier - often 
you will get a very good approximation after much fewer than n steps. In 
fact, we can analyze how fast Conjugate gradient converges. The end 
result is that Conjugate gradient is used as an iterative method for 
large linear systems today.

Please download the zip file and read the readme file inside the zip to 
know how to use it.

You can download it from:

https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

Language: GNU C++ and Visual C++ and C++Builder

Operating Systems: Windows, Linux, Unix and Mac OS X on (x86)

-- 

As you have noticed i have just written above about my Parallel C++ 
Conjugate Gradient Linear System Solver Library that scales very well, 
but here is my Parallel Delphi and Freepascal Conjugate Gradient Linear 
System Solvers Libraries that scale very well:

Parallel implementation of Conjugate Gradient Dense Linear System solver 
library that is NUMA-aware and cache-aware that scales very well

https://sites.google.com/site/scalable68/scalable-parallel-implementation-of-conjugate-gradient-dense-linear-system-solver-library-that-is-numa-aware-and-cache-aware

PARALLEL IMPLEMENTATION OF CONJUGATE GRADIENT SPARSE LINEAR SYSTEM 
SOLVER LIBRARY THAT SCALES VERY WELL

https://sites.google.com/site/scalable68/scalable-parallel-implementation-of-conjugate-gradient-sparse-linear-system-solver


Thank you,
Amine Moulay Ramdane.

Back to comp.lang.pascal.misc | Previous | Next | Find similar


Thread

More about the energy efficiency of Transactional memory and more.. World90 <toto@toto.toto> - 2021-02-13 10:03 -0800

csiph-web