Groups > comp.compilers > #2978 > unrolled thread

Programming language similarity

Started by	Derek Jones <derek@NOSPAM-knosof.co.uk>
First post	2022-04-25 00:00 +0100
Last post	2022-04-25 12:06 -0400
Articles	9 — 5 participants

Back to article view | Back to comp.compilers

  Programming language similarity Derek Jones <derek@NOSPAM-knosof.co.uk> - 2022-04-25 00:00 +0100
    Re: Programming language similarity Derek Jones <derek@NOSPAM-knosof.co.uk> - 2022-04-25 08:59 +0100
    Re: Programming language similarity Fernando <pronesto@gmail.com> - 2022-04-25 04:24 -0700
      Re: Programming language similarity Derek Jones <derek@NOSPAM-knosof.co.uk> - 2022-04-25 19:35 +0100
    Re: Programming language similarity Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-04-25 06:00 -0700
      Re: Programming language similarity Derek Jones <derek@NOSPAM-knosof.co.uk> - 2022-04-25 20:51 +0100
        Re: Programming language similarity gah4 <gah4@u.washington.edu> - 2022-04-25 14:58 -0700
          Re: Programming language similarity Derek Jones <derek@NOSPAM-knosof.co.uk> - 2022-04-26 00:50 +0100
    Re: Programming language similarity Meshach Mitchell <meshach.mitchell@gmail.com> - 2022-04-25 12:06 -0400

#2978 — Programming language similarity

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2022-04-25 00:00 +0100
Subject	Programming language similarity
Message-ID	<22-04-012@comp.compilers>

All,

There has been remarkably little work that tries to measure
programming language similarity.

Yes, there are many multi-language runtime benchmark comparisons, and
people extract data from Wikipedia to made dubious claims.

Does anybody know of other kinds of attempts at measuring language
similarity?

Here is one approach
https://shape-of-code.com/2022/04/24/programming-language-similarity-based-on-their-traits/
[That seems awfully simplistic.  Fortran and PL/I both have FORMAT statements that look
superficially similar but the semantics are very different. -John]

[toc] | [next] | [standalone]

#2979

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2022-04-25 08:59 +0100
Message-ID	<22-04-013@comp.compilers>
In reply to	#2978

John,

> https://shape-of-code.com/2022/04/24/programming-language-similarity-based-on-their-traits/
> [That seems awfully simplistic.  Fortran and PL/I both have FORMAT statements that look
> superficially similar but the semantics are very different. -John]

Many keywords have different meanings, e.g., the do keyword in Fortran/C.

Even binary operators differ, binary plus for string concatenation.

The blog post uses a token based approach, which does not require
lots of time to gather the data.

A semantics based approach requires lots of head scratching. I made a
start by collecting information on function definitions (mostly forms
of argument passing). The semantic traits I looked at tended to have a
small number of characteristics, so some form of aggregating is needed
to create significant differences.

[toc] | [prev] | [next] | [standalone]

#2980

From	Fernando <pronesto@gmail.com>
Date	2022-04-25 04:24 -0700
Message-ID	<22-04-014@comp.compilers>
In reply to	#2978

Hi Derek,

Your repository is very nice! Can I use the "language info" part in the class
on programming language paradigms? It will be nice to give students some idea
about the number of keywords in different programming languages, for
instance.

By the way, perhaps you should consider also comparing the languages with
regards to the static and the dynamic aspects of their type systems, e.g.:
typing discipline (static, dynamic, gradual?), type verification (inference,
annotations, mixed?), type enforcement (weak, strong), static type equivalence
(nominal, structural, mixed?), etc. That might lead to very different trees.
For instance, in your keyword tree, Java and JavaScript are close, but they
are very different semantically.

> Does anybody know of other kinds of attempts at measuring language
similarity?

About that: I don't know of other studies. There is the article on Wikipedia
(Programming Languages Comparison), but it does not cite a paper with a
comparative study.

Regards,

Fernando

[toc] | [prev] | [next] | [standalone]

#2984

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2022-04-25 19:35 +0100
Message-ID	<22-04-018@comp.compilers>
In reply to	#2980

Fernando,

> Your repository is very nice! Can I use the "language info" part in the class
> on programming language paradigms? It will be nice to give students some idea

Please do.  The code is under a GPL license.

> about the number of keywords in different programming languages, for
> instance.

I was surprised by the diversity of words used.

> By the way, perhaps you should consider also comparing the languages with
> regards to the static and the dynamic aspects of their type systems, e.g.:
> typing discipline (static, dynamic, gradual?), type verification (inference,
> annotations, mixed?), type enforcement (weak, strong), static type equivalence
> (nominal, structural, mixed?), etc. That might lead to very different trees.

I looked into building a tree based on allowed implicit types, with
the hope of coming up with a measure of strong/week typing.

A list of implicit conversions performed by a language seems like a
good start. But this approach makes Fortran 77 look like it's strongly
typed; there are fewer implicit conversions than other languages
because it supports fewer types, e.g., no enums or pointers. C's
relatively large number of integer types, and the corresponding
implicit conversions, make it look weakly typed compared to languages
with fewer integer types (and hence fewer implicit conversions).

The list of characteristics you list might be combined in some
meaningful way, such that a type 'distance' tree could be constructed.
Lots of careful reading of language specifications would be needed to
figure out the details.

> About that: I don't know of other studies. There is the article on Wikipedia
> (Programming Languages Comparison), but it does not cite a paper with a
> comparative study.

Some of the Yes/No classifications on this page are somewhat surprising
(at least to me)
https://en.wikipedia.org/wiki/Comparison_of_programming_languages

[toc] | [prev] | [next] | [standalone]

#2982

From	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Date	2022-04-25 06:00 -0700
Message-ID	<22-04-016@comp.compilers>
In reply to	#2978

On Monday, April 25, 2022 at 4:49:03 AM UTC+2, Derek Jones wrote:
> All,
>
> There has been remarkably little work that tries to measure
> programming language similarity.
>
> Yes, there are many multi-language runtime benchmark comparisons, and
> people extract data from Wikipedia to made dubious claims.
>
> Does anybody know of other kinds of attempts at measuring language
> similarity? ...

Just some "food for thought" on a conceptually similar topic:

Denis Roegel: A brief survey of 20th century logical notations (https://hal.inria.fr/hal-02340520/document)

-atom

[toc] | [prev] | [next] | [standalone]

#2985

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2022-04-25 20:51 +0100
Message-ID	<22-04-019@comp.compilers>
In reply to	#2982

Jan,

> Denis Roegel: A brief survey of 20th century logical notations (https://hal.inria.fr/hal-02340520/document)

This is an interesting collection of decisions made by authors
over 120 years.

What makes somebody choose a particular set of symbols.
My guess is that their past experience is a major factor,
i.e., the use of symbols they had previously been exposed to.

Of course it could be something as mundane as the characters
available on their typewriter, or their printer of the journal
the work was published in.

Then again, academics do love to do their own thing.  Perhaps
the decisions are based on the need to be different.

[toc] | [prev] | [next] | [standalone]

#2986

From	gah4 <gah4@u.washington.edu>
Date	2022-04-25 14:58 -0700
Message-ID	<22-04-020@comp.compilers>
In reply to	#2985

On Monday, April 25, 2022 at 1:54:58 PM UTC-7, Derek Jones wrote:

(snip)

> What makes somebody choose a particular set of symbols.
> My guess is that their past experience is a major factor,
> i.e., the use of symbols they had previously been exposed to.

Early Fortran was limited by the number of characters available
on the IBM 026 keypunch.  They redefined some of the punch
codes with different symbols for scientific use, as that was
easier than designing a whole new machine.

Much of that was then fixed with EBCDIC in S/360, where
an 8 bit code allowed, and pretty much required, that they be
separated. In any case, the characters (with new punches)
were kept.  (And new compilers have an option to accept
the old punch codes.)

I do remember punching ALGOL programs on the 026, where
you had to use the multipunch key, along with big charts on
the wall, to get the needed characters.

In any case, character set limitations stay with us long after
the reason for the limitation has gone.

[toc] | [prev] | [next] | [standalone]

#2988

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2022-04-26 00:50 +0100
Message-ID	<22-04-022@comp.compilers>
In reply to	#2986

gah4,

> In any case, character set limitations stay with us long after
> the reason for the limitation has gone.

More than you probably wanted to know about character set
history still being with us
https://archive.org/details/mackenzie-coded-char-sets

[toc] | [prev] | [next] | [standalone]

#2983

From	Meshach Mitchell <meshach.mitchell@gmail.com>
Date	2022-04-25 12:06 -0400
Message-ID	<22-04-017@comp.compilers>
In reply to	#2978

I could see how that could be interesting as an academic pursuit, but I
think the dearth of exploration here is most likely because pretty much
anyone in a position to do that already knows that every turing complete
language is equivalent. The comparison, therefore, would be a comparison of
placement of syntactic sugar. I have trouble visualizing a real-world use
for such a comparison, by which I mean, what is the problem that I would be
able to solve by knowing which languages are similar? In the current
environment, anywhere you would work already has a whole tech stack already
mapped out.

I have actually thought about this, and vaguely remember looking up
articles on the subject. The article you linked is interesting, but I agree
with your analysis; semantic similarity has some value but IMO what really
matters is "supported patterns". ie. what a language provides "for free".
Now., TINSTAAFL, so there is no real "free" but there is some optimization
done by a language [compiler, interpreter] to support statements
represented in the grammar. An example that comes to mind is in javascript
(I know, I *know*, but I have a family, and we need to eat.) Early
implementations of async in js used the *Promise* object to implement
asynchronous execution, but newer versions of the language use *async* and
*await* keywords. The former piggy-backs on the existing OO architecture,
while the latter, implemented as keywords, is available to lower level
abstraction and optimization.

We've been doing this long enough that a number of "higher level" patterns
have emerged. The aforementioned asynchronous (threaded, maybe?) execution
is one. *Events* also come to mind, which are generally implemented as good
old-fashioned polling under the hood or function registration and
hash-lookup. What is actually happening in the machine translates to vastly
different computation cost, and seems to me to be non-trivial. I think a
meaningful categorization could be done based on this idea of language
"provisions" over language semantics, and some deeper analysis of how
exactly a language [compiler, interpreter] implements what necessarily
boils down to syntactic sugar.

To answer your actual question, No, I don't know of other attempts, but I
can understand the scarcity. Hope my thoughts have some value.

-- Meshach Mitchell

On Sun, Apr 24, 2022 at 10:49 PM Derek Jones <derek@nospam-knosof.co.uk>
wrote:

> All,
>
> There has been remarkably little work that tries to measure
> programming language similarity.
>
> Yes, there are many multi-language runtime benchmark comparisons, and
> people extract data from Wikipedia to made dubious claims.
>
> Does anybody know of other kinds of attempts at measuring language
> similarity?

[toc] | [prev] | [standalone]

csiph-web

Programming language similarity

Contents

#2978 — Programming language similarity

#2979

#2980

#2984

#2982

#2985

#2986

#2988

#2983