Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail
From: Joshua Cranmer <Pidgeot18@verizon.invalid>
Newsgroups: alt.comp.lang.learn.c-c++,comp.lang.java.programmer
Subject: Re: How to get from A to B (actually, from type "A" to type "B")
Date: Sun, 06 Jan 2013 19:12:59 -0600
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <kcd7ev$tnk$1@dont-email.me>
References: <b1a4c163-c843-4d36-b9bd-f504dd5bc4e6@d10g2000yqe.googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 7 Jan 2013 01:13:03 +0000 (UTC)
Injection-Info: mx04.eternal-september.org; posting-host="b5e4fe56db341d88f7c9c7600d8e654f"; logging-data="30452"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/tWTCu5rposEsxhCZ8GLBRU9dUdMB7EBQ="
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0
In-Reply-To: <b1a4c163-c843-4d36-b9bd-f504dd5bc4e6@d10g2000yqe.googlegroups.com>
Cancel-Lock: sha1:UMKiKHZXjmoshMRHf2FE/lFVCz4=
Xref: csiph.com comp.lang.java.programmer:21091

On 1/6/2013 1:22 PM, Ramon F. Herrera wrote:
> I had been using for a long time this (from Boost::Filesystem):
>
>      string somestring = "abc/de";
>      path p = path(somestring);
>
> Only to realize, accidentally, that the conversion is done
> automatically. The IDE should help you in those cases:
>
>      path p = somestring;

This is mainly an issue, in C++, of detecting whether or not an explicit 
conversion is necessary. Note, however, that many style guides tend to 
prefer explicit conversions over implicit ones.

> This one made me kick myself. I used this many, many times:
>
>      const char* sometext = somestring.string().c_str();
>
> Well, it turns out that this one is just as good:
>
>      const char* sometext = somestring.c_str();
>
> My question is about R&D done in this particular field. I tried Google
> but the word "type" is too ambiguous.

There was actually a project by Google using Clang that automatically 
eliminated instances where std::string and const char* interconversion 
was being unnecessarily performed. Note that this is a reason why 
implicit conversion is frowned upon by style guides. :-)

> This problem is very similar to the resolution of Rubik's Cube. Your
> expression is in some "scrambled" state and you need the computer to
> tell you -not only any path! mind you- but the shortest path (known as
> God's algorithm) to the desired type.

No, the algorithm you're looking for is "BFS", specifically in a 
directed graph, as taught in any introductory algorithms class and often 
many more too.

The problem is not doing graph traversal, but actually computing the 
graph in the first place: you are basically asking people to solve a 
very hard AI problem of inferring intent, and this can be difficult even 
for humans with very good documentation. Let's use your filesystem 
example to motivate why it's hard.

Suppose you have a file class like so:

class File {
   public String getAbsolutePath(); // Removes . and ..
   public String getCanonicalPath(); // Resolve symlinks
   public String getFilename();
   public String getExtension();
   public int getSize();
   public int getPermissions(); // Unix-style octal permissions
   public int getInodeNumber(); // Unix filesystem UID
   public int open(); // Returns file descriptor number
}

What should you return if you want to query File -> String? A human 
responder would probably say one of the first too, but there are times 
to prefer one over the other (it depends on what you are doing!). 
Automated analysis would have to either require the human to annotate 
all the conversion methods or use heuristics to guess. The irony is that 
getExtension() is probably the simplest method of the lot, so heuristics 
based on implementation complexity will probably fail here.

Now suppose you queried File -> int. This, to most humans, is probably a 
nonsensical request, but on Unix systems, you might want to get file 
descriptor numbers. This would necessitate opening the file, which is a 
stateful request. Supporting this kind of query would almost certainly 
render a tool useless due to false positives.

If we look at our classic friend, in C and C++, const char *, note that 
there tend to be about 4 distinct semantic types that this type refers 
to. They are the following:
1. Raw binary data
2. Pure ASCII data, so it should only contain \x01-\x7f.
3. Native platform charset (what you can, e.g., pass into filesystem APIs)
4. Proper UTF-8

Sometimes, functions don't care which of these semantic type is in use, 
assuming they're all null-terminated anyways (C's strchr is an example). 
Sometimes, though, it matters hugely which definition is in use 
(converting to/from UTF-16). The answer I as a human would give for 
conversion functions depends heavily on context.

That said, I don't know if people have written papers on this topic 
before; you might find software engineering conference archives useful 
in this regard.
-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth