Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: Joshua Cranmer Newsgroups: alt.comp.lang.learn.c-c++,comp.lang.java.programmer Subject: Re: How to get from A to B (actually, from type "A" to type "B") Date: Sun, 06 Jan 2013 19:12:59 -0600 Organization: A noiseless patient Spider Lines: 93 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Mon, 7 Jan 2013 01:13:03 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="b5e4fe56db341d88f7c9c7600d8e654f"; logging-data="30452"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/tWTCu5rposEsxhCZ8GLBRU9dUdMB7EBQ=" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 In-Reply-To: Cancel-Lock: sha1:UMKiKHZXjmoshMRHf2FE/lFVCz4= Xref: csiph.com comp.lang.java.programmer:21091 On 1/6/2013 1:22 PM, Ramon F. Herrera wrote: > I had been using for a long time this (from Boost::Filesystem): > > string somestring = "abc/de"; > path p = path(somestring); > > Only to realize, accidentally, that the conversion is done > automatically. The IDE should help you in those cases: > > path p = somestring; This is mainly an issue, in C++, of detecting whether or not an explicit conversion is necessary. Note, however, that many style guides tend to prefer explicit conversions over implicit ones. > This one made me kick myself. I used this many, many times: > > const char* sometext = somestring.string().c_str(); > > Well, it turns out that this one is just as good: > > const char* sometext = somestring.c_str(); > > My question is about R&D done in this particular field. I tried Google > but the word "type" is too ambiguous. There was actually a project by Google using Clang that automatically eliminated instances where std::string and const char* interconversion was being unnecessarily performed. Note that this is a reason why implicit conversion is frowned upon by style guides. :-) > This problem is very similar to the resolution of Rubik's Cube. Your > expression is in some "scrambled" state and you need the computer to > tell you -not only any path! mind you- but the shortest path (known as > God's algorithm) to the desired type. No, the algorithm you're looking for is "BFS", specifically in a directed graph, as taught in any introductory algorithms class and often many more too. The problem is not doing graph traversal, but actually computing the graph in the first place: you are basically asking people to solve a very hard AI problem of inferring intent, and this can be difficult even for humans with very good documentation. Let's use your filesystem example to motivate why it's hard. Suppose you have a file class like so: class File { public String getAbsolutePath(); // Removes . and .. public String getCanonicalPath(); // Resolve symlinks public String getFilename(); public String getExtension(); public int getSize(); public int getPermissions(); // Unix-style octal permissions public int getInodeNumber(); // Unix filesystem UID public int open(); // Returns file descriptor number } What should you return if you want to query File -> String? A human responder would probably say one of the first too, but there are times to prefer one over the other (it depends on what you are doing!). Automated analysis would have to either require the human to annotate all the conversion methods or use heuristics to guess. The irony is that getExtension() is probably the simplest method of the lot, so heuristics based on implementation complexity will probably fail here. Now suppose you queried File -> int. This, to most humans, is probably a nonsensical request, but on Unix systems, you might want to get file descriptor numbers. This would necessitate opening the file, which is a stateful request. Supporting this kind of query would almost certainly render a tool useless due to false positives. If we look at our classic friend, in C and C++, const char *, note that there tend to be about 4 distinct semantic types that this type refers to. They are the following: 1. Raw binary data 2. Pure ASCII data, so it should only contain \x01-\x7f. 3. Native platform charset (what you can, e.g., pass into filesystem APIs) 4. Proper UTF-8 Sometimes, functions don't care which of these semantic type is in use, assuming they're all null-terminated anyways (C's strchr is an example). Sometimes, though, it matters hugely which definition is in use (converting to/from UTF-16). The answer I as a human would give for conversion functions depends heavily on context. That said, I don't know if people have written papers on this topic before; you might find software engineering conference archives useful in this regard. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth