Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #136 > unrolled thread
| Started by | Gabriel Quadros <gabrielquadros@hotmail.com> |
|---|---|
| First post | 2011-06-06 21:00 -0700 |
| Last post | 2011-06-12 12:11 +0100 |
| Articles | 5 — 5 participants |
Back to article view | Back to comp.compilers
Dealing with load/store instructions on static tainted flow analysis Gabriel Quadros <gabrielquadros@hotmail.com> - 2011-06-06 21:00 -0700
Re: Dealing with load/store instructions on static tainted flow analysis glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-06-07 09:04 +0000
Re: Dealing with load/store instructions on static tainted flow analysis kym@kymhorsell.com - 2011-06-08 07:53 +0000
Re: Dealing with load/store instructions on static tainted flow analysis George Neuner <gneuner2@comcast.net> - 2011-06-09 18:51 -0400
Re: Dealing with load/store instructions on static tainted flow analysis Martin Ward <martin@gkc.org.uk> - 2011-06-12 12:11 +0100
| From | Gabriel Quadros <gabrielquadros@hotmail.com> |
|---|---|
| Date | 2011-06-06 21:00 -0700 |
| Subject | Dealing with load/store instructions on static tainted flow analysis |
| Message-ID | <11-06-010@comp.compilers> |
Dear guys,
I am trying to implement a pass to detect information leak in
programs. The problem is a variation of static tainted-flow analysis:
I have some source functions, sink functions and sanitizers. I want to
know if it is possible for data to flow from source to sink without
going across a sanitizer.
I am using LLVM, and I am analyzing the LLVM bitcodes. My pass is
working well, but I am having some issues with memory. Once
information flows to the heap, it is hard to know how it propagates to
the rest of the program. Example:
a = SOURCE
b = malloc(100)
...
b[i] = a
...
SINK = c[j]
...
So, the problem is that it is hard to know that c != b and i != j.
Once information flows into memory, the safest thing to do is to flag
the whole memory as a SOURCE. Of course, that is very conservative. I
was wondering if you guys could recommend me some strategies and
techniques to be more precise. In particular, if you could point me
some paper that does it, that would be great.
My best regards,
Gabriel.
[toc] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2011-06-07 09:04 +0000 |
| Message-ID | <11-06-013@comp.compilers> |
| In reply to | #136 |
Gabriel Quadros <gabrielquadros@hotmail.com> wrote: > I am trying to implement a pass to detect information leak in > programs. The problem is a variation of static tainted-flow analysis: > I have some source functions, sink functions and sanitizers. I want to > know if it is possible for data to flow from source to sink without > going across a sanitizer. (snip) > In particular, if you could point me some paper that does it, > that would be great. It isn't exactly the same, but I would start looking at the Java class verifier. Well, for one Java requires bounds checking, so you can be sure that only references to the same array would leak. Java also requires the verifier to detect references that load/store the wrong data type, such as treating a double as two ints. (I believe it detects both stack and heap accesses.) -- glen
[toc] | [prev] | [next] | [standalone]
| From | kym@kymhorsell.com |
|---|---|
| Date | 2011-06-08 07:53 +0000 |
| Message-ID | <11-06-014@comp.compilers> |
| In reply to | #136 |
Gabriel Quadros <gabrielquadros@hotmail.com> wrote: ... > So, the problem is that it is hard to know that c != b and i != j. > Once information flows into memory, the safest thing to do is to flag > the whole memory as a SOURCE. Of course, that is very conservative. I > was wondering if you guys could recommend me some strategies and > techniques to be more precise. In particular, if you could point me > some paper that does it, that would be great. ... Something that may be of use. You can create a hash value for address expressions and memory accesses using a simple hash scheme. Each constant in an expression is represented by itself. Each variable is assigned a random number. All operations + * / and performed modulo p (a nice big prime; there are better choices if you're working with mixed integer and real expressions e.g. if you want i**2 and exp(i*pi) to hash to -1 (i.e. p-1)). Assignments copy the hash from one variable to another. A pointer access can be modeled by generating a PRN using the hash of the address as the seed. If 2 hash values are different there is a strong probability the 2 expressions are not identical. Of course, put this under an optimisation and give warnings in the manual. :)
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-06-09 18:51 -0400 |
| Message-ID | <11-06-016@comp.compilers> |
| In reply to | #136 |
On Mon, 6 Jun 2011 21:00:41 -0700 (PDT), Gabriel Quadros <gabrielquadros@hotmail.com> wrote: > I am trying to implement a pass to detect information leak in >programs. The problem is a variation of static tainted-flow analysis: >I have some source functions, sink functions and sanitizers. I want to >know if it is possible for data to flow from source to sink without >going across a sanitizer. > : >a = SOURCE >b = malloc(100) >... >b[i] = a >... >SINK = c[j] >... > >So, the problem is that it is hard to know that c != b and i != j. >Once information flows into memory, the safest thing to do is to flag >the whole memory as a SOURCE. Of course, that is very conservative. I >was wondering if you guys could recommend me some strategies and >techniques to be more precise. In particular, if you could point me >some paper that does it, that would be great. Hi Gabriel, A general solution, I believe, is intractable. Using SSA or some kind of value numbering, it theoretically would be possible - though not easy - to keep track of all generated non-local addresses and compare the addresses accessed via disjoint program paths to see if there is overlap. Unfortunately, there are (at least) 4 problems with that. The first is that the addresses the compiler works with are symbolic: a name plus offset ... and they may be more involved to store and compare than a simple numerical address. The second, related, problem is that global and heap addresses often will be of a form like "$DATA+0xFA3892", having no reference to any recognizable program data ... you'll need to track or to figure out the boundaries of your globals and heap allocations in terms of the compiler's symbolic addresses in order to give meaningful error messages. The third problem is you'll need to process the entire program before looking for leaks. The storage needed to retain all the address lists may be extremely large and if the language permits separate compilation, you'll need to ensure that all of the program has been processed. And the fourth problem is identifying the disjoint program paths. This can be done using any of the number of flow and dependency graphs that the compiler generates ... but the compiler is more interested in following individual serial execution paths and generally isn't much interested in the extent of multiple path disjointness. WRT literature, I'm not familiar with any dealing precisely with the kind of *internal* "hidden channel" security issues you are addressing ... there is quite a bit about leakage outside of a program. The best I can suggest is that you investigate array optimizations, in particular looking at the extensive work that has been done toward identifying array index aliasing. Your particular problem, though more expansive, is directly analogous to the array aliasing problem (treating all of process memory as an array). George
[toc] | [prev] | [next] | [standalone]
| From | Martin Ward <martin@gkc.org.uk> |
|---|---|
| Date | 2011-06-12 12:11 +0100 |
| Message-ID | <11-06-017@comp.compilers> |
| In reply to | #136 |
On Tuesday 07 Jun 2011 at 05:00, Gabriel Quadros <gabrielquadros@hotmail.com> wrote: > I am using LLVM, and I am analyzing the LLVM bitcodes. My pass is > working well, but I am having some issues with memory. Once > information flows to the heap, it is hard to know how it propagates to > the rest of the program. Example: > > a = SOURCE > b = malloc(100) > ... > b[i] = a > ... > SINK = c[j] > ... > > So, the problem is that it is hard to know that c != b and i != j. > Once information flows into memory, the safest thing to do is to flag > the whole memory as a SOURCE. Of course, that is very conservative. I > was wondering if you guys could recommend me some strategies and > techniques to be more precise. In particular, if you could point me > some paper that does it, that would be great. I suggest that you look at the research on "points-to analysis" with regions: this attempts to divide the memory of the machine into disjoint regions where each pointer can be guaranteed to address one or more of these regions. As long as the SOURCE pointers are in different regions to the SINK pointers, you should be OK. A possible starting point: "Putting Pointer Analysis to Work" (1998) by Rakesh Ghiya , Laurie J. Hendren http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.9725 -- Martin STRL Reader in Software Engineering and Royal Society Industry Fellow martin@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4 G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/ Mirrors: http://www.gkc.org.uk and http://www.gkc.org.uk/gkc
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web