Without getting into a Wikipedia like entry here, let's take a look at what goes on with white-listing.
You have a machine with an application - say notepad. You create an entry called acc_note which when notepad is called, is validated against a list, then allowed to run. How is this application being validated?
Unless there are strong checksums against that application, nothing stops me - as an attacker - from binding rogue calls to that application, to which when run, will allow me to run code even more-so now, because that application was deemed trusted. You also need to understand that in order to whitelist, you will likely need to whitelist includables (DLLs. *.so's and so on to make it truly effective.) Any updates, you will need to go back through the whole process. See the dilemma here?
This is not to say that whitelisting is a failure however, this is to point out the notion that simply by whitelisting all is well. In an enterprise environment, maintaining a list of what is legitimate and what is not can be cumbersome. This is because most operating systems issue updates which would change any checksummed based systems. Administrators tasked with maintaning these systems will likely learn to overlook re-calculating checksums. Most of this overlooking can come directly from management in their effort to get things done "right now."
You can read more from two heavyweights (Ranum and Schneier) on this subject here:
http://searchsecurity.techtarget.com/magazineContent/Schneier-Ranum-Face-Off-on-whitelisting-and-blacklistingA better approach at whitelisting boils down to whitelisting
CONNECTIVITY. This is the
MOST CRUCIAL, misunderstood and overlooked element here. E.g., you have a machine say a DB. Its role is to take data stored
INSIDE the environment and populate it elsewhere. It makes much more sense to whitelist all the machines
INSIDE the the local network and block the others. Same rings true across the board. Even in an outbreak, the machine would be programmed to talk to no one else
BUT trusted sources. This can be accomplished on the local machine as well as egress points to ensure there would be no data leaks.
This is where people fail miserably. In their approach, not to forget the fact that too many people have been following the words of others for so long when the initial design was wrong to begin with. E.g.: "Input validation versus Output Validation" Can you seriously control what people try to input? If you think you can, you're mistaken. You may be able to control what your machine
processes, but it won't stop anyone from attempting to input it will it? You will beat yourself to a bloody pulp trying to concoct massive amounts of counters
however, you
CAN control what your machine puts
OUT every single time. YOU and only YOU know what your machine is supposed to distribute. This is
ALWAYS under your control and the applicable rules ARE under your control. It's all in the approach and understanding.
E.g., statistically, a DB needs to return a total of 10 variable with a sum of say 10k to render a query complete (to show someone their account summary). You can easily create a counter that says: "Look machine, at no point in time should you ever go over this maximum amount of variables. 10 fields for a sum of 10k" This is a much stronger rule since your machine would not OUTPUT an error message or website with more than that. Data leakage is minimized to 10 variables at 10k. Versus trying to create voodoo rules that won't work because you won't be able to keep up with millions of attackers consistently trying.