Thursday, January 05, 2006

Every Phone in America

A recent post by FuturePundit gave me a distinct feeling of deja vu. The following passage from Future Imperfect, a book manuscript I have had up on the web for several years, has just become topical:

---

The first step is to ask why, if phone taps are as useful as law enforcement spokesmen claim, there are so few of them and they produce so few convictions. …

The answer is not the reluctance of courts to authorize wiretaps. The National Security Agency, after all, gets its wiretaps authorized by a special court, widely reported to have never turned down a request. The answer is that wiretaps are very expensive. …

That problem has been solved. Software to convert speech into text is now widely available on the market. Using such software, you can have a computer listen, convert the speech to text, search the text for key words and phrases, and notify a human being if it gets a hit. Current commercial software is not very reliable unless it has first been trained by the user to his voice. But an error level that would be intolerable for using a computer to take dictation is more than adequate to pick up key words in a conversation. And the software is getting better.

Computers work cheap. If we assume that the average American spends half an hour a day on the phone–a number created out of thin air by averaging in two hours for teenagers and ten minutes for everyone else–that gives, on average, about six million phone conversations at any one time. Taking advantage of the wonders of mass production, it should be possible to produce enough dedicated computers to handle all of that for less than a billion dollars.

Every phone in America.

A Legal Digression: My Brief for the Bad Guys

Law enforcement agencies still have to get court orders for all of those wiretaps–and however friendly the courts may be, persuading judges that every phone in the country needs to be tapped, including theirs, might be a problem.

Or perhaps not. A computer wiretap is not really an invasion of privacy–nobody is listening. Why should it require a search warrant? If I were an attorney for the FBI, facing a friendly judiciary, I would argue that a computerized tap is at most equivalent to a pen register, which keeps track of who calls whom and does not currently require a warrant. The tap only rises to the level of a search when a human being listens to the recorded conversation. Before doing so, the human being will, of course, go to a judge, offer the judge the computer's report on key words and phrases detected, and use that evidence to obtain a warrant. Thus law enforcement will be free to tap all our phones without recourse to the court system–until, of course, it finds evidence that we are doing something wrong. If we are doing nothing wrong, only a computer will hear our words–so why worry? What do we have to hide?

17 comments:

Anonymous said...

For all you know, your internet traffic is being monitored right now by for use in academic research or by some semiconductor company trying to optimize their broadband chipsets. I'd hope that if such activities are going on, they are randomizing the IP addresses so they can't correlate traffic patterns to a specific user. ;-)

How effective would such computerized wiretapping be with VoIP? As you've argued, the 2nd admendment should protect our right to unregulated encryption. Malicious hackers are already using VPN in the current generation of IRC bots which makes tracing the malicious hacker that much more difficult. Looks like there's already an encrypted VoIP prototype out there. So, if encryption is readily available, and if voice continues to move towards data topologies, how feasible is it to wiretap every phone?

Anonymous said...

"Live Ammo" is an IT security consultancy, on their blog recently they ran a post on how easy it would be to tap the internet, including VoIP (itself used increasingly by telcos)

See http://liveammo.blogspot.com/2005/12/us-probes-eavesdropping-leak.html

Anonymous said...

The last time I looked (about a year and a half ago), I was pretty unimpressed with the VOIP crypto proposals. But the underlying problem shouldn't be that hard, at least if you're just trying to add end-to-end encryption and not trying to maintain a backdoor for the Feds.

It's worth noting that people have done internet voice encryption programs before--look up PGP Phone and Nautilus for examples. And the big lesson I heard from people who worked on those was that the encryption added almost no computational overhead--AES encryption is way cheaper than digitizing, encoding, and compressing voice. The key management is potentially a big problem, though if you're getting a phone number from some trusted entity, you ought to be able to get a certificate from them binding some public key to that phone number.

One nitpick, though. It's relatively easy to do end-to-end encryption, but it's very hard to protect anonymity of those communications. For various technical reasons, the kind-of standard trick of anonymous remailers (mix networks) will both make the voice channel unusable and fail to provide strong anonymity.

Anonymous said...

It's very easy to have a almost unbreakable encryption on a VOIP network, unfortunately, the government demands the keys to such encryption schemes.

As for what is the problem with having the computer read your email and phone conversations and look for key words. This is the same silly argument that Posner used in an editorial in the WaPo or NYT a few weeks ago. Even if it is "only a computer" reading it it is a serious invasion of privacy. Once you set up the search protocol and have the capability to search all conversations all you have to do is type in a few different phrases and you can listen to any conversation you want to. Bored at work today and want to listen to some electric train hobbyists. A couple of keystrokes and you're pulling all the conversations with "HO" and "OO" in them. That not to your taste, let's try "bondage", "punishment", "leather", "spanking", and "watersports". How about we want to see what the Democrats are up to today, search for "DNC", "Dean", and "Biden".

See how easy it is to abuse the system.

Anonymous said...

Even knowing what keywords are being spoken or which people are being flagged for which keywords could be really valuable. I recommend that any DoJ would-be millionaires focus on potential bioterrrism suspects working at small biotech companies. Keywords (to protect national security, of course) are the names of the big pharmaceutical companies, as well as the string "FDA approval".

Anonymous said...

One nitpick, though. It's relatively easy to do end-to-end encryption, but it's very hard to protect anonymity of those communications. For various technical reasons, the kind-of standard trick of anonymous remailers (mix networks) will both make the voice channel unusable and fail to provide strong anonymity.

I believe you're making the assumption that as bad guys trying to communicate using non-standard VOIP with strong encryption, that we have to "hide" our single IP addresses. These days, with hundreds of thousands of public wifi hotspots (including private networks that are not secured) and a laptop, it is straightforward to be able to use a nearly random IP address each time I need to communicate. Enough of these hotspots allow me to access them from a position that has adequate physical security -- my car outside the building, a meeting room with the door shut at the public library, etc. I can think of a number of reasonably secure schemes to let us exchange IP addresses in order to establish a connection at a scheduled time.

It seems worth noting that by the time the NSA or anyone else has enough information to be able to crack these communications consistently, they certainly have enough information to obtain a warrent.

Anonymous said...

Good point. If you use a different endpoint each time, then you have a lot less pressing need for anonymization by the network. That won't stop a really determined investigation (they can try to correlate people seen close to an open wireless network with the time some call happened, using surveilance cameras or questioning witnesses), but it will make it a lot harder to do fishing type stuff.

Anonymous said...

I wonder what folks like Posner et al would say about having cameras trained on the interior of their homes and monitored by machines which would sound an alarm only if a lot of sudden movement or loud noise were to occur.

Lippard said...

Protecting anonymity can be done not only by using locations like WiFi hotspots (many of those, by the way, do require authentication and track who is using them), but with technology like EFF's "TOR" onion routing: http://tor.eff.org/

John T. Kennedy said...

It's reasonable to assume the feds have speech recognition technology that's better than what is currently commercially viable in the mass market. Friedman's estimate that it would cost only $1 billion to tap every phone might be off by orders of magnitude, but it won't stay that way.

Anonymous said...

I've been a court reporter for 30 years. Every year the authorities try to replace us with software that will translate the spoken into text. Mr. Friedman claims he can get software to do this translation. Let's see the software. Let's see it work. If bush is believing this crap, it's no wonder he is known as the worst president in history. You can make a computer work with very rudimentary commands, but translating the spoken word is years away. Ask Bill Gates.

As someone who has been desperately trying to get Dragon NaturallySpeaking to work for months, I agree!

David Friedman said...

" You can make a computer work with very rudimentary commands, but translating the spoken word is years away. Ask Bill Gates.

As someone who has been desperately trying to get Dragon NaturallySpeaking to work for months, I agree!"

I think you are both mistaken.

In order for speech to text software to be adequate to take dictation--the job of the programs from Dragon and IBM--it has to have an error rate close to zero--say no more than a few percent. If all you want to do is identify conversations where the word "bomb" appears at least three times, an error rate of ten or fifteen percent is not a serious problem.

Anonymous said...

But is it going to prevent crime or merely force criminals to spens a few cents on encryption?
While less and less expensive, it is still not necessarily the most efficient way to gather information.
Also, in a world where both wiretapping and encryption are extremely cheap, the best approach would be to resort to the law of jungle. After all, the only reason we have laws against violent crime and an expensive infrastructure to enforce it is that personal defense is excessively expensive.
A much better approach, in my opinion, would be to let everybody wiretap as they please and also let everybody encrypt as they please. This would result in a much more secure communication infrastructure. Since security (even national security) is the very goal, why not let it take care of itself?

FuturePundit said...

The knee jerk focus on voice conversation tapping (aside from being boring) misses what is going on here. The bigger value the NSA is probably trying to get is to map the nodes in terrorist networks. They do not need to hear voices to do that.

Some of the press reports I read spoke of data mining. Data mining of what? Call records. If they can figure out that node A is bad guys and A calls B then they have go go data mining for calls to and from B. Ditto for calls to and from all those other nodes. Then they can look for patterns of which nodes are key nodes and which are leaves.

They can do similar data mining on email traffic, chat traffic, Usenet group posting traffic, airline reservation databases, hotel reservation databases, credit card billing databases, and so on. Write really clever software that runs on server farms to combine info from multiple data archives and voila, terrorist networks should pop out of the system in handy reports.

The obsession of privacy rights advocates about phone taps misses what they probably really doing: mapping terrorist network using database mining algorithms. That's very interesting and I'd like to know why I should view it as a horrible invasion of my privacy.

A friend of mine used to work on database designs for call records. They have huge problems and push the edge on tape backup. 10 years ago the big tel cos had as part of their business plans that if this or that hardware failed they'd just lose the ability to charge for N million call records. They found it impossible to do effective backups. I wonder how much things have changed and I wonder what sort of IT systems the NSA has to sift thru the call records and other huge databases they've got to be mining.

David Friedman said...

"The bigger value the NSA is probably trying to get is to map the nodes in terrorist networks. They do not need to hear voices to do that."

Nor do they need to (arguably) violate FISA to do that.

Pen registers don't require a warrant--that's been established law for a long time. So if all NSA was doing was mapping who talked to whom, and not listening in on the conversations, the whole controversy wouldn't have arisen.

I think any explanation of what NSA is doing has to explain why they didn't follow existing law. The most plausible explanation I can come up with is that they were tapping on a much larger scale than FISA contemplated--say a hundred thousand lines at a time. But it also has to explain why the legality of what they were doing was controversial.

Anonymous said...

FuturePunduit:

This raises privacy issues because the same investigative techniques that help me discover a pattern of consistent meetings between you and your Chinese intelligence service handler, or you and your Al Qaida contact, also help me discover the consistent meetings you're having with your mistress or your gay lover. Similarly, the same construction of a network of people that's useful for finding terrorist networks is useful for finding networks of other kinds, too--say networks of gay men who interact a lot, or members of some fringe religious sect.

It's all a question of how the results of the widespread surveilance will be used. But if you think about it, I'll bet you could identify almost everyone who has a nonnegligible chance to win the presidency in 2008 or even 2012, today. What do you suppose the value is of being able to find dirt on almost anyone who is running for president and has a reasonable chance of winning? I'm guessing the result of that is that the people in control of the surveilance mechanisms get an effective veto on presidential candidates they don't like. Probably the same thing applies to supreme court justices.

That seems like a pretty high price to pay for added security against another AQ attack. Maybe it's worth it, maybe it's not, but we're talking about a serious risk that the FBI, NSA, and/or DHS could end up with a scary amount of political power. Further, we have an example of that having happened in the past, with Hoover's FBI, so this isn't just wild-eyed conspiracy theorizing. (Google for "cointelpro".)

Anonymous said...

How long would it take some lawyer to argue that the mere fact a conversation is encrupted would be enough to get a warrent to see what was said?

Even creating a code-jargon wouldn't work indefinitly. Any jargon that could be understood across a dispersed terrorist network would simply create more code words to use for a search.

BTW, I'm assuming that no single word would trigger a investigation, but sets of words, or more specifically, sounds. Besides, it's not the words, it's the particular sound that's important. That way, the software would not be lanugage specific, and any particular sound in any language could be keyed in...

So the software is not searching for words, but sounds. Is that protected?
Just some thoughts...