The Window That Looks Back |

By Matt Swartz

In my lifetime, only two tech-related companies have become so ubiquitous that their proper names have become commonly-recognized verbs. And they couldn’t be more different. Xerox, the older of the two, is now inextricably linked with a fundamentally simple process.

Xerox (verb) means what Xerox (noun) made its billions doing: to produce, with a scanner and a printer, an exact copy of a sheet of paper. The process is simple; on older machines it leaves no record of what has been done. It’s a camera and a printer wired together, and while there are additional features available on newer models, the fundamentals remain unchanged: Any piece of paper one puts under that lid and exposes to that scrolling bright light will emerge beneath with its contents reproduced. The task completes with complete indifference to the original contents of the paper.

The schematic of a stealth bomber reproduces just as nicely (and just as discreetly, except on newer models with internal hard drives), as Nana’s raisin bread recipe. The machine Xerox is utterly indifferent to the contents Xeroxed.

With google (verb), the act of using Google (noun’s) website, to search the Internet, on the other hand, matters could hardly be more opposite; the way the former works is utterly dependent on the caprices of decision-makers employed by the latter, in a way that’s poorly understood. To make matters worse, according to Barton Gellman and Laura Poitras, in a June 17, 2013 article in the Washington Post, the whole process leaves a permanent record, both in Google’s servers and in a backdoor pipeline that runs into a National Security Agency intelligence-gathering program called PRISM.

Tech enthusiasts who skew utopian praise the new era of transparency that digitization of information online offers, and they’re not wrong to do so. This afternoon, without leaving my desk, I can access millions of government documents for free, the abstracts of innumerable academic journal articles, as well as most public-domain books. In a sense, it’s like having the best library in history at my disposal.

Except it’s not American library; American librarians are bound by a code of conduct that precludes their revealing the contents of patron checkouts and inquiries, except when subpoenaed. And in fact, there’s more anonymity even than that; one doesn’t have to give his or her name to browse a library’s stacks or databases. As long as you’re quiet and conscientious, you can do whatever you want in American library from its opening to its close, and never have to explain your purposes to any government employee.
I’ve had the distinct pleasure of photocopying and uploading an entire rare book to my Internet cloud service. No questions were asked, no explanations were given. I walked away with a rare treatise about the Kennedy assassination in my pocket.

The Internet offers no similar experience. Google keeps a record of every search made, and they’re catalogued by IP address. That means that they know where the searches originate from, at what time, and it’s a small step between that and knowing by whom.

One of my friends was visited by Secret Service agents and asked to explain the specifics of some of the searches he was making. The burden of proof was on him to prove that they were innocuous, not, as one would expect from our English Common Law roots, on the government to prove that they were not. His answers were judged satisfactory, and he was ultimately left alone, but being visited by armed federal agents is bad for one’s blood pressure, and of course records of that visit are never going away.

Not only are Web searches catalogued forever by Google, as noted by Tom Foremski in a 2010 article in his Silicon Valley Watcher, but they are subject to scrutiny by forces that, while not exactly hidden, are less than transparent. They’re also not necessarily an accurate representation of all available information about a given subject.

When I Xerox a page, I get two pages exactly alike, assuming I’ve got the paper lined up correctly. Every square micrometer of page, every scintilla of printed data, is reproduced from one page to the other. When I google a subject, on the other hand, I have no certainty that I’m getting results that correspond perfectly with what Google has in its databases, nor that they’re ranked in a mathematically transparent way.

We don’t know what Google’s search algorithms are, because that’s a trade secret, but that’s only the tip of the iceberg. We also don’t know how they differ depending on what’s searched for. The ability to rank what one sees when googling, for example, political candidates, is a power more total than that of any single mass media outlet, and yet far less frequently discussed.

It’s common knowledge in the search-engine optimization community that only a tiny percentage of searchers click through to the second and third pages of results. After reading the listed summaries for the first 10 articles, and perhaps clicking one or two links, they likely move onto the next subject or person feeling informed, as if they’ve done their due diligence. If they’re under 40, they’ll congratulate themselves on Going Directly To The Source, rather than relying on what’s broadcast to them. If they’re under 30, they’ll call people who get their news from the newspapers and TV more often than from Google sheeple (a portmanteau of sheep and people) for their credulity. Google as a source, as a potential font of bias, goes widely unconsidered.

In that vein, imagine a close election; moderate voters turn to the Web search for answers. And then one candidate’s name auto-completes to add “racist” or “irritable bowel syndrome” after his name. And we know of this one instance, we don’t know what we don’t know. Google’s algorithms are a trade secret.

What can we take away from this? What courses of action are necessary/possible? Your poor correspondent could hardly be less qualified to answer (perhaps he’d be less qualified if he didn’t view it as a problem, but that’s about it). But imagine if Web search were open source, with the algorithms used to filter and rank search results subject to perusal in real time.
We could also lobby our federal government to stop monitoring those parts of the Internet, like search, that could reasonably be described as private, under the older constitutional understanding that warrantless search is forbidden, and that individuals have an inherent right to privacy (barring extenuating circumstances, which by definition, cannot exist categorically). Federal antitrust mechanisms might help us by breaking Google up, as they did Bell Telecom in the 1980s and Standard Oil in the early 1900s, but these are distant goals.

Search isn’t going to be like photocopying in our lifetime, and perhaps the best we can do is be aware of the differences and the power patterns they create. In that sense, you, by reading to the end of this piece, by perusing more than the first few pages of my authorial results, might have struck a small blow for government and corporate transparency and personal anonymity, merely by familiarizing yourself with the unprecedented changes that have taken place and musing about whether or not they strike you as problematic.

By Matt Swartz

Related