How The Declassification Engine Caught America's Most Redacted
Eisenhower Edition
Image Carousel with 10 slides
A carousel is a rotating set of images. Use the previous and next buttons to change the displayed slide
-
Slide 1: İsmet İnönü — Former National Chief of Turkey. After a military coup d’etat, at risk of being “lynched.”
-
Slide 2: Louis Joxe — Top official at French foreign ministry, to be protected for admitting his superiors were lying to stoke anti-Americanism.
-
Slide 3: Patrice Lumumba — Independence leader and first Prime Minister of Congo, a target for the CIA and MI6. Shot dead and dumped in a well.
-
Slide 4: Charles Malik — Implacable foe of pan-Arabism and champion of U.S. intervention in the Middle East.
-
Slide 5: Mohammad Mossadegh — Iranian Prime Minister overthrown by the CIA and Britain’s MI6 after he reduced the Shah to “impotence” and allowed “communist mobs” to roam free. Died under house arrest.
-
Slide 6: Azzam Pasha — First Leader of the Arab League, who once threatened Israel with extermination. Though not “intrinsically evil,” he “could not be trusted.”
-
Slide 7: Faisal of Saudi Arabia — Crown Prince who eventually seized power from his brother to become King. Assassinated by his nephew.
-
Slide 8: Willy Brandt — “The bastard of Berlin,” and future Chancellor, deemed wobbly over defending the city in the face of Soviet threats.
-
Slide 9: Harmodio Arias Madrid — Ex-President of Panama was “exploiting social discontent to arouse students in lower classes to oust National Guard Commandant.”
-
Slide 10: William Pawley — Former ambassador to Peru, who served as an advisor in the overthrow of the democratically-elected government of Guatemala. Died of a self-inflicted gunshot wound.
İsmet İnönü — Former National Chief of Turkey. After a military coup d’etat, at risk of being “lynched.”
Louis Joxe — Top official at French foreign ministry, to be protected for admitting his superiors were lying to stoke anti-Americanism.
Patrice Lumumba — Independence leader and first Prime Minister of Congo, a target for the CIA and MI6. Shot dead and dumped in a well.
Charles Malik — Implacable foe of pan-Arabism and champion of U.S. intervention in the Middle East.
Mohammad Mossadegh — Iranian Prime Minister overthrown by the CIA and Britain’s MI6 after he reduced the Shah to “impotence” and allowed “communist mobs” to roam free. Died under house arrest.
Azzam Pasha — First Leader of the Arab League, who once threatened Israel with extermination. Though not “intrinsically evil,” he “could not be trusted.”
Faisal of Saudi Arabia — Crown Prince who eventually seized power from his brother to become King. Assassinated by his nephew.
Willy Brandt — “The bastard of Berlin,” and future Chancellor, deemed wobbly over defending the city in the face of Soviet threats.
Harmodio Arias Madrid — Ex-President of Panama was “exploiting social discontent to arouse students in lower classes to oust National Guard Commandant.”
William Pawley — Former ambassador to Peru, who served as an advisor in the overthrow of the democratically-elected government of Guatemala. Died of a self-inflicted gunshot wound.
Methodology
We began with a set of over 117k documents from Gale Cengage’s U.S Declassified Documents Online system (DDO). The collection includes most documents declassified at presidential libraries over the last forty years, including thousands of pages of top-level documents from the CIA, State Department, and the Pentagon. They cover US foreign policy since World War I, but most are from the Cold War era. Gale used double-key entry to transcribe most of these documents, though more recently-released materials were scanned using Optical Character Recognition (OCR).
This collection includes both “sanitized” and “unsanitized” versions of the same documents. One reason is that different departments and agencies redact different things depending on what they deem to be most sensitive. Bringing them together can reveal the people, places, and things that are most likely to be redacted, thus helping to correct the intrinsic bias in the public record: we only know what the government will let us know.
Sasha Rush, who was then a Ph.D. student in computer science at MIT, wrote a program that combined visual and textual analysis, and then ran it on the database. It enabled us to identify over five thousand examples of (un)redacted text. With this kind of data, never before available, we could identify which names are disproportionately likely to be blacked out relative to how often they appear in the rest of the corpus.
To come up with the list, we first had to decide what period to focus on. The DDRS has a lot more documents for the 1950s than for the 1980s, so a “most redacted” list for the whole Cold War would be misleading. We therefore decided to just start with the period when Eisenhower was President.
We did not want just the absolute number, i.e. the names most likely to be redacted overall, since that largely reflects what names appear most often in the collection, i.e. the Secretary of State, the Director of the CIA, etc. Instead, we wanted to get at the relative sensitivity, i.e. the names of people disproportionately likely to be blacked out.
Finally, we had to find some way to get the names themselves amidst the 60k + words in redacted text. We therefore ran a Named Entity Recognizer (NER) over the collection to extract these person names. We then made several types of calculations about the odds of a name showing up in the documents, most important among them
- Total Prob: The number of times a specific name appeared divided by the total number of words in all the Eisenhower-era documents. This gives us the odds of the name appearing anywhere in these documents.
- Redacted Prob: The number of times the name appears in redacted text divided by the total number of words in all the redactions. This gives us the probability that a name will appear in redacted text from the Eisenhower era.
- Log Odds: The logarithm of Total Prob divided by the logarithm of Redacted Prob. This tells us which names are most likely to appear in redactions compared to their likelihood of appearing in the whole collection (the lower the number, the relatively higher likelihood of the name appearing in a redaction).
The Data
The data in table one show the top terms that the NER tagger identified as being person names. They are sorted by the Log Odds of whether or not they would show up in redacted text.
Table 1: NER Most Redacted Person Names
- Name
- Azzam
- Number in Redacted Text
- 4
- Number in All Text
- 31
- Log Odds of Appearing in Redacted Text
- -1.67
- Name
- Brandt
- Number in Redacted Text
- 13
- Number in All Text
- 142
- Log Odds of Appearing in Redacted Text
- -1.52
- Name
- Pasha
- Number in Redacted Text
- 4
- Number in All Text
- 52
- Log Odds of Appearing in Redacted Text
- -1.44
- Name
- Mossadegh
- Number in Redacted Text
- 6
- Number in All Text
- 115
- Log Odds of Appearing in Redacted Text
- -1.27
- Name
- Malik
- Number in Redacted Text
- 7
- Number in All Text
- 212
- Log Odds of Appearing in Redacted Text
- -1.07
- Name
- Pawley
- Number in Redacted Text
- 5
- Number in All Text
- 168
- Log Odds of Appearing in Redacted Text
- -1.03
- Name
- Coyne
- Number in Redacted Text
- 7
- Number in All Text
- 270
- Log Odds of Appearing in Redacted Text
- -0.97
- Name
- Qasim
- Number in Redacted Text
- 8
- Number in All Text
- 320
- Log Odds of Appearing in Redacted Text
- -0.95
- Name
- Bissell
- Number in Redacted Text
- 6
- Number in All Text
- 256
- Log Odds of Appearing in Redacted Text
- -0.93
- Name
- Nkrumah
- Number in Redacted Text
- 5
- Number in All Text
- 221
- Log Odds of Appearing in Redacted Text
- -0.91
However, there were also names relatively likely to appear in redacted text that were not tagged as names by NER. Here is a list of words most likely to be redacted that the NER could not categorize:
Table 2: Sample of cases where most redacted plain words were actually highly-ranked names
- Word
- inonu
- Number in Redacted Text
- 5
- Number in All Text
- 33
- Log Odds of Appearing in Redacted Text
- -1.74
- Word
- joxe
- Number in Redacted Text
- 4
- Number in All Text
- 48
- Log Odds of Appearing in Redacted Text
- -1.48
- Word
- mossadegh
- Number in Redacted Text
- 9
- Number in All Text
- 115
- Log Odds of Appearing in Redacted Text
- -1.45
- Word
- arias
- Number in Redacted Text
- 4
- Number in All Text
- 65
- Log Odds of Appearing in Redacted Text
- -1.34
- Word
- oliver
- Number in Redacted Text
- 5
- Number in All Text
- 96
- Log Odds of Appearing in Redacted Text
- -1.27
- Word
- barbara
- Number in Redacted Text
- 4
- Number in All Text
- 118
- Log Odds of Appearing in Redacted Text
- -1.09
- Word
- lumumba
- Number in Redacted Text
- 18
- Number in All Text
- 634
- Log Odds of Appearing in Redacted Text
- -1.01
After combining these two sets together, we then scoured the actual redacted text with these names to correct any errors. In some cases the count was too low because of misspellings. In others it had to be adjusted down because some redactions were double-counted. And there is always the possibility that the total count is off, if for instance there are multiple people with the same name that appear elsewhere in the collection. As we continue processing these collections, we will identify and correct any such errors.
But for now this is the best intelligence we have about the people you don’t see enough of in the official history because they have been blacked out of the official record.
Table 3: Adjusted Final Top 10
- Name
- Ismet Inonu
- Number in Redacted Text
- 5
- Number in All Text
- 32
- Log Odds of Appearing in Redacted Text
- -1.74
- Name
- Azzam Pasha
- Number in Redacted Text
- 4
- Number in All Text
- 31
- Log Odds of Appearing in Redacted Text
- -1.67
- Name
- Willy Brandt
- Number in Redacted Text
- 15
- Number in All Text
- 142
- Log Odds of Appearing in Redacted Text
- -1.58
- Name
- Louis Joxe
- Number in Redacted Text
- 4
- Number in All Text
- 48
- Log Odds of Appearing in Redacted Text
- -1.48
- Name
- Mohammad Mossadegh
- Number in Redacted Text
- 7
- Number in All Text
- 115
- Log Odds of Appearing in Redacted Text
- -1.34
- Name
- Harmodio Arias Madrid
- Number in Redacted Text
- 4
- Number in All Text
- 65
- Log Odds of Appearing in Redacted Text
- -1.34
- Name
- Charles Malik
- Number in Redacted Text
- 9
- Number in All Text
- 212
- Log Odds of Appearing in Redacted Text
- -1.18
- Name
- Patrice Lumumba
- Number in Redacted Text
- 26
- Number in All Text
- 634
- Log Odds of Appearing in Redacted Text
- -1.17
- Name
- William Pawley
- Number in Redacted Text
- 6
- Number in All Text
- 168
- Log Odds of Appearing in Redacted Text
- -1.11
- Name
- Prince Faisal
- Number in Redacted Text
- 5
- Number in All Text
- 172
- Log Odds of Appearing in Redacted Text
- -1.02
Examples of redactions
"...more frequent. The opposition party maintains that the government is trying to have Mr. Inonu lynched. The Turkish Defense Minister recently remarked that the military leaders may have to intervene if the tension continues. If Inonu were killed, a revolt could take place in..."
"...He did not feel that Azzam Pasha was intrinsically evil, but rather that he could not be trusted to carry any messages. Mr. Lloyd said, however, that he agreed there was considerable room for maneuver with respect to the over-all problem..."
"...Mayor Brandt was a most interesting character and also a possible candidate for leader of them German Socialist Democrat Party in the future. He was often called "the Bastard of Berlin" because he had no known father. In any event, he was a self-made man and one to be reckoned with. Brandt was strongly on our side and it was our hope hat he and Adenauer would be able to get together..."
"...JOXE REPLIED HE WAS IN FULL AGREEMENT WITH ME AND SAID HE HAD ALREADY TOLD PRESS RELATIONS OFFICIALS AT QUAI TO EMPHASIZE TRIPARTITE AND WESTERN UNITY IN REGARD TO EVENTS IN HUNGARY. HE SAID HE WOULD SPEAK TO THEM AGAIN WITH SPECIFIC EMPHASIS ON UNITED STATES ROLE. JOXE THEN SAID, APPARENTLY THINKING OF PINEAU'S STATEMENT, THAT FRENCH GOVERNMENT HAD ALWAYS REALIZED SERIOUSNESS WITH WHICH UNITED STATES VIEWED HUNGARIAN AFFAIR BUT HAD..."
"...Mr. Allen Dulles then turned to the situation in Iran, which was very disturbed owing to the highly-publicized trial of Mossadegh. Mr. Herbert Hoover, Jr., had returned from his first visit to Teheran with a pessimistic judgment as to the prospects for an oil settlement. Mr. Hoover had reported the Iranians very ignorant as to the facts of life with regard to their oil resources, Secretary Dulles said that Mr. Hoover had been rather more optimistic in reporting to him, and had expressed the view that something could be worked out over a period of time..."