MIA: History: USA: Publications: International Socialist Review Notes On the Creation of this Digital Archive


International Socialist Review

Notes On the Creation of this Digital Archive



 

About this digital archive of International Socialist Review (ISR) 1900 - 1918 presented on Marxists Internet Archive, assembled by Marty Goodman of the Riazanov Library digit al archive project.

Why this archive is (by far most important!) more more complete, and also in many cases presents better quailty images, than that which you might view on or download from Google Books, Archive.org, or Hathi Trust.

The issues involved:

The scans of ISR here on MIA constitute objectively and significantly the highest quality digital archive you can find of this publication anywhere. It is superior in many respects (listed below) to what you'd get if you download this material from Google Books, Archive.org, or Hathi Trust (Hathi Trust used Google scans).

1. This archive is very near complete (at most only a few pages... a few back covers... are missing. It is for all intents and purposes a complete archive. I personally checked page content of every one of the roughly 201 issues... of the roughly 14,000 pages... to be sure of this, providing scans of pages missing from the Greenwood Reprint of ISR (or form originals in my personal collection) where this was necessary. In contrast, the material available on Google Books, Archive.org, and Hathi Trust is missing at least one entire issue, and many of the issues in some volumes have significant numbers of pages missing.

2. Many... 45 or more... of the issues in this archive... the important, later issues from 1910 and later... bearing a great deal of photos and some political art... I have personally scanned and present in visibly higher quality image fashion than what is found in the Google scans.

3. In those cases where I mostly used images from the Google scans, I checked each page of the issue, and ended up personally scanning and replacing (or restoring) what Google either missed or messed up the image of in as much as 30% or more of the pages of the issue. So badly messed up were the Google scans. Using Greenwood Reprints mostly as the source, tho in some cases using original issues in my collection.

4. I did not only fix absolutely critical errors (issues left out, pages missing from issues) in Google's scans in this presentation. I also re-scanned (again, from the Greenwood Reprint in most cases) art that had been poorly rendered in the Google scans... where low resolution and/or incorrect choice of mode to render the art was employed by Google. So most of the line drawn and some of the other art that Google presented at 300 dpi gray scale, I present in these scans at 1200 dpi single bit BW with meticulously carefully, personally and individually for each page chosen exposure.

5. In a few cases there is art or a photograph that goes across two pages of an issue of ISR. Google presents this by presenting the two pages separately, and cutting out a vertical strip in the middle of the art, entirely. I present such art as it was meant to be seen... as it was originally printed... as one continuous image, with nothing cut out from it. In one case, where the art was reproduced from a printing of the same image in The Masses, I provide (in addition to scans from ISR of the reproduction of the art) a far higher quality image of the art scanned from the far better printed, larger version of the art that originally appeared in the issue of The Masses.

6. Google actually in one volume it presented included pages from one issue in the sequence of pages for another issue. Did this in all issues in the volume. A mess! I sorted all this out using original issues in others collections and the Greenwood Reprint of the volume, and fixed all this in the processed Google scans I offer here. Where I had both, the Greenwood Reprint always agreed with the original issues. The Google scans (in that one volume) did not. They had stuffed extra pages at the end of the volume, after the last numbered page, and before the back cover pages... and left out pages following the front cover.

7. For a detailed (4 page) description of the nature of the "hybrid" (partly by me, partly by Google) scans that compose many of the issues presented in this archive, click here. [in Word format]

It is for these reasons that I state (call it a boast, if you like, but it really IS objectively true in many respects) this is a significantly superior digital archive to any you can find of this publication elsewhere.

Legal Note:

International Socialist Review was published prior to 1924, and as such is in the public domain. Under US case law, supported by many decisions, any "slavish reproduction" of a physical publication that is in the public domain is itself in the public domain. It does not matter how much skill, time, or money went into making that "slavish reproduction". Digital scans (and photographs of all kinds, including microfilm) are specifically defined by the courts as such "slavish reproductions". Thus, a Google-made scan of ISR ... whether on Google Books, Archive.org, or Hathi Trust... is itself in the public domain, and may be freely republished. The same is true of the scans I produced, for the same reason. If you find someone claiming ownership of the scans they made of a publication that is in the public domain, they are almost certainly either ignorant of law, or outright deliberately lying. The latter is a common situation, with individuals or institutions attempting to bluff and bully, with lies, in an effort to assert control and ownership of scans they actually do not at all own. Google, very much to its credit, is very clear about this aspect of their scans of public domain material.

This legal status has been very solidly established in very many decisions in the US courts, up thru appellate courts, but the issue has not to date been brought to the US Supreme Court. Note that the legal status of such scans may be different in other countries.

Other aspects of differences between these scans and those made by Google:

Instead of providing monster files for each full volume, I provide the material in separate files for each issue. This makes it far easier to find the issue you want. Indeed, in some of the scans Google presented, there was no numbering of the thumbnails, and without Acrobat Pro you can't fix this. The lack of numbering makes it even harder to find the issue you wanted, or to return to a given page you had located.

We have the issues arranged here with breaks in the listing to allow easily seeing the boundaries both of given YEARS of publication, and of given VOLUMES of publication. Note that with ISR, for its entire run, Volumes of these monthly issues consisted of 12 issues, but started with the July issue of one year, and ended with the June issue of the next year. For example, Volume 10 includes the July 1909 issue thru the June 1910 issue.

Additionally, we've broken out the tables of contents that Kerr publications provided for issues 1 thru 10, and included them here as separate (searchable, with OCR) files, for your easy reference.

After 1910, Kerr ceased to publish bound volumes of 12 issues of the publication. And ceased creating a table of contents for the entire volume.

I spent easily 10 hours extracting, one by one, the tables of contents from individual issues, and combining them into pdf files to cover each volume for Volumes 11 thru 18. These files are (mostly, given limitations of OCR) searchable. Thus you have at least a crude and partial finding tool for searching quickly for authors and key words in titles of articles. It is my hope that eventually someone uses these ... and the issue scans... as raw material to produce cleaned up, plain text tables of contents for Vols 11 thru 18 of this publication. THAT, however, is many many hours of work. More time than I had lerft after the exhausting task of assembling this archive to this point.

Note regarding Kerr's tables of contents for Vols 1 thru 10:

The publisher provided a volume-long table of contents when the issues were bound into a volume and sold as such. This finding aid at first glance LOOKS like it is an index. It consists of one entry per article, presented for the entire volume organized by author, in alphabetical order of the authors' names.

WHY didn't they list the articles in order of appearance... in order of page number in the volume? I've no idea. I've never before in my life seen a table of contents listed alphabetically by author, instead of by increasing page number of appearance of the articles. And indexes are ALWAYS listed in alphabetical order. So it's easy to mistake these listings for indexes. But they are tables of contents... a list ONLY of the articles, with no reference of content of interest such as place names, person names (other than the authors), etc.

Note regarding the tables of contents I put together from scans of tables of contents of individual pages:

It took Kerr and Co. quite a while to figure out some obvious things about how to compose and print a table of contents page. For Volume 11 (starts in July of 1909) up to the final issue of Vol 14 (June 1914) they neglected to put on the table of contents page the date or volume and issue number of the publication. So when presenting the table of contents, there's no clue on that page, if that is all you are looking at, which issue the table of contents is for! To do a quick and dirty fix for this, I alternate scans of table of contents pages for those issues with a partial scan of the first page of the first article, which always DOES have the date of the issue and the volume number and issue number of the issue. I included in the scan the title and author of the first article, so you can confirm it goes with a given table of contents page.

However, there's ANOTHER bizarre problem with the tables of contents Kerr provided in individual issues for Volume 11 up until the final issue in Volume 14: They left out listing of page numbers on the table of contents page, providing only a listing of the titles of the articles, and of the authors.

In July of 1914, the last issue in Volume 14, when they FINALLY figured out it would be a good idea to put the date and volume and issue number on the table of contents page, it also finally dawned on the prodigious size brains at Kerr publishing that it MIGHT be nice thing to do to put page numbers in with their tables of contents. Happily, tho it took them four years to figure out that tables of contents in a magazine should both have the date/volume and issue information AND note what page a given article appears on, when they finally did reach the epiphany of this enlightened understanding, they continued to do things right after that to the end of publication of this periodical.

More details about what distinguishes this digital archive of ISR from what you might try to read and/or download elsewhere on the Web (from Google Books, Archive.org, or Hathi Trust):

The archive presented here is a hybrid. Most of the early years... the pre Kerr as editor years... volumes 1 through 7... I merely downloaded from Google Books, broke into individual issues (and table of contents) files, and prsented here. This because for those years, Google did a fine job of scanning the all-text issues, capturing every page, and producing nice images using 600 dpi single bit black and white scan mode, THE most appropriate mode for this sort of material.

No problems there.

However, when ISR began to present photos (its signature nature became in part its very lavish use of photos, which appear on approaching half of the pages of later issues), things with Google's scans changed.

Google, it appears, developed software that attempted to vary the scan mode as it scanned a single page, attempting to render text one way (properly as single bit BW at 600 dpi) and photos and art another way (300 dpi gray scale). When this system worked, it produced quite nice renditions of the page. Frankly, I wish I had software that did this!

Unfortunately, this system failed as much as half the time, rendering text in washed out low resolution gray because the "art / photo content" mode triggered falsely. Or rendering text part in clear black and white and part in faded gray, for not apparent obvious reason. Or rendering line drawn political cartoon and other line drawn art, which is most appropriately rendered in high resolution single bit BW, in low resolution gray scale.

Worse, it appears Google used automatic book scanning equipment for its project of scanning ISR. On numerous occasions, the machine malfunctioned. Sometimes it would mis-register the pages, capturing only part of the page content. Other times entire pages would be missed. Sometimes two pages in sequence. Sometimes every odd or every even page is missing for runs of 20 or 30 pages in a given issue.

There appears to have been zero... not little, but truly ZERO... quality control on Google's part when creating its archive of ISR, given how serious, glaring, and many are the errors I found in what Google produced.

There was also at least one issue of ISR that Google never managed to find in the collections of original issues at Princeton, Harvard, Cornell, Ohio State, Northwestern University, Wisconsin Historical Society, or the New York Public Library (all variously used as source for some of the material Google scanned of ISR). This was the May 1914 issue, v14n11. I could not lay hands on an original of this issue, either. But there was a pretty decent image of it avialable in the Greenwood Reprints set of ISR, and the University of California at Davis very kindly and generously permitted me to borrow that volume and scan it, so that I could present to all a complete run of this publication.

Why didn't Google do this? It would have been trival for them to do so. Well, what do you expect from an outfit that posts numerous issues with 15 to 20 pages missing from the issues, 15 to 20 pages mis-registered by their automatic scanner so badly that part of the text on each page is missing, or issues with pages that show serious problems due to incompetent selection of scanning mode (due to an imperfect automatic scan mode selection algorithm, I suspect). As I said, there was zero evidence of ANY amount of quality control. Zero evidence anyone involved with that project cared in the slights about what they were presenting to the public.

It surprised me more... and saddened me... that Hathi Trust... a respected name in academia as an outfit amassing a huge collection of digital images of published material... also apparently didn't bother to so much as LOOK at the scans Google handed over to them. You'd think Hathi Trust might have someone spot check for quality when Google hands over a job for them to present. You'd think THEY... a huge and highly financially endowed organization... could and would say to Google: "Excuse me... it seems there are numerous serious errors here... missing pages and missing issues... malfunction of your page-turning automatic scanner... could you please go back and fix these?" But it would appear that never happened. The evidence suggests... tho I must repeat I don't know for sure what the facts are... that Hathi Trust cared (at least for this publication) as little about presenting massively flawed images due to obvious errors as Google did.

It is a shame that it took ME... one who hasn't a penny to Hathi Trusts $100 to $1000, and not a penny to Google's $10,000 or more... to do a conscientious job of assembling and presenting this archive.

Acknowledgements:

I've been working (on and off) on this digital archive for 7 years before having ready an initial release version that presents... scanned part by Google, in part by me personally, every issue and very near every page of this publication.

For the most part, Google and I scanned original issues. But in some cases, I used the Greenwood Reprint. I used it to present the May 1914 issue, which Google did not present, and which I could not lay hands on as an original.

I sometimes used the Greenwood Reprint to scan line drawn art (which looks pretty near as good in the reprint as it does in the original, given the nature of Greenwood Press' reduction technology), where Google's scans of such were particularly poor. Where I lacked original issues, and needed to replace missing or mis-registered pages in the Google issues, I'd go to the Greenwood Reprint. Their text pages look as good or better in a scan as a scan from the original. The reproductions of photos in the Greenwood Reprint are a bit inferior in quality, when scanned, to a scan of the photo in the original publication, due to issues relating to analog reproduction of photos and to the half tone means of representing the photo in the original publication. But given the intrinsic technological issues involved, Greenwood Press did an exceptional job of reproducing those photos. An immensely exceptional job.

So first off, I want to thank the now long gone Greenwood Press, for providing for me a high quality "court of last resort" for obtaining missing pages and issues.

I want to thank John Durham of Bolerium Books, Lorne Bair of Lorne Bair Books, and Eugene Povirk of Southpaw Books, who looked out for and sold me when they got hold of them many of the issues of ISR currently in my collection, which I personally scanned for this project.

I want to thank Holt Labor Library for loaning to me, and permitting me to disassemble as needed, its holdings of International Socialist Review, to augment the number of issues I could present scanned from original issues.

I want to give very special thanks to Tim Davenport, a history scholar currently working on the definitive six volume collection of quotes of Eugene Debs. Tim graciously, and out of his immense commitment to making this history freely available to all, digitally, loaned me all of his holdings of ISR, including one bound volume of 1915 (12 issues), and permitted me to unbind the volume and, as needed, all issues, so that the highest quality scans could be made of them on a flat bed scanner.

I want to thank Julie Herrada, director of Labadie Library, who kindly scanned a color cover for me from Labadie Library's originals.

I want to thank Paul Thomas, a librarian at the Hoover Library at Stanford, who kindly looked into whether his library held specific covers I wanted color scans of. Paul had helped me out extensively in this fashion with a previous archive I made of Labor Defender, but unfortunately this time was unable to find holdings of the covers I needed.

Last but certainly not least, I want to thank David Walters, director of Holt Labor Library and a founding administrator of Marxists Internet Archive. David was the one who got me started on this now decade-long project of digitally archiving the periodicals and pamphlets of the socialist and communist left in the USA in the 20ieth century. David and I have worked closely together planning, scanning, and presenting on MIA hundreds of thousands of pages I've scanned as part of the Riazanov Library digital archive project. In the case of this archive, David approved the loan and disassembly as needed of the ISR holdings of Holt Labor Library for the project. There's quite a story of how this project began and how it grew... but that will have to be told elsewhere.

This was an unusual project for me. MOST of the time when I do a big digital archive project, of ten or more thousand pages, I personally (or I with one assistant) do the scanning. In this case, all of the scans of the all text early issues (Vols 1 - 8), and perhaps 45% of the scans of the more historically, politically, and visually interesting later issues (1909 and up) were done by Google, and presented here, after having between 0% and 30% of the pages in those issues deemed by me to be unacceptable in quality, or entirely missing, and thus scanned and restored to / substituted in Google's scan for that issue.

Technical specs on the scans of ISR I made: (this note written May 2019)

Years ago I scanned some of the original ISR issues ... including all issues in Volume 17, among others. At that time, early on in my learning the art of scanning, I used lower resolution and slightly different general techniques than those I deem appropriate today. Partly out of ignorance, and partly because I didn't have as good scanning equipment, and partly because of, back then, a sense of some degree of priority on not making the scans bigger than a certain file size... a size that now is quite reasonable given increases in data storage capacity and in data transfer rates.

In these earlier scans, I used:

400 dpi single bit BW for text.
600 dpi single bit BW for line drawn art.
600 dpi 8 bit gray scale for photos and for some art.
600 dpi 24 bit color for color covers.

I also in those earlier scans made very heavy use of multiple scans for pages bearing different sorts of material (such as text and photos, or art and photos)

The issues I scanned more recently, in 2019, I approached somewhat differently:

600 dpi single bit BW for text
1200 dpi (at times 2400 dpi) single bit BW: line drawn and some other art.
400 dpi 8 bit gray scale
600 dpi 8 bit gray scale for full page scans of photos.
1200 dpi (at times 2400) dpi 8 bit gray scale: small crops of some art
400 dpi color: some back covers with slight use of a single color
600 dpi color: color covers

While I continue to use crops done in different modes from the mode used to scan the whole page, I changed my scanning strategy of late somewhat, so as to have this occur a lot less often... so that I had multiple scans of the same page with significantly fewer pages than was the case before.

Martin H. Goodman MD
Director, Riazanov Library digital archive projects
May 2019 San Pablo, CA USA



Last updated on 15 May 2019