Beyond Turnitin and anti-plagiarism softwares

In my university, the European University Institute, Florence, Italy, the Dean of Studies and the Academic Service decided recently to introduce systematically the use of an anti-plagiarism software. The reason is for single Ph.D. researchers to look at the various chapters and drafts of their dissertation during the four years research/writing process and verify the originality of the contents. We want to avoid to have researchers be shamed and expelled out of the community of scholars like this student in Norway!

So, at the end of the process, when the thesis is submitted, each supervisor should perform this new task against plagiarism directly on the manuscript of his/her supervise. This task -and the instruments that are available to perform it- are today an evidence of the worldwide shift towards digital. It is taken for granted that everything we write is somewhere in the virtual space and can be retrieved and analyzed to avoid using someone else’s ideas without acknowledging it. This is an extraordinary shift in the humanities sciences towards “other” humanities. It introduced a bit of digital humanities for everybody in a way!

At the EUI, this task which was performed by the staff of the Dean of Studies and the Academic Service, has now to be performed directly by the thesis supervisor before the decision taken by the departments to officially accept that a candidate submit a thesis for discussion with the jury. The software Turnitin has been chosen and new administrative rules introduced on how to use it. Now, scholars on both side of the Ph.D. writing process: he who writes it and he who is supervising it, are both involved with digital tools. This is something that never happened before.

Introductory courses to plagiarism, originality check, good academic practices and, finally, to Turnitin itself, have been organized for the first time this academic year 2013-2014 for all new doctoral researchers. As History Information Specialist, I was asked to give my contribution both to the general discussion about plagiarism and to the correct way to use quotations in one’s own research/writing activity. As far as the history department is concerned, I am helping to prepare all its members –researchers, fellows and professors- to understand how they should proceed with the software. I will teach some Atelier Multimédia courses about it. For doing so, I would like to have the input of THATcamp Leadership. The first introductory course, the 8th of October will be about Good Academic Practice and the Avoidance of Plagiarism. But it’s not this specific contribution -in the EUI context- that I would like to question here. I would like to bring to the attention of THATcamp Leadership participants, what were the many queries and reflections on the use of such a software that challenged –at least for me- a “simple” task: showing how to use Turnitin. This task became more complicated than I thought. I started to think beyond plagiarism and to look at what an “originality check” was meaning in a new digital scholarly process in the Humanities and History. What could we all do with Turnitin ? And taken for granted that all EUI scholars will have to use it, what should I tell to those who never used any software before?

So my questions to TC Leadership would be to look at this software (and other similar softwares) from a different viewpoint. Is it possible to allow our community of humanists and social scientists to integrate one of the most important methods that enriched the process of document retrieval and document analysis in the field of Digital Humanities -“text-mining”- when teaching how to use a plagiarism software? Here are some possible issues to discuss during THATcamp:

Turnitin is a software against plagiarism. Are they any other softwares you would recommend and why? Anything in the OA/OS world ?
Do you use these softwares only for originality checking and fighting plagiarism?
Which other tasks could they perform ? Are they allowing us to know more and more easily about the deep web contents? And if so how and why?
How could we trace the originality of translated texts -from English to other languages and vice-versa-, using different languages corpora?
Could we think to use Turnitin to understand who is quoting what and in which contexts and the many other ways we interact with big online commercial textual databases like EEBO, ECCO, MOMW I & II, etc., or with open access web databases like Rousseau online ?
Up to which extend, these textual databases accessed through Turnitin, would allow contextualized keyword searching, similarity searching, frequency searching, etc., so to understand if a quotation we plan to use has already been used entirely or partially in other writings, how, where and by whom?
Could we perform with Turnitin a much more complex citations search then the one we were allowed to perform from years now with the Web of Knowledge (ISI) when, looking at the footnotes in a scholarly paper, we deduce that if somebody uses the same quotations, he/she may research in the same field and have similar ideas?
Which text-mining activities are allowed using these software’s if we accept the fact that Turnitin is a good Digital Humanities tool, able to perform one of the most important tasks within “big amount of data’s”: distance/close reading, searching for contexts, origin of quotations, places of words in millions of documents?
And, as a consequence, could we discuss if this is not only about plagiarism but if these kind of software’s may become a vector to introducing wider communities –not only the digital humanities community- to new ways to perform their research activities? Are they taking care in a daily research activity -and even without knowing about it-, of some characteristics, of both the linguistic turn and the digital turn if we may use big concepts ?

Turnitin seems to be an instrument that allows new digital experiments with, unfortunately some technical limitations. Our session could try to problematically look at the systematic introduction of these tools in universities worldwide: now that you know how to use it and what’s in it, which tasks do you think you could perform with such a tool ? In what ways this instrument could become useful to you ? And, this is maybe the most important question, in a global world where digital documents and primary sources aren’t all written in English, how these experiments with digital texts could take care of different cultural and linguistic frameworks ?

Show me your data: Scholarly notes, public expectations

The “Show Me Your Data” session proposal from THATCamp CHNM 2012 summarized the open research notes movement this way:

There has been some move in research to not just publish papers with the final results but to also release the raw data sets and even software for other researchers to verify the results and further discovery. There are even some futuristic claims that the data sets will be viewed as the ultimate results of research and the actual paper will be a secondary product.

That session focused on institutional repositories as ways to present data, but we’d like to focus more on the challenges posed by releasing research data to the public.

What happens when data collected for a monograph is removed from context? Are there different scholarly and interpretive requirements for data presented at a single-record level? When the same data is of interest to scholars and the general public, but the goals of each constituency are radically different, what happens?

We’d like to kick off the conversation by discussing a collaboration in progress. In research for Take Care of the Living, Jeffrey McClurken compiled a database of census and civil war service records for Pittsylvania County, Virginia. As this database is of tremendous interest to local historians and genealogists, and since his own family is connected to that county, Ben Brumfield offered to put that database online. However, the process has been challenging, as the interests and expectations of the public may be quite different from peer researchers, and a database compiled in support of a particular scholarly project turns out to be very different from a general-purpose database compiled for public use.

Let’s Make a Humanities Pre-Print Server

There are many complicated debates about open-access, peer review, and the economics of publishing. It’s complicated, and many ideas have been proposed. For the sake of brevity, I’m going to summarize two of them. The conservative position is that pre-publication peer-review is essential to good scholarly work. It’s fair to say that this is the default position of most scholars and scholarly institutions. The radical position is that scholars should “liberate” their scholarship and publish only in open-access venues. As you would expect, these two ideas frequently antagonistic. Most of the concrete proposals are essentially competitive, as in attempts to replace existing journals with open-access journals or to move peer review to post-publication.

But there is no reason that the scholarly value of pre-publication peer-review and the scholarly value of open access need to conflict. What the academy needs is a solution that is realistic, and recognizes that the entrenched system of corporate publication and tenure review is unlikely to go away, or at least unlikely to change quickly. And it needs a solution that is optimistic because it tries to take advantage of the internet’s low marginal costs and rapid distribution that makes open access publication possible.

Our colleagues in physics, mathematics, computer sciences, and the like already have such a solution in the arXiv e-print server. arXiv hosts pre-prints (or “e-prints”) of articles that will be published in peer review journals. Scholars upload these documents which are then freely available to the world much sooner than they will be available in gated journals. (There are many descriptors for levels of open access: let’s call this “good enough” open access.) For those who need them, the peer-reviewed version of the articles will still be available in the traditional venues.

I propose a session that will bring together people who are interested in bringing about a pre-print server for the humanities. Make no mistake: the problem is not technological, it is institutional. What is needed to change academic publishing is the will to put such a solution in place—in a word, leadership. These are the kinds of people at THATCamp Leadership who could help such a session:

scholars who could explain what they would hope to gain from a pre-print server,
leaders of professional organizations (AHA, OAH, MLA, ACLS, etc.) who could make the idea palatable to scholars in their disciplines,
grant writers and university administrators—especially in libraries—who would be willing to underwrite such an experimental project, and
coder-scholars who would be able to build a prototype, or at least to discuss what would go into a prototype.

The goal of the session will be to produce a brief document that will describe the essentials of a humanities pre-print server. And hopefully the session will forge connections between the people who can make this idea happen.

Geospatial Showcase

I’d like to propose a show-and-tell session for people who are making maps or want to get started with them. Anyone who wants to participate can spend a few minutes showing a map they made—and preferably, actually rendering the map in front of everyone else. Hopefully we’ll have a diversity of mapping methods which will give us a quick overview of the possibilities. Then for the remainder of the session, we’ll talk about what was interesting in the maps we saw, and how to make them. Perhaps we’ll break up into smaller groups so that mapping masters can give impromptu tutorials to beginners. By bringing together all the mapmakers into one place, I hope people will also be able to find someone who has already solved some of the problems they’re facing.

Policies and Safe Spaces for Diversity

Last August, Kate Losse wrote a brief post about how “breaking things” is a white male privilege. I’ve kept coming back to this post the last month or so. For years, I’ve been telling people to not be afraid to break stuff. That’s how I learned how to much much of the work I’ve been doing in the digital humanities for over a decade now. I still break stuff, and still learn from it. I’ve always thought it’s a good way to learn, but I had never considered it a privilege of being a white man to be able to break things until I read Loss’s piece. It never occurred to me that such an approach could be a privilege for a particular race and gender. It never occurred to me—and I’m embarrassed to admit it—that someone could not take this approach because of their race or gender or class or any other number of reasons. (Of course, like most white heterosexual men, I’m quite unaware of all the privileges I have. I willfully acknowledge it, and am in no way proud of it.) Knowing what I do know about our society and culture, it’s blatantly obvious to me that this would be true. But being able to break stuff, being able to try things out without permission and fear of criticism or backlash was one of the reasons we started THATCamp five or six years ago. I still think its a great approach, but its one THATCamp needs to work harder to open up to more people. THATCamp needs to grapple with who gets to do it, and more importantly who feels like they have permission to do that.

Similarly, earlier last month, someone on Twitter noted they were hesitant to attend a THATCamp that lacked a public anti-harassment policy. This also had never occurred to me (and once again I’m embarrassed to admit it), and made me sad and angry and disappointed that anyone would feel they wouldn’t be welcome at a THATCamp or would be harassed. (To be very clear, I’m not at all sad and angry and disappointed with the person who first posted this.) Amanda and I chimed in with interest to begin composing an anti-harassment policy, and Amanda forked the Code4Lib anti-harassment policy as a starting point. That policy itself contains links to other policies, all of which I think should be required reading for anyone organizing an event, and required reading for anyone who thinks a policy is unnecessary). But it seems like this is only the very tip of the very large iceberg that is diversity and THATCamp that we should more deliberately and sustainably address. No one should feel like they can’t attend a THATCamp out of fear of harassment or unwelcomeness.

Both of these stories to me highlight a need for THATCamp to develop policies and spaces that foster comfort and confidence and diversity within and beyond THATCamp, and I can’t think of a more important and relevant topic for THATCamp Leadership to take on. I’d like to help organize a session or set of sessions that address ways THATCamp can contribute positively to already ongoing conversations on diversity, tolerance, and DH, and even begin developing documentation the THATCamp community can use for their own individual camps. I won’t claim to be the best person to lead these sessions—I have tons to learn, and I want to learn—but I want to help organize them, or at the very least strongly support having them at THATCamp Leadership.

These sessions should go beyond developing formal policies for things like anti-harassments, and more deliberately consider how the tone and language and character of camps can be safe and inviting and fun for a variety of people. In my experience, digital humanities as a whole, and THATCamp more specifically, is one of the more tolerant and accepting communities that exist, but there is plenty of room for improvement, and I’d hope these sessions would focus on those ways to improve, to take seriously any points from any person to consider for improvement. It doesn’t seem enough to me to develop a one-page document that more or less says “You can’t harass people.” Language and tone and character of everything THATCamps produce is more important, and should be obviously contribute to making THATCamps safer and more encouraging.

In the end, I want to help make it so every person who attends a THATCamp leaves more confident in themselves, and has more friends who value their ideas and perspectives, than they did before they attended THATCamp. The scope of these sessions is vast, but that shouldn’t deter us from having them and trying to nurture some positive outcomes.

I hope we can discuss some specific topics to cover in the comments, and possible outcomes of such sessions. Please feel free to share any ideas, links to stuff we should read/watch/hear for the conversation, in the comments below. If you’d like to talk to me directly, feel free to email me or ping me on Twitter.

Talk Session: Spreading Innovation

It seems like I know many early adopters in the digital humanities, especially at small liberal arts colleges. I’m interested in how we can cross the chasm. How do we move digital humanities into the mainstream? Having recently started a new position as Director of Instructional and Emerging Technology, I am conscious of the need to encourage innovation but also to move innovations into the mainstream (and figuring out which ones deserve to be moved). We also have a task force on Academic Innovation and New Educational Approaches working right now.

One article about spreading innovation in science education is: Adrianna Kezar. “The Path to Pedagogical Reform in the Sciences: Engaging Mutual Adaptation and Social Movement Models of Change.” Liberal Education 98, no. 1 (Winter 2012): 40–45. It is available online here: www.aacu.org/liberaleducation/le-wi12/kezar.cfm

Discussion might center on successful strategies for evaluating innovation and spreading it across campus.

Call for Proposals for THATCamp Leadership

DC is a bit of a shutdown town right now — it’s a little hard to think of anything else. And the shutdown means that THATCamp Leadership next week may well be deprived of the insight and company of some of our friends and colleagues from the NEH, NARA, the Library of Congress, and the Smithsonian. Nevertheless, because THATCamp Leadership is an unconference whose agenda hasn’t been set yet, at least we now have the option of putting discussions of the shutdown on said agenda if we so choose.

Even if you don’t want to talk about the shutdown, though, you can still say what you would like to talk about — or make, or learn, or teach, or play — and your saying will help determine what we do next Thursday. There are two session proposals already up at the THATCamp Leadership site, one (mine) proposing a discussion of Purdue’s “Signals” software and one from Ryan Cordell on “THATCamp Hierarchy,” and you can read more about proposing sessions on the Propose page at leadership2013.thatcamp.org/propose/.

Remember that to propose a session, you will need to log in to the site at leadership2013.thatcamp.org/wp-login.php (click “Lost your password?” on that page if need be) and go to Posts –> Add New.

There you will be able to write a brief description of a session you’d like to facilitate. You can also add categories from the right-hand side, including the category “Session Proposals.” If all that sounds like too much, don’t worry: you can always suggest something on Thursday morning.

You can read through (and “favorite”) proposals in the coming days, and we’ll also post them around the Mason Inn on the morning of the unconference. If you’d like to get the proposals by email, go to the “Entries by email” link on the front page at leadership2013.thatcamp.org.

Looking forward to foregathering next week. See you online before then, and see you in person soon.

Talk Session: On the Signals software

I was very interested to read a story in last week’s Chronicle of Higher Education about “Signals,” a piece of software developed at Purdue that gives students feedback about how they’re doing in a course: chronicle.com/blogs/wiredcampus/purdue-u-software-prompt-students-to-study-and-graduate/46853 The data on the software’s effect on student retention was truly astonishing. Signals seems to be tightly integrated with Blackboard, but it sounds reminiscent of the promise of MOOCs to improve “learning analytics”: online.stanford.edu/news/2013/04/11/learning-analytics-stanford-takes-huge-leap-forward-moocs, even though the analytics it presents are available to students. We talked about the Signals story a bit on our podcast, Digital Campus, but I’d love to look into it even more, and to get different perspectives. It did make me wonder whether MOOCs even have the right approach to learning analytics, since mostly what I’ve heard about them suggests that such analytics are only provided to faculty and administrators rather than to the students themselves. It also made me wonder whether software like Purdue’s could be adopted and customized by universities themselves using in-house developers.

THATCamp Hierarchy 2014?

I’ve been saying lately—and I know I’m not alone here—that I’m suffering a bit of “THATCamp fatigue.” The unconference format, which was truly exciting, even liberating, the first time I experienced a THATCamp, has become a bit stale for me. One reason for that staleness, I think, is that the loose structure can inhibit as much as it can facilitate conversation and work. It sometimes seems the same introductory conversations recur over and over: (how) can one get professional credit for digital work? How do you use Omeka/Voyant/another DH tool in the classroom? How can we improve access to DH for those outside R1 universities? &c. &c.

I don’t want to denigrate such conversations: they’re necessary for the field, and especially for those just entering the field. But I worry there’s higher-level work that could be happening at THATCamps if there were space to make clearer distinctions among participants, not along typical lines, perhaps, but distinctions. If we could say something like, “this session on geospatial literary analysis will presume a good working knowledge of ArcGIS,” then we could move beyond talking about mapping and toward doing some cool mapping.

I realize there’s nothing hard-wired into the current structure of THATCamp to prevent organizing such sessions, and that high-level discussion and building has certainly happened at THATCamps. But I do think there’s some soft-wiring at play in most sessions: people read the injunctions toward welcoming and away from hierarchy as requiring all sessions to speak to all possible campers. And so sessions drift inexorably toward 101.

I don’t know how to address this problem. We might develop a rankings system for proposed sessions that would signal the level of expertise expected, but that does seem to jostle the THATCamp ethos. We might develop guidelines making it clear(er) that advanced sessions are welcome. I don’t have a good answer, but I would like to do some meta-thinking about how THATCamp might address this issue moving forward. THATCamp provides a valuable on-ramp for DH newcomers, but it will need to be more than just an on-ramp in order to keep experienced practitioners invested in the movement.

Preparing for THATCamp Leadership

Hi all — I’m so glad you’ve agreed to come to THATCamp Leadership. For some of you, this will be only one of many THATCamps you’ve attended (or organized): for others of you, it will be the first THATCamp and probably the first unconference you’ve ever attended. I’m writing this in order to give you more information about the purposes and processes of this special event.

First, a bit about purpose. THATCamp Leadership will

introduce academic leaders such as yourself to THATCamp and to the unconference model;
carve out some retreat-like time for discussion of and work on issues related to the humanities and technology; and
begin the process of creating a THATCamp Coordinating Council to administer the THATCamp project.

“That’s all very well,” you may be saying to yourself, “but how on earth does one approach this ‘unconference’ animal? What is expected at THATCamp Leadership, what will happen there, and what am I to do?”

Here, then, is a bit about process. To prepare for THATCamp Leadership, you should

pack your business casual clothes,
bring a laptop (or a tablet, though I advise a laptop),
have an idea for a session (optional, but strongly recommended), and
show up.

That’s all, really. The essence of an unconference is that it is participant-driven, and therefore those who attend will collaboratively decide what the agenda will be in the first session of the first day. What will happen on Thursday, October 10th, 2013 at THATCamp Leadership is therefore mostly to be determined — by us. Most people propose a session (or several) a week or two before the event begins, but you can also chime in with an idea as we work together to set the schedule at the beginning of the day on 10/10/13.

You can learn more about the principles and processes of THATCamp at THATCamp 101, and you can read further about proposing sessions and see examples of session proposals at leadership2013.thatcamp.org/propose/. I will also email you about October 1st to encourage you to propose something and to explain the process further.

Tomorrow I will add all participants to the THATCamp Leadership site as users; you will receive your username and password (if you don’t already have one) by email. This will allow you to log in to the site at leadership2013.thatcamp.org/wp-login.php and edit your profile (which will be listed on the Participants page) and/or write an early proposal. Don’t hesitate to write me at info@thatcamp.org if you have questions. I’m looking forward to unconferencing with you all.