Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

What is the acceptable similarity in a mathematics PhD dissertation when checking by Turnitin?

I have checked the originality of my PhD thesis in mathematics using Turnitin . The similarity was 31%. Is this percentage acceptable by most committees?

  • mathematics
  • plagiarism-checker

Sursula's user avatar

  • 1 I would imagine that this would vary for university to university. –  user21984 Commented Oct 10, 2014 at 10:45
  • 7 Provided that we are speaking about the ratings provided by automated tools (somewhat implied by the "similarity"): the answer should probably be that it does not matter. Any single copied paragraph that is beyond coincidence is a reason for rejecting the thesis. At the same time, for a thesis, someone should definitely check all potential cases of plagiarism that an automated tool provides. Otherwise the department would use the automated checking tool in a plain wrong way. If the rating is "90% plagiarism", but all cases found by the tool are false-positives, then this should be fine. –  DCTLib Commented Oct 10, 2014 at 11:08
  • 1 Different (but similar) question with good answers and comments is asked in this link: What is the range of percentage similarity of plagiarism for a review article? –  enthu Commented Oct 10, 2014 at 11:14
  • 13 Surely you know whether or not you've plagiarised. If you have, you'll be removing that plagiarism before you submit, regardless of the Turnitin score. So why are you checking your own thesis with Turnitin? –  410 gone Commented Oct 10, 2014 at 14:19
  • 1 @FranckDernoncourt: I do not think anybody will need a link to Turnitin. See also this Meta discussion . –  Wrzlprmft ♦ Commented Oct 10, 2014 at 16:53

5 Answers 5

Is this percentage acceptable by most committees?

This is the wrong question to be asking, since academic decisions are not made based on a numerical measure of similarity from a computer program. The purpose of this software is to flag suspicious cases for humans to examine more carefully. It will identify passages that appear similar to other writings, but it can't decide whether that constitutes plagiarism.

For example, part of your thesis might be based on previous papers you have written. In some circumstances, it may be reasonable to copy text from these papers. (You need to check that your advisor approves and that it doesn't conflict with any university regulations or the publishing agreement with the publisher.) Of course you would need to cite the papers and clearly indicate the overlap. It's not plagiarism if you do that, but Turnitin doesn't understand what you've written well enough to distinguish it from plagiarism. So it's possible that Turnitin would flag lots of suspicious sections, but that your committee would look at them and see that everything is cited appropriately.

If you haven't committed any plagiarism, then you don't need to worry about this at all. If you genuinely write everything yourself (or carefully quote and cite anything you didn't write), then there's no way you could accidentally write something that looks like proof of plagiarism. There's just too much possible variation, and the probability of matching someone else's words by chance is negligible. The worst case scenario is that Turnitin flags something due to algorithmic limitations or a poor underlying model, but human review shows that it is not actually worrisome. (Nobody trusts Turnitin more than they trust their own judgment.)

I'll assume you don't know you've committed plagiarism, but it is possible that you honestly wouldn't know? Unfortunately, the answer is yes if you have certain bad writing habits. For example, it's dangerous to write while having another reference open in front of you to compare with. Even if you don't copy anything verbatim, it's easy to write something that's just an adaptation of the original source (maybe rewording sentences or rearranging things slightly, but clearly based on the original).

If that's what worries you, then you should take a look at the most suspicious passages found by Turnitin. If they look like an adaptation of another source, then it's worth rewriting them. If they don't, then maybe Turnitin is worrying you unnecessarily.

But in any case a plagiarism finding won't just come down to a percentage of similarity. Any percentage greater than 0 is too much for actual plagiarism, and no percentage is too high if it reflects limitations of the software rather than actual plagiarism.

Anonymous Mathematician's user avatar

  • 5 Something that may be different for math papers is that many definitions are standard enough that the wording is almost exactly the same in all papers. I would not try to get "creative" with the definition of a complete metric space for example. –  Sasho Nikolov Commented Oct 24, 2014 at 22:31

TurnItIn uses a complicated algorithm to determine whether a piece of text within a larger body of work matches something in its database. The TurnItIn is limited to open access sources and therefore has huge gaps in its ability to detect things. Further, while TurnItIn can in some cases exclude things like references and quotes from the similarity index, it sometimes fails. Overall, when my department's academic misconduct committee looks at TurnItIn reports we essentially ignore the overall similarity index. We do not completely ignore it in that it guides how we are going to further examine the document.

We employ 4 different strategies based on whether the similarity index is 0, between 1 and 20 percent, between 20 and 40 percent, and over 40+ percent. A piece of work with a similarity index of 0 is pretty rare and generally means that students have manipulated the document in a way that TurnItIn cannot process it (e.g., if a paper is converted to an image file and then converted to a pdf, there is no text for TurnItIn to analyse). A similarity index less than 20 percent can arise from work that contains no plagiarism with the similarity being quotes and references and small meaningless sentences. The key here is "meaningless". For example, there are only so many ways of saying "we did a t-test between the two groups" and it is reasonable to assume that someone else has used exactly the same wording. A piece of work with a similarity index less than 20 percent can also, however, include a huge amount of plagiarised material. A similarity index between 20-40 percent generally means there is a problem unless a large portion of text that should have been skipped was not (e.g., block quotes, reference lists, or appendices of common tables). A similarity index in excess of 40 percent is almost always problematic.

You really should not depend on the overall similarity index. First and foremost you should depend on your own following of good academic practices. If you have followed good academic practices, there really is no need for TurnItIn. If you want to use the TurnItIn report, you should look at what is being match and ask yourself why it is matching. If it found something your "accidentally" cut and paste, or "inadvertently" did not reword appropriately, fix it and use that as a wake up call to improve your academic practice. If everything it is finding are properly attributed quotes or common tables (or questionnaires, etc) and references then there is no problem.

StrongBad's user avatar

I have some familiarity with Turnitin, though that was way back in undergrad. The thing about similarity engines is that they aren't perfect.

It's important to consider exactly how Turnitin describes itself on its FAQ .

What does TurnItIn actually do?

Turnitin determines if text in a paper matches text in any of the Turnitin databases. By itself, Turnitin does not detect or determine plagiarism — it just detects matching text to help instructors determine if plagiarism has occurred. Indeed, the text in the student’s paper that is found to match a source may be properly cited and attributed.

When we were testing Turnitin in high school (probably a decade ago) with a short writing prompt (~page or two) with a single source, the entire class ended up getting 15 to 20% similarity score, because not only did our sources match, but our quotes matched. No surprise there, really.

Now, consider how large Turnitin's database has grown. If this FAQ is to be trusted, you're comparing your paper to more than 80 thousand journals.

Turnitin’s proprietary software then compares the paper’s text to a vast database of 12+ billion pages of digital content (including archived internet content that is no longer available on the live web) as well as over 110 million papers in the student paper archive, and 80,000+ professional, academic and commercial journals and publications. We’re adding new content through new partnerships all the time. For example, our partner CrossRef boasts 500-plus members that include publishers such as Elsevier and the IEEE, and has already added hundreds of millions of pages of new content to our database.

If I recall correctly, you can see exactly where your paper has similarity with others, so you can pull that up.

Sources of Similarity

My bet is that your paper cites papers almost identically to how another paper cites theirs. The great benefit of commonplace citing techniques like APA and MLA is that they're consistent.

If you cite, for example, the general APA format from Purdue, and someone else cites it, they're going to match at almost 100%.

Angeli, E., Wagner, J., Lawrick, E., Moore, K., Anderson, M., Soderlund, L., & Brizee, A. (2010, May 5). General format. Retrieved from http://owl.english.purdue.edu/owl/resource/560/01/

The chances of you citing a paper that has never been cited before when compared to the world of science is, let's face it, probably 0%. Someone out there has cited your sources at some point. With sources being at times up to 10% of the paper's length, that's an easy portion we can knock out.

The other portion likely has to do with the vernacular that is used to describe a situation. Let's go with the following statement, written entirely off the top of my head.

Java is an object-oriented programming language.

Pretty simple statement, and true enough that it has been mentioned 260,000 times already, in that exact wording.

Similarity for that statement is 100% if it were to check for that. But when you make it loosely checked for similarity (i.e. remove the quotes from the search), you get several million hits.

Does that mean I plagiarized? Nope. Would TurnItIn flag it? Definitely. Consider how likely everyday people great each other with "How was your weekend?" Are we plagiarizing each other's greetings? Nope. We pick up similarities in how we control language to understand each other, and that shows in papers, where we describe confidence intervals, methodologies, and processes the same way.

Perhaps even more terrifying in considering the similarity score, is that it will likely evaluate the two following statements similar:

Statement 1

The double helix of DNA was first discovered by the combined efforts of Watson and Crick. Watson and Crick would later get a Nobel Prize for their efforts.

Statement 2

The double helix of DNA was not first discovered by the combined efforts of Watson and Crick, but by Franklin. Watson and Crick would later get a Nobel Prize for her efforts.

Two very similar sentences. 80-90% similarity word-wise. Meaning-wise? Completely different. That's why the human element is required. We can tell those two statements tell an entirely different story when read. These small similar sets of wording add up quite quickly, and a 30% similarity in your case, given the level of research probably done in whatever your field is, and the amount of sources you have probably cited (100+?) is unlikely to be anything to fret about in this day and age.

Community's user avatar

  • 1 Nice analysis, but I disagree with your conclusions. Do you have any evidence that "30% similarity is unlikely to be anything to fret about in this day and age." I haven't calculated the numbers for my department, although it might be worth doing, but I would estimate that over 3/4 of the cases of academic misconduct I have seen have an overall similarity index of less than 30%. Harder for me to estimate is the percentage of work that has a similarity index in excess of 30 percent that did not involve academic misconduct. –  StrongBad Commented Oct 10, 2014 at 14:23
  • @StrongBad I mean in his case, not in general, sorry D: I'm sure if we really wanted to, we could definitely break TurnItIn by forceful plagiarism at a <10% rating, and I know students will likely do that. I'll edit it to reflect that. –  Compass Commented Oct 10, 2014 at 14:25

From my experience with Ithenticate (the version of turnitin for journals and conference proceedings), I'd say that 30% similarity most likely indicates significant plagiarism or self-plagiarism (recycling of text.) I would certainly investigate further to understand exactly where the similar text was coming from.

If the similar text is taken from sources written by other authors, then I would investigate further by reading the text carefully and comparing it with the sources. There are certainly false alarms raised by this type of software. For example, common phrases like "Without loss of generality, we can assume that..." and "Partial differential equation boundary value problem" will be flagged. Standard definitions are also commonly flagged. However, if I see long narrative paragraphs with significant copying, that's clearly plagiarism.

It's traditional at many universities to staple together a bunch of papers and call it a dissertation. Conversely, it's also very common to slightly rewrite chapters of a dissertation and turn them into papers. Either way, this is "text recycling."

Now that text recycling can be easily detected, commercial publishers are cracking down on it for a variety of reasons. First, the publisher might get sued for copyright violation if the holder of the copyright on the previously published text objects. A different objection is that the material shouldn't be published because it isn't original. As a result, text recycling between two published papers (in conference proceedings or journal articles) is rapidly becoming a thing of the past. This has upset many academics who have made a habit of reusing text from one paper to the next. Some feel that if the reused text is from a methods section or literature review, than the copying is harmless. Publishers typically take a harder line.

The situation with dissertations is somewhat different. In one direction journals have always been willing to accept papers that are substantially based on dissertation chapters with minimal rewriting. Since the student usually retains copyright on the thesis itself, there's no particular problem with copyright violation. Since dissertations traditionally weren't widely distributed, publishers didn't care that the material had been "previously published." I don't really expect this to change much in the near future.

In the other direction, there are two issues: First, will the publisher of journal articles object to reuse of the text in the dissertation as a copyright violation? You'd need to check with the publisher. Second, will the university be willing to accept a dissertation (and perhaps publish it through Proquest or its own online dissertation web site) that contains material that has been separately published? That really depends on the policy of your university and the particular opinions of your advisor and committee.

Brian Borchers's user avatar

I have used websites in the past to help with similar content, they will give you a report of what was found online and help remove/reword the similar content so you don't have to worry about your document being marked as plagiarism.

eykanal's user avatar

  • He doesn't have to worry about the document being marked as plagiarism, no matter what a program says. –  Austin Henley Commented Oct 24, 2014 at 17:57

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged phd mathematics thesis plagiarism plagiarism-checker ..

  • Featured on Meta
  • Upcoming initiatives on Stack Overflow and across the Stack Exchange network...
  • Announcing a change to the data-dump process

Hot Network Questions

  • Does `chfn` provide a mobile phone parameter?
  • USB A mechanical orientation?
  • Solution for a modern nation that mustn't see themselves or their own reflection
  • Sorting with a deque
  • What is this font called and how to study it
  • Okay to travel to places where the locals are unwelcoming?
  • Has an aircraft ever crashed due to it exceeding its ceiling altitude?
  • CH32V003 max GPIO current
  • Is Shunt Jumper for Power Plane jumper, a good idea?
  • How could breastfeeding/suckling work for a beaked animal?
  • Zener instead of resistor divider on dc-dc converter feedback pin
  • Does quick review means likely rejection?
  • Was supposed to be co-signer on auto for daughter but I’m listed the buyer
  • Need help deciphering word written in Fraktur
  • Finite verification for theorems due to Busy Beaver numbers
  • Can a bad paper title hurt you? Can a good one help you?
  • Binary search tree with height of max 1.44 * log(n) is AVL tree or it's not an iff
  • Confusion on Symmetry in probability
  • Relation between energy and time
  • How can a hazard function be negative?
  • Ideas for cooling a small office space with direct sunlight
  • Edna Andrade's Black Dragon: Winding around control points
  • HPE Smart Array P408i 2GB performance drop
  • Was the head of the Secret Service ever removed for a security failure?

similarity search phd thesis

  • DSpace@MIT Home
  • MIT Libraries
  • Doctoral Theses

Learning task-specific similarity

Thumbnail

Other Contributors

Terms of use, description, date issued, collections.

Show Statistical Information

Go to the homepage

  • Apps & tools
  • Library access browser extension
  • Readspeaker Textaid
  • Access & accounts
  • Accessibility tools

iThenticate – Similarity check for researchers

  • Keylinks Learning Resources
  • Working with courses
  • Faculty support
  • Canvas tools
  • Integrated third party tools
  • Manuals and videos
  • FAQ for teachers and tutors
  • Canvas FAQ for students
  • Open Access Journal Browser
  • Qualtrics survey tool
  • Remote access to licensed resources and software
  • Virtual Research Environment – VRE
  • Wooclap for interaction
  • Zoom Videoconferencing tool
  • Video editing tools
  • Video recording tools
  • Check for software

Why and what?

Maastricht University endorses the principles of scientific integrity and therefore provides services to check for the similarity between documents. Separate services are provided for research and educational purposes.

Every UM-affiliated researcher can use this service. Ithenticate – provided by TurnItIn – compares your submitted work to millions of articles and other published works and billions of webpages.

Check a manuscript / PhD thesis

This tool is for research purposes only!

Not for educational purposes

The Similarity Check Service is not intended for educational purposes (e.g., checking master’s theses for plagiarism). Please use Turnitin Originality instead (available through the digital learning environment Canvas). Turnitin Originality is tailored to the specific requirements for educational purposes.

The maximum number of submissions for these services is adapted to their respective purposes.

Support & Contact

In case you are in doubt about which similarity check service to use for a particular purpose, please contact us so we can find a suitable solution for you while guaranteeing the sustainable availability of the services for all UM scholars.

Plagiarism and how to prevent it

Plagiarism is using someone else’s work or findings without stating the source and thereby implying that the work is your own. When using previously established ideas that add pertinent information in a research paper, every researcher should be cautious not to fall into the trap of sloppy referencing or even plagiarism.

Plagiarism is not just confined to papers, articles or books, it can take many forms (for more information, see this infographic by iThenticate ).

The Similarity Check Service can help you to prevent only one type of plagiarism: verbatim plagiarism, and only if the source is part of the corpus.

The software does not automatically detect plagiarism; it provides insight into the amount of similarity in the text between the uploaded document and other sources in the corpus of the software. This does not mean this part of the text is viewed as plagiarism in your specific field. For instance, the methods section in some subfields follows very common wording, which could lead to a match. If there are instances where the submission’s content is similar to the content in the database, it will be flagged for review and should be evaluated by you.

How to use the service

Getting started.

Go to iThenticate and enter your UM username and password in the appropriate fields. Select ‘login’.

Library Access - login

2. First-time user

As a first-time user, you will then have to check your personal information and declare that you agree to the Terms and conditions.

similarity search phd thesis

3. My Folders and My Documents iThenticate

iThenticate will provide you with a folder group My Folders and a folder within that group titled My Documents.

similarity search phd thesis

From the My Documents folder, you will be able to submit a document by selecting the Submit a document link.

similarity search phd thesis

4. Upload a file

On the Upload a file page, enter the authorship details and the document title. Select Choose File and locate the file on your device.

similarity search phd thesis

Select the Add another file link to add another file. You can add up to ten files before submitting. Select Upload to upload the document(s).

5. Similarity Report

To view the Similarity Report for the paper, select the similarity score in the Report column. It usually takes a couple of minutes for a report to generate.

similarity search phd thesis

Finding your way around

The main navigation bar at the top of the screen has three tabs. Upon logging in, you will automatically land on the folders page.

similarity search phd thesis

This is the main area of iThenticate. From the folders page, you will be able to upload, manage and view documents.

The settings page contains configuration options for the iThenticate interface.

Account Info

The account information page contains the user profile and account usage.

Options for exclusion

There can be various reasons why you may want to exclude certain sources that your document is compared to or certain parts of your document in the similarity check. You can specify options for exclusion in the Folder settings.

similarity search phd thesis

If you choose to exclude ‘small matches’, you will be asked to specify the minimum number of words that you want to be shown as a match.

If you choose to exclude ‘small sources’, you will be asked to specify a minimum number of words or a minimum match percentage.

Once you click Update Settings, the settings will be applied to the particular folder.

Manuals & training videos

iThenticate provides a scale of up-to-date manuals and instructions on their own website. Please consult them here .

You can also use these training videos to learn how to use the service.

Please be aware that information in these manuals and videos about logging in and account settings are not applicable to UM users of this service.

How to read the similarity report

The similarity report provides the percentage of similarity between the submitted document and content in the iThenticate database. This is the type of report that you will use most often for a similarity check.

It is perfectly natural for a submitted document to match against sources in the database, for example if you have used quotes. 

The similarity score simply makes you aware of potential problem areas in the submitted document. These areas should then be reviewed to make sure there is no sloppy referencing or plagiarism.

iThenticate should be used as part of a larger process, in order to determine if a match between your submitted document and content in the database is or is not acceptable.

This video shows how to read the various reports

This video shows how the Document viewer works  

Academic Integrity and Plagiarism

Everyone involved in teaching and research at Maastricht University shares in the responsibility for maintaining academic integrity (see Scientific Integrity ). All academic staff at UM are expected to adhere to the general principles of professional academic practice at all times.

Adhering to those principles also includes preventing sloppy referencing or plagiarism in your publications.

Additional information on how to avoid plagiarism can be also be found in the Copyright portal of the library.

Sources used

iThenticate compares the submitted work to 60 million scholarly articles, books, and conferences proceedings from 115,000 scientific, technical, and medical journals, 114 million Published works from journals, periodicals, magazines, encyclopedias, and abstracts, 68 billion current and archived web pages.

Checking PhD theses

The similarity check service (iThenticate) can be used by doctoral candidates or their supervisors to assess the work. Find out about the level of similarity with other publications and incorrect referencing before you send (parts of) the thesis to the Assessment Committee, a publisher or send in the thesis for deposit in the UM repository.

We kindly request you submit the whole thesis as one document (i.e. not per chapter) and only once to prevent unnecessary draws on the maximum number of submissions, as our contract provides a limited number of checks.

iThenticate FAQ

similarity search phd thesis

Contact & Support

For questions or information, use the web form to contact a library specialist.

Ask Your Librarian - Contact a library specialist

Learning Task-Specific Similarity

The right measure of similarity between examples is important in many areas of computer science. In particular it is a critical component in example- based learning methods. Similarity is commonly defined in terms of a conventional distance function, but such a definition does not necessarily capture the inherent meaning of similarity, which tends to depend on the underlying task. We develop an algorithmic approach to learning similarity from examples of what objects are deemed similar according to the task-specific notion of similarity at hand, as well as optional negative examples. Our learning algorithm constructs, in a greedy fashion, an encoding of the data. This encoding can be seen as an embedding into a space, where a weighted Hamming distance is correlated with the unknown similarity. This allows us to predict when two previously unseen examples are similar and, importantly, to efficiently search a very large database for examples similar to a query.

This approach is tested on a set of standard machine learning benchmark problems. The model of similarity learned with our algorithm provides and improvement over standard example-based classification and regression. We also apply this framework to problems in computer vision: articulated pose estimation of humans from single images, articulated tracking in video, and matching image regions subject to generic visual similarity.

Thesis chapters

  • Front matter (of little scientific interest)
  • Chapter 1: Introduction

This chapter defines some technical concepts, most importantly the notion of similarity we want to model, and provides a brief overview of the contributions of the thesis.

  • Chapter 2: Background

Among the topics covered in this chapter: example-based classification and regression, previous work on learning distances and (dis)similarities (such as MDS), and algorithms for fast search and retrieval, with emphasis on locality sensitive hashing (LSH).

  • Chapter 3: Learning embeddings that reflect similarity
  • Similarity sensitive coding (SSC). This algorithm discretizes each dimension of the data into zero or more bits. Each dimension is considered independently of the rest.
  • Boosted SSC. A modification of SSC in which the code is constructed by greedily collecting discretization bits, thus removing the independence assumption.

The underlying idea of all three algorithms is the same: build an embedding that, based on training examples of similar pairs, maps two similar objects close to each other (with high probability). At the same time, there is an objective to control for "spread": the probability of arbitrary two objects (in particular of dissimilar pairs of objects, if examples of such pairs are available) to be close in the embedding space should be low.

This chapter also describes results of an evaluation of the proposed algorithms on seven benchmarks data sets from UCI and Delve repositories.

  • Chapter 4: Articulated pose estimation

An application of the ideas developed in previous chapters to the problem of pose estimation: inferring the articulated body pose (e.g. the 3D positions of key joints, or values of joint angles) from a single, monocular image containing a person.

  • Chapter 5: Articulated tracking

In a tracking scenario, a sequence of views, rather than a single view of a person, is available. The motion provides additional cues, which are typically used in a probabilistic framework. In this chapter we show how similarity-based algorithms have been used to improve accuracy and speed of two articulated tracking systems: a general motion tracker and a motion-driven animation system focusing on swing dancing.

  • Chapter 6: Learning image patch similarity

An important notion of similarity that is naturally conveyed by examples is the visual similarity of image regions. In this chapter we focus on a particular definition of such similarity, namely invariance under rotation and slight shift. We show how the machinery developed in Chapter 3 allows us to improve matching performance for two popular representations of image patches.

  • Chapter 7: Conclusions
  • Bibliography

Scribbr Plagiarism Checker

Plagiarism checker software for students who value accuracy.

Extensive research shows that Scribbr's plagiarism checker, in partnership with Turnitin, detects plagiarism more accurately than other tools, making it the no. 1 choice for students.

plagiarism-checker-comparison-2022

How Scribbr detects plagiarism better

Scribbr is an authorized Turnitin partner

Powered by leading plagiarism checking software

Scribbr is an authorized partner of Turnitin, a leader in plagiarism prevention. Its software detects everything from exact word matches to synonym swapping .

Exclusive content databases

Access to exclusive content databases

Your submissions are compared to the world’s largest content database , covering 99 billion webpages, 8 million publications, and over 20 languages.

Upload documents to check for self-plagiarism

Comparison against unpublished works

You can upload your previous assignments, referenced works, or a classmate’s paper or essay to catch (self-)plagiarism that is otherwise difficult to detect.

Turnitin Similarity Report

The Scribbr Plagiarism Checker is perfect for you if:

  • Are a student writing an essay or paper
  • Value the confidentiality of your submissions
  • Prefer an accurate plagiarism report
  • Want to compare your work against publications

This tool is not for you if you:

  • Prefer a free plagiarism checker despite a less accurate result
  • Are a copywriter, SEO, or business owner

Get started

Trusted by students and academics worldwide

University applicant checking their essay for plagiarism

University applicants

Ace your admissions essay to your dream college.

Compare your admissions essay to billions of web pages, including other essays.

  • Avoid having your essay flagged or rejected for accidental plagiarism.
  • Make a great first impression on the admissions officer.

Student checking for plagiarism

Submit your assignments with confidence.

Detect plagiarism using software similar to what most universities use.

  • Spot missing citations and improperly quoted or paraphrased content.
  • Avoid grade penalties or academic probation resulting from accidental plagiarism.

Academic working to prevent plagiarism

Take your journal submission to the next level.

Compare your submission to millions of scholarly publications.

  • Protect your reputation as a scholar.
  • Get published by the journal of your choice.

Money-back guarantee

Happiness guarantee

Scribbr’s services are rated 4.9 out of 5 based on 13,360 reviews. We aim to make you just as happy. If not, we’re happy to refund you !

Privacy guarantee

Privacy guarantee

Your submissions will never be added to our content database, and you’ll never get a 100% match at your academic institution.

Price per document

Select your currency

Prices are per check, not a subscription

  • Turnitin-powered plagiarism checker
  • Access to 99.3B web pages & 8M publications
  • Comparison to private papers to avoid self-plagiarism
  • Downloadable plagiarism report
  • Live chat with plagiarism experts
  • Private and confidential

Volume pricing available for institutions. Get in touch.

Request volume pricing

Institutions interested in buying more than 50 plagiarism checks can request a discounted price. Please fill in the form below.

Name * Email * Institution Name * Institution’s website * Country * Phone number Give an indication of how many checks you need * Please indicate how you want to use the checks * Depending of the size of your request, you will be contacted by a representative of either Scribbr or Turnitin. * Required

Avoiding accidental plagiarism

You don't need a plagiarism checker, right?

You would never copy-and-paste someone else’s work, you’re great at paraphrasing, and you always keep a tidy list of your sources handy.

But what about accidental plagiarism ? It’s more common than you think! Maybe you paraphrased a little too closely, or forgot that last citation or set of quotation marks.

Even if you did it by accident, plagiarism is still a serious offense. You may fail your course, or be placed on academic probation. The risks just aren’t worth it.

Scribbr & academic integrity

Scribbr is committed to protecting academic integrity. Our plagiarism checker software, Citation Generator , proofreading services , and free Knowledge Base content are designed to help educate and guide students in avoiding unintentional plagiarism.

We make every effort to prevent our software from being used for fraudulent or manipulative purposes.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Frequently asked questions

No, the Self-Plagiarism Checker does not store your document in any public database.

In addition, you can delete all your personal information and documents from the Scribbr server as soon as you’ve received your plagiarism report.

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

Extensive testing proves that Scribbr’s plagiarism checker is one of the most accurate plagiarism checkers on the market in 2022.

The software detects everything from exact word matches to synonym swapping. It also has access to a full range of source types, including open- and restricted-access journal articles, theses and dissertations, websites, PDFs, and news articles.

At the moment we do not offer a monthly subscription for the Scribbr Plagiarism Checker. This means you won’t be charged on a recurring basis – you only pay for what you use. We believe this provides you with the flexibility to use our service as frequently or infrequently as you need, without being tied to a contract or recurring fee structure.

You can find an overview of the prices per document here:

Small document (up to 7,500 words) $19.95
Normal document (7,500-50,000 words) $29.95
Large document (50,000+ words) $39.95

Please note that we can’t give refunds if you bought the plagiarism check thinking it was a subscription service as communication around this policy is clear throughout the order process.

Your document will be compared to the world’s largest and fastest-growing content database , containing over:

  • 99.3 billion current and historical webpages.
  • 8 million publications from more than 1,700 publishers such as Springer, IEEE, Elsevier, Wiley-Blackwell, and Taylor & Francis.

Note: Scribbr does not have access to Turnitin’s global database with student papers. Only your university can add and compare submissions to this database.

Scribbr’s plagiarism checker offers complete support for 20 languages, including English, Spanish, German, Arabic, and Dutch.

The add-on AI Detector and AI Proofreader are only available in English.

The complete list of supported languages:

If your university uses Turnitin, the result will be very similar to what you see at Scribbr.

The only possible difference is that your university may compare your submission to a private database containing previously submitted student papers. Scribbr does not have access to these private databases (and neither do other plagiarism checkers).

To cater to this, we have the Self-Plagiarism Checker at Scribbr. Just upload any document you used and start the check. You can repeat this as often as you like with all your sources. With your Plagiarism Check order, you get a free pass to use the Self-Plagiarism Checker. Simply upload them to your similarity report and let us do the rest!

Your writing stays private. Your submissions to Scribbr are not published in any public database, so no other plagiarism checker (including those used by universities) will see them.

Open Access Theses and Dissertations

Thursday, April 18, 8:20am (EDT): Searching is temporarily offline. We apologize for the inconvenience and are working to bring searching back up as quickly as possible.

Advanced research and scholarship. Theses and dissertations, free to find, free to use.

Advanced search options

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in any language English Portuguese French German Spanish Swedish Lithuanian Dutch Italian Chinese Finnish Greek Published in any country US or Canada Argentina Australia Austria Belgium Bolivia Brazil Canada Chile China Colombia Czech Republic Denmark Estonia Finland France Germany Greece Hong Kong Hungary Iceland India Indonesia Ireland Italy Japan Latvia Lithuania Malaysia Mexico Netherlands New Zealand Norway Peru Portugal Russia Singapore South Africa South Korea Spain Sweden Switzerland Taiwan Thailand UK US Earliest date Latest date

Sorted by Relevance Author University Date

Only ETDs with Creative Commons licenses

Results per page: 30 60 100

October 3, 2022. OATD is dealing with a number of misbehaved crawlers and robots, and is currently taking some steps to minimize their impact on the system. This may require you to click through some security screen. Our apologies for any inconvenience.

Recent Additions

See all of this week’s new additions.

similarity search phd thesis

About OATD.org

OATD.org aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions . OATD currently indexes 7,042,247 theses and dissertations.

About OATD (our FAQ) .

Visual OATD.org

We’re happy to present several data visualizations to give an overall sense of the OATD.org collection by county of publication, language, and field of study.

You may also want to consult these sites to search for other theses:

  • Google Scholar
  • NDLTD , the Networked Digital Library of Theses and Dissertations. NDLTD provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not.
  • Proquest Theses and Dissertations (PQDT), a database of dissertations and theses, whether they were published electronically or in print, and mostly available for purchase. Access to PQDT may be limited; consult your local library for access information.
  • Switch language
  • Português do Brasil

similarity search phd thesis

Research Repository

Uk doctoral thesis metadata from ethos.

The datasets in this collection comprise snapshots in time of metadata descriptions of hundreds of thousands of PhD theses awarded by UK Higher Education institutions aggregated by the British Library's EThOS service. The data is estimated to cover around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787.

Previous versions of the datasets are restricted to ensure the most accurate version of metadata is available for download. Please contact [email protected] if you require access to an older version.

Collection Details

ISNI

List of items in this collection
    Title Creator Year Published Date Added Visibility
  2023 2023-11-27 Public
  2023 2023-05-12 Public
  2022 2022-10-14 Public
  2022 2022-04-12 Public
  2021 2021-09-03 Public
  2015 2021-03-08 Public
  2021 2021-02-09 Public
  2020 2020-07-24 Public
  2020 2020-02-11 Public
  2019 2019-12-12 Public
  • « Previous
  • Next »
  • Enroll & Pay

Open Access Theses and Dissertations (OATD)

OATD.org provides open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions. OATD currently indexes 6,654,285 theses and dissertations.

Turnitin Access To Plagiarism Check For GMS Students

Turnitin is an online plagiarism checking tool that compares your work with existing online publications.

To gain access:

  • Request  access  to the   Plagiarism-Check Blackboard site
  • Access to this site will be continuous throughout your time in GMS

How it works:

  • Upload your papers to Blackboard Learn to check for similarity index
  • Submit  multiple versions of papers, take home assignments, theses or dissertations and rewrite text as needed

Extended Directions:

  • You will be asked to give your name, email BUID and the submission type (i.e. Dissertation, Thesis or Paper) and your GMS program affiliation.
  • After verifying you are not a robot, you be should sent to a new page and get a notification saying “We have received your request to be added to the plagiarism check Blackboard Learn site.” It will take some time for your request to be approved (at least 10 min, possibly 24 hr). The next time you log in to Blackboard, under ‘My Courses’ in the right-hand column you should see the entry:  “GMS Plagiarism Check”.
  • Then click on “>> View/Complete”  (The first time you use Turnitin it will ask you to agree the user agreement). If you have never submitted a document before there should be a submit button.  If you have submitted before you will need to click the resubmit button to submit your new or revised document.  You may get a warning that resubmitting will replace your earlier submission.  It also reminds you that you can only upload 3 documents in 24 hours.
  • You can browse to find the document on your computer then click ‘upload’
  • You will get a message saying it may take up to 2-min to load
  • Once the document is loaded be sure to click on ‘confirm’
  • You should get a congratulatory message.  (You will also get an email confirming the successful submission.)
  • Click on the link to return to submission list
  • You will likely see that your document ‘similarity’ index is ‘processing.’  The algorithm may take a few hours to run.  You will need check back and see when it has finished.
  • Press “view” on the far right to see the submitted manuscript with Turnitin’s Feedback Studio which has some interesting automation features.
  • Alternatively, you can download the results with the arrow icon on the far right of the display
  • NOTE:  Students cannot submit more than 3 documents in a 24hour period.  Please plan accordingly or use an alternatively resource like Turnitin Draft Coach alternative via Google Doc
  • It is recommended that you remove the bibliography/references prior to submission to Turnitin.
  • Because Turnitin cannot scan Images and Figures these can also be removed prior to the check. You must manually check images and figures for plagiarism and potential copyright violations.
  • Published manuscripts should be removed from your thesis or dissertation prior to submission to Turnitin, assuming that they have already been analyzed by Turnitin. If they have not been, they should be included. Unpublished manuscripts should be included in your submission to Turnitin. • How to interpret a “Turnitin Originality Report”

 How to Interpret Your Score Report

  •  Similarity index, Similarity by Source, Internet Sources, Publications, & Student Papers
  • Generally, the similarity index should be less than 20%.
  • A common definition may be acceptable.
  • Did it identify methods or protocols from your lab? These will need to be rewritten.
  • Did it identify matches to publications or text from the internet—again these sentences, paragraphs,  sections will need to be rewritten
  • If you have any questions interpreting the Turnitin report please reach out to your faculty mentors.
  • Once you have made edits it is important to resubmit the document for a final check.
  • The final Turnitin report should be submitted to your mentor and first reader for approval.
  • Having problems with Turnitin:  Reach out to Dr. Theresa Davies for assistance ( [email protected] )

Librarians/Admins

  • EBSCOhost Collection Manager
  • EBSCO Experience Manager
  • EBSCO Connect
  • Start your research
  • EBSCO Mobile App

Clinical Decisions Users

  • DynaMed Decisions
  • Dynamic Health
  • Waiting Rooms
  • NoveList Blog

EBSCO Open Dissertations

EBSCO Open Dissertations makes electronic theses and dissertations (ETDs) more accessible to researchers worldwide. The free portal is designed to benefit universities and their students and make ETDs more discoverable. 

Increasing Discovery & Usage of ETD Research

EBSCO Open Dissertations is a collaboration between EBSCO and BiblioLabs to increase traffic and discoverability of ETD research. You can join the movement and add your theses and dissertations to the database, making them freely available to researchers everywhere while increasing traffic to your institutional repository. 

EBSCO Open Dissertations extends the work started in 2014, when EBSCO and the H.W. Wilson Foundation created American Doctoral Dissertations which contained indexing from the H.W. Wilson print publication, Doctoral Dissertations Accepted by American Universities, 1933-1955. In 2015, the H.W. Wilson Foundation agreed to support the expansion of the scope of the American Doctoral Dissertations database to include records for dissertations and theses from 1955 to the present.

How Does EBSCO Open Dissertations Work?

Your ETD metadata is harvested via OAI and integrated into EBSCO’s platform, where pointers send traffic to your IR.

EBSCO integrates this data into their current subscriber environments and makes the data available on the open web via opendissertations.org .

You might also be interested in:

academic search ultimate web thumbnail

Citation analysis of Ph.D. theses with data from Scopus and Google Books

  • Open access
  • Published: 24 October 2021
  • Volume 126 , pages 9431–9456, ( 2021 )

Cite this article

You have full access to this open access article

similarity search phd thesis

  • Paul Donner   ORCID: orcid.org/0000-0001-5737-8483 1  

5492 Accesses

5 Citations

9 Altmetric

Explore all metrics

This study investigates the potential of citation analysis of Ph.D. theses to obtain valid and useful early career performance indicators at the level of university departments. For German theses from 1996 to 2018 the suitability of citation data from Scopus and Google Books is studied and found to be sufficient to obtain quantitative estimates of early career researchers’ performance at departmental level in terms of scientific recognition and use of their dissertations as reflected in citations. Scopus and Google Books citations complement each other and have little overlap. Individual theses’ citation counts are much higher for those awarded a dissertation award than others. Departmental level estimates of citation impact agree reasonably well with panel committee peer review ratings of early career researcher support.

Similar content being viewed by others

Researchgate versus google scholar: which finds more early citations.

similarity search phd thesis

The influence of discipline consistency between papers and published journals on citations: an analysis of Chinese papers in three social science disciplines

similarity search phd thesis

The Use of Google Scholar for Tenure and Promotion Decisions

Avoid common mistakes on your manuscript.

Introduction

In this article we present a study on the feasibility of Ph.D. thesis citation analysis and its potential for studies of early career researchers (ECR) and for the rigorous evaluation of university departments. The context is the German national research system with its characteristics of a very high ratio of graduating Ph.D.’s to available open job positions in academia, a distinct national language publication tradition in the social sciences and humanities and slowly unfolding change from a traditional apprenticeship-type Ph.D. system to a grad school type system. The first nationwide census in Germany reported 152,300 registered active doctoral students in Germany (Vollmar 2019 ). In the same year, 28,404 doctoral students passed their exams in Germany (Statitisches Bundesamt 2018 ). Both universities and science and higher education policy attach high value to doctoral training and consider it a core task of the university system. For this reason, doctoral student performance also plays an important role in institutional assessment systems.

While there is currently no national scale research assessment implemented in Germany, all German federal states have introduced formula-based partial funding allocation systems for universities. In most of these, the number of Ph.D. candidates is a well-established indicator. Most universities also partially distribute funds internally by similar systems. Such implementations can be seen as incomplete as they do not take into account the actual research output of Ph.D. candidates. In this contribution we investigate if citation analysis of doctoral theses is feasible on a large scale and can conceptually and practically serve as a complement to current operationalizations of ECR performance. For this purpose we study the utility of two citation data sources, Scopus and Google Books. We analyze the obtained citation data at the level of university departments within disciplines.

Doctoral studies

The doctoral studies phase can theoretically be conceived as a status transition period. It comprises a status passage process from apprentice to formally acknowledged researcher and colleague in the social context of a scientific community (Laudel and Gläser 2008 ). Footnote 1 The published doctoral thesis and its public defense are manifest proof of the fulfilment of the degree criterion of independent scientific contribution, marking said transition. The scientific community, rather than the specific organization, collectively sets the goals and standards of work in the profession, and experienced members of a community of peers judge and grade the doctoral work upon completion. Footnote 2 Yet the specific organization also plays a very important role. The Ph.D. project and dissertation are closely associated with the hosting university as it is this organization that provides the environmental means to conduct the Ph.D. research, as a bare minimum the supervision by professors and experienced researchers, but often also formal employment with salary, workspace and facilities. And it is also the department ( Fakultät ) which formally confers the degree after passing the thesis review and defense.

As a rule, it is a formal requirement of doctoral studies that the Ph.D. candidates make substantial independent scientific contributions and publish the results. The Ph.D. thesis is a published scientific work and can be read and cited by other researchers. The extent to which other researchers make use of these results is reflected in citations to the work and is in principle amenable to bibliometric citation analysis (Kousha and Thelwall 2019 ). Citation impact of theses can be seen as a proxy of the recognition of the utility and relevance of the doctoral research results by other researchers. Theses are often not published in an established venue and are hence absent from the usual channels of communication of the research front, more so in journal-oriented fields, whereas in book-oriented fields, publication of theses through scholarly publishers is common. We address this challenge by investigating the presence of dissertation citations in data sources hitherto not sufficiently considered for this purpose in what follows.

Research contribution of early career researchers and performance evaluation in Germany

Almost all universities in Germany are predominantly tax-funded and the consumption of these public resources necessitates a certain degree of transparency to establish and maintain the perceived legitimacy of the higher education and research system. Consequently, universities and their subdivisions are increasingly subjected to evaluations. The pressure to participate in evaluation exercises, or in some cases the bureaucratic directive to do so by the responsible political actors, in turn, derives from demands of the public, which holds political actors accountable for the responsible spending of resources appropriated from net tax payers. Because the training of Ph.D. candidates is undisputedly a core task of universities, it is commonly implemented as an important component or dimension in university research evaluation.

While there is no official established national-scale research evaluation exercise in Germany (Hinze et al. 2019 ), the assessment of ECR performance plays a very importent role in evaluation and funding of universities and in the systems of performance-based funding within universities. In the following paragraphs we will shows this with several examples while critically discussing some inadequacies of the extant operationalizations of the ECR performance dimensions, thereby substantiating the case for more research into the affordance of Ph.D. thesis citation analysis.

The Council of Science and Humanities ( Wissenschaftsrat ) has conducted four pilot studies for national-scale evaluations of disciplines in universities and research institutes ( Forschungsrating ). While the exercises were utilized to test different modalities Footnote 3 , they all followed a basic template of informed peer review by appointed expert committees along a number of prescribed performance dimensions. The evaluation results did not have any serious funding allocation or restructuring consequences for the units. In all exercises, the dimension of support for early career researchers played a prominent role next to such dimensions as research quality, impact/effectivity, efficiency, and transfer of knowledge into society. Footnote 4 In all four exercises, the dimension was operationalized with a combination of quantitative and qualitative criteria.

As the designation ‘support for early career researchers’ suggests, the focus was primarily on the support structures and provisions that the assessed units offered, but the outcomes or successes of these support environments also played a role. Yet, some of the applied indicators are more in line with a construct such as the performance, or success, of the ECRs themselves, namely, first appointments of graduates to professorships, scholarships or fellowship of ECRs (if granted externally of the assessed unit), and awards. Footnote 5 As for the difference between the concept of the efforts expended for ECRs and the concept of the performance of ECRs, it appears to be implied that the efforts cause the performance, but this is far from self-evident. There may well be extensive support programs without realized benefits or ECRs achieving great success despite a lack of support structures. For this implied causal connection to be accepted, its mechanism should first be worked out and articulated and then be empirically validated, which was not the case in the Forschungsrating evaluation exercises. Footnote 6

No bibliometric data on Ph.D. theses was employed in the Forschungsrating exercises (Wissenschaftsrat 2007 , 2008 , 2011 , 2012 ). However, it stands to reason that citation analysis of theses might provide a valuable complementary tool if a more sound operationalization of the dimension of the performance of ECRs is to be established in similar future assessments. As for the publications of ECRs besides doctoral theses, these have been included in the other dimensions in which publications were used as criteria without special consideration. Footnote 7

There is a further area of university evaluation in which a performance indicator of ECRs, namely the absolute number of Ph.D. graduates over a specific time period, is an important component. At the time of writing, systems of partial funding allocation from ministries to states’ universities across all German federal states are well established. In these systems, universities within a state compete with one another for a modest part of the total budget based on fixed formulas relating performance to money. The performance based funding systems, different for each state, all include ‘research’ among their dimensions, and within it, the number of graduated Ph.D.’s is the second most important indicator after the acquired third party funding of universities (Wespel and Jaeger 2015 ). In direct consequence, similar systems have also found widespread application to distribute funds across departments within universities (Jaeger 2006 ; Niggemann 2020 ). These systems differ across universities. If only the number of completed Ph.D.’s is used as an indicator, then the quality of the research of the graduates does not matter in such systems. It is conceivable that graduating as many Ph.D.’s as possible becomes prioritized at the expense of the quality of ECR research and training.

A working group tasked by the Federal Ministry of Education and Research to work out an indicator model for monitoring the situation of early career researchers in Germany proposed to consider the citation impact of publications as an indicator of outcomes (Projektgruppe Indikatorenmodell 2014 ). Under the heading of “quality of Ph.D.—disciplinary acceptance and possibility of transfer” the authors acknowledge that, in principle, citation analysis of Ph.D. theses is possible, but citation counts do not directly measure scientific quality, but rather the level of response to, and reuse of, publications (impact). Moreover, it is stated that the literature of the social sciences and humanities are not covered well in citation indexes and theses are generally not indexed as primary documents (p. 136). Nevertheless, this approach is not to be rejected out of hand. Rather, it is recommended that the prospects of thesis citation analysis be empirically studied to judge its suitability (p. 137).

Another motivation for the present study was the finding of the National Report on Junior Scholars that even though “[j]unior scholars make a telling contribution to developing scientific and social insights and to innovation” (p. 3) the “contribution made by junior scholars to research and knowledge sharing is difficult to quantify in view of the available data” (Consortium for the National Report on Junior Scholars 2017 , p. 19).

To sum up, the foregoing discussion establishes (1) that there is a theoretically underdeveloped evaluation practice in the area of ECR support and performance, and (2) that a need for better early career researcher performance indicators on the institutional level has been suggested to science policy actors. This gives occasion to explore which, if any, contribution bibliometrics can make to a valid and practically useful assessment.

Prior research

Citation analysis of dissertation theses.

There are few publications on citation analysis of Ph.D. theses as the cited documents, as opposed to studies of the documents cited in theses, of which there are plenty. Yoels ( 1974 ) studied citations to dissertations in American journals in optics, political science (one journal each), and sociology (two journals) from the 1955 to 1969 volumes. In each case, several hundred citations in total to all Ph.D. theses combined were found, with a notable concentration on origins of Ph.D.’s in departments of high prestige – a possible first hint of differential research performance reflected in Ph.D. thesis citations. Non-US dissertations were cited only in optics. Author self-citations were very common, especially in optics and political science. While citations peaked in the periods of 1–2 or 3–5 years after the Ph.D. was awarded, they continued to be cited to some degree as much as 10 years later. According to Larivière et al. ( 2008 ), dissertations only account for a very small fraction of cited references in the Web of Science database. The impact of individual theses was not investigated. This study used a search approach in the cited references, based on keywords for theses and filtering, which may not be able to discover all dissertation citations. Kousha and Thelwall ( 2019 ) investigated Google Scholar citation counts and Mendeley reader counts for a large set of American dissertations from 2013 to 2017 sourced from ProQuest. This study did not take into account Google Books. Of these dissertations, 20% had one or more citations (2013: over 30%, 2017: over 5%) while 16% had at least one Mendeley reader. Average citation counts were comparatively high in the arts, social sciences, and humanities, and low in science, technology, and biomedical subjects. The authors evaluated the citation data quality and found that 97% of the citations of a sample of 646 were correct. As for the publication type of the citing documents, the majority were journal articles (56%), remarkably many were other dissertations (29%), and only 6% of citations originated from books. This suggests that Google Books might be a relevant citation data source instead of, or in addition to, Google Scholar.

More research has been conducted into the citation impact of thesis-related journal publications. Hay ( 1985 ) found that for the special case of a small sample from UK human geography research, papers based on Ph.D. thesis work accrued more citations than papers by established researchers. In a recent study of refereed journal publications based on US psychology Ph.D. theses, Evans et al. ( 2018 ) found that they were cited on average 16 times after 10 years. The citation impact of journal articles to which Ph.D. candidates contributed (but not of dissertations) has only been studied on a large scale for the Canadian province of Quèbec (Larivière 2012 ). The impact of journal papers with Ph.D. candidates’ contribution was contrasted to all other papers with Quèbec authors in the Web of Science database. As the impact of these papers, quantified as average of relative citations, was close to that of the comparison groups in three of four broad subject areas, it can be tentatively assumed that the impact of doctoral candidates’ papers was on par with that of their more experienced colleagues. The area with a notable difference between groups was arts and humanities, in which the coverage of publication output in the database was less comprehensive because a lot of research is published in monographs, and in which presumably many papers were written in French, another reason for lower coverage.

While these papers are not concerned with citations to dissertations, they do suggest that the research of Ph.D.’s is as impactful as that of other colleagues. To the best of our knowledge, no large scale study has been conducted on the citation impact of German theses on the level of individual works or on the level of university departments. We so far have scant information on the citation impact of dissertation theses, therefore the current study aims to fill this gap by a large scale investigation of citations received by German Ph.D. theses in Scopus and Google Books.

Causes for department-level performance differences

As we wish to investigate performance differences between departments of universities by discipline as reflected by thesis citations, we next consider the literature on plausible reasons for such performance differences which can result in differences in thesis citation impact. We do not consider individual level reasons for performance differences such as ability, intrinsic motivation, perseverance, and commitment.

One possible reason for cross-department performance differences is mutual selectivity of Ph.D. project applicants and Ph.D. project supervisors. In a situation in which there is some choice between the departments at which prospective Ph.D. candidates might register and some choice between the applicants a prospective supervisor might accept, out of self-interest both sides will seek to optimize their outcomes given their particular constraints. That is, applicants will opt for the most promising department for their future career while supervisors or selection committees, and thus departments, will attempt to select the most promising candidates, perhaps those who they judge most likely to contribute positively to their research agenda. Both sides can take into account a variety of criteria, such as departmental reputation or candidates’ prior performance. This is part of the normal, constant social process of mutual evaluation in science. However, in this case, the mutual evaluation does not take place between peers, that is, individuals of equal scientific social status. Rather, the situation is characterized by status inequality (superior-inferior, i.e. professor-applicant). Consequently, an applicant may well apply to her or his preferred department and supervisor, but the supervisor or the selection committee makes the acceptance decision. In practice however, there are many constraints on such situations. For example, as described above, the current evaluation regime rewards the sheer quantity of completed Ph.D.’s.

Once the choices are made, Ph.D. candidates at different departments can face quite different environments, more or less conducive to research performance (which, as far as they were aware of them and were able to judge them, they would have taken into consideration, as mentioned). For instance, some departments might have access to important equipment and resources, others not. There may prevail different local practices in time available for the Ph.D. project for employed candidates as opposed to expected participation in groups’ research, teaching, and other duties (Hesli and Lee 2011 ).

Ph.D. candidates may benefit from the support, experience and stimulation of the presence of highly accomplished supervisors. Experienced and engaged supervisors teach explicit and tacit knowledge and can serve as role models. Long and McGinnis ( 1985 ) found that the performance of mentors was associated with Ph.D.’s publication and citation counts. In particular, citations were predicted by collaborating with the mentor and the mentor’s own prior citation counts. Mentors’ eminence only had a weak positive effect on the publication output of Ph.D.’s who actively collaborated with them. Similarly, Hilmer and Hilmer ( 2007 ) report that advisors’ publication productivity is associated with candidate’s publication count. However, there are multiple professors or other supervisors at any department, which causes variation within departments if the department and not the supervisor is used as a predictive variable. Between departments it is then the concentration of highly accomplished supervisors that may cause differences. Beyond immediate supervisors, a more or less supportive research environment can offer opportunities for learning, cooperation or access to personal networks. For example, Kim and Karau ( 2009 ) found that support from faculty, through the development of research skills, lead to higher publication productivity of management Ph.D. candidates. Local work culture and local expectations of performance may elicit behavioral adjustment (Allison and Long 1990 ).

In summary, prior research shows that there are several reasons to expect department-level differences of Ph.D. research quality (and its reproduction and reinforcement) which might be reflected in thesis citation impact. But it needs to be noted that the present study cannot serve towards shedding light on which particular factors are associated with Ph.D. performance in terms of citation impact. It is limited to testing if there are any department-level differences on this measure.

Citation counts and scientific impact of dissertation theses

We have argued above that citation analysis of theses could be a complementary tool for quantitative assessment of university departments in terms of the research performance of early career researchers. Hence it needs to be established that citation counts of dissertations are in fact associated with a conception of the impact of research.

As outlined by Hemlin ( 1996 ), “[t]he idea [of citation analysis] is that the more cited an author or a paper is by others, the more attention it has received. This attention is interpreted as an indicator of the importance, the visibility, or the impact of the researcher or the paper in the scientific community. Whether citation measures also express research quality is a highly debated issue.” Hemlin reviewed a number of studies of the relationship between citations and research quality but was not able to make a definite conclusion: “it is possible that citation analysis is an indicator of scientific recognition, usefulness and, to some unknown extent, quality.” Researchers cite for a variety of reasons, not only or primarily to indicate the quality of the cited work (Aksnes et al. 2019 ; Bornmann and Daniel 2008 ). Nevertheless, work that is cited usually has some importance for the citing work. Even citations classified in citation behavior studies as ‘perfunctory’ or ‘persuasive’ are not made randomly. On the contrary, for a citation to persuade anyone, the content of the cited work needs to be convincing rather than ephemeral, irrelevant, or immaterial. Citation counts are thus a direct measure of the utility, influence, and importance of publications for further research (Martin and Irvine 1983 , sec. 6). Therefore, as a measure of scientific impact, citation counts have face validity. They are a measure of the concept itself, though a noisy one. Not so for research quality.

Highly relevant for the topic of the present study are the early citation impact validation studies by Nederhof and van Raan ( 1987 ), Nederhof and van Raan ( 1989 ). These studied the differences in citation impact of publications produced during doctoral studies of physics and chemistry Ph.D. holders, comparing those awarded the distinction ‘cum laude’ for their dissertation based on the quality of the research with other graduates without this distinction (cum laude: 12% of n = 237 in chemistry, 13% of n = 138 in physics). In physics, “[c]ompared to non-cumlaudes, cumlaudes received more than twice as many citations overall for their publications, which were all given by scientists outside their alma mater” (Nederhof and van Raan 1987 , p. 346). In fact, differences in citation impact of papers between the groups are already apparent before graduation, that is, before the conferral of the cum laude distinction on the basis of the dissertation. And interestingly, “[a]fter graduation, citation rates of cumlaudes even decline to the level of non-cumlaudes” (p. 347) leading the authors to suggest that “the quality of the research project, and not the quality of the particular graduate is the most important determinant of both productivity and impact figures. A possible scenario would be that some PhD graduates are choosen carefully by their mentors to do research in one of the usually rare very promising, interesting and hot research topics currently available. Most others are engaged in relatively less interesting and promising graduate research projects” (p. 348). The results in chemistry are very similar: “Large difference in impact and productivity favor cumlaudes three to 2 years before graduation, differences which decrease in the following years, although remaining significant. [...] Various sceptics have claimed that bibliometric measures based on citations are generally invalid. The present data do not offer any support for this stance. Highly significant differences in impact and productivity were obtained between two groups distinguished on a measure of scientific quality based on peer review (the cum laude award)” (Nederhof and van Raan 1989 , p. 434).

In Germany, a system of four passing marks and one failing mark is commonly used. The better the referees judge the thesis, the higher the mark. Studies investigating the association of level of mark and citation impact of theses or thesis-associated publications are as of yet lacking. The closest are studies on medical doctoral theses from Charité. Oestmann et al. ( 2015 ) provide a correlational study of medical doctoral degree marks (averages of thesis and oral exams marks) and the publications associated with the theses from one institution, Charité University Medicine Berlin. Their data for 1992–2014 shows a longitudinal decrease of the incidence of the third best mark and an increase of the second best mark. For samples from 3 years (1998, 2004, 2008) for which publication data were collected, an association between the level of the mark and the publication productivity was detected. Both the chance to publish any peer-reviewed articles and the number of articles increase with the level of the mark. The study was extended in Chuadja ( 2021 ) with publication data for 2015 graduates. It was found that the time to graduation covaries with the level of the mark. For 2015 graduates, the average 5 year Journal Impact Factors for thesis-associated publication increase with the level of the graduation mark in the sense that theses awarded better marks produced publications in journals with higher Impact Factors. As little as these findings say about the real association of thesis research quality and citation impact, they suggest enough to motivate more research into this relationship.

Research questions

The following research questions will be addressed:

How often are individual Ph.D. theses cited in the journal and book literature?

Does Google Books contain sufficient additional citation data to warrant its inclusion as an additional data source alongside established data sources?

Can differences between universities within a discipline explain some of the variability in citation counts?

Are there noteworthy differences in Ph.D. thesis citation impact on the institutional level within disciplines?

Are the citation counts of Ph.D. theses associated with their scientific quality?

To test whether or not dissertation citation impact is a suitable indicator of departmental Ph.D. performance, citation data for theses needs to be collected, aggregated and studied for associations with other relevant indicators, such as doctorate conferrals, drop-out rates, graduate employability, thesis awards, or subjective program appraisals of graduates. As a first step towards a better understanding of Ph.D. performance, we conducted a study on citation sources for dissertations. The present study is restricted to monograph form dissertations. These also include monographs that are based on material published as articles. However, to be able to assess the complete scientific impact of a Ph.D. project it is necessary to also include the impact of papers which are produced in the context of the Ph.D. project, for both cumulative publication-based theses and for theses only published in monograph form. Because of this, the later results should be interpreted with due caution as we do not claim completeness of data.

Dissertations’ bibliographical data

There is presently no central integrated source for data on dissertations from Germany. The best available source is the catalog of the German National Library (Deutsche Nationalbibliothek, DNB). The DNB has a mandate to collect all publications originating from Germany, including Ph.D. theses. This source of dissertation data has been found useful for science studies research previously (Heinisch and Buenstorf 2018 ; Heinisch et al. 2020 ). We downloaded records for all Ph.D. dissertations from the German National Library online catalog in April 2019 using a search restriction in the university publications field of “diss*”, as recommended by the catalog usage instructions, and publication year range 1996–2018. Records were downloaded by subject fields in the CSV format option. Footnote 8 In this first step 534,925 records were obtained. In a second step, the author name and work title field were cleaned and the university information extracted and normalized and non-German university records excluded. We also excluded records assigned to medicine as a first subject class which were downloaded because they were assigned to other classes as well. As the dataset often contained more than one version of a particular thesis because different formats and editions were cataloged, these were carefully de-duplicated. In this process, as far as possible the records containing the most complete data and describing the temporally earliest version were retained as the primary records. Variant records were also kept in order to later be able to collect citations to all variants. This reduced the dataset to a size of 361,971 records. Of these, about 16% did not contain information on the degree-granting university. As the National Library’s subject classification system was changed during the covered period (in 2004), the class designations were unified based on the Library’s mapping and aggregated into a simplified 40-class system. Footnote 9 If more than one subject class was assigned, only the first was retained.

Citation data

Citation data from periodicals was obtained from a snapshot of Scopus data from May 2018. Scopus was chosen over Web of Science as a source of citation data because full cited titles for items not contained as primary documents in Web of Science have only recently been indexed. Before this change, titles were abbreviated so inconsistently and to such short strings as to be unusable, while this data is always available in unchanged form in Scopus if it is available in the original reference. Cited references data was restricted to non-source citations, that is, references not matched with Scopus-indexed records. Dissertation bibliographical information (author name, publication year and title) for primary and secondary records was compared to reference data. If the author family name and given name first initial matched exactly and the cited publication year was within ± 1 year of the dissertation publication year, then the title information was further compared as follows. The dissertation’s title was compared to both Scopus cited reference title and cited source title, as we found these two data fields were both used for the cited thesis title. Before comparison, both titles were truncated to the character length of the shorter title. If the edit distance similarity between titles was greater than 75 out of 100, the citation was accepted as valid and stored. We furthermore considered the case that theses might occasionally be indexed as Scopus source publications. We used the same matching approach as outlined above to obtain matching Scopus records restricted to book publication types. This resulted in 659 matched theses. In addition, matching by ISBN to Scopus source publications resulted in 229 matched theses of which 50 were not matched in the preceding step. The citations to these matched source publications were added to the reference matched citations after removing duplicates. Citations were summed across all variant records while filtering out duplicate citations.

In addition, we investigated the utility of Google Books as a citation data source. This is motivated by the fact that many Ph.D. theses are published in the German language and in disciplines favoring publication in books over journal articles. Citation search in Google Books has been made accessible to researchers by the Webometric Analyst tool, which allows for largely automated querying with given input data (Kousha and Thelwall 2015 ). We used Webometric Analyst version 2.0 in April and May 2019 to obtain citation data for all Ph.D. thesis records. We only used primary records, not variants, as the collection process takes quite a long time. Search was done with the software’s standard settings using author family name, publication year and six title words and subsequent result filtering was employed with matching individual title words rather than exact full match. We additionally removed citations from a number of national and disciplinary bibliographies and annual reports because these are not research publications but lists of all publications in a discipline or produced by a certain research unit. We also removed Google Books citations from sources that were indexed in Scopus (10,958 citations) as these would otherwise be duplicates.

Google Scholar was not used as a citation data source, because it includes a lot of grey literature and there is no possibility to restrict citation counts to citations from journal and book literature. It has alarming rates of incorrect citations (García-Pérez 2010 , p. 2075), however Kousha & Thelwall (2015, p. 479) found the citation error rate for Ph.D. theses in Google Scholar to be quite low.

Dissertation award data

For later validation purposes we collected data on German dissertation awards from the web. We considered awards for specific disciplines granted in a competitive manner based on research quality by scholarly societies, foundations and companies. A web search was conducted for awards, either specifically for dissertations or for contributions by early career researchers which mention dissertations besides other works. Only awards granted by committees of discipline experts and awarded by Germany-based societies etc. were considered, we did not include internationally oriented awards. In general, these awards are given on the basis of scientific quality and only works published in the preceding one or 2 years are accepted for submission. We were able to identify 946 Ph.D. theses that received one or more dissertation awards from a total of 122 different awards. More details can be found in the “Appendix”, Table 5 .

A typical example is the Wilhelm Pfeffer Prize of the German Botanical Society, which is described as follows: “The Wilhelm Pfeffer Prize is awarded by the DBG’s Wilhelm Pfeffer Foundation for an outstanding PhD thesis (dissertation) in the field of plant sciences and scientific botany.” Footnote 10 The winning thesis is selected by the board of the Foundation. If no work achieves the scientific excellence expected by the board, no prize is awarded. Footnote 11

Citation data for Ph.D. theses

In total we were able to obtain 329,236 Scopus citations and 476,495 Google Books citations for the 361,971 Ph.D. thesis records. There was an overlap of about 11,000 citations from journals indexed in both sources, which was removed from the Google Books data. The large majority of Scopus citations was found with the primary thesis records only (95%). Secondary (variant) thesis records and thesis records matched as Scopus source document records contributed 5% of the Scopus citations. Scopus and Google Books citation counts are modestly correlated with Pearson’s r = 0.20 ( \(p < 0.01\) , 95% CI 0.197–0.203). Table 1 gives an overview of the numbers of citations for theses published in different years. Footnote 12 One can observe a minor overall growth in annual dissertations and modest peak in citations in the early years of the observation period. Overall, and consistently for all thesis publication years, theses are cited more often in Google Books than in Scopus by a ratio of about 3 to 2. Hence, in general, German Ph.D. theses were cited more often in the book literature covered by Google Books than in the periodical literature covered by Scopus. The average number of citations per thesis seems to stabilize at around 3.5, as the values for 1996–2003 are all of that magnitude. As citations were collected in 2019, these works needed about 15 years to reach a level at which they did not increase further. Whereas Kousha and Thelwall ( 2019 ) found that 20% of the American dissertations from the period 2013–2017 were cited at least once in Google Scholar, the corresponding figure from our data set is 30% for the combined figure of both citation data sources.

We studied author self-citations of dissertations, both by comparing the thesis author to all publication authors of a citing paper (all-authors self-citations) and only to the first author. We used exact match or Jaro-Winkler similarity greater than 90 out of 100 for author name comparison (Donner 2016 ). Considering only the first authors of publications is more lenient in that it does not punish an author for self-citation that could possibly be suggested by co-authors and is at least endorsed by them (Glänzel and Thijs 2004 ). For the Google Books citation corpus we find only 8366 all-authors self-citations (1.7%) and 5711 first author self-citations (1.2%). Footnote 13 In the Scopus citations there are 52,032 all-author self-citations (15.6%) and 31,260 first-author self-citations (9.4%). Overall this amounts to an all-author self-citation rate of 7.5% and a first-author self-citation rate of 4.6%, quite lower rates than Yoels ( 1974 ). We do not exclude any self-citations in the following analyses.

Figure  1 displays the citation count distribution for theses published before 2004, in which citation counts are likely to not increase any further. In this subset, 58% of dissertations have one or more citations. Ph.D. theses exhibit the typical highly skewed and long-tailed citation count distribution.

figure 1

Citation count distribution for German Ph.D. theses from 1997–2003 (values greater than 10 citations not shown), n = 118,447

The distributions of theses and citations by data source over subject classes are displayed in Table 2 . There are striking differences in the origin of the major part of citations across disciplines. Social sciences and humanities such as education, economics, German language and literature, history, and especially law are cited much more often in Google Books. The opposite holds in natural science subjects like biology or physics, and in computer science and mathematics, where most citations come from Scopus. This table also indicates that the overall most highly cited dissertations can be found in the humanities (history, archeology and prehistory, religion) and that natural science dissertations are poorly cited (biology, physics and astronomy, chemistry). Much of this difference is probably because in the latter subjects, Ph.D. project results are almost always communicated in journal articles first and citing a dissertation is rarely necessary.

Validation of citation count data with award data

In order to judge whether thesis citation counts can be considered to include a valid signal of scientific quality (research question 5) we studied the citation impact of theses that received dissertation awards compared to those which did not. High citation counts, however, can not simply be equated with high scientific quality. As a rule, the awards for which we collected data are conferred by committees of subject experts explicitly on the basis of scientific quality and importance. But if the content of theses has been published in journal articles before it was published in a thesis it is possible that awards juries might have been influenced by citation counts of these articles.

Comparing the 946 dissertations that were identified as having received a scientific award directly with the non-awarded dissertations we find that the former received on average 3.9 citations while the latter received on average 2.2 citations. To factor out possible differences across subjects and time we match each award-winning thesis with all non-awarded theses of the same subject and year, and calculate the average citation count of this comparable publication set. Award theses obtain 3.9 citations and matched non-award theses 1.7 citations. This shows that Ph.D. theses that receive awards on average are cited more often than comparable theses and indicates that citation counts partially indicate scientific quality as sought out by award committees. Nevertheless, not every award-winning thesis need be highly cited nor every highly cited thesis be of high research quality and awarded a prize. The differences reported here hold on average for large numbers of observations, but do not imply a strong association between scientific quality and citation count on the level of individual dissertations. This is important to note lest the false conclusion be drawn that merely because a thesis is not cited or rarely cited, it is of low quality. Such a view must be emphatically rejected as it is not supported by the data.

A possible objection to the use of award data for validating the relationship of citation counts and the importance and quality of research is that it might be the signal of the award itself which makes the publications more visible to potential citers. In other words, a thesis is highlighted and brought to the attention of researchers by getting an award and high citation counts are a result not of the intrinsic merit of the reported research but merely of the raised visibility. The study by Diekmann et al. ( 2012 ) has scrutinized this hypothesis. They studied citation counts (Social Sciences Citation Index) of 102 papers awarded the Thyssen award for German social science journal papers and a random control sample of other publications from the same journals. The award winners are determined by a jury of experts and there are first, second, and third rank awards each year. It was found that awarded papers were cited on average six times after 10 years, control articles two times. Moreover, first rank award articles were cited 9 times, second rank articles 6 times, and third rank articles 4 times on average. The jury decides in the year after the publication of the articles. The authors argue that publications citing awarded articles in the first year after its publication can not possibly have been influenced by the award. For citation counts of a 1-year citation window, awarded articles are cited 0.8 times, control group articles 0.2 times on average. And again, the ranks of the awards correspond to different citation levels. Thus it is evident that the citation reception of the articles are different even before the awards have been decided. Citing researchers and expert committee independently agree on the importance of these articles.

We can replicate this test with our data by restricting the citations to those received in the year of publication of the thesis and the next year. This results in average citation counts of 0.040 for theses which received awards and 0.012 for theses which did not receive any award. Award-winning Ph.D. theses are cited more than three times as often as other theses even before the awards have been granted and before any possible publicity had enough time to manifest itself in increased citation counts.

Application: a preliminary study of citation impact differences between departments

In this section we consider research questions 3 and 4, which are concerned with the existence of differences in Ph.D. thesis citation impact between universities in the same discipline and their magnitude. This application is preliminary because only the citation impact of the thesis but not of the thesis-related publications is considered here and because we use a subject classification based on that of the National Library. Footnote 14 In order to mitigate against these limitations as much as possible, we study here only two subjects from the humanities (English and American language and literature, henceforth EALL) and the social sciences (sociology) which are characterized by Ph.D. theses mainly published as monographs rather than as articles and thus show relatively high dissertation citation counts. These are also disciplines specifically identifiable in the classification system and as distinct university departments. We furthermore restrict the thesis publication years to the years covered by the national scale pilot research assessment exercises discussed in the introduction section which were carried out by the German Council of Science and Humanities in these two disciplines (Wissenschaftsrat 2008 , 2012 ) in order to be able to compare the results and to test if the number of observations in typical evaluation periods yield sufficient data for useful results.

We use multilevel Bayesian negative binomial regressions in which the observations (theses) are nested in the grouping factor universities, estimated with the R package brms , version 2.8.0 (Bürkner 2017 ). By using a multilevel model we can explicitly model the correlation of observations within a group, that is to say, take into account the characteristics of the departments which affect all Ph.D. candidates of a department and their research quality. The negative binomial response distribution is appropriate for the characteristic highly skewed, non-negative citation count distribution. The default prior distributions of brms were used. Model estimations are run with two MCMC chains with 2500 iterations each, of which the first 500 are for sampler warmup. As population level variables we include a thesis language dummy (German [reference category], English, Unknown/Other), the publication year, and dummies for whether the dissertation received an award (reference category: no award received). There were no awards identified for EALL in the observation period, so this variable only applies to the Sociology models. For the language variable we used the data from the National Library where available and supplemented it with automatically identified language of the dissertation title using R package cld3 , version 1.1. Languages other than English and German and theses with unidentifiable titles were coded as Unknown/Other.

Models are run for the two disciplines separately, once without including the university variable (null models) and once including it as a random intercept, making these multilevel models. If the multilevel model shows a better fit, i.e. can explain the data better, this would indicate significant variation between the different university departments in a discipline and higher similarity of citation impact of theses within a university than expected at random. In other words, the null model assumes independence of observations while the multilevel model controls for correlation of units within a group, here the citation counts of theses from a university. The results are presented in Table 3 . The coefficient of determination (row ‘R 2 ’) is calculated according to Nakagawa et al. ( 2017 ).

Regarding the population level variables, it can be seen that the publication year has a negative effect in EALL (younger theses received less citations, as expected) but no significant effect in Sociology. As the Sociology models data is from an earlier time period, this supports the notion that the citation counts used in the Sociology models have stabilized but those from EALL have not. According to the credible intervals, while there is no significant language effect on citations in the Sociology null model, controlling for the grouping structure reveals the language to be significant predictor. In EALL, English language theses received substantially more citations than German language theses. There is a strong positive effect in Sociology from having received an award for the Ph.D. thesis. If we compare the null models with their respective multilevel models, that is A to B and C to D, we can see that introducing the grouping structure does not affect the population level variable effects other than language for Sociology. In both disciplines, the group effect (standard deviation of the random intercept) is significantly above zero and the model fit in terms of R 2 improved, indicating that the hierarchical model is more appropriate and that the university department is a significant predictor. However, the values of the coefficients of determination are small, which suggests that it is not so much the included population level predictors and the group membership, but additional unobserved thesis-level characteristics that affect citation count. In addition, this means it is not possible to estimate with any accuracy a particular thesis’ citation impact only from knowing the department at which it originated. The estimated group effects describe the citation impact of particular university departments. The distribution of the estimates from models B and D with associated 95% credible intervals are displayed in Fig.  2 . It is evident that while there are substantial differences in the estimates as a whole, there is also large uncertainty about the exact magnitude about all effects, indicated by the wide credible intervals. This a consequence of the facts that, first, most departments produce theses across the range of citation impact and small differences in the ratios of high, middle and low impact theses determine the estimated group effects and, second, given the high within-group variability, there are likely too few observations in the applied time period to arrive at more precise estimates.

figure 2

Ph.D. thesis citation impact regression group effect estimates for a 49 sociology departments (2001–2005) and b 52 English and American language and literature departments (2004–2010). Means and 95% posterior probability ranges

The results of the above mentioned research evaluation of sociology departments in the Forschungsrating series of national scale pilot assessments allow for a comparison between the university group effects of Ph.D. thesis citation impact obtained in the present study and qualitative ordinal ratings given by the expert committee in the category ‘support of early career researchers.’ Footnote 15 In the sociology assessment exercise, the ECR support dimension was explicitly intended to reflect both the supportive actions of the departments and their successes. The reviewer panel put special weight on the presence of structured degree programs and scholarships and obtained professorship positions of graduates. Further indicators that were taken into account were the number of conferred doctorates, the list of Ph.D. theses with publisher information, and a self-report of actions and successes. This dimension was rated by the committee on a five point scale ranging from 1, ‘not satisfactory’, to 5, ‘excellent’ (Wissenschaftsrat 2008 , p. 22).

For 47 universities both an estimated citation impact score (group effect coefficient from the above regression) and an ECR support rating were available for sociology in the same observation period. A tabulated comparison is presented in Table 4 . The Kendall \(\tau\) rank correlation between these two variables at the department level is 0.36 ( \(p < 0.01\) ), indicating a moderate association, but the mean citation scores in the table do not exhibit a clear pattern of increase with increasing expert-rated ECR support. The bulk of departments were rated in the lower middle and middle categories, that is to say, the ratings are highly concentrated in this range, making distinctions quite difficult.

figure 3

ECR support expert committee rating and mean estimates of Ph.D. thesis citation impact of 47 German sociology departments (2001–2005)

This relationship is displayed in Fig.  3 . It can be seen that while the five rating groups do have some association with the citation analysis estimates, there is large variability within the groups, especially for categories 3 and 4. In fact, there is much overlap across all the rating groups in terms of citation impact scores. The university department with the highest citation impact effect estimate was rated as belonging to the middle groups of ECR support. In summary, the association between rated ECR support of a department and the impact of the department’s citation is demonstrable but moderate in size.

In this study we have demonstrated the utility of the combination of citation data from two distinct sources, Scopus and Google Books, for citation analysis of Ph.D. theses in the context of research evaluation of university departments. Our study has a number of limitations hampering its immediate implementation into practice. We did not have verified data about the theses produced by departments and used publication sets approximated by using a subject classification. We did not take into account the publications of Ph.D.’s other than the theses, such as journal articles, proceedings papers, and book chapters, which resulted in low citation counts in the sciences. These limitations must be resolved before any application for research evaluation and the present study is to be understood as a feasibility study only.

We now turn to the discussion of the results as they relate to the guiding research questions. The first research question concerned the typical citation counts of Ph.D. theses. We found that German Ph.D. theses were cited more often in the book literature covered by Google Books than in the periodical literature covered by Scopus and that it takes 10–15 years for citation counts to reach a stable level where they do not increase any further. At this stage, about 40% of theses remained uncited. We further found large differences in typical citation rates across fields. Theses are cited more often in the social sciences and humanities, especially in history and in archeology and prehistory. But citation rates were very low in physics and astronomy, chemistry and veterinary medicine. Furthermore, there were distinctive patterns of the origin of the bulk of citations between the two data sources, in line with the typical publication conventions of the sciences and the social sciences and humanities. Science theses received more citations from Scopus, that is, primarily the journal literature, than from the book literature covered by Google Books. The social sciences and humanities, on the other hand, obtained far more citations from the book literature covered in Google Books. Nevertheless, these fields’ theses do also receive substantial numbers of citations from the journal literature which must not be neglected. Thus with regard to our second research question we can state that Google Books is clearly a very useful citation data source for social sciences and humanities dissertations and most likely also for other publications beyond dissertations from these areas of research. Citations from Google Books are complementary to those from Scopus as they clearly cover a different literature; the two data sources have little overlap.

Our results of multilevel regressions allow us to affirm that there are clearly observable thesis citation impact differences between departments in a given discipline (research question 3). However, they are of small to moderate magnitude and the major part of citation count variation is found at the individual level (research question 4). Our results do not allow any statements about what factors caused these differences. They could be the result of mutual selection processes or of different levels of support or research capacity across departments. Our results of citation impact of departments, interpretable as collective performance of early career researchers at the university department level, are roughly in line with qualitative ratings of an expert committee in our case study of sociology. This association also does not rule out or confirm any possible explanations, as both department supportive actions and individual ECR successes were compounded in the rated dimension which furthermore showed only limited variation.

Research question 5 concerned the association of citation counts of Ph.D. theses and their research quality. Using information about dissertation awards we showed that theses which received such awards also received higher citation rates vis-à-vis comparable theses which did not win such awards. This result strongly suggests that if researchers cite Ph.D. theses, they do tend to prefer the higher quality ones, as recognized by award granting bodies and committees.

As a case in point for the status change approach we note that, in Germany, holding a Ph.D. degree is a requirement for applying for funding at the national public science funding body Deutsche Forschungsgemeinschaft.

While in other countries it is the norm that doctoral theses are evaluated by experts external to the university, this has traditionally not been the case in Germany (Wissenschaftsrat 2002 ). In fact, grading of the thesis by the Ph.D. supervisor is not considered inappropriate by the German Rectors’ Conference (Hochschulrektorenkonferenz, 2012 , p. 7).

The assessed disciplines were chemistry, sociology, electrical and information engineering, and English and American language and literature. Possibly the most important experimentally varied factor was the criterion by which research outputs were accredited to units: either all outputs in the reference period by researchers employed at the unit at assessment time (“current potential” principle) or all outputs of the units' research staff in the period, regardless of employment at assessment time (“work-done-at” principle).

As an exception, the committee on the evaluation of English and American language and literature considered the support of ECRs as a sub-dimension of the ‘enabling of research’ dimension, alongside third-party funding activity and infrastructures, and networks. However, within this category, it was given special importance.

Other applied indicators are extra-scientific in that they are indicators of compliance to science-external political directions, such as the share of female doctoral candidates.

Another issue deserving more scrutiny are the incentive structures promoted by the indicators. Indicators such as the number of granted Ph.D.’s and number of current Ph.D. candidates, which were applied in all exercises, could further exacerbate the Ph.D. oversupply in academia (Rogge and Tesch 2016 ).

It would be justified to (also) include publications by ECR other than theses in an ECR performance dimension, ideally appropriately weighted for ECR’s co-authorship contribution (Donner 2020 ).

The field of medicine was not included, because medical theses (for a Dr. med. degree) typically have lower research requirements and are therefore generally not commensurable to theses in other subjects (Wissenschaftsrat 2004 , pp. 74–75). It was not possible to distinguish between Dr. med. theses and other degree theses within the medicine category, which means that regular natural science dissertations on medical subjects are not included in this dataset if medicine was the first assigned class.

Classification mapping according to https://wiki.dnb.de/download/attachments/141265749/ddcSachgruppenDNBKonkordanzNeuAlt.pdf accessed 07/18/2019. It should be noted that some classes were not used for some time, for example electrical engineering was not used between 2004 and 2010 but grouped under engineering/mechanical engineering alongside mining/construction technology/environmental technology.

https://www.deutsche-botanische-gesellschaft.de/en/about-us/promoting-young-scientists/wilhelm-pfeffer-prize accessed 26/04/2021.

https://www.deutsche-botanische-gesellschaft.de/ueber-die-dbg/nachwuchsfoerderung/wilhelm-pfeffer-preis/satzung-pfeffer-stiftung §4, accessed 26/04/2021.

These figures are lower than the official numbers for completed doctoral exams as published by the Federal Statistical Office in Series 11/4/2 because medicine is not included.

As this is a very low figure, we manually checked the results and did not notice any issues. We note that about 58,000 (about 17%) of the Google Books citation records did not have any names of authors.

This is not ideal because its classes are not aligned with the delimitations of university departments. The National Library data do not contain information about the university department which accepted a thesis. As this exercise is not intended to be understood as an evaluation of university departments but only as a feasibility study, we still use the classification in lack of a better alternative. It goes without saying that for any actual evaluation use this course of action is not appropriate. Instead, a verified list of the dissertations of each unit would be required and all publications related to the theses would have to be included.

This is not possible for EALL because early career researcher support was not assessed separately but compounded with acquisition of third party funding and ‘infrastructures and networks’ into a dimension called ‘enabling of research’ and the ratings in this category were further differentiated by four sub-fields (Wissenschaftsrat 2012 , p. 19).

Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. Sage Open, 9 (1). https://doi.org/10.1177/2158244019829575 .

Allison, P. D., & Long, J. S. (1990). Departmental effects on scientific productivity. American Sociological Review, 55(4), 469–478. https://doi.org/10.2307/2095801 .

Article   Google Scholar  

Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64 (1), 45–80. https://doi.org/10.1108/00220410810844150 .

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80 (1), 1–28.

Chuadja, M. (2021). Promotionen an der Charité Berlin von 1998 bis 2015. Qualität, Dauer, Promotionstyp (Ph.D. thesis). Charité - Universitätsmedizin Berlin.

Consortium for the National Report on Junior Scholars. (2017). 2017 National Report on Junior Scholars. Statistical Data and Research Findings on Doctoral Students and Doctorate Holders in Germany. Overview of Key Results . Retrieved from https://www.buwin.de/dateien/buwin-2017-keyresults.pdf .

Diekmann, A., Näf, M., & Schubiger, M. (2012). Die Rezeption (Thyssen-) preisgekrönter Artikel in der “Scientific Community.” Kölner Zeitschrift für Soziologie und Sozialpsychologie, 64 (3), 563–581. https://doi.org/10.1007/s11577-012-0175-4 .

Donner, P. (2016). Enhanced self-citation detection by fuzzy author name matching and complementary error estimates. Journal of the Association for Information Science and Technology, 67 (3), 662–670. https://doi.org/10.1002/asi.23399 .

Donner, P. (2020). A validation of coauthorship credit models with empirical data from the contributions of PhD candidates. Quantitative Science Studies, 1 (2), 551–564. https://doi.org/10.1162/qss_a_00048 .

Evans, S. C., Amaro, C. M., Herbert, R., Blossom, J. B., & Roberts, M. C. (2018). “Are you gonna publish that?” Peer-reviewed publication outcomes of doctoral dissertations in psychology. PloS ONE, 13 (2), e0192219. https://doi.org/10.1371/journal.pone.0192219

García-Pérez, M. A. (2010). Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology. Journal of the American Society for Information Science and Technology, 61 (10), 2070–2085. https://doi.org/10.1002/asi.21372 .

Glänzel, W., & Thijs, B. (2004). Does co-authorship inflate the share of self-citations? Scientometrics, 61 (3), 395–404. https://doi.org/10.1023/b:scie.0000045117.13348.b1 .

Hay, A. (1985). Some differences in citation between articles based on thesis work and those written by established researchers: Human geography in the UK 1974–1984. Social Science Information Studies, 5 (2), 81–85. https://doi.org/10.1016/0143-6236(85)90017-1 .

Heinisch, D. P., & Buenstorf, G. (2018). The next generation (plus one): An analysis of doctoral student’s academic fecundity based on a novel approach to advisor identification. Scientometrics, 117 (1), 351–380. https://doi.org/10.1007/s11192-018-2840-5

Heinisch, D. P., Koenig, J., & Otto, A. (2020). A supervised machine learning approach to trace doctorate recipient’s employment trajectories. Quantitative Science Studies, 1 (1), 94–116. https://doi.org/10.1162/qss_a_00001

Hemlin, S. (1996). Research on research evaluation. Social Epistemology, 10 (2), 209–250. https://doi.org/10.1080/02691729608578815 .

Hesli, V. L., & Lee, J. M. (2011). Faculty research productivity: Why do some of our colleagues publish more than others? PS: Political Science & Politics, 44 (2), 393–408. https://doi.org/10.1017/S1049096511000242 .

Hilmer, C. E., & Hilmer, M. J. (2007). On the relationship between the student-advisor match and early career research productivity for agricultural and resource economics Ph.Ds. American Journal of Agricultural Economics, 89 (1), 162–175. https://doi.org/10.1111/j.1467-8276.2007.00970.x .

Hinze, S., Butler, L., Donner, P., & McAllister, I. (2019). Different processes, similar results? A comparison of performance assessment in three countries. In W. Glänzel, H. F. Moed, U. Schmoch, & M. Thelwall (Eds.), Springer handbook of science and technology indicators (pp. 465–484). Berlin: Springer. https://doi.org/10.1007/978-3-030-02511-3_18 .

Hochschulrektorenkonferenz. (2012). Zur Qualitätssicherung in Promotionsverfahren . Retrieved from https://www.hrk.de/positionen/beschluss/detail/zur-qualitaetssicherung-in-promotionsverfahren/ .

Jaeger, M. (2006). Leistungsbezogene Budgetierung an deutschen Universitäten - Umsetzung und Perspektiven. Wissenschaftsmanagement, 12 (3), 32–38.

Google Scholar  

Kim, K., & Karau, S. J. (2009). Working environment and the research productivity of doctoral students in management. Journal of Education for Business, 85 (2), 101–106. https://doi.org/10.1080/08832320903258535 .

Kousha, K., & Thelwall, M. (2015). An automatic method for extracting citations from Google Books. Journal of the Association for Information Science and Technology, 66 (2), 309–320. https://doi.org/10.1002/asi.23170 .

Kousha, K., & Thelwall, M. (2019). Can Google Scholar and Mendeley help to assess the scholarly impacts of dissertations? Journal of Informetrics, 13 (2), 467–484. https://doi.org/10.1016/j.joi.2019.02.009 .

Larivière, V. (2012). On the shoulders of students? The contribution of PhD students to the advancement of knowledge. Scientometrics, 90 (2), 463–481. https://doi.org/10.1007/s11192-011-0495-6 .

Larivière, V., Zuccala, A., & Archambault, É. (2008). The declining scientific impact of theses: Implications for electronic thesis and dissertation repositories and graduate studies. Scientometrics, 74 (1), 109–121. https://doi.org/10.1007/s11192-008-0106-3 .

Laudel, G., & Gläser, J. (2008). From apprentice to colleague: The metamorphosis of early career researchers. Higher Education, 55 (3), 387–406. https://doi.org/10.1007/s10734-007-9063-7 .

Long, J. S., & McGinnis, R. (1985). The effects of the mentor on the academic career. Scientometrics, 7 (3–6), 255–280. https://doi.org/10.1007/BF02017149 .

Martin, B. R., & Irvine, J. (1983). Assessing basic research: Some partial indicators of scientific progress in radio astronomy. Research Policy, 12 (2), 61–90. https://doi.org/10.1016/0048-7333(83)90005-7 .

Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination \({R}^{2}\) and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface, 14 (134), 20170213. https://doi.org/10.1098/rsif.2017.0213 .

Nederhof, A., & van Raan, A. (1987). Peer review and bibliometric indicators of scientific performance: A comparison of cum laude doctorates with ordinary doctorates in physics. Scientometrics, 11 (5–6), 333–350. https://doi.org/10.1007/bf02279353 .

Nederhof, A., & van Raan, A. (1989). A validation study of bibliometric indicators: The comparative performance of cum laude doctorates in chemistry. Scientometrics, 17 (5–6), 427–435. https://doi.org/10.1007/BF02017463 .

Niggemann, F. (2020). Interne LOM und ZLV als Instrumente der Universitätsleitung. Qualität in der Wissenschaft, 14 (4), 94–98.

Oestmann, J. W., Meyer, M., & Ziemann, E. (2015). Medizinische Promotionen: Höhere wissenschaftliche Effizienz. Deutsches Ärzteblatt , 112 (42), A–1706/B–1416/C–1388.

Projektgruppe Indikatorenmodell. (2014). Indikatorenmodell für die Berichterstattung zum wissenschaftlichen Nachwuchs . Retrieved from https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bildung-Forschung-Kultur/Hochschulen/Publikationen/Downloads-Hochschulen/indikatorenmodell-endbericht.pdf .

Rogge, J.-C., & Tesch, J. (2016). Wissenschaftspolitik und wissenschaftliche Karriere. In D. Simon, A. Knie, S. Hornbostel, & K. Zimmermann (Eds.), Handbuch Wissenschaftspolitik (2nd ed., pp. 355–374). Berlin: Springer. https://doi.org/10.1007/978-3-658-05455-7_25

Statistisches Bundesamt. (2018). Prüfungen an Hochschulen 2017 . Retrieved from https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bildung-Forschung-Kultur/Hochschulen/Publikationen/Downloads-Hochschulen/pruefungen-hochschulen-2110420177004.pdf .

Vollmar, M. (2019). Neue Promovierendenstatistik: Analyse der ersten Erhebung 2017. WISTA - Wirtschaft und Statistik, 1, 68–80.

Wespel, J., & Jaeger, M. (2015). Leistungsorientierte Zuweisungsverfahren der Länder: Praktische Umsetzung und Entwicklungen. Hochschulmanagement, 10 (3+4), 97–105.

Wissenschaftsrat. (2002). Empfehlungen zur Doktorandenausbildung . Saarbrücken.

Wissenschaftsrat. (2004). Empfehlungen zu forschungs-und lehrförderlichen Strukturen in der Universitätsmedizin . Berlin.

Wissenschaftsrat. (2007). Forschungsleistungen deutscher Universitäten und außeruniversitärer Einrichtungen in der Chemie . Köln.

Wissenschaftsrat. (2008). Forschungsleistungen deutscher Universitäten und außeruniversitärer Einrichtungen in der Soziologie . Köln.

Wissenschaftsrat. (2011). Ergebnisse des Forschungsratings Elektrotechnik und Informationstechnik . Köln.

Wissenschaftsrat. (2012). Ergebnisse des Forschungsratings Anglistik und Amerikanistik . Köln.

Yoels, W. C. (1974). On the fate of the Ph.D. dissertation: A comparative examination of the physical and social sciences. Sociological Focus, 7 (1), 1–13. https://doi.org/10.1080/00380237.1974.10570872 .

Download references

Acknowledgements

Funding was provided by the German Federal Ministry of Education and Research [Grant Numbers 01PQ16004 and 01PQ17001]. We thank Beatrice Schulz for help with data collection.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Abteilung 2, Forschungssystem und Wissenschaftsdynamik, Deutsches Zentrum für Hochschul- und Wissenschaftsforschung, Schützenstr. 6a, 10117, Berlin, Germany

Paul Donner

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Paul Donner .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Donner, P. Citation analysis of Ph.D. theses with data from Scopus and Google Books. Scientometrics 126 , 9431–9456 (2021). https://doi.org/10.1007/s11192-021-04173-w

Download citation

Received : 21 January 2021

Accepted : 30 September 2021

Published : 24 October 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s11192-021-04173-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Early career researchers
  • Ph.D. thesis
  • Doctoral students
  • Doctoral degree holders
  • Citation analysis
  • Research performance
  • Research assessment
  • Find a journal
  • Publish with us
  • Track your research

similarity search phd thesis

Get science-backed answers as you write with Paperpal's Research feature

Know the Difference: PhD Dissertation vs. Thesis

PhD thesis vs PhD dissertation

One of the biggest turning points of any PhD student’s journey is the submission of a research writing project in the form of either a PhD thesis or a PhD dissertation. From an academic perspective, the thesis/dissertation is in many ways a major indicator of the abilities and expertise that you have gained as a doctoral candidate.

The mere task of understanding the requirements and compiling a PhD dissertation/thesis is in itself huge. However, what may be confusing to understand, especially if you are just embarking upon your doctoral journey, is the difference between thesis and dissertation.

Table of Contents

Similarities in phd dissertation vs. thesis, 1. understanding differences in the meaning of the two terms, 2. difference between thesis and dissertation based upon geographical location, 3. understanding a difference in content for a phd thesis v/s phd dissertation.

These two terms are often used interchangeably when referring to doctoral studies as there are a number of similarities between them:

  • The very first commonality between thesis and dissertation is that the submission of both is considered to be an official culmination of the doctoral work of the candidate.
  • Both the thesis and the dissertation demonstrate the ability of a doctoral candidate to effectively communicate their process of resolving a problem statement.
  • The thesis and dissertation both test a candidate’s ability toward analytical reasoning and critical thinking , while showcasing his/her expertise in a particular subject area.
  • Both the thesis and the dissertation are evaluated by an official review committee consisting of internal as well as external examiners who are experts in the specific subject area being explored in the doctoral study.
  • Based upon the reviews of the committee members, both of these documents are then subject to changes and re-submission as required .
  • Lastly, both a thesis and a dissertation can be treated as official publications that may be available as resources in the university library .

Owing to the above-mentioned similarities, the confusion between the correct usage of dissertation vs. thesis is quite understandable. In order to ensure the proper usage of these two terms, it’s crucial to understand the differences in a PhD thesis v/s PhD dissertation. Here are some quick pointers that may be useful.

Differences between thesis and dissertation

Since most academic institutions will continue to use these terms interchangeably, it is imperative that you confirm the intricate details regarding the expected structure of a PhD thesis/dissertation with your respective institution. However, for now, we hope that the above article helps in clarifying some of the major doubts that you may have had regarding a PhD thesis v/s dissertation.

In order to better understand the meaning of thesis vs dissertation, let us go back to the origin of the terms. The term ‘thesis’ originates from the Greek word ‘tithenai’, which means ‘to place a proposition’, while the term ‘dissertation’ has a Latin origin, which essentially means ‘disserere’/’dissertare’, i.e., ‘to (continue to) examine and discuss’. 1 To simplify further, a thesis by itself may simply represent an argument that you put forth and describe in depth, while a dissertation may represent a written summary/discussion of a particular work. 2

similarity search phd thesis

In countries/institutions that follow the British education system, it is common to term the final doctoral research writing project as a PhD thesis, while the countries/institutions that follow the American education system prefer to call it a PhD dissertation. In case you are unsure which education system is followed by your institution, it may be a good idea to verify this with the respective personnel, so that you can plan your doctoral journey effectively.

While the above two points may be useful to understand the differences between thesis and dissertation on a surface level, as a PhD student it is crucial for you to understand the deeper differences in the content and the type of work that goes into each of them. Let us do this by revisiting the differences in the origin of the two terms: ‘dissertare’ or to discuss (dissertation) v/s ‘to place a proposition’ (thesis). In my experience, the content of a PhD dissertation often comprises peer-reviewed publications that are published by the doctoral candidate during their doctoral work, along with supplementary chapters.

On the other hand, while compiling a PhD thesis, a doctoral candidate may need to describe the doctoral work in detail with the help of distinct chapters comprising: abstract, introduction, literature review, methodology, results, discussion, conclusion and bibliography/references. Thus, the main difference between thesis and dissertation, lies in the way the written document is being presented although the doctoral work done by the candidate will mostly remain the same .

References:

1.         What is the Difference Between a Dissertation and a Thesis? | Postgrad.com. https://www.postgrad.com/advice/exams/dissertation-and-theses/difference-between-a-dissertation-and-a-thesis/

2.         The PhD Thesis | FindAPhD.com. www.FindAPhD.com https://www.findaphd.com/guides/phd-thesis.

The time it takes to write a PhD thesis vs. dissertation can vary depending on several factors, including the research topic, the scope of the project, the research methodology, and the specific requirements of the academic program. However, in general, a dissertation is typically longer and more comprehensive than a thesis, and therefore may take longer to complete.

A dissertation for doctoral programs is typically required after the completion of required coursework, passing comprehensive exams, and hitting any other specific milestones outlined by the program. This means PhD students usually devote years in developing their dissertation, which demonstrates your ability to conduct independent research and make original contributions to the field

A thesis statement is used in academic writing to provide a concise summary of the main argument or point being made in an essay or paper. It should be used at the introduction at the beginning of the paper to guide the reader and provide focus for the writing. A thesis statement can be used in academic essays, research papers, analytical papers and literature reviews and presents readers with a clear roadmap of the research being conducted.

Paperpal is a comprehensive AI writing toolkit that helps students and researchers achieve 2x the writing in half the time. It leverages 21+ years of STM experience and insights from millions of research articles to provide in-depth academic writing, language editing, and submission readiness support to help you write better, faster.  

Get accurate academic translations, rewriting support, grammar checks, vocabulary suggestions, and generative AI assistance that delivers human precision at machine speed. Try for free or upgrade to Paperpal Prime starting at US$19 a month to access premium features, including consistency, plagiarism, and 30+ submission readiness checks to help you succeed.  

Experience the future of academic writing – Sign up to Paperpal and start writing for free!  

Related Reads:

  • How to Identify a Predatory Journal and Steer Clear of It
  • How to Write a Research Paper Introduction (with Examples)
  • Academic Writing Groups: 5 Benefits for Researchers
  • What is Peer Review: Importance and Types of Peer Review

Duplicate Publications: How to Avoid Overlapping Publications in Research

Manuscript submission: get your pre-submission checks right with paperpal , you may also like, phd qualifying exam: tips for success , ai in education: it’s time to change the..., 9 steps to publish a research paper, self-plagiarism in research: what it is and how..., 6 tips for post-doc researchers to take their..., 8 most effective ways to increase motivation for..., how to make your thesis supervision work for..., how to write a conclusion for research papers..., ethical research practices for research with human subjects, 5 reasons for rejection after peer review.

How to find resources by format

Why use a dissertation or a thesis.

A dissertation is the final large research paper, based on original research, for many disciplines to be able to complete a PhD degree. The thesis is the same idea but for a masters degree.

They are often considered scholarly sources since they are closely supervised by a committee, are directed at an academic audience, are extensively researched, follow research methodology, and are cited in other scholarly work. Often the research is newer or answering questions that are more recent, and can help push scholarship in new directions. 

Search for dissertations and theses

Locating dissertations and theses.

The Proquest Dissertations and Theses Global database includes doctoral dissertations and selected masters theses from major universities worldwide.

  • Searchable by subject, author, advisor, title, school, date, etc.
  • More information about full text access and requesting through Interlibrary Loan

NDLTD – Networked Digital Library of Theses and Dissertations provides free online access to a over a million theses and dissertations from all over the world.

WorldCat Dissertations and Theses searches library catalogs from across the U.S. and worldwide.

Locating University of Minnesota Dissertations and Theses

Use  Libraries search  and search by title or author and add the word "thesis" in the search box. Write down the library and call number and find it on the shelf. They can be checked out.

Check the  University Digital Conservancy  for online access to dissertations and theses from 2007 to present as well as historic, scanned theses from 1887-1923.

Other Sources for Dissertations and Theses

  • Center for Research Libraries
  • DART-Europe E-Thesis Portal
  • Theses Canada
  • Ethos (Great Britain)
  • Australasian Digital Theses in Trove
  • DiVA (Sweden)
  • E-Thesis at the University of Helsinki
  • DissOnline (Germany)
  • List of libraries worldwide - to search for a thesis when you know the institution and cannot find in the larger collections
  • ProQuest Dissertations Express  - to search for a digitized thesis (not a free resource but open to our guest users)

University of Minnesota Dissertations and Theses FAQs

What dissertations and theses are available.

With minor exceptions, all doctoral dissertations and all "Plan A" master's theses accepted by the University of Minnesota are available in the University Libraries system. In some cases (see below) only a non-circulating copy in University Archives exists, but for doctoral dissertations from 1940 to date, and for master's theses from 1925 to date, a circulating copy should almost always be available.

"Plan B" papers, accepted in the place of a thesis in many master's degree programs, are not received by the University Libraries and are generally not available. (The only real exceptions are a number of old library school Plan B papers on publishing history, which have been separately cataloged.) In a few cases individual departments may have maintained files of such papers.

In what libraries are U of M dissertations and theses located?

Circulating copies of doctoral dissertations:.

  • Use Libraries Search to look for the author or title of the work desired to determine location and call number of a specific dissertation. Circulating copies of U of M doctoral dissertations can be in one of several locations in the library system, depending upon the date and the department for which the dissertation was done. The following are the general rules:
  • Dissertations prior to 1940 Circulating copies of U of M dissertations prior to 1940 do not exist (with rare exceptions): for these, only the archival copy (see below) is available. Also, most dissertations prior to 1940 are not cataloged in MNCAT and can only be identified by the departmental listings described below.  
  • Dissertations from 1940-1979 Circulating copies of U of M dissertations from 1940 to 1979 will in most cases be held within the Elmer L. Andersen Library, with three major classes of exceptions: dissertations accepted by biological, medical, and related departments are housed in the Health Science Library; science/engineering dissertations from 1970 to date will be located in the Science and Engineering Library (in Walter); and dissertations accepted by agricultural and related departments are available at the Magrath Library or one of the other libraries on the St. Paul campus (the Magrath Library maintains records of locations for such dissertations).  
  • Dissertations from 1980-date Circulating copies of U of M dissertations from 1980 to date at present may be located either in Wilson Library (see below) or in storage; consult Libraries Search for location of specific items. Again, exceptions noted above apply here also; dissertations in their respective departments will instead be in Health Science Library or in one of the St. Paul campus libraries.

Circulating copies of master's theses:

  • Theses prior to 1925 Circulating copies of U of M master's theses prior to 1925 do not exist (with rare exceptions); for these, only the archival copy (see below) is available.  
  • Theses from 1925-1996 Circulating copies of U of M master's theses from 1925 to 1996 may be held in storage; consult Libraries search in specific instances. Once again, there are exceptions and theses in their respective departments will be housed in the Health Science Library or in one of the St. Paul campus libraries.  
  • Theses from 1997-date Circulating copies of U of M master's theses from 1997 to date will be located in Wilson Library (see below), except for the same exceptions for Health Science  and St. Paul theses. There is also an exception to the exception: MHA (Masters in Health Administration) theses through 1998 are in the Health Science Library, but those from 1999 on are in Wilson Library.

Archival copies (non-circulating)

Archival (non-circulating) copies of virtually all U of M doctoral dissertations from 1888-1952, and of U of M master's theses from all years up to the present, are maintained by University Archives (located in the Elmer L. Andersen Library). These copies must be consulted on the premises, and it is highly recommended for the present that users make an appointment in advance to ensure that the desired works can be retrieved for them from storage. For dissertations accepted prior to 1940 and for master's theses accepted prior to 1925, University Archives is generally the only option (e.g., there usually will be no circulating copy). Archival copies of U of M doctoral dissertations from 1953 to the present are maintained by Bell and Howell Corporation (formerly University Microfilms Inc.), which produces print or filmed copies from our originals upon request. (There are a very few post-1952 U of M dissertations not available from Bell and Howell; these include such things as music manuscripts and works with color illustrations or extremely large pages that will not photocopy well; in these few cases, our archival copy is retained in University Archives.)

Where is a specific dissertation of thesis located?

To locate a specific dissertation or thesis it is necessary to have its call number. Use Libraries Search for the author or title of the item, just as you would for any other book. Depending on date of acceptance and cataloging, a typical call number for such materials should look something like one of the following:

Dissertations: Plan"A" Theses MnU-D or 378.7M66 MnU-M or 378.7M66 78-342 ODR7617 83-67 OL6156 Libraries Search will also tell the library location (MLAC, Health Science Library, Magrath or another St. Paul campus library, Science and Engineering, Business Reference, Wilson Annex or Wilson Library). Those doctoral dissertations still in Wilson Library (which in all cases should be 1980 or later and will have "MnU-D" numbers) are located in the central section of the third floor. Those master's theses in Wilson (which in all cases will be 1997 or later and will have "MnU-M" numbers) are also located in the central section of the third floor. Both dissertations and theses circulate and can be checked out, like any other books, at the Wilson Circulation desk on the first floor.

How can dissertations and theses accepted by a specific department be located?

Wilson Library contains a series of bound and loose-leaf notebooks, arranged by department and within each department by date, listing dissertations and theses. Information given for each entry includes name of author, title, and date (but not call number, which must be looked up individually). These notebooks are no longer current, but they do cover listings by department from the nineteenth century up to approximately 1992. Many pre-1940 U of M dissertations and pre-1925 U of M master's theses are not cataloged (and exist only as archival copies). Such dissertations can be identified only with these volumes. The books and notebooks are shelved in the general collection under these call numbers: Wilson Ref LD3337 .A5 and Wilson Ref quarto LD3337 .U9x. Major departments of individual degree candidates are also listed under their names in the GRADUATE SCHOOL COMMENCEMENT programs of the U of M, available in University Archives and (for recent years) also in Wilson stacks (LD3361 .U55x).

  • << Previous: Dictionaries and encyclopedias
  • Next: E-books >>

School of Electrical and Computer Engineering

College of engineering, ph.d. dissertation defense - arindam mandal, related links.

Title :  Design of Digitally-Assisted and Artifact-Robust Next-Generation Bi-directional Neural Interfaces

Dr. Visvesh Sathe, ECE, Chair, Advisor

Dr. Shaolan Li, ECE

Dr. Muhannad Bakir, ECE

Dr. Farrokh Ayazi, ECE

Dr. Sam Sober, Emory

Who was Thomas Matthew Crooks? What we know about the Donald Trump rally shooter

A 20-year-old man from Pennsylvania has been identified as the suspect who attempted to assassinate Donald Trump at a political rally in the United States, law enforcement officials said.

Thomas Matthew Crooks has been named as the "subject involved" in the incident, the FBI said in a statement.

Crooks, who was killed by Secret Service snipers at the scene, was from Bethel Park, close to where the rally was held on Saturday local time.

Thomas Cooks

Bethel Park School District confirmed in a statement that Crooks graduated from Bethel Park High School in 2022. 

This is what we know so far about the suspect and how the shooting played out. 

What do we know about the shooter?

Mr Trump was injured when multiple shots were fired at the stage, but the former president's campaign says he is doing "fine".

One attendee was killed and two critically injured in the incident, according to authorities. 

Crooks had not been attending the rally in Butler, Pennsylvania.

He is suspected of carrying out the attack from a rooftop on a building outside the event.

Witnesses said they alerted police to a gunman on the roof of the building, which was about 120 metres away from the stage where Mr Trump began giving an address at around 6:02pm (8:02am Sunday AEST).

They spotted the suspected shooter several minutes before shots rung out and Mr Trump fell to the ground, rising moments later with blood streaming down his face.

Map showing distance between stage and where shooter was on roof.

So far, the FBI has only released Crooks's name, age and where he lived. 

They earlier said the gunman was not carrying identification, so they analysed his DNA to provide a biometric confirmation of his identity. 

Bethel Park, where Crooks lived, is about an hour south of Butler. 

The Federal Aviation Administration said on Sunday that the airspace over the Bethel Park was closed "effective immediately" for special security reasons.

Crooks graduated from Bethel Park High School in 2022, according to a statement from the Bethel Park School District.

"Our school district will cooperate fully with the active law enforcement investigation surrounding this case, and as such, we are limited in what we can publicly disclose," the statement read.

"The school district wishes to express its sincere wishes for a speedy and complete recovery for Mr. Trump and those in attendance at the Saturday event who may have been physically harmed or emotionally impacted by these tragic events." 

He received a $US500 ($740) "star award" from the National Maths and Science Initiative , according to a local media report.

State voter records show that Crooks was a registered Republican and the upcoming election would have been the first time he was old enough to vote.

When Crooks was 17 he made a $US15 donation to ActBlue, a political action committee that raises money for left-leaning and Democratic politicians, according to a 2021 Federal Election Commission filing.

The donation was earmarked for the Progressive Turnout Project, a national group that rallies Democrats to vote. The groups did not immediately respond to a Reuters request for comment.

Crooks's father, Matthew Crooks, 53, told CNN that he was trying to figure out what happened and would wait until he spoke to law enforcement before speaking about his son.

What weapon was involved?

Two officials spoke to The Associated Press on the condition of anonymity, saying an AR-style rifle was recovered at the scene.

AR-style rifles are common in mass shootings in the US, said Professor David Smith from the United States Studies Centre.

They are affordable, lightweight and semi-automatic, meaning they can fire multiple rounds quickly. 

Mr Trump was discussing border crossing numbers when the shots, at least five, were fired.

He released a statement on his Truth Social platform describing the moment he was struck.

"I was shot with a bullet that pierced the upper part of my right ear. I knew immediately that something was wrong in that I heard a whizzing sound, shots, and immediately felt the bullet ripping through the skin," he wrote.

Up to a dozen of the deadliest mass shootings in the US since 2006 have involved an AR-15.

Semi-automatic rifles such as the AR-15 have been banned in the past, but are currently widely available in most states. 

President Joe Biden has been calling to reinstate a nationwide ban that expired in 2004.

"One of my fears is that even though most of the major assassination attempts of the 20th century actually led to significant advances in gun control, this one will not," Professor Smith told ABC News channel.

"This doesn't seem like a moment where Americans are going to ask themselves whether things have gone too far with this violence.

"Instead, this seems more like a moment that is going to accelerate the current polarisation."

Possible motives?

The FBI, which is leading the investigation, told reporters that it's too early to say whether the shooter acted alone and have not provided a motive.

But police do not believe there is "any other existing threat out there".

The FBI is calling for people with video of the event or information to share it with authorities.

Trump can be seen entered a car backwards surrounded by suited men and soldiers

Professor Claire Finkelstein from the University of Pennsylvania Law School, said we are likely to learn more about possible motives in the coming days.

She noted there has been an increase in political violence. 

"We have seen an increase in violent rhetoric around political events of all sorts, and political cause," Professor Finkelstein told ABC News. 

"And the irony is, of course, that Trump himself has fomented some of this violent rhetoric. 

"I think this is a lesson to all of us about how dangerous it is for politicians to increase the violent rhetoric around their campaigns and around their causes."

A huddle of suited men and women move through a rally ground with American flags and crowds on either side

Homeland Security Secretary Alejandro Mayorkas and US Secret Service Director Kimberly Cheatle have briefed President Joe Biden.

They are working with law enforcement partners to respond to and investigate the shooting, Mr Mayorkas said on X.

Republican US House Speaker Mike Johnson also said the House will conduct a full investigation of the attack on Trump's campaign rally.

"The American people deserve to know the truth," Mr Johnson said.

"We will have Secret Service Director Kimberly Cheatle and other appropriate officials from DHS and the FBI appear for a hearing before our committees ASAP."

  • X (formerly Twitter)

Related Stories

Two killed, two critical and a presidential candidate injured: how the shooting at donald trump's rally unfolded.

Trump is seen in profile close up with a bloodied face and hands on the ground

Witnesses say gunman was spotted on roof near rally minutes before Trump assassination attempt

A soldier in black with an assault rifle looms over the stage with a pile of suited men covering Trump

The shooting took an instant but the attempt on Trump's life will be felt for years to come

Trump raises a fist in the air as he's escorted into a vehicle by secret service agents. His ear is covered in blood

  • US Elections
  • United States
  • World Politics

IMAGES

  1. Example of similarity search

    similarity search phd thesis

  2. (PDF) Similarity in Ph D Thesis in State Universities is Less Than 10%

    similarity search phd thesis

  3. GitHub

    similarity search phd thesis

  4. (PDF) Similarity Search: A Matching Based Approach

    similarity search phd thesis

  5. Similarity index by thesis type and academic year

    similarity search phd thesis

  6. PPT

    similarity search phd thesis

VIDEO

  1. Similarity : Finding Unknown Lengths

  2. Tips Menurunkan Similarity Index pada Turnitin

  3. Top AI Tools for Efficient Literature Search

  4. Say No to Plagiarism

  5. 5 Best Websites to Download Thesis and Dissertation #downloadthesis #dissertation

  6. The similarity distance on graphs and graphons

COMMENTS

  1. PDF High-Dimensional Similarity Search for Large Datasets

    A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department of Computer Science ... similarity search, and for his guidance and encouragement throughout the course of this study.

  2. High-dimensional similarity search and sketching : algorithms and hardness

    This item appears in the following Collection (s) Doctoral Theses Show simple item record

  3. What is the acceptable similarity in a mathematics PhD dissertation

    I have checked the originality of my PhD thesis in mathematics using Turnitin. The similarity was 31%. Is this percentage acceptable by most committees?

  4. PDF Representation Learning for Efficient and Effective Similarity Search

    Representation Learning for Efficient and Effective Similarity Search and Recommendation. Advisors: Stephen Alstrup, Christina Lioma, Jakob Grue Simonsen. submitted to the PhD School of The Faculty of Science, University of CopenhagenAbstractHow data is represented and operationaliz. d is critical for building computational solutions that are ...

  5. Improved Similarity Search for Large Data in Machine Learning and Robotics

    A thesis submitted in partial fulfilment of the requirements ... Josiah Walker , declare that this thesis titled, 'Improved Similarity Search for Large Data in Machine Learning and Robotics' and the work presented in it are my own. ... for sharing our marriage with my PhD, for being understanding and patient through all the various stages ...

  6. PDF UTS Similarity Search

    Traditional similarity search techniques used for standard time series are not always effective for uncertain time series data analysis. This motivates our work in this dissertation.

  7. Learning task-specific similarity

    Abstract The right measure of similarity between examples is important in many areas of computer science. In particular it is a critical component in example-based learning methods. Similarity is commonly defined in terms of a conventional distance function, but such a definition does not necessarily capture the inherent meaning of similarity, which tends to depend on the underlying task. We ...

  8. Similarity check for researchers

    iThenticate - Similarity check for researchers Prevent sloppy referencing or plagiarism with the Similarity Check for researchers. The tool can be used by (co-)authors to authenticate manuscripts, and to check PhD theses.

  9. Scalable Similarity Search

    The team includes 3 PhD students and 3 post-docs. The aim of the project is to improve theory and practice of algorithms for high-dimensional similarity search on big data, and to extend similarity search algorithms to work in settings where data is distributed (using a communication complexity perspective) or uncertain (using a statistical ...

  10. Learning Task-Specific Similarity

    Abstract The right measure of similarity between examples is important in many areas of computer science. In particular it is a critical component in example- based learning methods. Similarity is commonly defined in terms of a conventional distance function, but such a definition does not necessarily capture the inherent meaning of similarity, which tends to depend on the underlying task. We ...

  11. Free Plagiarism Checker in Partnership with Turnitin

    Catch accidental plagiarism with Scribbr, in partnership with Turnitin, using the same plagiarism checker software as most universities.

  12. OATD

    Advanced research and scholarship. Theses and dissertations, free to find, free to use. Advanced Search Options Find ETDs with:

  13. UK Doctoral Thesis Metadata from EThOS // British Library

    The datasets in this collection comprise snapshots in time of metadata descriptions of hundreds of thousands of PhD theses awarded by UK Higher Education institutions aggregated by the British Library's EThOS service. The data is estimated to cover around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787.

  14. Open Access Theses and Dissertations (OATD)

    OATD.org provides open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions. OATD currently indexes 6,654,285 theses and dissertations.

  15. Dissertations & Theses

    ProQuest Dissertations & Theses provides researchers with unmatched search and reference link functionality that delivers results from across the globe.

  16. Turnitin Access To Plagiarism Check For GMS Students

    How it works: Upload your papers to Blackboard Learn to check for similarity index Submit multiple versions of papers, take home assignments, theses or dissertations and rewrite text as needed Extended Directions:

  17. Similarity Index of Doctoral Theses Submitted to Universities in Kerala

    21 % similarity index in its PhD theses followed by M.G University having 20.4% and the University of Kerala has least with 17.9% of similarity index. The paper points out the importance of user awareness programmes and training programmes on anti-plagiarism for the research guides, research scholars and library staff members.

  18. EBSCO Open Dissertations

    EBSCO Open Dissertations is a collaboration between EBSCO and BiblioLabs to increase traffic and discoverability of ETD research. You can join the movement and add your theses and dissertations to the database, making them freely available to researchers everywhere while increasing traffic to your institutional repository.

  19. Citation analysis of Ph.D. theses with data from Scopus and ...

    This study investigates the potential of citation analysis of Ph.D. theses to obtain valid and useful early career performance indicators at the level of university departments. For German theses from 1996 to 2018 the suitability of citation data from Scopus and Google Books is studied and found to be sufficient to obtain quantitative estimates of early career researchers' performance at ...

  20. Know the Difference: PhD Dissertation vs. Thesis

    Table of Contents Similarities in PhD dissertation vs. thesis Differences between thesis and dissertation 1. Understanding differences in the meaning of the two terms 2. Difference between thesis and dissertation based upon geographical location 3. Understanding a difference in content for a PhD thesis v/s PhD dissertation FAQ

  21. How to find resources by format

    Search the Libraries by format. Why use a dissertation or a thesis? A dissertation is the final large research paper, based on original research, for many disciplines to be able to complete a PhD degree. The thesis is the same idea but for a masters degree.

  22. Ph.D. Program

    The dissertation is the culmination of the student's graduate career and is expected to be a substantial and original work of scholarship or criticism. Students within normative time complete the dissertation in their fourth through sixth years.

  23. Ph.D. Dissertation Defense

    Title: Coverage Control for Constrained Heterogeneous Multi-Robot TeamsCommittee:Dr. Samuel Coogan, ECE, Chair, AdvisorDr. Magnus Egerstedt, Irvine, Co-AdvisorDr ...

  24. Ph.D. Dissertation Defense

    Search. Ph.D. Dissertation Defense - Arindam Mandal Breadcrumb. Home; July 24 2024 ... Location. Room W218, Van Leer. Related links. Microsoft Teams Meeting link. Keywords. Phd Defense. graduate students. Wednesday, July 24, 2024 10:00AM Title: Design of Digitally-Assisted and Artifact-Robust Next-Generation Bi-directional Neural Interfaces.

  25. What we know about the shooter behind Trump assassination attempt

    The FBI have named Thomas Matthew Crooks, a 20-year-old man from Pennsylvania, as the "subject involved" in the attempted assassination of former President Donald Trump.