dedupmarcxml documentation Documentation Status

How to import modules

# Import libraries
from dedupmarcxml.evaluate import evaluate_records_similarity, get_similarity_score
from dedupmarcxml.briefrecord import Briefrec

Base example code

# Import libraries
from dedupmarcxml.evaluate import evaluate_records_similarity, get_similarity_score
from dedupmarcxml.briefrecord import BriefRec

rec1 = BriefRec(etree.Element)
rec2 = BriefRec(etree.Element)

score_detailed = evaluate_records_similarity(rec1, rec2, method=mean)

score = get_similarity_score(score_detailed, method='mean')

Contents

class dedupmarcxml.briefrecord.BriefRec

Class representing a brief record object

You can create a brief record object from a SruRecord object or from the XML data of a MARCXML record using an Etree Element object.

The namespaces are removed from the XML data.

Variables:
  • error – boolean, is True in case of error

  • error_messages – list of string with the error messages

  • data – json object with brief record information

dedupmarcxml.evaluate.evaluate_records_similarity(rec1: BriefRec, rec2: BriefRec, prevent_auto_match=False) Dict[str, float]

Evaluate similarity between two records

Parameters:
  • rec1 – BriefRecord object

  • rec2 – BriefRecord object

  • prevent_auto_match – if True, we check record id of both records, if they are the same, we return 0 to all parameters to avoid auto match

Returns:

float with matching score

dedupmarcxml.evaluate.get_similarity_score(sim_analysis: Dict[str, float], method: str | None = 'mean') float

Return the similarity score between two records

It uses the result of the evaluation of similarity of two records (func:dedupmarcxml.evaluate.evaluate_records_similarity).

Parameters:
  • sim_analysis – dictionary containing the results of the evaluation of similarity of two records

  • method – method to use to calculate the similarity score, default method is the mean

Returns:

similarity score between two records as float