dedupmarcxml documentation
How to import modules
# Import libraries
from dedupmarcxml.evaluate import evaluate_records_similarity, get_similarity_score
from dedupmarcxml.briefrecord import Briefrec
Base example code
# Import libraries
from dedupmarcxml.evaluate import evaluate_records_similarity, get_similarity_score
from dedupmarcxml.briefrecord import BriefRec
rec1 = BriefRec(etree.Element)
rec2 = BriefRec(etree.Element)
score_detailed = evaluate_records_similarity(rec1, rec2, method=mean)
score = get_similarity_score(score_detailed, method='mean')
Contents
- class dedupmarcxml.briefrecord.BriefRec
Class representing a brief record object
You can create a brief record object from a
SruRecordobject or from the XML data of a MARCXML record using an Etree Element object.The namespaces are removed from the XML data.
- Variables:
error – boolean, is True in case of error
error_messages – list of string with the error messages
data – json object with brief record information
- dedupmarcxml.evaluate.evaluate_records_similarity(rec1: BriefRec, rec2: BriefRec, prevent_auto_match=False) Dict[str, float]
Evaluate similarity between two records
- Parameters:
rec1 – BriefRecord object
rec2 – BriefRecord object
prevent_auto_match – if True, we check record id of both records, if they are the same, we return 0 to all parameters to avoid auto match
- Returns:
float with matching score
- dedupmarcxml.evaluate.get_similarity_score(sim_analysis: Dict[str, float], method: str | None = 'mean') float
Return the similarity score between two records
It uses the result of the evaluation of similarity of two records (func:dedupmarcxml.evaluate.evaluate_records_similarity).
- Parameters:
sim_analysis – dictionary containing the results of the evaluation of similarity of two records
method – method to use to calculate the similarity score, default method is the mean
- Returns:
similarity score between two records as float