### Install refextract Source: http://pythonhosted.org/refextract Install the refextract library using pip. This is the first step before using any of its functionalities. ```bash pip install refextract ``` -------------------------------- ### Extract Journal Reference Source: http://pythonhosted.org/refextract Use extract_journal_reference to get structured information from a publication reference string. Ensure the input is a valid reference format. ```python from refextract import extract_journal_reference reference = extract_journal_reference("J.Phys.,A39,13445") print(reference) ``` ```json { 'extra_ibids': [], 'is_ibid': False, 'misc_txt': u'', 'page': u'13445', 'title': u'J. Phys.', 'type': 'JOURNAL', 'volume': u'A39', 'year': '' } ``` -------------------------------- ### Extract references from a file with custom format Source: http://pythonhosted.org/refextract Customizes the output format of extracted references using the reference_format parameter. ```python >>> extract_references_from_file(path, reference_format="{title},{volume},{page}") ``` -------------------------------- ### Override knowledge bases for file extraction Source: http://pythonhosted.org/refextract Provides custom paths for journal knowledge bases to improve extraction accuracy. ```python >>> extract_references_from_file(path, override_kbs_files={'journals': 'my/path/to.kb'}) ``` -------------------------------- ### Override knowledge bases for string extraction Source: http://pythonhosted.org/refextract Provides custom paths for knowledge bases when extracting from a string. ```python >>> extract_references_from_url(path, override_kbs_files={'journals': 'my/path/to.kb'}) ``` -------------------------------- ### Extract references from a string with custom format Source: http://pythonhosted.org/refextract Customizes the output format of references extracted from a raw string. ```python >>> extract_references_from_url(path, reference_format="{title},{volume},{page}") ``` -------------------------------- ### Extract References from URL Source: http://pythonhosted.org/refextract Extract references directly from a URL pointing to a PDF file using extract_references_from_url. The URL must be accessible. ```python from refextract import extract_references_from_url reference = extract_references_from_url("http://arxiv.org/pdf/1503.07589v1.pdf") print(reference) ``` ```json { 'references': [ {'author': [u'F. Englert and R. Brout'], 'doi': [u'10.1103/PhysRevLett.13.321'], 'journal_page': [u'321'], 'journal_reference': ['Phys.Rev.Lett.,13,1964'], 'journal_title': [u'Phys.Rev.Lett.'], 'journal_volume': [u'13'], 'journal_year': [u'1964'], 'linemarker': [u'1'], 'title': [u'Broken symmetry and the mass of gauge vector mesons'], 'year': [u'1964']}, ... ], 'stats': { 'author': 15, 'date': '2016-01-12 10:52:58', 'doi': 1, 'misc': 0, 'old_stats_str': '0-1-1-15-0-1-0', 'reportnum': 1, 'status': 0, 'title': 1, 'url': 0, 'version': u'0.1.0.dev20150722' } } ``` -------------------------------- ### Extract References from File Source: http://pythonhosted.org/refextract Extract references from a full-text PDF file using extract_references_from_file. Provide the correct file path as an argument. ```python from refextract import extract_references_from_file reference = extract_references_from_file("some/fulltext/1503.07589v1.pdf") print(reference) ``` ```json { 'references': [ {'author': [u'F. Englert and R. Brout'], 'doi': [u'10.1103/PhysRevLett.13.321'], 'journal_page': [u'321'], 'journal_reference': ['Phys.Rev.Lett.,13,1964'], 'journal_title': [u'Phys.Rev.Lett.'], 'journal_volume': [u'13'], 'journal_year': [u'1964'], 'linemarker': [u'1'], 'title': [u'Broken symmetry and the mass of gauge vector mesons'], 'year': [u'1964']}, ... ], 'stats': { 'author': 15, 'date': '2016-01-12 10:52:58', 'doi': 1, 'misc': 0, 'old_stats_str': '0-1-1-15-0-1-0', 'reportnum': 1, 'status': 0, 'title': 1, 'url': 0, 'version': u'0.1.0.dev20150722' } } ``` -------------------------------- ### extract_references_from_file Source: http://pythonhosted.org/refextract Extracts references from a local PDF file. ```APIDOC ## extract_references_from_file ### Description Extracts references from a local PDF file. Raises FullTextNotAvailable if the file does not exist. ### Parameters #### Request Body - **path** (string) - Required - Path to the local PDF file. - **recid** (any) - Optional - Record ID. - **reference_format** (string) - Optional - Format string for references (default: '{title} {volume} ({year}) {page}'). - **linker_callback** (function) - Optional - Callback function executed for every reference element found. - **override_kbs_files** (dict) - Optional - Dictionary to override knowledge base files. ### Response - **result** (dict) - Returns a dictionary with extracted references and stats. ``` -------------------------------- ### extract_references_from_string Source: http://pythonhosted.org/refextract Extracts references from a raw string. ```APIDOC ## extract_references_from_string ### Description Extracts references from a raw string. Raises FullTextNotAvailable if the source is invalid. ### Parameters #### Request Body - **source** (string) - Required - The raw string to extract references from. - **is_only_references** (boolean) - Optional - Set to False if the string contains more than just references to improve accuracy. - **recid** (any) - Optional - Record ID. - **reference_format** (string) - Optional - Format string for references. - **linker_callback** (function) - Optional - Callback function for reference elements. - **override_kbs_files** (dict) - Optional - Dictionary to override knowledge base files. ``` -------------------------------- ### extract_references_from_url Source: http://pythonhosted.org/refextract Extracts references from a PDF located at a URL. ```APIDOC ## extract_references_from_url ### Description Extracts references from the PDF specified in the URL. Raises FullTextNotAvailable if the URL returns a 404. ### Parameters #### Request Body - **url** (string) - Required - The URL of the PDF file. - **headers** (dict) - Optional - HTTP headers for the request. - **chunk_size** (int) - Optional - Chunk size for downloading (default: 1024). - **kwargs** (dict) - Optional - Additional keyword arguments. ``` -------------------------------- ### extract_journal_reference Source: http://pythonhosted.org/refextract Extracts journal reference information from a given string. ```APIDOC ## extract_journal_reference ### Description Extracts the journal reference from a string and parses for specific journal information. ### Parameters #### Request Body - **line** (string) - Required - The input string containing the journal reference. - **override_kbs_files** (dict) - Optional - Dictionary to override knowledge base files for journal names. ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.