Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit records don't seem to adhere to the --fixed-dt timestamp #31

Open
Shrinks99 opened this issue Nov 3, 2023 · 0 comments
Open
Assignees
Labels

Comments

@Shrinks99
Copy link
Member

All other records in the created WARC file seem to adhere to the --fixed-dt flag if set by the user. Revisit records, automatically created by warcit based on the directory structure, are the only ones that seem to exhibit this issue.

This is possibly because revisit records use a different method of deriving warc_date than other records do. See

warcit/warcit/warcit.py

Lines 547 to 554 in d94ecd7

warc_date = record.rec_headers['WARC-Date']
source_uri = record.rec_headers['WARC-Source-URI']
revisit_record = writer.create_revisit_record(index_url, digest, url, warc_date)
# no creation date needed, as it matches warc-date
#revisit_record.rec_headers['WARC-Creation-Date'] = warc_date
revisit_record.rec_headers['WARC-Source-URI'] = source_uri
vs

warcit/warcit/warcit.py

Lines 495 to 501 in d94ecd7

# timestamp
if self.use_mapfile and file_info.mapfile_results and 'timestamp' in file_info.mapfile_results:
warc_date = self._set_fixed_dt(file_info.mapfile_results['timestamp'])
elif self.fixed_dt:
warc_date = self.fixed_dt
else:
warc_date = datetime_to_iso_date(file_info.modified_dt)

Screenshot

This issue only appears to affect revisit records as shown below.

Screenshot 2023-11-02 at 11 02 23 PM

The current URL timestamp shows the current date of WARC creation instead of the --fixed-dt date. The HTML file displays the correct date displaying the time that these website files would have been seen (according to the user of warcit).

Screenshot 2023-11-02 at 11 07 58 PM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant