Skip to content

Ramy-Badr-Ahmed/php-swh-client

Repository files navigation

GitHub top language GitHub DOI

SWH

SWH API Client

This is a PHP API client/connector for Software Heritage (SWH) web API - currently in Beta phase. The client is wrapped round the Illuminate Http package and the GuzzleHTTP library.

Note

Detailed documentation can be found in the wiki pages of this very repository.

A demonstrable version (some features) can be accessed here: Demo Version

Working on new features and fixes will be gladly considered. Please feel free to report.

Installation Steps:

1) Clone this project.

2) Open a console session and navigate to the cloned directory:

    Run "composer install"

    This should involve installing the PHP REPL, PsySH

3) (Optional) Acquire SWH tokens for increased SWH-API Rate-Limits.

4) Prepare .env file and add tokens:   

    4.1) Rename/Copy the cloned ".env.example" file to .env
            cp .env.example .env   
            
    4.2) (Optional) Edit these two token keys:
    
            SWH_TOKEN_PROD=Your_TOKEN_FROM_SWH_ACCOUNT              # step 3)                 
            SWH_TOKEN_STAGING=Your_STAGING_TOKEN_FROM_SWH_ACCOUNT   # step 3)                 

5) (optional) Add psysh to PATH.

Quickstart:

In a console session inside the cloned directory, start the php REPL:

$ psysh     // if not added to PATH replace with: vendor/bin/psysh

Psy Shell v0.12.0 (PHP 8.2.0 — cli) by Justin Hileman

This will open a REPL console-based session where one can test the functionality of the api classes and their methods before building a suitable workflow/use-cases.

Presets

As a one-time configuration parameter, you can set the desired returned data type by SWH (default JSON):

> namespace Module\HTTPConnector;
> use Module\HTTPConnector;         

> HTTPClient::setOptions(responseType:'object')     // json/collect/object available

Visits

Retrieve Latest Full Visit in the SWH archive:

> namespace Module\OriginVisits;
> use Module\OriginVisits; 

> $visitObject = new SwhVisits('https://github.com/torvalds/linux/');

> $visitObject->getVisit('latest', requireSnapshot: true)

More details on further swh visits methods: SwhVisits.

DAG Model:

As graph Nodes, retrieve node Contents, Edges or find a Path to other nodes (top-bottom):

> namespace Module\DAGModel;
> use Module\DAGModel; 

> $snpNode = new GraphNode('swh:1:snp:bcfd516ef0e188d20056c77b8577577ac3ca6e58')

> $snpNode->nodeHopp()   // node contents

> $snpNode->nodeEdges()  // node edges keyed by the respective name

> $revNode = new GraphNode('swh:1:rev:9cf5bf02b583b93aa0d149cac1aa06ee4a4f655c')

> $revNode->nodeTraversal('deps/nghttp2/lib/includes/nghttp2/nghttp2ver.h.in') //  traverse to a deeply nested file

More details on:

Archive

You can specify repositories URL w/o paths and archive to SWH using one of the two variants (static/non-static methods):

> namespace Module\Archival;
> use Module\Archival; 
    
> $saveRequest = new Archive('https://github.com/torvalds/linux/')    // Example 1
> $saveRequest->save2Swh()
    
> $newSaveRequest = Archive::repository('https://github.com/hylang/hy/tree/stable/hy/core')  // Example 2

    // in both cases: the returned POST response contains the save request id and date

Enquire about archival status using the id/date of the archival request (available in the initial POST response)

> $saveRequest->getArchivalStatus($saveRequestDateOrID)     // current status is returned 
> $saveRequest->trackArchivalStatus($saveRequestDateOrID)   // tracks until archival has succeeded

More details on further archive methods: Archive.

EBNF Grammar

Validate a given swhID. TypeError is thrown for non-valid swhIDs.

> namespace Module\DataType; 
> use Module\DataType; 
         
$snpID = new SwhcoreId('swh:1:snp:bcfd516ef0e188d20056c77b8577577ac3ca6e5Z') // throws TypeError Exception

Full details of the SWHID persistent Identifiers: Syntax

Note

Todo: Core identifiers with qualifiers.

MetaData

Returns a list of metadata authorities that provided metadata on the given target

> namespace Module\MetaData;
> use Module\MetaData; 

> SwhMetaData::getOriginMetaData('https://github.com/torvalds/linux/')

More details on further metadata methods: Metadata.