Table of Contents
Introduction of the CLI tool
The ahcrawler has a command line tool. It is located in the ./bin/ subdirectory. With it you can
- list current profiles
- (re)index a website profile
- delete data of a profile
- flush all data of all profiles
Syntax
It was written to be used in cronjobs and for manual indexing.
Calling it without parameter it shows a help.
Basic syntax
The most commands you will need have a structure with 3 parameter blocks
cli.php [action] [for wich data] [and which profile]
You can use the short variant for the parameters or long (which are more readable).
Actions
--action [name of action] or -a [name of action]
Known actions are:
Action | Description |
---|---|
list | list all existing profiles |
index | start crawler to reindex searchindex or resources |
update | start crawler to update missed searchindex or resources |
empty | remove existing data of a profile |
flush | drop data for ALL profiles |
Data
--data [name] or -d [name]
Valid data items are:
Item | Description |
---|---|
searchindex | the database of the webcontent for a website search; his is always the first data item you need to fill!! |
resources | the used resources in your website (links, images, css, js files) |
search | the entered search terms of your visitors (if you use the search form) |
all | short for searchindex + resources |
Profile
In the backend you can define multiple profiles for different websites. You need to add the profile id for each action.
--profile [id] or -p [id]
List profiles
With the list action you find out the ids of your profiles. These ids you will need for the parameter –profile (or -p) in other actions.
Example:
cli.php --action list
(Re-) Create the website data
With the reindex action you can
- delete existing indexed data and
- start the indexer
This is is the most simple variant to create or fully update a profile. It handles both data stores in a single step: it deletes and indexes searchindex and resources.
The –profile parameter defines the profile to handle.
Example:
cli.php --action reindex --profile 1
Remark: On a shared hosting with a limited execution time you can split actions (empty then index and then update), data resources (searchindex and resources) while looping over all profiles.
Create index of a website
With the index action you can start the indexer to rescan the searchindex OR the linked resources.
The –profile parameter defines the profile to handle. The –data parameter is used to tell what to index.
- searchindex - the database of the webcontent for a website search; this is always the first data item you need to fill!!
- resources
- all (searchindex + resources)
Example:
cli.php --action index --data all --profile 1
Remarks:
- If the website was crawled before you may want to delete the data of a single profile first (action empty) - or flush all indexed content of all profiles (action flush).
- To delete already indexed data you need to call the “empty” action (see below).
Rescan last errors
With the update action you can complete a scan. It starts the indexer to check all items that failed in the last run and have an error status. Repeat the update command after a full index of a website profile only.
The –profile parameter defines the profile to handle. The –data parameter is used to tell what to index.
- searchindex
- resources
Example:
cli.php --action update --data resources --profile 1
Empty data of a single website profile
With the empty action you can delete all entries of the given profile id. This command initiates a DELETE in the database table(s) for all items with the given profile id.
The –profile parameter defines the profile to handle. The –data parameter is used to tell what to delete.
- searchindex
- resources
- all (searchindex + resources)
- search - be careful - this you don’t want in the most cases
- full (searchindex + resources + search) - be careful - this you don’t want in the most cases
Example:
cli.php --action empty --data searchindex --profile 1
Flush data of all website profiles
With the flush action you can delete all data of all profiles. This command initiates a DROP TABLE command in the database. You should use the flush command if you have created a search index and a resources scan and want to rebuild them from point zero.
The –profile parameter is not needed - dropping tables has impact to all profiles. The –data parameter is used to tell what to delete.
- searchindex
- resources
- all (searchindex + resources)
- search - be careful - this you don’t want in the most cases
- full (searchindex + resources + search) - be careful - this you don’t want in the most cases
Example:
cli.php --action flush --data all