This section will guide you through the basic tasks of the web based AmCat Navigator environment.
1. Introduction
This section will guide you through the web based AmCat Navigator environment in two steps. First, the global structure will be covered of this application to organise and search huge quantities of texts for content analysis. Secondly, specific tasks will be explained step by step. For example creating a particular project, adding articles to it and assigning coding jobs to coders. AmCat is the graphical user interface of an underlying database containing all bits and bytes of projects, articles and coding. It helps a researcher to organise and store thousands of texts. Everything without having to know and write complex computer languages like SQL. The environment is typically employed for research on topics like news developments during elections and other political information in newspapers. For this purpose Amcat was originally written by members of the section of Communication Science at the Vrije Universiteit Amsterdam, Holland. Nowadays they use the system for creating and storing news content for computer assisted content analysis using software like iNet and ways of automatic indexing of news texts. Of course this doesn’t exclude other purposes. Texts selected from for example websites like a user forum can be added and distributed in a similar way. The subsequent section outlines the global structure and frequently used expressions.
2. Structure
* 2.1 General *
AmCat Navigator has two major components. First and major part is the ability to create and structure article sets that have to be coded for a specific research project. This part will be highlighted in detail in this paragraph. Second element of the system is the possibility to search texts quickly with (boolean) search strings. The results can be saved, either as graph or csv file that can be edited in programs like Microsoft Excel. An explanation of the search option can be found within the ‘ how to-section ’. The major project-managing-part of AmCat is structured in a pyramid-like shape in accordance with the hierarchical connections between so-called ‘tables’, that are tied together in the database behind the navigator. These tables contain information about the projects, batches, coding jobs, articles and so on. They are utilised to combine all necessary elements to define for example a new project with certain articles from date x to y and newspaper x (click here to see how to create a project ). In the next subparagraphs the distinct elements considering a project will be elucidated. To begin with, the term project. Followed by information about what batches consist of. Third, the coding jobs will be explained.
* 2.2 AmCat expressions *
_ Projects _
On top of the pyramid are the project-labels. Projects are the basic units by which a certain collection of articles can be identified. In fact a project in AmCat equals a specific research project, because AmCat-projects usually are labelled by a brief description or title of a study. Most important project-label though, is the so-called project id. This is an automatically generated, non-interchangeable number for a certain project and can be applied to select and download necessary data into a data file used in statistical software such as SPSS.
_ Batches _
Within a specific project one can find batches containing an article set put in order by texts from an uploaded text file. Such a file can consist of articles from a specific date, media or event. Batches are identified by a unique, automatically allocated number too. This batch id can be used to divide different article collections within a project into separate coding jobs (see below).
_ Coding jobs _
Hooked into batches are the coding jobs. These jobs consist of article sets that are categorised by for example batch id, medium, period or other article selections; a selection of the full set with articles within a project is possible too. After defining a coding job, one can assign it by its id-number to a specific coder. Usually a coder indexes the content of a job via specialised content analysis software like iNet .
_ Creator, owner and coder _
The terms creator, owner and coder have a shared user id and this number or name belongs to a single AmCat-account. By this label one is able to assign for instance coding jobs to users or, if one has the right system privileges, create projects and jobs. The creator information is presented in the project overview. It presents who has created a project. The owner can be found in the ‘coding job details’ overview. It states who has grouped the specific job. The coder label on the coding job details page visualises which coder a job is assigned to.
3. How to employ specific AmCAT tasks
Using the AmCat Navigator it is easy to create projects or quickly search within an article set without having to compose a complex syntax. This section explains how step-by-step. Main task is creating a project, uploading texts to the database and distribute article sets across coders. To begin with, these steps will be described and illustrated by images in detail. Subsequent are the details about the search options . For instance how to use boolean operators to search articles for certain words, read them or export the results either as graph or data file.
3.1 Create a project
Preparing the connection to the database
In order to access AmCat, a secure network connection to the database has to be established. Two ways can be used.
First option and most straightforward is using iNet (this open source software can be down-loaded using the hyperlink). Open the programme, navigate to file ? connect ? connect to Anoko DB ? Anoko via SSH Tunnel and use your credentials to connect. These credentials are identical to your AmCat user account and password. Open a random coding job and you should be able to connect to AmCat in your web browser by using the url localhost.
Second option is the use of Putty, a so-called ssh client. Make sure you have configured all network settings. If not, send an e-mail to get the details. Then open the session probably la-belled as AmCat. A black screen pops up. Enter you user name, press enter, enter your pass-word and press enter again. Open a web browser window and surf to the url localhost.
Through the address localhost, a login screen should appear (see image below). Enter your credentials and use the login-button to enter the AmCat Navigator environment.

_ Defining a new project _
The main window presents three options, as the first image shows. We need the View Projects button to access the projects section where we are able to define a project title and description.

Next is the My Projects overview. This table presents all projects and coding jobs assigned or belonging to a specific user. First of all the project id number is presented, secondly its name and description, the creator and the last column presents when the project or job was originally inserted in the database. Two links are situated above this table_: Create New Project_ and View All Projects. The latter gives a user with sufficient system privileges access to an overview of all projects in the database. Coders are not able to enter this view and are restricted to the My Projects table. Nonetheless, as we want to create a new project, we only use the first button: Create New Project.

Labelling a new project is not that hard. Enter a clear name for it within the first text box and add a description of the research project in the second field. For the purpose of this manual, we define a project named ‘Research X’ and add the description ‘Example AmCat manual ’. Then we hit the Create Project button. (Press Cancel to abandon creating a new project and to return to the previous My Projects-space.)

A new screen containing all project details appears instantly. AmCat has generated the project id number 304 automatically, as is shown in the image below. This page is our administrative ‘control room’ to manage every aspect of the project. Notice the familiar aspects that were already highlighted in the first section of the manual: article batches and coding jobs. These are the elements within the project’s details screen we have to use in order to perform the most elementary tasks of the project managing: Adding texts by uploading plain text files (presented as distinct article batches) Distribute these texts across coders by defining coding jobs Adding users to the project These tasks will be enlightened in the next, three subparagraphs.

_ Add texts to the project _
Once the project is created, we want to upload newspaper articles or other texts to it. Starting point is the project’s details page (see screen in subsequent paragraph) which presents different actions below the blue coloured table containing all project information. Navigate to the third link Add articles to Project and the Add articles to Project #-page should pop up (see image below). The Add articles-window has two options to add texts. By using Upload File, the first and most frequently utilised alternative, huge numbers of texts bundled in a single file can be uploaded. This choice is much more efficient than the Direct Input-option, because through this second way only one article can be added at a time. In fact this option can be used best if a sole article misses and has to be incorporated later on in the project. Since we want to upload a vast collection of texts, we opt for the Upload File-mode. Above the upload field, as presented in the image below, AmCat shows which kinds of files are permitted: either a plain text file (.txt) downloaded from newspaper archive LexisNexis, a compressed archive containing different text/xml files (.zip) or a xml file (*.xml). For our example we use a simple text file taken from LexisNexis, consisting of ten articles about president Obama in the New York Times. We have to make sure that we have changed the file extension *.TXT (in capitals) into *.txt, because otherwise the system refuses to accept the file. Within the first File-text field we have to enter the path to the file on our computer or simply navigate to it by using the button at the end of the field. Behind New Batch Name the software offers the possibility to name our article collection or, to put it in AmCat-terms, article batch. Our collection gets the descriptive tag ‘Articles Obama NY Times’ to identify the content of the article batch later on. A batch name is not a prerequisite though. If the text box is kept blank, the file name is used instead. Ultimately press the Upload File-button to add all texts to the project.

When AmCat has finished uploading – this could take a while as some files consist of a few megabytes – a green coloured message shows up to confirm the articles are successfully added. It contains information about the number of texts, the name and id number of our article batch and from which source file(s) they were taken. Our batch has received the id 4450 and contains ten articles (see image below). That’s correct. Underneath this dialog box, AmCat presents three options for additional activities. First one enables us to inspect all articles within the batch (View Batch #). Second link is the ability to return to the Project Details-page and manage other details of the project (View Project Details). The third option guides us to the Add articles-screen again (Add more articles to this project). We use the second link for the reason that we have to divide the articles between coders by generating coding jobs.

_ Formulate coding jobs _
Back again, on the project details page, we select the blue link Article selection, which can be found below the title ‘Coding Jobs’. Through the following screen we formulate a coding job.

In the Article Selection section we have to define every detail to select the right articles and run the right script. The image below visualises the sequence of distinct steps described in this subparagraph. Basic step – keep the pyramid-like structure of the environment in mind – is to make sure we select the right project id, which is id ‘304 – Research X’ in our case. Subsequently, AmCat asks by which we want to select our articles (behind select). We use batch, because we have just generated a clear article batch containing all texts we want to assign to coders. Select the right batch id from the menu below. Optionally, we can formulate additional rules to select articles from a batch by for example a newspaper or certain date. Our batch does not contain articles taken from several media or a certain number of days, so we use the typical values ‘all media’ and ‘all dates’. (Other useful option to select an article sample, is to copy and paste the article ids you want to incorporate. Select the last option behind select: article ids to use this option.) Third basic step is to select the action we want to perform. In order to define a coding job, we have to run a script. Select Run Script and next, in the dropdown menu, which script AmCat has to run: New Coding Job.

After the above-mentioned basic configuration, the new coding job’s details have to be put in order (see the image below).
First we have to name the coding job (name). Select one or more coders by name to assign the job to their iNet job overview. Hold down the crtl-key to select several coders at a time (coders). Choose between the different coding schemes the coders have to use in iNet. The unitSchema refers to the type of NET (kind of relational content analysis) schema you want to use to code the phrases of the texts. The_ articleSchema_ defines which additional data of an article have to be indexed by the coders. Then make sure the type is text. Finally enter the number of articles each job has to contain (setSize). The value and how much distinct jobs will be generated, depend on the total number of articles and coders selected. We enter ‘10’ in our example, the total amount of articles in our batch. The overlap box is an additional option utilised to calculate inter-coder consistency. When put to – for example five – texts, each coder indexes a set of five identical articles which can be compared to compute reliability scores. Click on the submit query button to generate the coding job. A green coloured message pops up accordingly to confirm the job has been successfully created.

3.2 Search through texts
Besides the features to organise your data , AmCAT has built-in search tools to analyse huge quantities of texts (automated content analysis). For example to obtain an image of how much media attention Obama and the economic crisis receive in all news articles published between October 2008 and February 2009 in USA Today and The New York Times. We can answer such questions by using the following process. First we have to create a so-called search index of all articles that we have stored in a single batch , multiple batches or all texts within a project. Second, we have to upload or paste a specified _search query _that contains all key words (search strings) we want to search for. Third and last AmCAT provides plenty of options to present or export the results of our automated content analysis (tables, graphs, export data to SPSS and so on). This paragraph will highlight each aspect.
Create search index
To be able to search through texts stored in our project, we have to define a search index (or to put it more technically: a Lucene index). This index collects and contains all data about what textual information is stored where in the database underlying AmCAT, whereas batches that we created earlier are just fragmented collections of articles saved somewhere in the database. A search index can consist of a single batch, multiple batches, coding jobs , an already saved selection of articles or a list of article ids (unique identification numbers). To define an index we navigate to the article selection page through our project details page (hyperlink is placed below the search indices section). An alternative path is the menu item ‘article selection’ below ‘query’ on the left-hand side of the AmCAT Navigator screen. Then use the following steps to create a searchable index of texts in our project (see also images below): Project : select the right project or make sure the proper AmCAT project is selected. This determines which project and its corresponding articles will be loaded and within which project the index will be created. Selection : this section enables us to formulate which articles will be included in the search index. The first condition describes which collection of stored texts will be indexed. This could be all articles saved in our project, a batch (select one batch or hold down the ctrl-key to pick multiple batches), an earlier saved selection of articles (saved using the article selection tool as well), all articles within a coding job or a specified list containing article IDs of the texts you want to incorporate (paste the numbers in the text box; one number on a row). Then we have to decide the material of which media we want to select. In our example about Obama and the crisis, we choose the option ‘filter’ and select The New York Times and USA Today while holding down the ctrl-key (leave the menu behind ‘media’ untouched to select all media in your project). Last selection-option we have to determine is the period that has to be included. We select the option ‘between’, as we only want to include the articles that have been published in the period between October 1 2008 and March 1 2009. Other options are ‘is’ (a single day), ‘from’ (select everything published after date x) or ‘before’ (select everything published prior to date x). Action : at this point we have to run a specialised script in order to create a search index according to the selection criteria we have defined above. Click on ‘run script’ and select the option ‘New Lucene Index’. Next, we enter our index’s name (in our case ‘example’; see image below). ‘Split paragraphs’ is equal to ‘No’ as we do not want to perform analyses within distinct paragraphs (e.g. ‘person x or word y appear within a paragraph’). Submit query : click on this button to create the index. This could take a while depending on the number of articles the AmCAT-system has to go through. Refresh your project details page after a few minutes to see whether the index is finished or is still placed in the ‘search indices queue (see second image below).


Search commands
Now we have specified a Lucene index, we can search through the texts using single key words or combinations of synonyms and special characters. In this section, typical search commands for automated content analyses will be covered. If needed, you will find more detailed information about Lucene search queries here .
In AmCAT we have to click on the name of the search index on the project details page to open the search dialog. In our case index number 502 called ‘example’, the one we have made earlier (see image below).

The screen that pops up provides two input options for search commands. We can use either the text box behind ‘query’ or upload a separate text file containing our search strings (see image below). It does not matter which of both we utilize. A text file is beneficial because all commands are edited and saved in a single place. The text box is faster, as search strings can be directly pasted into or written in the box. For example if we just want to check how often Obama is mentioned in the news or to test a single search query.

The search commands can be as simple or complex as we want them to be. We can search for articles containing a single word like the name of a person or a topic like ‘economy’. However, time and again it is wise to include several synonyms to raise the chance of retrieving most relevant texts out of the huge collection of articles stored in our project, or to search for combinations of terms (e.g. ‘Obama’ mentioned together with ‘economic crisis’). How to combine search strings or synonyms into one search statement is shown below: AND-statements : both words connected by an AND-statement have to be in the same article. For instance if we use the command Obama AND economy the system will only select articles that contain both the word ‘Obama’ and the phrase ‘economy’. Important: the AND-operator has to be written in capitals. OR-statements : an OR-operator defines that one of the words have to present in an article. Obama OR economy ‘tells’ the system it has to select texts that contain either the word Obama, economy or both words. Spaces between words are interpreted as OR-statements as well by AmCAT (in other words: Obama economy *is the same as *Obama OR economy). Important: if you use the OR-operator it has to be written down in capitals. Combination of AND- & OR-statements : relevant texts have to contain synonyms 1 or 2 but also words 3 or 4. The statement (Obama OR “White House” OR “American government”) AND (economy OR “economic crisis” OR “financial crisis” OR recession OR downturn) means relevant news stories include the labels ‘Obama’ or the synonyms for the U.S. government, but also one of the synonyms for the economic crisis. In this kind of statement, the words connected by an OR-operator have to be between parentheses. Furthermore: words that have to appear directly after each other, like White House, have to be between quotation marks. Otherwise AmCAT will search for the words white and house separately, because a white space between words is equal to ‘OR’. _ Word distances : sometimes it is important that certain words co-occur within a certain range. For example to be sure to only find articles about ‘economic growth’ and not stories that mention economy in the lead and the negative impact of growing deficits somewhere at the end of the text. In this case we want to restrict the distance between words, as ‘growth’ has to appear near ‘economic’ or ‘economy’: _“economic/economy growth/growing/rise/rising”~5. The latter combination states that the words ‘economic’ or ‘economy’ (the slash is an OR-operator in this kind of statement; a white space an AND-clause) have to appear within a range of five words from ‘growth’ or its synonyms. Wildcard-statements : search for economic, economy, economist or economical using a single key word. This is possible thanks to a so-called wildcard: econom**. This way of defining search strings raises the chance of finding texts that contain derivative concepts that are all related to the topic economy. Use the * symbol to allow all kinds of variations (see latter example econom) or the ? sign to restrict word variations to one character (te?t will search for ‘text’ and ‘test’). Exclude-statements : include AND NOT (“bush trees/hiking/tree/wood/woods”~10**) at the end of a statement to be sure that AmCAT ignores articles about hiking through the bush in Australia if you search for the former U.S. president George Bush (e.g. bush AND (economy OR recession) AND NOT (“bush trees/hiking/tree/wood/woods”~10) ).
Furthermore it is important to know that every search command has to begin on a new line, both in the AmCAT text box behind ‘query’ and in a separate text file (a search command looking for Obama and economy in general has to be separated by an enter from a subsequent query searching for Obama and the financial crisis). Another suggestion is to incorporate labels or, in other words, variable names in front of every search command. This way we can give every search string a clear, distinguishable label that will become the variable name when we export our data to SPSS or visualise the frequencies in a table (see next paragraph). If we want a label for our query searching for articles about the economic crisis, for instance, we have to put economic_crisis_freq# in front of our search string (e.g. *economic_crisis_freq# “economic/economy growth/growing/rise/rising”~5 *). The number sign always marks the end of the title.
In the subsequent section we will use all these search commands and present or export the results.
Present or export results
This section will cover how to present or to export search results. Because AmCAT provides so many output types, frequently used options will be explained in distinct sections below. We will use the search query Obama_economy_general_freq#obama AND economy to get a grasp of how often Obama is mentioned together with economy in general. Second command is more specific, as it searches for how frequent Obama appears in news stories about the financial crisis: Obama_downturn_freq#(Obama OR “White House” OR “American government”) AND (economy OR “economic crisis” OR “financial crisis” OR recession OR downturn). Show retrieved articles The basic output action ‘show summary’ offers a Google-like data presentation (see image below). It shows the titles of retrieved texts and relevant passages marked by bold printed key words. We can click on the headline to read the whole article. This data presentation is a handy tool for inspecting the data and with that testing the search strings.
‘Show summary’ incorporates two additional features. First one is exporting the data to Excel or other spreadsheet software (see the check box ‘save as csv’). This csv-file contains data like the keywords we have used within the first column, the unique article ids (hint about using this list of ids), the headlines, hits and the unique id number of a newspaper or other medium (see second image below for an example). Second option of ‘show summary’ is the box ‘show as list’ to present the texts’ titles only.


Show table feature to export results automated content analysis to SPSS
Most significant feature regarding automated content analyses is the ‘show table’ tool (second ‘action’; see image below). This option enables us to export search results to SPSS or Excel for enhanced data analyses. For example a file that contains the frequencies of Obama and economy or the financial crisis in the news per day, month, year or medium.
First we have to specify by which interval or entity like time span, medium or the amount of hits or articles AmCAT has to present the frequencies. Within the first menu we have to choose between search term per interval (a search term is a full search command such as Obama_economy_general_freq#obama AND economy), search term per medium, medium per interval, or search term per article. The second menu defines the time interval: the frequencies per day, week, month, quarter or year. The third column asks whether we want the values presented in the table or data file to express the amount of articles, number of hits or percentages.

When we press the submit button without checking the ‘save as csv’ box, AmCAT will generate a web based frequency table on the same page (like the screen above shows). As soon as we tick off the box ‘save as csv’ the data will be exported to a file. To edit this file within for instance SPSS, we have to perform the following tasks. First the csv-file has to be saved as an Excel-file (*.xls or *.xlsx) using spreadsheet software like Microsoft Excel or Open Office Calc. Secondly, we have to open the Excel-file in SPSS by navigating to file, open, data and choose ‘Excel’ in the ‘files of type’-menu (below ‘file name’; see images below). Then we have to make sure we thick off ‘read variable names from the first row of data’ in the subsequent dialog screen and hit ‘OK’ to load the data file.



Show graph Third, AmCAT is able to visualise our data through graphs (both line and stacked) as well. The menus to define them are almost identical to the previous ‘show table’ area of output options (see image below). Within the first menu we select whether we want to show the frequencies by a certain time interval (to be specified in the second menu), per medium or medium per time interval (like day, week and so on). The second dropdown menu defines the proper interval: per day, week, month, quarter, year or medium. The last menu asks us to pick the kind of measure we want to use (number of hits, articles or percentages).
AmCAT presents a line graph by default. Optionally, we can check the ‘stacked graph’ box to choose the alternative data presentation. The latter visualisation is shown in the second image below.
To copy a graph, we have to right-click on the graph to save or copy the image. For example to use it within a text file or a presentation.


Visualise co-occurring key words: cluster maps A handy tool for interpreting data is the ‘cluster map’ functionality. This mapping tool shows which individual key words co-occur most frequently within the same texts, by connected coloured ‘balloons’ of varying sizes to visualise the frequencies and overlap. The image below contains an example. It tracks how often individual concepts like ‘economy’, ‘crisis’, ‘Obama’ and ‘McCain’ appear in the same news stories. Most obvious is the fact both Obama and McCain are mentioned most frequently in articles covering the economy in general (the red and blue ‘cloud’ overlap with the large, green balloon of economy on the upper right). Articles highlighting the financial crisis show the same pattern: either are mentioned together with the emerging downturn in the same stories most of the time. This way we can derive a global picture from the texts.
If we want to examine the articles underlying the cluster map, we have to click on one of the yellow dots. A small article viewer will pop up consequently. The retrieved key words are marked with red for some assistance.
Additionally, the exact frequencies of the co-occurrences can be exported to Excel or other spreadsheet software. Thick off the box ‘save as csv’ to do this.
