Fip DataFormatting and AP Agate


version 003-19j0


Overview



This is a brief overview on how to convert AP agate files into CCI NITF using Fip Data Formats.




How the Data is formatted from AP



AP send files in 2 formats :

1. Typset with embedded fixed spaces which are control characters

2. Zwire with no markup and data in fields with a semicolon as a separator.


For either format, the first character of each line indicates what type of line it is.


Unfortunately two of these three chrs are control/non-printable chrs.


The Typset format - which is a lot uglier - has no definable separator between fields and relies on control/unprintable characters as fixed spaces. These - 020, 031, 035, 036, 037 - can be mapped directly to CCI fixed spaces but that defeats the object of have a style independent feed - and there is no table to talk about.


Basically you want to use the Zwire feed wherever possible as it is so much easier to use.


This record type character is the same in both cases and is used by Fip Data Formats as the Record Key (reckey).


^ (carat or hat) means a Heading line

which could also be a Byline, Subhead, AP keyword or Editorial msg

BS (backspace) means a table line

this is the actual data !

Any line stating BS carat is a column heading

TAB (ordinary tab) means a foot note or run-on text.

plus a LAST line which is the AP date/time stamp

this starts TAB space ETX (decimal 3) AP-....


Example of the Zwire file :


^AM-HKN--NHL Expanded Glance,0567< 

^National Hockey League= 

^Expanded Standings= 

^At A Glance= 

^By The Associated Press= 

^All Times EST= 

^EASTERN CONFERENCE= 

^Atlantic Division= 

(BS)^;W;L;T;Pts;GF;GA;Home;Away;Div 

(BS)New Jersey;17;7;3;37;81;67;7-;3-;3;10-;4-;0;7-;2-;1 

(BS)San Jose;6;14;7;19;57;68;5-;7-;3;1-;7-;4;2-;6-;2 

^Monday's Games= 

(TAB)Phoenix 2, Montreal 2, tie 

(TAB)N.Y. Rangers 5, Calgary 2 

(TAB)St. Louis 0, Colorado 0, tie< 

^Tuesday's Games= 

(TAB)Edmonton at Carolina, 7 p.m. 

(TAB)Tampa Bay at Pittsburgh, 7:30 p.m. 

(TAB)St. Louis at Dallas, 8 p.m. 

(TAB)(SPC)(ETX)AP-ES-12-15-98 0941EST< 


NITF


NITF header and trailer are added automatically by the rest of the FIP food chain - in particular 'ipedsys'.


So all we are concerned with is the actual table, headings and extra text.


Rules about NITF :


- All tags must have an end tag associated with it.

So if you start with a <H3> for your heading, you must end the heading with a </H3>.


Structure is :

<H2>Main Headings</H2>

<H4>Sub Head</H4>

<TABLE>

<THEAD> all column headers here (optional)

<TR> <TD>ColHdr1</TD>

<TD>ColHdr2</TD>

<TD>ColHdr3</TD>

</TR>

</THEAD>

<TBODY> actual table data

<TR> <TD>Data1</TD>

<TD>Data2</TD>

<TD>Data3</TD>

</TR>

<TR> <TD>Data1</TD>

<TD>Data2</TD>

<TD>Data3</TD>

</TR>

</TBODY>

<TFOOT> all foot lines (optional)

<TR> <TD>Footer1</TD>

<TD>Footer2</TD>

<TD>Footer3</TD>

</TR>

</TFOOT>

</TABLE>



-Each row in the table must be wrapped by the TR (Table Row) tag : <TR>


-Each field in a row in the table must be wrapped by the TD (Table Data) tag : <TD> data </TD>


-For fields spanning more than one column, use COLSPAN :

"<TD COLSPAN=3>"


-Fractions are specified by the <FRAC> tag

<FRAC> 1 / 2 </FRAC>

** The FRAC tag has 'evolved' over time so check with the CCI types that this is the format used at the site.

The standard AP characterSet maps the following characters to fractions :

[ 1/8

\\ 1/4

] 3/8

{ 1/2

| 5/8

} 3/4

~ 7/8

So these can be mapped EITHER in 'ipxchg' or in the dataformatting file using 'match'.


-Note that special characters like Heart, arrows currency symbols like the Euro,that are NOT part of the normal keyboard can be denoted in the '&name;' format. Get the CCI system admin type to give you a list of such characters.

eg &club;

&dagger;


-Any data inside a table not wrapped with <TD> data </TD> will float off either before or after - but not inside - the table.

eg:

<table>

<tr><td>Orlando</td></tr>

<tr><td>Miami</td></tr>

<br>Bahamas<br>

<tr><td>Tampa</td></tr>

</table>

will see 'Bahamas' on a line on its own after the table.


-As NITF is XML compliant then all start-of-tags MUST have a corresponding end-of-tag. So if you specify a <TBODY>, there must also be a </TBODY>.

(However in practice, certain end tags can be safely ignored - </TD>, </TR>, </P>)

-BUT always remember to end the table with a </TABLE>.

If using IExplioter, it will compensate for a missing EndTable

but CCI TableEdit or NetScape will NOT (NetScape will strip any data too)

It is good practise to use ‘after’ to make sure you have ended the table.

Ie if you had a line like :

r#\b ifprv r=\b \n</TABLE>\n

then add

after ifprv r=\b \n</TABLE>\n

this will take care of the case where there are no records after the last tabular record.


Linking to CCI Table Edit


- The Link to CCI TableEdit uses two attributes to the TABLE tag - 'class' and 'style'.

TableEdit picks up these and automatically maps them to a template.


What should go where ?


In 'class', we need the (PUB)(DEPT)(SECTION)(TOPIC)

eg SUNDAY_SPT_NFL_GLANCE


and 'style' is the individual table inside the class and is only used where there is two or more DIFFERENT looking tables in a file - for example Box Scores

(but not where there are multiple instances of the same style - like the Baseball standings which are repeated 3 or more times for each division).


Class and style are always double quoted. eg :

<table class="MAG_FEA_BRIDGE" style="hand">


CCI Table Edit normally uses the '-' as an internal separator so use the underscore '_' in those cases.


Putting in Native CCI markup


To the purist this is abhorrent.

To the realist there are a couple of fudges which you may need.


-Leader Dots - where two bits of data are placed in a single tab fields with leader dots, the CCI command '<WL>' can be used :

eg Arthur...........76

Glen.............12

Chris............-3

will read

<tr><td>Arthur<wl>76</TD></TR>

<tr><td>Glen<wl>12</TD></TR>

<tr><td>Chris<wl>-3</TD></TR>

<WS> will do the same without the dots.


-Bold a single field - where a single field needs to be bold and all other fields in the same column are not. Some papers like to embolden the home team in a table. CCI Table edit obviously does not search the data for Keywords. So put <B> and </B> before and after the data using Fip.


-Fixed Spaces - For very large tables like Stox listings which go on (and on) for several columns, CCI TableEdit can take some time to justify. So a fudge is to make the table one or two columns wide rather than 5 or 6, and use <fig> to space each figure/number space. To do this you will have to count the number of  characters that are in each field so you can pad by the right amount ! 


-Quads



Hints



HTML/NITF ignore NewLines and spaces, boiling many down to a single space.

You can use this to make it a lot more readable by inserting a NL after each bit of data.


For Headers .....

Normally want to strip Byline (By The Associated Press) and APkeyword etc

So add these BEFORE any output of headlines

r=^ ifeq "AM-" caps zapspc f1 continue

r=^ ifeq "PM-" caps zapspc f1 continue

r=^ ifeq "BC-" caps zapspc f1 continue

r=^ ifeq "BYTHEASSP"  caps zapspc f1 continue


AP notes or message to Editors are normally Headlines starting 'Eds:'. These should be passed to the CCI as <ED-MSG> tags:

r=^ ifeq "EDS:" caps zapspc f1 <ED-MSG> zapspcextra zapequal zaplt zapgt f1 </ED-MSG>\n continue



For TableLines .... (type BS or \b)

Start a table as the first TableLine (type BS) thru

r=\b ifnprv r=\b "\n<TABLE class="SPT-NHL-GLANCE">


For Text .... (type TAB or \t)

Always stop on a TAB line containing ETX

r=\t ifcon \003 f1 stop!


Decision - do you want to output these as with a hard end of line or as run-on with several wrapping over one and subsequent lines ?

If One input line = One output line -> put a `<BR>' at the end of each.

r=\t zapspcextra f1 "<BR>\n"

If you want to run-on, check whether the previous line was alos a TAB to add a separator :

r=\t ifprv r=\t "; "

r=\t zapspcextra f1 

(You may also want to track the last record thru to add a dot or something:

r#\t ifprv r=\t ".<br>\n"


Watch out for - The Quad Center or Quad Left chr ('=' or '<') at the end of data. This can be stripped off with two 'match's and a builtin to remove extra spaces :

in the middle section :

match: eatlt +<++

match: eateq +=++

in the output section

r=\t zapspcextra eatlt eateq f1 

(Quad right is a '>' but is almost never seen)


Any line ....

Catch an end of table with the first NON-tableLine

r#\b ifprv r=\b \n</TABLE>\n


To make it easier to read, put all fixed strings in the output section in double quotes. It also allows you to embed spaces in a string (Spaces are normally stripped).

BUT how do you specify a double quote to be output ?

(embarrasingly) you need to put it in octal '\042' or hex '\x22' :

"<table class=\x22MAG_FEA_BRIDGE\x22 style=\x22hand\x22>"


Extra Fip Bits



How do you get the newly formatted file over to CCI?


In the data format parameter file use the 'hdr' and 'name' keywords to make sure the new file has the same information as the raw file EXCEPT you want to change the Source (FipHdr SU) to FORM or 'FMT'. Therefore once the file is on CCI, it will have a different name but enough information to be able to track back to the raw file if there are problems.


For AP traffic, the standard 'hdr' and 'name' are :

hdr     SH:\SH\nSN:\SN\nSU:FORM.AP\nHY:\HY\nHM:\HM\nHD:\HD\nHH:\HH\nHN:\HN\nHB:\

HB\nHG:\HG\n

name    #SN:\SN#DU:formcci+w4formatted


or if you do not need a copy of the Formatted data back in the w4 browser, remove the '+w4formatted'.