Not an official ACM page
[Problem F | 1995 Regional problem set | My ACM problem archive | my home page]

Problem E -- HTML Syntax Checking -- Filename HTML

This problem requires you to write a program to syntactically validate some simple text written using HTML, the HyperText Markup Language in which all documents available on the World Wide Web are written. We'll not consider the semantics, or meaning, of these documents, and will only consider simplified syntactical rules. In these documents you'll find ordinary text (with arbitrary line lengths) interspersed with markup tags. The markup tags we consider will always occur in pairs. An example will illustrate this point:
  	This is some ordinary text. <ATAG>  Here's some additional
 	text. <BOLD>  This will be boldface. </BOLD>  Still more 
There are two pair of markup tags in the example. The meaning of ATAG and BOLD is unimportant to us in this problem, but typically a markup tag requests particular treatment of the text to which it applies. The markup tags are easily identified because they always appear in angle brackets (that is, a less than symbol and a greater than symbol). The tags we'll consider will always be written as a sequence of no more than 10 upper-case alphabetic characters. The end of the document region affected by the tag is indicated by a tag with the same name preceded by a forward slash, '/' As illustrated, the tagged regions may encompass more than one line of text. Also as shown in the example, the HTML tags must nest properly, just like BEGIN...END pairs in Pascal, or '{' and '}' in C/C++.


The input will consist of a number of test cases. Each will begin with a separate line containing an integer specifying the number of lines of text in the test case, NL. NL will never exceed 32767. The end of input is marked by a value of zero (0) for NL.

Following the line specifying NL in each test case there will be NL lines of text which are to be checked for syntactic conformance. Remember that there is no maximum line length limit.


For each test case, output the test case number (they are numbered sequentially starting with 1). If the text is in conformance with the rules specified above, then output the word "OK". If the text is erroneous, then output one of the following messages for the first error only:
 	line #: bad character in tag name
	line #: too many/few characters in tag name 
	line #: expected </xxxxxxxxxx> 
	line #: no matching begin tag.  
The '#' in these messages is to be replaced by the line number of the test case when the offending tag was detected. Examples of when these messages are displayed are found in the example input and expected output immediately following.

If an error is detected, after producing the appropriate error message your program must skip any remaining lines in the erroneous test case to reach the beginning of the next test case.

Example Input

This is some ordinary text. 
<BEGIN> This is included in the BEGIN tag </BEGIN>  
        <START>    Here's some stuff 
and so is this 
     more stuff. </START> 
   This has a null tag <> 
   And an extra line after the error 
   This has some good stuff <OKAY> and some bad stuff later on. 
   <GOOD> All is still okay, but later on we'll have an error. 
   </GOOD> We're still in the pink! <THISISTOOLONG> 
   This line will be skipped.  
   As will this one. 
   This is an interesting error: <ERROR 
   This one is okay
                                                <IN> </IN>
   Mismatch <START> </STOP> 
    Missing start symbol:   <OK></OK></NOTOK>   more garbage... 

Sample solution

Expected Output

Test Case 1 

Test Case 2 
line 1: too many/few characters in tag name. 
Test Case 3 
line 3: too many/few characters in tag name. 
Test Case 4 
line 1: bad character in tag name. 
Test Case 5 
Test Case 6 
line 1: expected </START> 
Test Case 7 
line 1: no matching begin tag.

This page maintained by Ed Karrels.
Last updated September 20, 1999