Parsing incoming data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing incoming data

^
:99:1: '<' not allowed in attrib value
ctrlspace
I'm using Http to get an html stream.  I then need to parse the incoming data for a particular tag in the html.  I am attempting to use the PCDataXmlParser to do this with the incoming string but I get an odd error back about '<' not being valid in xml.  Do you have a better suggestion of how to parse the incoming data?  Or does dispatch have any parsing functions built in?  

Thanks,
Chris.
p.s. I'm very new to scala.
 

[code]
def upcDatabase() = {
      val http = new Http
      var stream: String = "";
      http("http://www.upcdatabase.com/item/0606949324124" >- (arg => stream = arg))
      PCDataXmlParser(stream);
  }
[/code]

[exception]
INF: [console logger] dispatch: GET http://www.upcdatabase.com/item/0606949324124
log4j:WARN No appenders could be found for logger (org.apache.http.impl.conn.SingleClientConnManager).
log4j:WARN Please initialize the log4j system properly.
:96:5: '<' not allowed in attrib value     ^
:97:1: '<' not allowed in attrib value</p>^
:98:1: '<' not allowed in attrib value
^
:99:27: whitespace expected
                          ^
:99:27: '>' expected instead of '%'
                          ^
Exception in thread "main" java.lang.ExceptionInInitializerError
        at ca.ctrlspace.loveItHateItWeb.xml.UpcDatabaseFeed.main(UpcDatabaseFeed.scala)
Caused by: java.lang.RuntimeException: FATAL
        at scala.Predef$.error(Predef.scala:76)
        at scala.xml.parsing.MarkupParser$class.xToken(MarkupParser.scala:267)
...
[/exception]
Reply | Threaded
Open this post in threaded view
|

Re: Parsing incoming data

n8han
Administrator
Dispatch does have a simple interface to the XML parser build into Scala:

http("http://www.upcdatabase.com/item/0606949324124" <> { elem =>
  // use elem here...
})

This is the type of elem: http://www.scala-lang.org/docu/files/api/scala/xml/Elem.html

Although, it may be that that parser is too strict. I haven't used the one from lift that you're talking about but if you do use that or anything taking a stream you should do it inside the block. e.g.

http("http://www.upcdatabase.com/item/0606949324124" >> { stream =>
  val parser = PCDataXmlParser(stream);
  // use parser here...
})

The parser might read the whole stream immediately, or it might not. Generally I would do as much as possible inside the block before the stream is closed, and then the last statement produces whatever value (tuple, if necessary) you need to use outside the block.

Nathan
Reply | Threaded
Open this post in threaded view
|

Re: Parsing incoming data

ctrlspace
Thanks for the input Nathan.  Coming from Java to Scala I realize I have a few things to overcome, like making use of blocks which is an awesome concept.

Unfortunately the html I'm pulling isn't well formatted so the XML parser won't work.

c.