<\/td> | \ <\/td> | (PHONE NUMBER[^<]*)<\/td>/g;
This one line contains all four pieces of data needed to complete the
record, so if the page fails to parse, the first place that needs to look
at is whether this still works.
c) Do anymore format changing and then print it out in the same format as
is accepted by the C++ code.
3) If the page does not even pull down correctly anymore, time to goto the
previous mentioned websites to determine if what happened. Possible problems
are (a) Mapquest no longer supports that type of query or (b) Mapquest has
changed the layout of the request or (c) Mapquest has gotten fancy and now
requires more authenication or javascript or HTTPObjects.
(a) This type of problem is a major problem because the demo system needs it
so some redesigning of what the system is capable of is needed. Goodluck
(b) This shouldn't be too difficult to determine. Just type garbage into
the new HTML box such as the following link:
http://www.mapquest.com/maps/map.adp?formtype=searchadv&searchtype=search&country=US&addtohistory=&cat=AAABBBCCC&address=DDDEEEFFF&city=GGGHHHIII&state=ZZ&zipcode=55555
and try and determine what data goes where in the new query. If the
requesting layout changed, then most likely the display layout changed
so then this reduces to a (2) problem.
(c) This is a major problem that either requires additional perl modules or
major rewriting or perhaps even a language change. If this happens,
perhaps there is a legecy Mapquest website that will stay on the web or
there is another site that gives directions in a layout similar to
Mapquest that can be used. I know of none but I also am not looking
since Mapquest works wonderfully for my purposes.
Hints:
- Do not remove all html initially because
a) It is quite difficult to remove all HTML using just a regular expression.
b) If the HTML is removed, context of the data is lost. Sure you can look for
"miles away" and the specific layout of a phone number, but you can do
that while keeping all the HTML present.
- It may help to split the document into lines by newlines or s.
a) On the other hand, you may split the pieces of HTML you may be wanting
to reg'ex find, so be careful.
- Download as many HTMLs as you can to find a pattern. It is not fun to find
a pattern and spend time coding it only to find out that it won't work for
all of them.
- Do realize that you may not be able to get all the entries. Some entries may
not have "miles away" written by them, so they may not be easy to capture
and not really worth the extra time.
- Perl is a powerful reg'ex language and the current coder does not have tons
of experience with perl. Therefore there are probably much easier\better ways
of writing what is there. Don't follow the bad practices if there is a better
way of writing what is already there.
- Some perl expressions result in the rest of the lines becoming comments so
sometimes things like #" or # '"])> must be written after a reg'ex so that
emacs will behave and color things correctly.
|