[kwlug-disc] Help!

William Park opengeometry at yahoo.ca
Wed Dec 31 09:10:42 EST 2014


On Wed, Dec 31, 2014 at 08:42:15AM -0500, Joe Wennechuk wrote:
> Hello All,
> Slightly off topic, but I know you guys can help. I have applied for a
> job, and they have asked me to write a java class that searches html
> from websites for links. I am using this regex ...(Pattern pattern =
> Pattern.compile("<a[^>]*>(.*?)</a>", Pattern.DOTALL |
> Pattern.CASE_INSENSITIVE);) to find them but based on the constraints
> I don't think I'm doing it right, as I am not finding all of the
> links. Here are the constraints.. Can anyone help??  Implementation
> constrains:   * For simplification assume that the link is defined as
> '<[whitespace]a[whitespace]' or '<[whitespace]A[whitespace]'.
> ('<a ', '< a h', '<A >', '<a	attr=' are all valid links)

Are they testing your Java knowledge?
    - You are supposed to account for whitespaces.  That may be the
      problem.

Or, do they just want the list of links?
    - Here, there are better ways to get the list of links, eg.
	lynx -dump -listonly http://...
-- 
William





More information about the kwlug-disc mailing list