Regex in Java
Introduction
Suppose you created a list of Students with their corresponding Enrollment numbers. And, enrollment number entered by one of the students is CS101 and that entered by another one is 101-CS. Due to different formats of enrollment numbers, inconsistencies arise in the program. Thus, it is important to identify whether the Enrollment number follows a consistent pattern or not. That is where the Regex comes into the picture.
The Regex stands for Regular Expression in Java which is an API to define the patterns of the Strings for manipulation and searching them. In simple words, Regex in Java can be used to define the constraints on the strings so that they follow a regular pattern. For example, it can be used for email and password validation.
Regex is defined in the java.util.regex package with 3 classes and 1 interface which are listed below:
- MatchResult Interface
- Matcher Class
- Pattern Class
- PatternSyntaxException Class
Before going to their implementation, let’s discuss each of these in detail.
Match Result Interface
This is the interface that contains the query methods to find the result of the match operations on an input string against a regular pattern. The various methods of this interface are listed below:
- start(): It returns the start index of the match found in the string.
- start(int group): It returns the start index of the match found by the given group during the previous match operations.
- end(): It returns the offset after the matching of the last character of the input is matched.
- end(int group): It returns the offset after the input’s last character is matched by the given group during previous match operations.
- group(): It returns the input subsequence captured
- group(int group): It returns the input subsequence captured by the given group during the previous match operations.
- groupCount(): Returns the number of subsequences matched in the pattern.
Matcher Class
This class Regex in Java of implements the MatchResult Interface and is used to perform the match operations on a sequence of characters by interpreting a pattern. It has a lot of methods among which, some methods are listed below.
- pattern()- It returns the pattern which is interpreted by this matcher.
- matches()- It returns true if the pattern in the string matches the regular expression.
- find()- It finds the multiple occurrences of pattern in the string.
- groupCount()- It returns the total number of matched subsequences.
- start()- It returns the starting index of the matched subsequence.
- end()- It returns the ending index of the matched subsequence.
Pattern Class
This class is the compiled representation of the regular expressions which can be used to define various kinds of strings. The regular expression which is specified as a string is first compiled to an instance of this class. The resulting pattern can be used to create a Matcher object which can be used to match any arbitrary string against the regular expression.
The methods of this class are as follows:
- compile(String regex)- It compiles the given regular expression into a pattern.
- compile(String regex, int flag)- It compiles the regular expression into the pattern under some conditions.
- matcher(CharSequence input)-It creates a matcher that will match the given input against the pattern
- matches(String regex, CharSequence input)- It compiles the given regular expression and matches the input against the pattern.
- quote()- It returns the strings of literal pattern for the specified string.
- split()-It splits the given input sequences of characters around matches of the pattern.
- toString()- It returns the string representation of the pattern.
PatternSyntaxException Class
This class of Regex in Java extends the IllegalArgumentException Class. It is used to throw the unchecked exception to indicate a syntax error in the regular exception pattern. The methods of this class are
- getDescription(): It fetches the description of the error.
- getIndex(): It retrieves the index of the error.
- getMessage(): It returns the description, index, erroneous regular expression pattern, and visual indication of the error index within the pattern.
- getPattern(): It returns the regular expression pattern which generates the error.
Now, let us see how these classes for Regex in Java are used for the validation of certain sequences of characters against a regular expression.
Regex in Java for Email Validation
A valid email should follow the format ‘X@Y.domain’ where X refers to 1 to 20 lowercase or uppercase characters, an underscore (_) or a hyphen (-), the symbol ‘@’, domain name and 2 to 3 characters after a dot (.) symbol. Thus, the regular expression will include:
- \w- It denotes a sequence of numbers, uppercase or lowercase letters. It is a short form for “[a-zA-Z_0-9].”
- @ symbol
- Dot symbol (.)
- $- it denotes the ending of the string.
Below is the implementation of the pattern matching in java for email validation.
import java.util.regex.Pattern; |
Output:
info.abhi40@gmail.com is a valid email |
You might be confused why we have used an extra backslash everywhere in the regular expression. For example, in place of ‘\w,’ we have used ‘\\w.’ Here, extra backslash ‘\’ denotes that the \w is an escape sequence. Alternatively, regex in java can be also used in the following way:
import java.util.regex.Pattern; |
Output:
info.abhi40@gmail.com is a valid email |