Você está na página 1de 16




String manipulation in PHP

Defining Strings
In PHP, we can define string in multiple ways. Let’s discuss them one by one:
1.Single Quote Strings
We can declare strings using single quotes. When we use single quotes to define a string and print them, PHP
interpreter does not parse the string and print it as it is. Consider:
$many = 9;
$sentence = 'Our solar system contains $many planets';
print $sentence;
Our solar system contains $many planets
2.Double Quote Strings
PHP also allows declaring strings in double quotes. At the time of printing of a double quote string, it is
parsed and if there are any variables, they are replaced. Consider:
$many = 9;
$sentence = "Our solar system contains $many planets";
print $sentence;
Our solar system contains 9 planets
Heredoc method of defining string in PHP is used when we have very complex strings.
The heredoc supports all the features of double quotes. It also allows strings to be defined in more than one
line without string concatenation.
We use heredoc syntax when there is a complex string spread over multiple lines. In this syntax, we may use
double quotes inside the string sequence without escaping them.
$name = "Bootsity";

$here_doc = <<<EOT
This is $name website
for PHP, Laravel and Angular Tutorials


echo $here_doc;
This is Bootsity website
for PHP, Laravel and Angular Tutorials

The Nowdoc string definition method in PHP is like the heredoc method but it works like the way single
quotes work.
For nowdoc, no parsing takes place at time of printing inside the Nowdoc blocks.
Nowdoc string definition comes handy when we are working with raw data that do not need to be parsed
(variables not expanded). Consider:
$name = "Bootsity";

$here_doc = <<<'EOT'
This is $name website
for PHP, Laravel and Angular Tutorials


echo $here_doc;
This is $name website
for PHP, Laravel and Angular Tutorials

Only difference between heredoc's and nowdoc's syntax is that nowdoc's starting identifier is surrounded by
single quotes.

Concatenating strings
In PHP, string concatenation is done using concatenation operator, which is denoted by a dot (.)
$str = "Bootsity"." Tutorials"
echo $str; // will print - Bootsity Tutorials

Some important String functions

Most frequently used in-built PHP functions related to strings are given below:
1 strtolower
strtolower makes a string lowercase
print strtolower("PHP Tutorials"); // php tutorials
Similarly, to make a string uppercase, we can use strtoupper
2 strlen
strlen computes length of string
print strlen("abcd"); // 4
3 explode
explode returns array after separating strings delimited by given string
$pizza = "piece1 piece2 piece3 piece4 piece5 piece6";
$pieces = explode(" ", $pizza);
echo $pieces[0]; // piece1
echo $pieces[1]; // piece2
4 substr
substr returns part of the string
$rest = substr("abcdef", 0, -1); // returns "abcde"
$rest = substr("abcdef", 2, -1); // returns "cde"
$rest = substr("abcdef", 4, -4); // returns false
5 ucfirst
ucfirst turns a string’s first character uppercase
$foo = 'hello world!';
$foo = ucfirst($foo); // Hello world!
Reverse the given string.
$str="my name is Sonu";
echo $str; // unoS si eman ym
The implode() function returns a string from the elements of an array.
$arr = array('Hello','World!','Beautiful','Day!');
echo implode(" ",$arr); //Hello World! Beautiful Day! //
8. str_word_count()
The PHP str_word_count() function counts the number of words in a string:
echo str_word_count("Hello world!"); // outputs 2
9. strpos()
The PHP strpos() function searches for a specific text within a string.
If a match is found, the function returns the character position of the first match. If no match is found, it will
return FALSE.
echo strpos("Hello world!", "world"); // outputs 6
10. str_replace()
The PHP str_replace() function replaces some characters with some other characters in a string.
echo str_replace("world", "Dolly", "Hello world!"); // outputs Hello Dolly!

Comments are strings that we write in our programs to give information about the current code. Comments
are meta-data about code and are not executed at run-time. PHP supports two type of comments:
1. Single line comments and
2. Multi line comments.

/* Multiline comment
it spreads across multiple lines

// Single line comment

# Another syntax for single line comment


Regular Expressions
One of the most important tools in any programming language is regular expressions,
because it searches for patterns in strings.
Where regular expressions can be used? Almost everywhere. For form validation, browser
detection, spam filtering, to check the strength of passwords, and so much more.

1. How to find matching strings?( preg_match )

preg_match is built-in PHP function, that searches for a match to regular expressions
within strings. If it finds a match, it returns true, otherwise it returns false.

The syntax is straight forward:

preg_match($regex, $string, $match);
$regex is the regular expression, and it needs to be encompassed with slashes (/). For
$regex = "/exp/";
$string stands for the string inside which we look for the matching pattern.
$match is the array that stores the first match that the function finds
(preg_match stops searching as soon as it finds the first match).

For example:
// The regex here is the word 'go'.
$regex = "/go/";
// We search for a match inside this string.
$string = "you gotta give the go kart a try";
// preg_match returns true or false.
if(preg_match($regex, $string, $match))
echo "We found a match to the expression: " . $match[0];
echo "We found no match.";

We found a match to to the expression: go
preg_match searches for a string that matches the regular expression ($regex) in the
string ($string), and the first match it finds is stored as the first item of
the $match array.
If we change the string to something that doesn't match the pattern, preg_match will
return false.
# The i modifier
Regular expressions are by default case sensitive. This means that the regular
expressions /chr*/ is different from /Chr*/, only because the the first expression
starts with a lowercase 'c' while the second starts with a capital 'C'. But we can change
this behavior by adding the i modifier, right after the closing delimiter of the regular
So, the regular expression:
$regex = "/Chr*/i";
will match both 'Chrome' as well as 'chrome'.

2. # Global regular expression matching(preg_match_all)

We learned about preg_match that stops searching as soon as it finds the first match.
But in order to find all the matches to the regular expression (i.e., global matching), we
need to use another built-in function, preg_match_all.
In the following example, we perform a global regular expression matching for the
expression $regex in the $string, and each match is stored in the $matches array.
$regex = "/reg/";
$string = "Both regex and regexp are short for regular expression";

if(preg_match_all($regex, $string, $matches))


[0] => reg
[1] => reg
[2] => reg
# Meta what? Meta character!
Metacharacters are characters that have special meaning in regular expressions.
Let's learn our first 3 metacharacters:

cter matches

. The dot metacharacter matches any character except for a new line

^ The caret metacharacter indicates the start of the string

$ The dollar metacharacter indicates end of the string

// The metacharacter $ indicates that we look for a match at the end of

the string.
$regex = "/3.2$/";
$string = "13.2";
if (preg_match($regex, $string, $match))
echo "We found a match to: $match[0]";
echo "No match!";
We found a match to: 3.2

# The problem of string literals

The metacharacters have a special meaning (^ stands for the start of the string, $ for
the end of the string, and so on). The way to change this behavior is by escaping the
metacharacters with a backslash (\).
For example, the following regex, matches exactly the string "3.2":
$regex = "/3\.2/";
And this example, matches exactly the string "^$":
$regex = "/\^\\$/";
$string = "^$^";
if (preg_match($regex, $string, $match))
echo "We found a match to: $match[0];
We found a match to: ^$

The expression matches

\. Simply a dot

\$ Simply the $ character

\^ Simply the caret character

\\ Simply the backslash character

In some cases, you don't need to escape the metacharacters, because the regular
expressions are smart enough to know where to use the metacharacters in their literal
meaning. On the other hand, it can't hurt to add the backslash whenever you want to
use metacharacters in their literal meaning. So, use the backslash if you're not sure, it
probably won't hurt you.

# Character sets
We can use character sets to specify a range of characters to which we want to find a match between square

The character
set matches

[ab] a or b

[abc] a, b or c

[A-Z] the Uppercase letters A-Z

[a-z] lowercase letters (a-z)

[A-Za-z] lowercase or Uppercase letters

[a-d] the range of letters from a to d

[a-dA-D] both Uppercase and lower case letters a-d

[a-dm-p] the range of letters from a to d and from m to p

[0-9] all the digits

[1-4] the range of digits from 1 to 4

[a-zA-Z0-9] all the letters and all the digits

The caret (^) symbol represents the start of the string, but when it is used inside the square brackets it
indicates the negation of the character set.

For example, if [a-z] means the range of letters from a to z, than [^a-z] with the caret symbol inside the
brackets, means every character that is not in the set of lowercase letters.
Let's see some more examples:

The character
set matches

[^A-Z] every character that is not an Uppercase letter

[^a-z] everything that is not a lowercase letter

[^A-Za-z] every character other than English letters

[^0-9] everything that is not a digit

[^A-Za-z0-9] everything that is not a digit or a letter

[^a-d] everything that is not in the range of a-d

# Shorthand character sets

We have shortcuts for some of the most commonly used character sets.

shorthand matches

\s White space characters like space, tab and new line

\d Matches any digit (0-9)

Matches word characters, including the English letters (a-zA-Z), digits (0-9), and
\w underscore (_)

For example, the regular expression:

$regex = "/\s\w\s\d\d\d\d$/";
Searches for a pattern that matches white space, a word character, another white space, and than 4 digits.
Let's test the expression with the following code:

$regex = "/\s\w\s\d\d\d\d/";

$string = "bat a 1000";

if (preg_match($regex, $string, $match))

echo "We found a match: $match[0]";
echo "No match!";
We found a match: a 1000

We found the the pattern of a space, followed by a word character, followed by 4 digits at the end of the

# Qunantifiers in regular expressions

We use quantifiers in order to specify the number of times that a group of characters or a character can be
repeated in a regular expression.

For example, in order to find a match to the string "Mississippi", we can use the following expression:
1$regex = "/Mis{2}is{2}ip{2}i/";

The {2} in the expression means exactly 2 times.

The following table gives examples to the use of quantifiers:

quantifier Searches for

n{x} the letter 'n' exactly x times

n{2} the letter 'n' exactly 2 times

n{x,y} 'n' between x and y times

n{2,3} 'n' between 2 and 3 times

n{x,} 'n' at least x times

n{2,} 'n' at least 2 times

n{,y} 'n' not more than y times

n{,3} 'n' not more than 3 times

For example, we can find a match to both "color" and "colour" by using the following regular expression:
$regex = "colou{0,1}r";

Here, the use of the quantifier makes the 'u' optional.

Other regex quantifiers are less specific:

quantifier Searches for

* zero times or more

+ at least 1 time

? zero or 1 time

For example, the following expression can match both "color" and "colour".
$regex = "colou?r";
Since the "?" metacharacter makes the one character that it follows optional, the regular expression finds a
match with or without the "u".

# Lazy and greedy expressions

Quantifiers are greedy. They are greedy because they try to match the longest string possible. This may have
unforeseen outcomes, since we might get a much longer match than we anticipated.

For example, I would like to replace the colors of both cats (which are found inside span elements) with the
string 'M@#!'in the following sentence.

$string = "Said the <span>striped</span> cat to the <span>orange</span> cat.";

// ordinary greedy expression.

$regex = "/<span>.+<\/span>/";

echo preg_replace($regex ,' M@#! ', $string);

Said the M@#! cat.

Not quite what we expected.

Instead of replacing each span separately, the expression started replacing from the first opening span tag
and ended with the last closing span tag. This behavior is caused by the greedy nature of the regular

To get what we want, we need to make the expression lazy. We can make the expression lazy by adding the
'?' symbol right after the quantifier.
Let's precede the expression with the '?' symbol to make it lazy.

$string = "Said the <span>striped</span> cat to the <span>orange</span> cat.";

// Lazy expression with the '?' symbol.

$regex = "/<span>.+?<\/span>/";

echo preg_replace($regex , ' M@#! ', $string );

Result is:
Said the M@#! cat to the M@#! cat.
Every span is separately replaced.

# How to find a match to set of expressions?

In order to choose between several alternatives we need to put the pipe (|) symbol between the different
alternatives. For example,

To find a match to one of the strings 'png' or 'jpeg' we use the expression '/png|jpeg/'.
We can have more than two alternatives, '/png|jpeg|gif|bmp/'.

If we want to choose between 'jpg' and 'jpeg', we can add the expression 'jpeg' to the set: '/png|jpg|gif|bmp|

We can also make the 'e' in 'jpeg' optional, by adding the '?' symbol right after it: '/png|jpe?g|gif|bmp/'.

The following expression matches images filenames:

$regex = "/^([A-Za-z0-9-_.])+\.(png|jpe?g|gif|bmp)$/";

Pay attention to the parentheses around the set of file extensions. The parentheses separate the extensions
from the rest of the expression, so the first alternative in the set is 'png' and not '([A-Za-z0-9-_.])+\.png'.

# How to match alternatives when the order does matter?

In the previous section, we saw how to write a set of options when the order does not matter, but how can we
search for matches when the order does matter?

Let's take, for example, the following string:

'The sum of 2 and 3 is 5'
which is equivalent to:
'The sum of 3 and 2 is 5'.

The first regex that comes to mind may be the following:

$regex = "/^The sum of (2|3) and (3|2) is 5$/";
The problem is that the regex can also match the following strings:
"The sum of 2 and 2 is 5"
"The sum of 3 and 3 is 5"

Which are obviously wrong.

To find a perfect match, we need to modify the regex a little bit, so it can only match the right options.
$regex = "/^the sum of (2( and 3)|3( and 2)) is 5$/";
Let's test the regex:

$regex = "/^The sum of (2( and 3)|3( and 2)) is 5$/";

$string = "The sum of 2 and 3 is 5";

if(preg_match($regex, $string, $match))

echo "We found a match to the expression: " . $match[0];
echo "We found no match.";

Result is:

We found a match to the expression: The sum of 2 and 3 is 5

# Capturing groups and backreferences

When we use parentheses we capture the expressions, and so we can later

backreference these expressions with the '$' metacharacter. The first group that we
captured we'll be referenced by '$1', the second group by '$2', the third group by '$3',

In the following example, we take the dates in the European date format and re-format
them into the American date format.

$string = "16-04-2016";
$regex = "/([0-9]{1,2})-([0-9]{1,2})-([0-9]{4})/";
//The first group references the day
//The second group references the month
$replace = "$2-$1-$3";
//Replace the first group with the second

echo preg_replace($regex ,$replace ,$string );

Result is:

If we want to avoid capturing one of the groups, we add the non-capturing

group (?:) at the beginning of the group, so it will be excluded from the match.
For example, in order to avoid capturing the year, we can add the non-capturing
group at the beginning of the expression that matches the year:

$string = "16-04-2016";
$regex = "/([0-9]{1,2})-([0-9]{1,2})-(?:[0-9]{4})/";
$replace = "$2-$1-$3";

echo preg_replace($regex ,$replace ,$string );

Accordingly, the result does not contain the third group:

How to improve the performance of capturing groups?

It is advisable to use capturing groups only when they are really needed because they
slow down the regex.
Of course, there are cases in which we have no escape but to use them. So, a neat way
to solve the problem is to avoid capturing the groups, by using the non-capturing
group, ':?'.

For example, we can improve the performance of the regex that searches for images by
adding the non-capturing group:
$regex = "/^(?:[A-Za-z0-9-_.])+\.(?:png|jpe?g|gif|bmp)$/";
# The search for strings that don't match

Sometimes we may be intersted only in those strings that do not match the regular
expression, and in these cases we will precede the PHP function with the "not operator"
(!) in order to reverse the result of the boolean expression.

Better done than said. In the following example, we want to find a match to those
strings that do not contain the string "fowl":
$regex = "/fowl/";
$string = "Birds of a feather flock together.";
// In order to search for strings that don't match
// we precede the PHP function with the not operator, "!"
if(!preg_match($regex, $string, $match))
echo "No match";
echo "There is a match";

Result is:
No match

# Search and replace with preg_replace

In order to replace strings, we use the preg_replace() function, with the following
preg_replace($regex, $replace, $string);
 $regex - the expression that we search for.
 $replace - what we want the match to be replaced with.
 $string - the string in which we look for the expression.

In the following example, we replace all the wrong forms of the word 'misspelled' with
the correct form.

$regex = "/miss?pp?ell?e?d/";
$replace = "misspelled";
$string = "He mispeled the word in all of his emails.";

echo preg_replace($regex, $replace, $string);

Result is:
He misspelled the word in all of his emails.

# How to split strings by regular expressions? (preg_split )

preg_split is the built-in PHP function that we use when we want to split a string by
regular expression. It has the following syntax:
preg_split($regex, $string);
 $regex - the expression that we search for, and want to split at.
 $string - the string in which we search for the expression.

In the following example we want to split at the comma followed by any number of
$regex = "/,\s+/";
$string = "html, css, javascript, php";

$languages = preg_split($regex, $string);


Result is:
[0] => html
[1] => css
[2] => javascript
[3] => php

# How to search for matches inside arrays? (preg_grep)

preg_grep searches for matches inside arrays, and brings back an array that is
consisted only of matching items.
The syntax:
preg_grep($pattern, $array);
$array stands for the array in which we search for matching items.
In the following example, we serch inside the $cars array for items that start with 't'
(lower or upper case):
$models = array("Bentley", "Tesla", "Maserati", "toyota", "Subaru",

$output = preg_grep('/^t[a-z]+/i', $models);

print_r( $output );

Result is:
[1] => Tesla
[3] => toyota

.error {color: #FF0000;}

// define variables and set to empty values
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;

<h2>PHP Form Validation Example</h2>

<p><span class="error">* required field.</span></p>
<form method="post" action="w3_example.php">
Name: <input type="text" name="name">
<span class="error">* <?php echo $nameErr;?></span>
E-mail: <input type="text" name="email">
<span class="error">* <?php echo $emailErr;?></span>
Website: <input type="text" name="website">
<span class="error"><?php echo $websiteErr;?></span>
Comment: <textarea name="comment" rows="5" cols="40"></textarea>
<input type="radio" name="gender" value="female">Female
<input type="radio" name="gender" value="male">Male
<span class="error">* <?php echo $genderErr;?></span>
<input type="submit" name="submit" value="Submit">


// define variables and set to empty values
$nameErr = $emailErr = $genderErr = $websiteErr = "";
$name = $email = $gender = $comment = $website = "";

if (empty($_POST["name"])) {
$nameErr = "Name is required";
} else {
$name = test_input($_POST["name"]);

if (empty($_POST["email"])) {
$emailErr = "Email is required";
} else {
$email = test_input($_POST["email"]);

if (empty($_POST["website"])) {
$website = "";
} else {
$website = test_input($_POST["website"]);

if (empty($_POST["comment"])) {
$comment = "";
} else {
$comment = test_input($_POST["comment"]);

if (empty($_POST["gender"])) {
$genderErr = "Gender is required";
} else {
$gender = test_input($_POST["gender"]);

echo "<h2>Your Input:</h2>";

echo $name;
echo "<br>";
echo $email;
echo "<br>";
echo $website;
echo "<br>";
echo $comment;
echo "<br>";
echo $gender;