Tuesday, January 31, 2017

Elixir - Strings

Strings in Elixir are inserted between double quotes, and they are encoded in UTF-8. Unlike C and C++ where default strings are ASCII encoded and only 256 different characters are possible, UTF-8 consists of 66536 code points. This means that UTF-8 encoding consists of those many different possible characters. Since the strings use utf-8, we can also use symbols like: ö, ł, etc.

Create a String

To create a string variable, simply assign a variable a string:
str = "Hello world"
To print this to your console, simply call the IO.puts function and pass it the variable str:
str = "Hello world"
IO.puts(str)
When running above program, it produces following result:
Hello World

Empty Strings

You can create an empty string using the string literal, "". For example,
a = ""
if String.length(a) === 0 do
 IO.puts("a is an empty string")
end
When running above program, it produces following result
a is an empty string

String Interpolation

String interpolation is a way to construct a new String value from a mix of constants, variables, literals, and expressions by including their values inside a string literal. Elixir supports string interpolation, to use a variable in a string, when writing it, wrap it with curly braces and prepend the curly braces with a '#' sign. For example:
x = "Apocalypse" 
y = "X-men #{x}"
IO.puts(y)
This will take the value of x and substitute it in y. When running above program, it produces following result:
X-men Apocalypse

String Concatenation

We have already seen the use of String concatenation in previous chapters. The '<>' operator is used to concatenate strings in Elixir. To concatenate 2 strings,
x = "Dark"
y = "Knight"
z = x <> " " <> y
IO.puts(z)
When running above program, it produces following result:
Dark Knight

String Length

To get the length of the string, we use the String.length function. Pass it a string as a parameter and it'll show you its size. For example,
IO.puts(String.length("Hello"))
When running above program, it produces following result: 5.

Reversing a string

To reverse a string, pass it to the String.reverse function. For example,
IO.puts(String.reverse("Elixir"))
When running above program, it produces following result:
rixilE

String comparison

To compare 2 strings, we can use the == or the === operators. For example,
var_1 = "Hello world"
var_2 = "Hello Elixir"
if var_1 === var_2 do
 IO.puts("#{var_1} and #{var_2} are the same")
else
 IO.puts("#{var_1} and #{var_2} are not the same")
end
When running above program, it produces following result:
Hello world and Hello elixir are not the same.

String Matching

We have already seen the use of the =~ string match operator. To check if a string matches a regex, we can either use the string match operator or the String.match? function. For example,
IO.puts(String.match?("foo", ~r/foo/))
IO.puts(String.match?("bar", ~r/foo/))
When running above program, it produces following result:
true
false
This same can also be achieved with the =~ operator. For example,
IO.puts("foo" =~ ~r/foo/)
When running above program, it produces following result:
true

String Functions

Elixir supports a large number of functions related to strings, some of the most used are listed here. For more info on them, please visit the Elixir docs.
S.No. Function and its purpose
1 at(string, position)
Returns the grapheme at the position of the given utf8 string. If position is greater than string length, then it returns nil
2 capitalize(string)
Converts the first character in the given string to uppercase and the remainder to lowercase
3 contains?(string, contents)
Checks if string contains any of the given contents
4 downcase(string)
Converts all characters in the given string to lowercase
5 ends_with?(string, suffixes)
Returns true if string ends with any of the suffixes given
6 first(string)
Returns the first grapheme from a utf8 string, nil if the string is empty
7 last(string)
Returns the last grapheme from a utf8 string, nil if the string is empty
8 replace(subject, pattern, replacement, options \\ [])
Returns a new string created by replacing occurrences of pattern in subject with replacement
9 slice(string, start, len)
Returns a substring starting at the offset start, and of length len
10 split(string)
Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored. Groups of whitespace are treated as a single occurrence. Divisions do not occur on non-breaking whitespace
11 upcase(string)
Converts all characters in the given string to uppercase

Binaries

A binary is just a sequence of bytes. Binaries are defined using << >>. For example:
<< 0, 1, 2, 3 >>
Of course, those bytes can be organized in any way, even in a sequence that does not make them a valid string. For example,
<< 239, 191, 191 >>
Strings are also binaries. And the string concatenation operator <> is actually a Binary concatenation operator:
IO.puts(<< 0, 1 >> <> << 2, 3 >>)
When running above program, it produces following result:
<< 0, 1, 2, 3 >>
Note the ł character. Since this is utf-8 encoded, this character representation takes up 2 bytes.
Since each number represented in a binary is meant to be a byte, when this value goes up from 255, it is truncated. To prevent this, we use size modifier to specify how many bits we want that number to take. For example:
IO.puts(<< 256 >>) # truncated, it'll print << 0 >>
IO.puts(<< 256 :: size(16) >>) #Takes 16 bits/2 bytes, will print << 1, 0 >>
When running above program, it produces following result:
<< 0 >>
<< 1, 0 >>
We can also use utf8 modifier, this will output the character if it is codepoint, else the bytes:
IO.puts(<< 256 :: utf8 >>)
When running above program, it produces following result:
Ā
We have a function called is_binary that checks if a given variable is a binary. Note that only variables which are stored as multiple of 8bits are binaries.

Bitstrings

If we define a binary using the size modifier and pass it a value that is not a multiple of 8, we end up with a bitstring instead of a binary. For example,
bs = << 1 :: size(1) >>
IO.puts(bs)
IO.puts(is_binary(bs))
IO.puts(is_bitstring(bs))
When running above program, it produces following result:
<< 1::size(1) >>
false
true
This means that variable bs is not a binary but rather a bitstring. We can also say that a binary is a bitstring where the number of bits is divisible by 8. Pattern matching works on binaries as well as bitstrings in the same way.

No comments:

Post a Comment