Note that use of the starting value is necessary, because we want to be able to reduce lists of lengths 0 and 1 as well. The default starting value is zero."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## String handling\n",
"We have already seen how to index, slice, concatenate, and repeat strings. Let's now look into what methods the `str` class offers. In Python strings are immutable. This means that for instance the following assignment is not legal:"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [],
"source": [
"s=\"text\"\n",
"# s[0] = \"a\" # This is not legal in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because of the immutability of the strings, the string methods work by returning a value; they don't have any side-effects. In the rest of this section we briefly describe several of these methods. The methods are here divided into five groups."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Classification of strings\n",
"All the following methods will take no parameters and return a truth value. An empty string will always result in `False`.\n",
"\n",
"* `s.isalpha()` True if all characters are letters or digits\n",
"* `s.isalpha()` True if all characters are letters\n",
"* `s.isdigit()` True if all characters are digits\n",
"* `s.islower()` True if contains letters, and all are lowercase\n",
"* `s.isupper()` True if contains letters, and all are uppercase\n",
"* `s.isspace()` True if all characters are whitespace\n",
"* `s.istitle()` True if uppercase in the beginning of word, elsewhere lowercase"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### String transformations\n",
"The following methods do conversions between lower and uppercase characters in the string. All these methods return a new string.\n",
"\n",
"* `s.lower()` Change all letters to lowercase\n",
"* `s.upper()` Change all letters to uppercase\n",
"* `s.capitalize()` Change all letters to capitalcase\n",
"* `s.title()` Change to titlecase\n",
"* `s.swapcase()` Change all uppercase letters to lowercase, and vice versa\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Searching for substrings\n",
"All the following methods get the wanted substring as the\n",
"parameter, except the replace method, which also gets the\n",
"replacing string as a parameter\n",
"\n",
"* `s.count(substr)` Counts the number of occurences of a substring\n",
"* `s.find(substr)` Finds index of the first occurence of a substring, or -1\n",
"* `s.rfind(substr)` Finds index of the last occurence of a substring, or -1\n",
"* `s.index(substr)` Like find, except ValueError is raised if not found\n",
"* `s.rindex(substr)` Like rfind, except ValueError is raised if not found\n",
"* `s.startswith(substr)` Returns True if string starts with a given substring\n",
"* `s.endswith(substr)` Returns True if string ends with a given substring\n",
"* `s.replace(substr, replacement)` Returns a string where occurences of one string\n",
"are replaced by another\n",
"\n",
"Keep also in mind that the expression `\"issi\" in \"mississippi\"` returns a truth value of whether the first string occurs in the second string.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trimming and adjusting\n",
"* `s.strip(x)` Removes leading and trailing whitespace by default, or characters found in string x\n",
"* `s.lstrip(x)` Same as strip but only leading characters are removed\n",
"* `s.rstrip(x)` Same as strip but only trailing characters are removed\n",
"* `s.ljust(n)` Left justifies string inside a field of length n\n",
"* `s.rjust(n)` Right justifies string inside a field of length n\n",
"* `s.center(n)` Centers string inside a field of length n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An example of using the `center` method and string repetition:"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-----------\n",
"| * |\n",
"| *** |\n",
"| ***** |\n",
"| ******* |\n",
"|*********|\n",
"| * |\n",
"| * |\n",
"-----------\n"
]
}
],
"source": [
"L=[1,3,5,7,9,1,1]\n",
"print(\"-\"*11)\n",
"for i in L:\n",
" s=\"*\"*i \n",
" print(\"|%s|\" % s.center(9))\n",
"print(\"-\"*11)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Joining and splitting\n",
"The `join(seq)` method joins the strings of the sequence `seq`. The string itself is used as a delimitter. An example:"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'abc--def--ghi'"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"--\".join([\"abc\", \"def\", \"ghi\"])"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99\n",
" 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99\n"
]
}
],
"source": [
"L=list(map(lambda x : \" %s\" % x, range(100)))\n",
"s=\"\"\n",
"for x in L:\n",
" s = s + x # Don't ever do this, it creates a new string at every iteration\n",
"print(s)\n",
"print(\"\".join(L)) # This is the correct way of building a string out of smaller strings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
If you want to build a string out of smaller strings, then\n",
"first put the small strings into a list, and then use the `join` method to catenate the pieces together. It is much more efficient this way. Use the `+` catenation operator only if you have very few short strings that you want to catenate.
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method `split(sep=None)` divides a string into pieces that are separated by the string `sep`. The pieces are returned in a list. For instance, the call `'abc--def--ghi'.split(\"--\")` will result in"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['abc', 'def', 'ghi']"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'abc--def--ghi'.split(\"--\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If no parameters are given to the `split` method, then it splits at any sequence of white space."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"####
Exercise 18 (acronyms)
\n",
"\n",
"Write function `acronyms` which takes a string as a parameter and returns a list of acronyms. A word is an acronym if it has length at least two, and all its characters are in uppercase. Before acronym detection, delete punctuation with the `strip` method.\n",
"\n",
"Test this function in the `main` function with the following call:\n",
"```python\n",
"print(acronyms(\"\"\"For the purposes of the EU General Data Protection Regulation (GDPR), the controller of your personal information is International Business Machines Corporation (IBM Corp.), 1 New Orchard Road, Armonk, New York, United States, unless indicated otherwise. Where IBM Corp. or a subsidiary it controls (not established in the European Economic Area (EEA)) is required to appoint a legal representative in the EEA, the representative for all such cases is IBM United Kingdom Limited, PO Box 41, North Harbour, Portsmouth, Hampshire, United Kingdom PO6 3AU.\"\"\"))\n",
"```\n",
"\n",
"This should return\n",
"```['EU', 'GDPR', 'IBM', 'IBM', 'EEA', 'EEA', 'IBM', 'PO', 'PO6', '3AU']```\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"####
Exercise 19 (sum equation)
\n",
"\n",
"Write a function `sum_equation` which takes a list of positive integers as parameters and returns a string with an equation of the sum of the elements.\n",
"\n",
"Example:\n",
"`sum_equation([1,5,7])`\n",
"returns\n",
"`\"1 + 5 + 7 = 13\"`\n",
"Observe, the spaces should be exactly as shown above. For an empty list the function should return the string \"0 = 0\".\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Modules"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To ease management of large programs, software is divided\n",
"into smaller pieces. In Python these pieces are called *modules*.\n",
"A module should be a unit that is as independent from other\n",
"modules as possible.\n",
"Each file in Python corresponds to a module.\n",
"Modules can contain classes, objects, functions, ...\n",
"For example, functions to handle regular expressions are in\n",
"module `re`\n",
"\n",
"The standard library of Python consists of hundreds of\n",
"modules. Some of the most common standard modules include\n",
"\n",
"* `re`\n",
"* `math`\n",
"* `random`\n",
"* `os`\n",
"* `sys`\n",
"\n",
"Any file with extension `.py` that contains Python source code\n",
"is a module. So, no special notation is needed to create a module."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using modules\n",
"\n",
"Let’s say that we need to use the cosine function.\n",
"This function, and many other mathematical functions are\n",
"located in the `math` module.\n",
"To tell Python that we want to access the features offered by\n",
"this module, we can give the statement `import math`.\n",
"Now the module is loaded into memory.\n",
"We can now call the function like this:\n",
"```python\n",
"math.cos(0)\n",
"1.0\n",
"```\n",
"\n",
"Note that we need to include the module name where the `cos`\n",
"function is found.\n",
"This is because other modules may have a function (or other\n",
"attribute of a module) with the same name.\n",
"This usage of different namespace for each module prevents\n",
"name clashes. For example, functions `gzip.open`, `os.open` are not to be confused\n",
"with the builtin `open` function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Breaking the namespace\n",
"\n",
"If the cosine is needed a lot, then it might be tedious to\n",
"always specify the namespace, especially if the name of the\n",
"namespace/module is long.\n",
"For these cases there is another way of importing modules.\n",
"Bring a name to the current scope with\n",
"`from math import cos` statement.\n",
"Now we can use it without the namespace specifier: `cos(1)`.\n",
"\n",
"Several names can be imported to the current scope with\n",
"`from math import name1, name2, ...`\n",
"Or even all names of the module with `from math import *`\n",
"The last form is sensible only in few cases, normally it just\n",
"confuses things since the user may have no idea what names\n",
"will be imported."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Module lookup\n",
"\n",
"When we try to import a module `mod` with the import\n",
"statement, the lookup proceeds in the following order:\n",
"\n",
"* Check if it is a builtin module\n",
"* Check if the file `mod.py` is found in any of the folders in\n",
"the list `sys.path`. The first item in this list is the current\n",
"folder\n",
"\n",
"When Python is started, the `sys.path` list is initialised with\n",
"the contents of the `PYTHONPATH` environment variable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Module hierarchy\n",
"\n",
"The standard library contains hundreds of modules.\n",
"Hence, it is hard to comprehend what the library includes.\n",
"The modules therefore need to be organised somehow.\n",
"In Python the modules can be organised into hierarchies using\n",
"*packages*.\n",
"A package is a module that can contain other packages and\n",
"modules.\n",
"For example, the `numpy` package contains subpackages `core`,\n",
"`distutils`, `f2py`, `fft`, `lib`, `linalg`, `ma`, `numarray`, `oldnumeric`,\n",
"`random`, and `testing`.\n",
"And package `numpy.linalg` in turn contains modules `linalg`,\n",
"`lapack_lite` and `info`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Importing from packages\n",
"\n",
"The statement `import numpy` imports the top-level package `numpy`\n",
"and its subpackages. \n",
"\n",
"* `import numpy.linalg` imports the subpackage only, and\n",
"* `import numpy.linalg.linalg` imports the module only\n",
"\n",
"If we want to skip the long namespace specification, we can\n",
"use the form\n",
"\n",
"```python\n",
"from numpy.linalg import linalg\n",
"```\n",
"\n",
"or\n",
"\n",
"```python\n",
"from numpy.linalg import linalg as lin\n",
"```\n",
"\n",
"if we want to use a different name for the module. The following command imports the function `det` (computes the determinant of a matrix) from the module linalg, which is contained in a subpackage linalg, which belongs to package numpy:\n",
"```python\n",
"from numpy.linalg.linalg import det\n",
"```\n",
"\n",
"Had we only imported the top-level package `numpy` we would have to refer to the `det` function with the full name `numpy.linalg.linalg.det`.\n",
"\n",
"Here's a recap of the module hierarchy:\n",
"\n",
"```\n",
"numpy package\n",
" .\n",
"linalg subpackage\n",
" .\n",
"linalg module\n",
" .\n",
" det function\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Correspondence between folder and module hierarchies\n",
"\n",
"The packages are represented by folders in the filesystem.\n",
"The folder should contain a file named `__init__.py` that\n",
"makes up the package body. This handles the initialisation of\n",
"the package.\n",
"The folder may contain also further folders\n",
"(subpackages) or Python files (normal modules).\n",
"\n",
"```\n",
"a/\n",
" __init__.py\n",
" b.py\n",
" c/\n",
" __init__.py\n",
" d.py\n",
" e.py\n",
"```\n",
"![package.svg](package.svg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Contents of a module\n",
"\n",
"Suppose we have a module named `mod.py`.\n",
"All the assignments, class definitions with the `class` statement,\n",
"and function definitions with `def` statement will create new\n",
"attributes to this module.\n",
"Let’s import this module from another Python file using the\n",
"`import mod` statement.\n",
"After the import we can access the attributes of the module\n",
"object using the normal dot notation: `mod.f()`,\n",
"`mod.myclass()`, `mod.a`, etc.\n",
"Note that Python doesn’t really have global variables that are\n",
"visible to all modules. All variables belong to some module\n",
"namespace.\n",
"\n",
"One can query the attributes of an object using the `dir` function. With no\n",
"parameters, it shows the attributes of the current module. Try executing `dir()` in\n",
"an IPython shell or in a Jupyter notebook! After that, define the following attributes, and try running `dir()`\n",
"again:\n",
"\n",
"```python\n",
"a=5\n",
"def f(i):\n",
" return i + 1\n",
"```\n",
"\n",
"The above definitions created a *data attribute* called `a` and a *function attribute* called `f`.\n",
"We will talk more about attributes next week when we will talk about objects.\n",
"\n",
"Just like other objects, the module object contains its\n",
"attributes in the dictionary `modulename.__dict__`\n",
"Usually a module contains at least the attributes `__name__` and\n",
"`__file__`. Other common attributes are `__version__`,\n",
"`__author__` and `__doc__` , which contains the docstring of the\n",
"module.\n",
"If the first statement of a file is a string, this is taken as the\n",
"docstring for that module. Note that the docstring of the module really must be the first non-empty non-comment line.\n",
"The attribute `__file__` is always the filename of the module.\n",
"\n",
"The module attribute `__name__` has value `“__main__”` if we in are the main program,\n",
"otherwise some other module has imported us and name\n",
"equals `__file__`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Python it is possible to put statements on the top-level of our module `mod` so that they don't belong to any function. For instance like this:\n",
"\n",
"```python\n",
"for _ in range(3):\n",
" print(\"Hello\")\n",
"```\n",
"\n",
"But if somebody imports our module with `import mod`, then all the statements at the top-level will be executed. This may be surprising to the user who imported the module. The user will usually say, explicitly when he/she wants to execute some code from the imported module.\n",
"\n",
"It is better style to put these statements inside some function. If they don't fit in any other function, then you can use, for example, the function named `main`, like this:\n",
"\n",
"```python\n",
"def main():\n",
" for _ in range(3):\n",
" print(\"Hello\")\n",
"\n",
"if __name__ == \"__main__\": # We call main only when this module is not being imported, but directly executed\n",
" main() # for example with 'python3 mod.py'\n",
"```\n",
"\n",
"You probably have seen this mechanism used in the exercise stubs.\n",
"Note that in Python the `main` has no special meaning, it is just our convention to use it here.\n",
"Now if somebody imports `mod`, the `for` loop won't be automatically executed. If we want, we can call it explicitly with `mod.main()`. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"for _ in range(3):\n",
" print(\"Hello\")\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"####
Exercise 20 (usemodule)
\n",
"\n",
"Create your own module as file `triangle.py` in the `src` folder. The module should contain two functions:\n",
"\n",
"* `hypothenuse` which returns the length of the hypothenuse when given the lengths of two other sides of a right-angled triangle\n",
"* `area` which returns the area of the right-angled triangle, when two sides, perpendicular to each other, are given as parameters.\n",
"\n",
"Make sure both the functions and the module have descriptive docstrings. Add also the `__version__` and `__author__` attributes to the module. Call both your functions from the main function (which is in file `usemodule.py`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"* We have learned that Python's code blocks are denoted by consistent indenting, with spaces or tabs, unlike in many other languages\n",
"* Python's `for` loops goes through all the elements of a container without the need of worrying about the positions (indices) of the elements in the container\n",
"* More generally, an iterable is an object whose elements can be gone through one by one using a `for` loop. Such as `range(1,7)`\n",
"* Python has dynamic typing: the type of a name is known only when we run the program. The type might not be fixed, that is, if a name is created, for example, in a loop, then its type might change at each iteration.\n",
"* Visibility of a name: a name that refers to a variable can disappear in the middle of a code block, if a `del` statement is issued!\n",
"* Python is good at string handling, but remember that if you want to concatenate large number of strings, use the `join` method. Concatenating by the `+` operator multiple times is very inefficient\n",
"* Several useful tools exist to process sequences: `map`, `reduce`, `filter`, `zip`, `enumerate`, and `range`. The unnamed lambda function can be helpful with these tools. Note that these tools (except the `reduce`) don't return lists, but iterables, for efficiency reasons: Most often we don't want to store the result from these tools to a container (such as a list), we may only want to iterate through the result!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"
\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}