Web scraping using Python

This  is in continuation of Python Libraries  

I just scanned through Python control and loop commands with C. I discussed  working of Jupyter notebook and  Python core libraries using the same.

Key Takeaway

   I already discussed if – else , for , while compound statements used for controlling the program flow .  Today I will discuss about another compound statement “with as” . Apart from that I will discuss input(), file operations, string manipulation and web scraping with Beautifulsoup package. This will be followed by demo in Jupyter Notebook.

The Beutifulsoup demo is just to show Python capability. If you are looking for quick website creation solution, You can find experts like in this link.

Input from Keyboard 

  •  C function 
    • int scanf(const char *format, …)
  • Python
    • input([prompt])
    • Prompt will be presented to output without trailing new line

string manipulation 

Here is comparison between some of the C library string manipulation functions with corresponding Python functions. 

Calculating the length of string

    • C
      • size_t strlen(const char *str)
    • Python 
      • len(s1)

Copy a string to another string

    • C
      • char *strcpy(char *dest, const char *src)
    • Python
      • s1=s2

 Concatenation(joins) of two strings

    • C
      • char *strcat(char *dest, const char *src)
    • Python
      • s3=s1+s2

Comparison of two string

    • C
      • int strcmp(const char *str1, const char *str2)
    • Python
      • max(s1,s2)

Converting string to lowercase

    • C
      • char *strlwr(char *string);
    • Python
      • s1.lower()

 Converts string to uppercase

    • C
      • char *strupr(char *string);
    • Python
      • s1.upper()

File operations 

  • C
    • FILE *fopen(const char *filename, const char *mode)
    • int fclose(FILE *stream)
  • Python
    • open()
    • file.close()
    • With as 
    • Python’s with statement supports the concept of a runtime context defined by a context manager. This is implemented using a pair of methods that allow user-defined classes to define a runtime context that is entered before the statement body is executed and exited when the statement ends.

      • The entry will do initialization while exit will do necessary cleanup. This helps to achieve a stable state(exception safe)  as supported by many other OOPS languages as well. We will check this in demo for automatic file closure. 
        • with_stmt ::= “with” with_item (“,” with_item)* “:” suite
        • with_item ::= expression [“as” target]

Anaconda Packages

    There are many Python based libraries available in Anaconda cloud. Lets try some web scraping with Python libraries. 

  • Requests Library
    • This allows to do HTTP requests using Python
    • Verify its installation 
    • C:\>conda search requests
      Loading channels: done
      # Name Version Build Channel
      requests 0.13.9 py26_0 pkgs/free
      requests 0.13.9 py27_0 pkgs/free
      :::::::::::::::::::::::::::::::::::::::::::::::::::::
      requests 2.21.0 py37_0 pkgs/main
  • Beutifulsoup 
    • Beutifulsoup is a Python library to extract data from HTML,XML and other markup languages
  • Verify the installation of this package in Anaconda . Install if not found
C:\>conda search beautiful-soup
Loading channels: done
# Name Version Build Channel
beautiful-soup 4.3.1 py26_0 pkgs/free
:::::::::::::::::::::::::::::::::::::::::::::::::::
beautiful-soup 4.3.2 py33_1 pkgs/free
beautiful-soup 4.3.2 py34_0 pkgs/free
beautiful-soup 4.3.2 py34_1 pkgs/free

Lets look into video  at Youtube link

Reference :

Coding style : https://docs.python-guide.org/writing/style/

This article is discussed at Web scraping using Python  video.

Please subscribe to YouTube channel Embedkari  for additional embedded related stuff.

Embedkari  provides  100 mins videos on step-by-step Python uses with Cloud and Machine Learning. These videos are under low cost training and kept at Rs500 only.. Please check the detail  of Low Cost Training Option1.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.