Web Scraping using Python

This blog is in continuation of my last blog . Please SUBSCRIBE to YouTube channel Embedkari  to access detail  video related to this and other important topics.

I just scanned through Python control and loop commands with C. I discussed  working of Jupyter notebook and  Python core libraries using the same.

Key Takeaway

   I already discussed if – else , for , while compound statements used for controlling the program flow .  Today I will discuss about another compound statement “with as” . Apart from that I will discuss input(), file operations, string manipulation and web scraping with Beautifulsoup package. This will be followed by demo in Jupyter Notebook.

The Beutifulsoup demo is just to show Python capability. If you are looking for quick website creation solution, You can find experts like in this link.

Input from Keyboard 

  •  C function 
    • int scanf(const char *format, …)
  • Python
    • input([prompt])
    • Prompt will be presented to output without trailing new line

string manipulation 

Here is comparison between some of the C library string manipulation functions with corresponding Python functions. 

  • Calculating the length of string
    • C
      • size_t strlen(const char *str)
    • Python 
      • len(s1)
  • Copy a string to another string
    • C
      • char *strcpy(char *dest, const char *src)
    • Python
      • s1=s2
  •  Concatenation(joins) of two strings
    • C
      • char *strcat(char *dest, const char *src)
    • Python
      • s3=s1+s2
  • Comparison of two string
    • C
      • int strcmp(const char *str1, const char *str2)
    • Python
      • max(s1,s2)
  • Converting string to lowercase
    • C
      • char *strlwr(char *string);
    • Python
      • s1.lower()
  •  Converts string to uppercase
    • C
      • char *strupr(char *string);
    • Python
      • s1.upper()

File operations 

  • C
    • FILE *fopen(const char *filename, const char *mode)
    • int fclose(FILE *stream)
  • Python
    • open()
    • file.close()
    • With as 
    • Python’s with statement supports the concept of a runtime context defined by a context manager. This is implemented using a pair of methods that allow user-defined classes to define a runtime context that is entered before the statement body is executed and exited when the statement ends.

      • The entry will do initialization while exit will do necessary cleanup. This helps to achieve a stable state(exception safe)  as supported by many other OOPS languages as well. We will check this in demo for automatic file closure. 
        • with_stmt ::= “with” with_item (“,” with_item)* “:” suite
        • with_item ::= expression [“as” target]

Anaconda Packages

    There are many Python based libraries available in Anaconda cloud. Lets try some web scraping with Python libraries. 

  • Requests Library
    • This allows to do HTTP requests using Python
    • Verify its installation 
    • C:\>conda search requests
      Loading channels: done
      # Name Version Build Channel
      requests 0.13.9 py26_0 pkgs/free
      requests 0.13.9 py27_0 pkgs/free
      requests 2.21.0 py37_0 pkgs/main
  • Beutifulsoup 
    • Beutifulsoup is a Python library to extract data from HTML,XML and other markup languages
  • Verify the installation of this package in Anaconda . Install if not found
  • C:\>conda search beautiful-soup
    Loading channels: done
    # Name Version Build Channel
    beautiful-soup 4.3.1 py26_0 pkgs/free
    beautiful-soup 4.3.2 py33_1 pkgs/free
    beautiful-soup 4.3.2 py34_0 pkgs/free
    beautiful-soup 4.3.2 py34_1 pkgs/free
Lets look into demo at Youtube link

Reference :

Coding style : https://docs.python-guide.org/writing/style/

Please subscribe to YouTube channel Embedkari  for additional embedded related stuff.



Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.