Extracting Wikipedia data in Python

Extracting Wikipedia data in Python

Wikipedia is the world's largest and free encyclopedia. Its data is easily accessible through a Python library called Wikipedia. We will see how can we do that today in this tutorial.

Installation

Before using this API, firstly, we will manually installing it. Because, it is not an built-in API. So just type th following commands in your command prompt.

$ pip install Wikipedia

Searching Wikipedia

Firstly, we will understand how to search a query in Wikipedia with this API.

Search Method

The search method returns thelist of search results for our query. Just like Google, Wikipedia also has its own search engine. Here this the code.

import wikipedia
result = wikipedia.search("Tesla Inc.")

Suggestions

Wikipedia also gives suggestions on searching on using search() method. On passing suggestion parameter in this method, we will find some suggestions for our query (if any).

result = wikipedia.search("Tesla Inc.", suggestion = True)

Result

(["Tesla Inc.", ....], None)

This will return a tuple which contains our search results and suggestions. There is no suggestions for our query. That is why, this return us None.

Getting Summary

With this API, we get summary of any article published on Wikipedia by its Title. For this, we just have to run the following code:

wikipedia.summary("Google maps")

You have to enter the title of the articles published on wikipedia.

Languages

This Wikipedia module gives us an option to change the language in which we want to read the articles.

wikipedia.set_lang("fr")

In above code, fr is language code for french language.

Supported Languages

To get the list of supported languages, run this code:

wikipedia.languages()

languages() method returns us the list of languages on which articles are written in wikipedia.

Page Details

This wikipedia API also gives us the option to access all the web pages hosted wikipedia website. To access the page details, firstly, you have to run this code:

India = wikipedia.page("India")

Now, we will you this India variable( instance) to get the page details.

Title and Url

To get the title of the page just enter the following code

India.title

To get the url of the page, we can just enter the following code:

India.url

Content

To get the whole content of the page / article, you just have to run the following code:

India.content

Thanks

Read the full article Link is here (TECHWITHPIE)