Ukumiswa kuMining Data

Ukumiswa kuyindlela yokwebiwa kwedatha enikeza izigaba ekuqoqweni kwedatha ukuze kutholakale ukubikezela okunembile nokuhlaziya. Futhi okuthiwa ngezinye izikhathi kuthiwa yi- Decision Tree , ukuhlelwa kwezinhlelo kungenye yezindlela ezimbalwa ezihloselwe ukwenza ukuhlaziywa kwamadokhumenti amakhulu kakhulu kuphumelele.

Kungani Kunqunywa?

Ulwazi olubanzi kakhulu luba yinto evamile ezweni lanamuhla le "idatha enkulu." Cabanga nje nge-database ene-terabyte eminingi yedatha-i-terabyte iyinkulungwane ye- trillion yedatha yedatha.

I-Facebook yedwa iqoqa ama-terabytes angu-600 yedatha entsha njalo ngosuku olulodwa (kusukela ngo-2014, isikhathi sokugcina lapho kubika lezi zici). Inselelo enkulu yedatha enkulu yindlela yokuyiqonda ngayo.

Futhi ivolumu ayiyona yodwa inkinga: idatha enkulu nayo ibuye iguquke, ingakhiwe futhi iguquke ngokushesha. Cabanga idatha yomsindo nevidiyo, okuthunyelwe kwezokuxhumana, idatha ye-3D noma i-geospatial. Lolu hlobo lwedatha aluhlelwa kalula noma oluhlelekile.

Ukuze uhlangabezane nale nselele, izindlela eziningi ezizenzekelayo zokukhipha ulwazi oluwusizo zakhiwe, phakathi kwazo ngezigaba .

Indlela Ukuhlukaniswa Kusebenza Kanjani

Ngengozi yokuhamba kakhulu ekukhulumeni kwe-tech, ake sixoxe ngokuthi ukuhlukaniswa kusebenza kanjani. Umgomo ukudala iqoqo lemithetho yokuhlukanisa ezophendula umbuzo, ukwenza isinqumo, noma ukubikezela ukuziphatha.Ukuqala, iqoqo lemininingwane yokuqeqeshwa lakhiwe eliqukethe isethi ethile yezimfanelo kanye nemiphumela engenzeka.

Umsebenzi we-algorithm yokuhlukanisa ukukala ukuthi ukuthola ukuthi lezi zimfanelo zifinyelela kanjani ekuphethweni kwalo.

Isimo : Mhlawumbe inkampani yekhadi lesikweletu lizama ukunquma ukuthi yiziphi amathemba okufanele zithole ukunikezwa kwekhadi lesikweletu.

Lokhu kungase kube isethi yayo yedatha yokuqeqesha:

Ukuqeqeshwa kwedatha
Igama Ubudala Ubulili Imali Yonyaka Isipho Sekhadi Lesikweletu
John Doe 25 M $ 39,500 Cha
Jane Doe 56 F $ 125,000 Yebo

Amakholomu "e-predictor" Ubudala , Ubulili , kanye Nemivuzo Yonyaka inquma ukubaluleka kwe-"predictor attribute". Esikhathini sokuqeqesha, isichazamazwi se-predictor siyaziwa. I-algorithm ye-classification yabe izama ukunquma ukuthi ukubaluleka kwesici sokuqagela kwafinyelelwa kanjani: yikuphi ubudlelwane obukhona phakathi kokubikezela nesinqumo? Izokwakha isethi yemithetho yokubikezela, ngokuvamile isitatimende se-IF / THEN, isibonelo:

I-IF (Ubudala> 18 NOMA Ubudala <75) NENYAKA YONYAKA YONYAKA> 40,000 KUNYE I-Credit Card Offer = yebo

Ngokusobala, lokhu yisibonelo esilula, futhi i-algorithm izodinga isibalo esikhulu sedatha kakhulu kunalawo marekhodi amabili aboniswe lapha. Ngaphezu kwalokho, imithetho yokubikezela kungenzeka ibe yinkimbinkimbi kakhulu, kufaka phakathi imithetho emincane yokuthola imininingwane yemfanelo.

Okulandelayo, i-algorithm inikezwa "isethi yokubikezela" yedatha yokuhlaziya, kodwa lokhu kusethelwa ukuthi akusikho isici sokubikezela (noma isinqumo):

Idatha ye-Predictor
Igama Ubudala Ubulili Imali Yonyaka Isipho Sekhadi Lesikweletu
UJack Frost 42 M $ 88,000
UMary Murray 16 F $ 0

Le datha yokulungisa idinga ukulinganisa ukunemba kwemithetho yokubikezela, futhi imithetho isuke isetshenziswe kuze kube yilapho umthuthukisi ebheka ukuthi izibikezelo ziyasebenza futhi ziwusizo.

Usuku nosuku Izibonelo zokubekwa

Ukuhlukaniswa, kanye namanye amasu okumba amaminithi wedatha, kubangelwa ukuhlangenwe nakho kwethu kwansuku zonke njengabathengi.

Ukubikezela kwesimo sezulu kungase kusetshenziswe ukuhlukanisa ukubika ukuthi usuku luzokanda, lube lusuku noma lube lukhulu. Umsebenzi wezokwelapha angase ahlaziye izimo zempilo ukubikezela imiphumela yezokwelapha. Uhlobo lwenqubo yokuhlukanisa, i-Naive Bayesian, isebenzisa amathuba okuhlukanisa ama-imeyli ogaxekile. Kusukela ekutholeni ukukhwabanisa ekuhlinzekeni komkhiqizo, ukuhlukaniswa kwemiphumela kulandela izigcawu nsuku zonke ukuhlaziya idatha nokukhiqiza izibikezelo.